remove elements from XML file in java

remove elements from XML file in java - java

I have generated an xml file from an excel data base and it contains automatically an element called "offset". To make my new file match my needs, I want to remove this element using java.
here is the xml content:
<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>
I made a code that reads(with a buffered reader) and writes the content in a new file and used the condition if:
while (fileContent !=null){
fileContent=xmlReader.readLine();
if (!(fileContent.equals("<offset/>"))){
System.out.println("here is the line:"+ fileContent);
XMLFile+=fileContent;
}
}
But it does not work

I would personally recommend using a proper XML parser like Java DOM to check and delete your nodes, rather than dealing with your XML as raw Strings (yuck). Try something like this to remove your 'offset' node.
File xmlFile = new File("your_xml.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
NodeList nList = doc.getElementsByTagName("offset");
for (int i = 0; i < nList.getLength(); i++) {
Node node = nList.item(i);
node.getParentNode().removeChild(node);
}
The above code removes any 'offset' nodes in an xml file.
If resources/speed considerations are an issue (like when your_xml.xml is huge), you would be better off using SAX, which is faster (a little more code intensive) and doesn't store the XML in memory.
Once your Document has been edited you'll probably want to convert it to a String to parse to your OutputStream of choice.
Hope this helps.

Just try to replace the unwanted string with an empty string. It is quick... and seriously dirty. But might solve your problem quickly... just to reappear later on ;-)
fileContent.replace("<offset/>, "");

String sXML= "<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>";
System.out.println(sXML);
String sNewXML = sXML.replace("<offset/>", "");
System.out.println(sNewXML);

String xml = "<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>";
xml = xml.replaceAll("<offset/>", "");
System.out.println(xml);

In your original code that you included you have:
while (fileContent !=null)
Are you initializing fileContent to some non-null value before that line? If not, the code inside your while block will not run.
But I do agree with the other posters that a simple replaceAll() would be more concise, and a real XML API is better if you want to do anything more sophisticated.

Related

Extracting elements from an HTTP XML response--HTTP Client & Java

So I've gotten help from here already so I figured why not try it out again!? Any suggestions would be greatly appreciated.
I'm using HTTP client and making a POST request; the response is an XML body that looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<CartLink
xmlns="http://api.gsicommerce.com/schema/ews/1.0">
<Name>vSisFfYlAPwAAAE_CPBZ3qYh</Name>
<Uri>carts/vSisFfYlAPwAAAE_CPBZ3qYh</Uri>
</CartLink>
Now...
I have an HttpEntity which is
[HttpResponse].getEntity().
Then I get a String representation of the response (which is XML in this case) by saying
String content = EntityUtils.toString(HttpEntity)
I tried following some of the suggestions on this post: How to create a XML object from String in Java? but it did not seem to work for me. When I built up the document it still appeared to be null.
MY END GOAL here is just to get the NAME field.. i.e. the "vSisFfYlAPwAAAE_CPBZ3qYh" part. So do I want to build up a document and then extract it...? Or is there a simpler way? I've been trying different things and I can't seem to get it to work.
Thanks for all of the help guys, it is most appreciated!!

Instead of trying to extract the value with string manipulation, try to use Java's inbuilt ability to parse XML. That's a much better approach. Http Components returns responses in an XML format - there's a reason for that. :)
Here's probably one way to solve your problem:
// Parse the response using DocumentBuilder so you can get at elements easily
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(response);
Element root = doc.getDocumentElement();
// Now let's say you have not one, but 'n' nodes that contain the value
// you're looking for. Use NodeList to get a list of all those nodes and just
// pull out the tag/attribute's value you want.
NodeList nameNodesList = doc.getElementsByTagName("Name");
ArrayList<String> nameValues = null;
// Now iterate through the Nodelist to get the values you want.
for (int i=0; i<nameNodesList.getLength(); i++){
nameValues.add(nameNodesList.item(i).getTextContent());
}
The ArrayList "nameValues" will now hold every single value contained within "Name" tags. You could also create a HashMap to store a key value pair of Nodes and their respective text contents.
Hope this helps.

How to convert a String to an XML object in Java

I get a SOAP message from a web service, and I can convert the response string to an XML file using the below code. This works fine. But my requirement is not to write the SOAP message to a file. I just need to keep this XML document object in memory, and extract some elements to be used in further processing. However, if I just try to access the document object below, it comes as empty.
Can somebody please tell me how I can convert a String to an in-memory XML object (without having to write to a file)?
String xmlString = new String(data);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
// Use String reader
Document document = builder.parse( new InputSource(
new StringReader( xmlString ) ) );
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource( document );
Result dest = new StreamResult( new File( "xmlFileName.xml" ) );
aTransformer.transform( src, dest );
}

Remove the 5 last lines of code, and you'll just have the DOM document in memory. Store this document in some field, rather than in a local variable.
If that isn't sufficient, then please explain, with code, what you mean with "if I just try to access the document object below, it comes as empty".

JB Nizet is right, the first steps create a DOM out of xmlString. That will load your xmlString (or SOAP message) into an in-memory Document. What the following steps are doing (all the things related with the Transform) is to serialize the DOM to a file (xmlFileName.xml), which is not what you want to do, right?
When you said that your DOM is empty, I think you tried to print out the content of your DOM with document.toString(), and returned something like "[document: null]". This doesn't mean your DOM is empty. Actually your DOM contains data. You need now to use the DOM API to get access to the nodes inside your document. Try something like document.getChildNodes(), document.getElementsByTagName(), etc

Can we modify the XML file, like deleting,updating the nodes using SAX parser at run-time?

I am new to xml parsing, can we update or modify like deleting the node of the XML using SAX parser at run-time, and stream the updated data as XML, or transform it to our own format if we require? As the DOM parser does.
dbFactory = DocumentBuilderFactory.newInstance();
docBuilder = dbFactory.newDocumentBuilder();
document = docBuilder.parse("src/"+xmlFile);
tranformerFactory = TransformerFactory.newInstance();
transformer = tranformerFactory.newTransformer();
for (int i = 0; i < inputElementsArrayToRemove.length; i++)
{
element = (Element)document.getElementsByTagName(inputElementsArrayToRemove[i]).item(0);
if(element != null)
{
// Removes the node from the document
element.getParentNode().removeChild(element);
}
}
// Normalize the DOM tree to combine all adjacent nodes
document.normalize();
// Here, transforming(Converting) the document source to the another XML file
Source source = new DOMSource(document);
Result dest = new StreamResult(new FileOutputStream(new File("src/"+resultXmlFileName)));
// transform method to write out the DOM as XML data.
transformer.transform(source, dest);

Yes, using the XMLReader and XMLFilter APIs. See this question and answer for an example.

We can Read or Write xml using SAX parser. to edit or update we need to use DOM parser only.

You can not. SAX Parser produces stream of events and feeds them to your handler.
Handler doessomething it likes with it.
DOM parser produces node tree, and you can modify it and save it back as XML
( but it also consumes memory )
XPP parser allows you to pull xml events out of stream one by one, and you can
wrap it with your parser and modify events on the fly. I would speculate is comes closes
to your task

Nope. SAX is event based parser which tells as when it encounter tokens. It is used for reading. Can we write using it?

Parsing an XML file without root in Java

I have this XML file which doesn't have a root node. Other than manually adding a "fake" root element, is there any way I would be able to parse an XML file in Java? Thanks.

I suppose you could create a new implementation of InputStream that wraps the one you'll be parsing from. This implementation would return the bytes of the opening root tag before the bytes from the wrapped stream and the bytes of the closing root tag afterwards. That would be fairly simple to do.
I may be faced with this problem too. Legacy code, eh?
Ian.
Edit: You could also look at java.io.SequenceInputStream which allows you to append streams to one another. You would need to put your prefix and suffix in byte arrays and wrap them in ByteArrayInputStreams but it's all fairly straightforward.

Your XML document needs a root xml element to be considered well formed. Without this you will not be able to parse it with an xml parser.

One way is to provide your own dummy wrapper without touching the original 'xml' (the not well formed 'xml') Need the word for that:
Syntax
<!DOCTYPE some_root_elem SYSTEM "/home/ego/some.dtd"
[
<!ENTITY entity-name "Some value to be inserted at the entity">
]
Example:
<!DOCTYPE dummy [
<!ENTITY data SYSTEM "http://wherever-my-data-is">
]>
<dummy>
&data;
</dummy>

You could use another parser like Jsoup. It can parse XML without a root.

I think even if any API would have an option for this, it will only return you the first node of the "XML" which will look like a root and discard the rest.
So the answer is probably to do it yourself. Scanner or StringTokenizer might do the trick.
Maybe some html parsers might help, they are usually less strict.

Here's what I did:
There's an old java.io.SequenceInputStream class, which is so old that it takes Enumeration rather than List or such.
With it, you can prepend and append the root element tags (<div> and </div> in my case) around your no-root XML stream. (You shouldn't do it by concatenating Strings due to performance and memory reasons.)
public void tryExtractHighestHeader(ParserContext context)
{
String xhtmlString = context.getBody();
if (xhtmlString == null || "".equals(xhtmlString))
return;
// The XHTML needs to be wrapped, because it has no root element.
ByteArrayInputStream divStart = new ByteArrayInputStream("<div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream divEnd = new ByteArrayInputStream("</div>".getBytes(StandardCharsets.UTF_8));
ByteArrayInputStream is = new ByteArrayInputStream(xhtmlString.getBytes(StandardCharsets.UTF_8));
Enumeration<InputStream> streams = new IteratorEnumeration(Arrays.asList(new InputStream[]{divStart, is, divEnd}).iterator());
try (SequenceInputStream wrapped = new SequenceInputStream(streams);) {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(wrapped);
From here you can do whatever you like, but keep in mind the extra element.
XPath xPath = XPathFactory.newInstance().newXPath();
}
catch (Exception e) {
throw new RuntimeException("Failed parsing XML: " + e.getMessage());
}
}

Java change and move non-standard XML file

I am using a third party application and would like to change one of its files. The file is stored in XML but with an invalid doctype.
When I try to read use a it errors out becuase the doctype contains "file:///ReportWiz.dtd"
(as shown, with quotes) and I get an exception for cannot find file. Is there a way to tell the docbuilder to ignore this? I have tried setValidate to false and setNamespaceAware to false for the DocumentBuilderFactory.
The only solutions I can think of are
copy file line by line into a new file, omitting the offending line, doing what i need to do, then copying into another new file and inserting the offending line back in, or
doing mostly the same above but working with a FileStream of some sort (though I am not clear on how I could do this..help?)
DocumentBuilderFactory docFactory = DocumentBuilderFactory
.newInstance();
docFactory.setValidating(false);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(file);

Tell your DocumentBuilderFactory to ignore the DTD declaration like this:
docFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
See here for a list of available features.
You also might find JDOM a lot easier to work with than org.w3c.dom:
org.jdom.input.SAXBuilder builder = new SAXBuilder();
builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
org.jdom.Document doc = builder.build(file);

Handle resolution of the DTD manually, either by returning a copy of the DTD file (loaded from the classpath) or by returning an empty one. You can do this by setting an entity resolver on your document builder:
EntityResolver er = new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if ("file:///ReportWiz.dtd".equals(systemId)) {
System.out.println(systemId);
InputStream zeroData = new ByteArrayInputStream(new byte[0]);
return new InputSource(zeroData);
}
return null;
}
};

My first thought was dealing with it as a stream. You could make a new adapter at some level and just copy input to output except for the offending text.
If the file is shortish (under half a gig or so) you could also read the entire thing into a byte array and make your modifications there, then create a new stream from the byte array into your builder.
That's the advantage of the amazingly bulky way Java handles streams, you actually have a lot of flexibility.

Another thing I was debating was storing all of the file in a string, then doing my manipulations and wiring the String out to a file.None of these seem clean or easy, but what is the best way to do this?

if you do not want to assume the parser is xerces, and want generic solution see this

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

remove elements from XML file in java - java

Just try to replace the unwanted string with an empty string. It is quick... and seriously dirty. But might solve your problem quickly... just to reappear later on ;-) fileContent.replace("<offset/>, "");

Related

Extracting elements from an HTTP XML response--HTTP Client & Java

How to convert a String to an XML object in Java

Can we modify the XML file, like deleting,updating the nodes using SAX parser at run-time?

Parsing an XML file without root in Java

Java change and move non-standard XML file

Categories

Resources