string to a dom document in java - java

I have a string which has an xml inside. I want to write this to a Document in java so i can later retrieve the attributes within it and store them in other strings. Could someone provide me with an example of this please?

Some more details are necessary for a proper answer but you can start with the DocumentBuilder like this:
DocumentBuilder db = DocumentBuilderFactory.newDocumentBuilder()
Document doc = db.parse(yourstring);
You now have a Document from which you can access your XML attributes etc.

Check out http://www.java-tips.org/java-se-tips/javax.xml.parsers/how-to-read-xml-file-in-java.html. Hopefully it will give you a good starting point.

http://www.java-samples.com/showtutorial.php?tutorialid=152
This is a good example of how to parse out some XML. This should be a good start for you.

Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(myXMLString.getBytes());

Related

remove elements from XML file in java

I have generated an xml file from an excel data base and it contains automatically an element called "offset". To make my new file match my needs, I want to remove this element using java.
here is the xml content:
<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>
I made a code that reads(with a buffered reader) and writes the content in a new file and used the condition if:
while (fileContent !=null){
fileContent=xmlReader.readLine();
if (!(fileContent.equals("<offset/>"))){
System.out.println("here is the line:"+ fileContent);
XMLFile+=fileContent;
}
}
But it does not work
I would personally recommend using a proper XML parser like Java DOM to check and delete your nodes, rather than dealing with your XML as raw Strings (yuck). Try something like this to remove your 'offset' node.
File xmlFile = new File("your_xml.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
NodeList nList = doc.getElementsByTagName("offset");
for (int i = 0; i < nList.getLength(); i++) {
Node node = nList.item(i);
node.getParentNode().removeChild(node);
}
The above code removes any 'offset' nodes in an xml file.
If resources/speed considerations are an issue (like when your_xml.xml is huge), you would be better off using SAX, which is faster (a little more code intensive) and doesn't store the XML in memory.
Once your Document has been edited you'll probably want to convert it to a String to parse to your OutputStream of choice.
Hope this helps.
Just try to replace the unwanted string with an empty string. It is quick... and seriously dirty. But might solve your problem quickly... just to reappear later on ;-)
fileContent.replace("<offset/>, "");
String sXML= "<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>";
System.out.println(sXML);
String sNewXML = sXML.replace("<offset/>", "");
System.out.println(sNewXML);
String xml = "<Root><models><id>2</id><modelName>Baseline</modelName><domain_id>2</domain_id><description> desctiption </description><years><Y2013>value1</Y2013><Y2014>value2</Y2014><Y2015>value3</Y2015><Y2016>value4</Y2016><Y2017>value5</Y2017></years><offset/></models></Root>";
xml = xml.replaceAll("<offset/>", "");
System.out.println(xml);
In your original code that you included you have:
while (fileContent !=null)
Are you initializing fileContent to some non-null value before that line? If not, the code inside your while block will not run.
But I do agree with the other posters that a simple replaceAll() would be more concise, and a real XML API is better if you want to do anything more sophisticated.

Extracting elements from an HTTP XML response--HTTP Client & Java

So I've gotten help from here already so I figured why not try it out again!? Any suggestions would be greatly appreciated.
I'm using HTTP client and making a POST request; the response is an XML body that looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<CartLink
xmlns="http://api.gsicommerce.com/schema/ews/1.0">
<Name>vSisFfYlAPwAAAE_CPBZ3qYh</Name>
<Uri>carts/vSisFfYlAPwAAAE_CPBZ3qYh</Uri>
</CartLink>
Now...
I have an HttpEntity which is
[HttpResponse].getEntity().
Then I get a String representation of the response (which is XML in this case) by saying
String content = EntityUtils.toString(HttpEntity)
I tried following some of the suggestions on this post: How to create a XML object from String in Java? but it did not seem to work for me. When I built up the document it still appeared to be null.
MY END GOAL here is just to get the NAME field.. i.e. the "vSisFfYlAPwAAAE_CPBZ3qYh" part. So do I want to build up a document and then extract it...? Or is there a simpler way? I've been trying different things and I can't seem to get it to work.
Thanks for all of the help guys, it is most appreciated!!
Instead of trying to extract the value with string manipulation, try to use Java's inbuilt ability to parse XML. That's a much better approach. Http Components returns responses in an XML format - there's a reason for that. :)
Here's probably one way to solve your problem:
// Parse the response using DocumentBuilder so you can get at elements easily
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(response);
Element root = doc.getDocumentElement();
// Now let's say you have not one, but 'n' nodes that contain the value
// you're looking for. Use NodeList to get a list of all those nodes and just
// pull out the tag/attribute's value you want.
NodeList nameNodesList = doc.getElementsByTagName("Name");
ArrayList<String> nameValues = null;
// Now iterate through the Nodelist to get the values you want.
for (int i=0; i<nameNodesList.getLength(); i++){
nameValues.add(nameNodesList.item(i).getTextContent());
}
The ArrayList "nameValues" will now hold every single value contained within "Name" tags. You could also create a HashMap to store a key value pair of Nodes and their respective text contents.
Hope this helps.

java DOM lookupNamespaceURI is not able to locate namespace URI

I'm trying to follow http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html
UniversalNamespaceResolver
example for resolving namespaces of the XPath evaluation agains an XML. The problem I encountered is that lookupNamespaceURI call below returns null on the XML, I given below:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse(new InputSource(new StringReader(xml)));
String nsURI = dDoc.lookupNamespaceURI("h");
the XML:
<?xml version="1.0"?>
<h:root xmlns:h="http://www.w3.org/TR/html4/">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>`
</h:root>
while I'd expect it to return "http://www.w3.org/TR/html4/".
When configuring a DocumentBuilder, you have to explicitly make it namespace aware (a silly relic from the first days of xml when there were no namespaces):
domFactory.setNamespaceAware(true);
As a side note, the advice in that article is not very good. it fundamentally misses the point that you don't care what the namespace prefixes are in the actual document, they are irrelevant. you need the xpath namespace resolver to match the xpath expressions that you are using, and that is all. if you do what they are suggesting, you will have to change your xpath code whenever the document's prefixes change, which is a horrible idea.
Note, they sort of cede this point in their last bullet, but the rest of the article seems to miss that this is the fundamental idea when using xpath.
But if you don't have control over the XML file, and someone can send you any prefixes they wish, it might be better to be independent of their choices. You can code your own namespace resolution as in Example 1 (HardcodedNamespaceResolver), and use them in your XPath expressions.

SAXParseException when “ is used in XML

I'm getting a "org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 26; The entity "ldquo" was referenced, but not declared." exception when reading an XML document. I'm reading it as follows:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlBody));
Document document = builder.parse(is);
And then there's an exception on builder.parse(is);
From searching I figured that it is necessary to declare some of those new entities externally, unfortunately, I cannot modify the original XML document.
How do I fix this problem?
Thanks
From searching I figured that it is necessary to declare some of those new entities externally, unfortunately, I cannot modify the original XML document.
Well, unless you declare the entity then the document isn't XML and you won't be able to process it using an XML parser.
When you are asked to process input that isn't well-formed XML, the best approach is to fix the process that created the document (the whole idea of using XML for interchange relies on it being well-formed XML). The alternatives are to "repair" the document to turn it into well-formed XML (which you say you can't do), or to forget the fact that it was intended to be XML, and treat it as you would any proprietary non-XML format.
Not a pleasant set of choices - but that's the mess you get into when people pay lip-service to XML but fail to conform to the letter of the standard.
Try
factory.setExpandEntityReferences(false);
This will prevent the parser from trying to expand entities.
EDIT: How about this http://xerces.apache.org/xerces2-j/features.html#dom.create-entity-ref-nodes -- The top of that page has an example of how to set features on the underlying parser. This should cause the parser to create entity-reference DOM nodes instead of trying to expand the entities.

how to use Pattern matcher in java?

lets say the string is <title>xyz</title>
I want to extract the xyz out of the string.
I used:
Pattern titlePattern = Pattern.compile("&lttitle&gt\\s*(.+?)\\s*&lt/title&gt");
Matcher titleMatcher = titlePattern.matcher(line);
String title=titleMatcher.group(1));
but I am getting an error for titlePattern.matcher(line);
You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find() on the matcher once to actually search for the pattern:
if(titleMatcher.find()){
String title = titleMatcher.group(1);
}
Not that if you really match against a string with non-escaped HTML entities like
<title>xyz</title>
Then your regular expression will have to use these, not the escaped entities:
"<title>\\s*(.+?)\\s*</title>"
Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions. If you are working with XML, it's much easier to use an XML parser, e.g. JDOM.
Not technically an answer but you shouldn't be using regular expressions to parse HTML. You can try and you can get away with it for simple tasks but HTML can get ugly. There are a number of Java libraries that can parse HTML/XML just fine. If you're going to be working a lot with HTML/XML it would be worth your time to learn them.
As others have suggested, it's probably not a good idea to parse HTML/XML with regex. You can parse XML Documents with the standard java API, but I don't recommend it. As Fabian Steeg already answered, it's probably better to use JDOM or a similar open source library for parsing XML.
With javax.xml.parsers you can do the following:
String xml = "<title>abc</title>";
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
NodeList nodeList = doc.getElementsByTagName("title");
String title = nodeList.item(0).getTextContent();
This parses your XML string into a Document object which you can use for further lookups. The API is kinda horrible though.
Another way is to use XPath for the lookup:
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
String titleByXpath = xPath.evaluate("/title/text()", new InputSource(new StringReader(xml)));
// or use the Document for lookup
String titleFromDomByXpath = xPath.evaluate("/title/text()", doc);

Categories

Resources