Finding Element in NodeList XML - java

Is there a way i can get the first Element from a NodeList? Im using org.w3c.dom to handle XML files, i have already written large parts of my program using org.w3c.dom and discovered only recently dom4j which has a method for it but i cannot use it because of backward compatibility issues with my other methods.
It is critical that i can find and pass the very first Element in my XML and it must be of type org.w3c.dom.Element , however, not even using doc.normalize(); has helped, neither did using dom4j methods to find the element and cast it into org.w3c.dom.Element as thats forbidden.
File file = new File("myXML.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dbuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
doc = dBuilder.parse(file);
doc.normalize;
Element sourceElem = doc.getDocumentElement();
NodeList nodelist = sourceElem.getChildNodes();
Element elem;
if(nodeList.item(0).getNodeType() == Node.ELEMENT_NODE){
elem = (Element)nodeList.item(0);
}
Im getting NullPointerException from other methods because it cant find the element.
I need it to work in the same way this C++ code does:
XmlDocument doc = new XmlDocument();
doc.Load("myXML.xml");
XmlElement elem = (XmlElement)doc.DocumentElement.ChildNodes[0];
EDIT: OR is there a way i can cast dom4j Element back into org.w3c.dom.Element?
EDIT2: Sample XML i need to access http://pastebin.com/C3nvxhwx

Related

How to skip certain elements when parsing XML file to DOM with Java

I am attempting to parse some XML documents into DOM so that I can run XPath queries against it. My code is in Java and have been using the Xerces org.apache.xerces.parsers.DOMParser implementation.
I am only interested in certain portions of the XML, under element elementICareAbout and can ignore other elements.
<top>
<elementICareAbout>...</elementICareAbout>
<elementToIgnore>...</elementToIgnore>
</top>
The XML file size can be quite large, and I would not like to have to hold onto elements in memory which I would not need as part of the processing, where I would expect an XPath query to /top/elementICareAbout to return data, but /top/elementToIgnore would just return nothing (as I don't need it to).
Looking over the Xerces DOMParser or the JAXP APIs I don't see any kind of way to explicitly ignore certain elements so that they are not part of the DOM tree in memory after parsed?
Is there a good way to construct a partial DOM Document from an XML file tailored to the parts that I need?
You could write a SAX filter and insert it into the processing pipeline between the (SAX) parser and the document builder. Or with rather less coding you could write an XSLT 3.0 streaming transformation. Or you could write an XQuery to select the parts of the document you want, and run it using a query processor that supports document projection. It all depends how wedded you are to Java/DOM coding - my preference would be for higher-level languages than that.
You can also get the element by tagname.
For example, if you have a xml files call Question.xml.
Question.xml
In the java file, you can do the following:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(responseString));
Document doc = dBuilder.parse(is);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("Question");
//get all lessons stored
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
//Looking through elements by tagname
String q1 = eElement.getElementsByTagName("q1").item(0).getTextContent();

XPath Java count of child nodes

I want to count some child nodes of a given xml. But it always returns me 0 and I can't figure out why.
Here's the xml:
<FirstOne xmlns:xxx="http://www.w3.org/2001/XMLSchema-instance">
<Formulas xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
</Formulas>
</FirstOne>
I want to count the number of "xxx:yyy". In this example 3.
I tried the following:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(fileArray[i].toString())));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
expression = "count(//Formulas/xxx:yyy)";
Double result = (Double) xpath.evaluate(expression, doc, XPathConstants.NUMBER);
It always gives me 0.0 ...
Thanks for your help!
The problems all stem from the namespaces.
Firstly, XPath evaluation is only defined over namespace-well-formed XML, so you need to ensure that the aa and cc prefixes are properly mapped to namespace URIs in the XML.
Secondly, you need to parse the XML into a DOM tree using a namespace-aware parser (for what I can only assume are historical reasons, DocumentBuilderFactory is not namespace-aware by default).
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(fileArray[i].toString())));
Now you have a proper namespace-well-formed DOM tree you need to handle the namespaces correctly in the XPath. You need to define a NamespaceContext telling the XPath how to relate prefixes and namespace URIs. Annoyingly there's no default implementation of this interface available in the core Java libraries but there are third-party implementations such as Spring's SimpleNamespaceContext, or it's only three methods to implement it yourself. With a SimpleNamespaceContext:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
xpath.setNamespaceContext(nsCtx);
nsCtx.bindNamespaceUri("x", "http://www.w3.org/2001/XMLSchema-instance");
With this context in place you can now select namespaced nodes in your XPath expression:
String expression = "count(//Formulas/x:yyy)";
(the prefixes you use are the ones in the NamespaceContext, not necessarily the ones in the original XML source).
While some DOM parsers and XPath implementations might let you get away with parsing non-namespace-aware and omitting the prefixes in the XPath expressions, this is an implementation detail and the behaviour is not defined by the specifications. It might work in one version but fail in another, or behave differently if you add additional JARs to your project that change the default parser, etc.
While xxx is the tag prefix, use just count(//Formulas/yyy).

How to append xml nodes (as a string) into an existing XML Element node (only using java builtins)?

(Disclaimer: using Rhino inside RingoJS)
Let's say I have a document with an element , I don't see how I can append nodes as string to this element. In order to parse the string to xml nodes and then append them to the node, I tried to use documentFragment but I couldn't get anywhere. In short, I need something as easy as .NET's .innerXML but it's not in the java api.
var dbFactory = javax.xml.parsers.DocumentBuilderFactory.newInstance();
var dBuilder = dbFactory.newDocumentBuilder();
var doc = dBuilder.newDocument();
var el = doc.createElement('test');
var nodesToAppend = '<foo bar="1">Hi <baz>there</baz></foo>';
el.appendChild(???);
How can I do this without using any third party library ?
[EDIT] It's not obvious in the example but I'm not supposed to know the content of variable 'nodesToAppend'. So please, don't point me to tutorials about how to create elements in an xml document.
You can do this in java - you should be able to derive the Rhino equivalent:
DocumentBuilderFactory dbFactory = javax.xml.parsers.DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
Element el = doc.createElement('test');
doc.appendChild(el);
String xml = "<foo bar=\"1\">Hi <baz>there</baz></foo>";
Document doc2 = builder.parse(new ByteArrayInputStream(xml.getBytes()));
Node node = doc.importNode(doc2.getDocumentElement(), true);
el.appendChild(node);
Since doc and doc2 are two different Documents the trick is to import the node from one document to another, which is done with the importNode api above
I think your question is like this question and there is answer on it :
Java: How to read and write xml files?
OR see this link http://www.mkyong.com/java/how-to-create-xml-file-in-java-dom/

XML DOM createchild takes Node instread of Element

I am creating a XML using DOM as below using online examples,
DocumentBuilderFactory docfac= DocumentBuilderFactory.newInstance();
DocumentBuilder docb= docFactory.newDocumentBuilder();
Document doc = docb.newDocument();
// root
Element rootElement = (Element)doc.createElement("TEST");
doc.appendChild(rootElement); //Compiler error
...
appenchild takes Node object, not Element object. I was trying to use Node but, it seems like there is no methods exposed to set attribute, therefore, I can't really use node.
Any help would be really appreciate it.
Thanks.
Please verify the packages you've imported : import org.w3c.dom.Document and import org.w3c.dom.Element; and change docfac.newDocumentBuilder();
No need to type cast the org.w3c.dom.Element because doc.createElement("TEST") returns an object of org.w3c.dom.Element which is a sub-interface of org.w3c.dom.Node.
org.w3c.dom.Element rootElement = doc.createElement("TEST");
doc.appendChild(rootElement)

Always get null when querying XML with XPath

I am using the following code to query some XML with XPath I get from a stream.
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(false);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputStream);
inputStream.close();
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//FOO_ELEMENT");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
I have checked the stream for content by converting it to a string - and it's all there - so it's not as if there is no data in the stream.
This is just annoying me now - as I have tried various different bits of code and I still keep getting 'null' being printed at the "System.out.println" line - what am I missing here?
NOTE: I want to see the text inside the element.
In addition to what Brabster suggested, you may want to try
System.out.println(nodes.item(i).getTextContent());
or
System.out.println(nodes.item(i).getNodeName());
depending on what you're intending to display.
See http://java.sun.com/javase/6/docs/api/org/w3c/dom/Node.html
Not an expert in the Java XPath impl tbh, but this might help.
The javadocs say that he result of getNodeValue() will be null for most types of node.
It's not totally clear what you expect to see in the output; element name, attributes, text? I'll guess text. In any XPath impl I have used, if you want the text content of the node, you have to XPath to
//FOO_ELEMENT/text()
Then the node's value is the text content of the node.
The getTextContent() method will return the text content of the node you've selected with the XPath, and any descendant nodes, as per the javadoc. The solution above selects exactly the text component of the any nodes FOO_ELEMENT in the document.
Java EE Docs for Node <-- old docs, see comments for current docs.

Categories

Resources