DOM Parser wrong childNodes Count

DOM Parser wrong childNodes Count - java

This is strange but let me try my best to put it accross.
I have a XML which i am reading through the normal way from desktop and parsing it through DOM parser.
<?xml version="1.0" encoding="UTF-8"?>
<Abase
xmlns="www.abc.com/Events/Abase.xsd">
<FVer>0</FVer>
<DV>abc App</DV>
<DP>abc Wallet</DP>
<Dversion>11</Dversion>
<sigID>Ss22</sigID>
<activity>Adding New cake</activity>
</Abase>
Reading the XML to get the childs.
Document doc = docBuilder.parse("C://Users//Desktop//abc.xml");
Node root = doc.getElementsByTagName("Abase").item(0);
NodeList listOfNodes = root.getChildNodes(); //Sysout Prints 13
So here my logic works well.When am trying to do by pushing the same XML to a queue and read it and get the child nodes it gives me no. of child nodes is 6.
Document doc=docBuilder.parse(new InputSource(new ByteArrayInputStream(msg.getBytes("UTF-8"))));
Node root = doc.getElementsByTagName("Abase").item(0);
NodeList listOfNodes = root.getChildNodes(); //Sysout Prints 6
this screws my logic of parsing the XML.Can anyone help me out?
UPDATE
Adding sending logic :
javax.jms.TextMessage tmsg = session.createTextMessage();
tmsg.setText(inp);
sender.send(tmsg);
PROBLEM
If i read this xml from desktop it says 13 childs, 6 element node and 7 text nodes.The Common Logic is :
Read all the childs and iterate through list of child items.
If node ISNOT text node get inside if block,add one parent element with two child and append to existing ROOT.Then get NodeName and get TextContext between the element node and push them as setTextContext for both the childs respectively.
So i have a fresh ELEMENT NODE now which have two childs .And as i dont need the already existing element node now which are still the childs of root,Lastly am removing them.
So the above logic is all screwed if i am pushing the XML to queue and areading it for doing the same logic.
OUTPUT XML which is coming good when i read from desktop,but reading from queue is having problem, because it screw the complete tree.
<Abase
xmlns="www.abc.com/Events/Abase.xsd">
<Prop>
<propName>FVer</propName>
<propName>0</propName> //similarly for other nodes
</Prop>
</Abase>
Thanks

Well, there are 13 children if whitespace text nodes are included, but only 6 if whitespace text nodes are dropped. So there's some difference in the way the tree has been built between the two cases, that affects whether whitespace text nodes are retained or not.

The document under "Output XML" means that there is something wrong on the sender side. My guess would by that inp isn't a String but some kind of object and setText(inp) doesn't call inp.toString() but instead triggers some kind of serialization code which produces this odd XML that you're seeing.

Related

Difference between JSoup Element and JSoup Node

Can anyone please explain the difference between the Element object and Node object provided in JSoup ?
Which is the best thing to be used in which situation/condition.

A node is the generic name for any type of object in the DOM hierarchy.
An element is one specific type of node.
The JSoup class model reflects this:
Node
Element
Since Element extends Node anything you can do on a Node, you can do on an Element too. But Element provides additional behaviour which makes it easier to use, for example; an Element has properties such as id and class etc which make it easier to find them in a HTML document.
In most cases using Element (or one of the other subclasses of Document) will meet your needs and will be easier to code to. I suspect the only scenario in which you might need to fall back to Node is if there is a specific node type in the DOM for which JSoup does not provide a subclass of Node.
Here's an example showing the same HTML document inspection using both Node and Element:
String html = "<html><head><title>This is the head</title></head><body><p>This is the body</p></body></html>";
Document doc = Jsoup.parse(html);
Node root = doc.root();
// some content assertions, using Node
assertThat(root.childNodes().size(), is(1));
assertThat(root.childNode(0).childNodes().size(), is(2));
assertThat(root.childNode(0).childNode(0), instanceOf(Element.class));
assertThat(((Element) root.childNode(0).childNode(0)).text(), is("This is the head"));
assertThat(root.childNode(0).childNode(1), instanceOf(Element.class));
assertThat(((Element) root.childNode(0).childNode(1)).text(), is("This is the body"));
// the same content assertions, using Element
Elements head = doc.getElementsByTag("head");
assertThat(head.size(), is(1));
assertThat(head.first().text(), is("This is the head"));
Elements body = doc.getElementsByTag("body");
assertThat(body.size(), is(1));
assertThat(body.first().text(), is("This is the body"));
YMMV but I think the Element form is easier to use and much less error prone.

It's seem like same. but different.
Node have Element. and additionally have TextNode too.
so... Example.
<p>A<span>B</span></p>
In P Elements.
.childNodes() // get node list
-> A
-> <span>B</span>
.children() // get element list
-> <span>B</span>

How to get elements from XPath in Java

I want to get data from an XPath query:
Element location = (Element) doc.query("//location[location_name='"+ locationName +"']/*").get(0).getDocument().getRootElement();
System.out.println(location.toXML());
Element loc = location.getFirstChildElement("location");
System.out.println(loc.getFirstChildElement("location_name").getValue());
However, no matter what I choose, I always get 1 node (because of .get(0)). I don't know how to select the node which was selected by query.
I found that I should cast the node to Element, (XOM getting attribute from Node?) but the link only shows how to select the first node.

Call getParent() on the first element in the result:
Builder parse = new Builder();
Document xml = parse.build("/var/www/JAVA/toForum.xml");
System.out.println(xml.query("//location[#id=83]/*").get(0).getParent().toXML());
Produces the following output:
<location id="83">
<location_name>name</location_name>
<company_name>company a</company_name>
<machines>
<machine id="12">A</machine>
<machine id="312">B</machine>
</machines>
</location>

The call you make to getDocument() is returning the entirety of the XML document.
The call to query() returns a Nodes object directly containing references to the nodes that you are after.
If you change to
Element location = (Element)doc.query(
"//location[location_name='"+ locationName +"']/*").get(0);
System.out.println(location.getAttribute("location_name").getValue());
it should be ok
EDIT (by extraneon)
Some extra explanation not worthy of an answer by itself:
By doing
Element location =
(Element) doc.query("//location[location_name='"
+ locationName +"']/*").get(0)
.getDocument().getRootElement();
you search through the tree and get the requested node. But then you call getDocument().getRootNode() on the element you want, which will give you the uppermost node of the document.
The above query can thus be simplified to:
Element location = (Element)doc.getRootElement();
which is not wahat you intended.
It's a bit like a bungie jump. You go down to where you need to be (the element) but go immediately back to where you came from (the root element).

It's not clear (at least for me) what actually has to be done. From your query you should get list of nodes matching the given criteria. You will get NodeList and then you can iterate over this NodeList and get content of each node with getNodeValue for example.

Problems setting a new node value in java, dom, xml parsing

I have the following code:
DocumentBuilder dBuilder = dbFactory_.newDocumentBuilder();
StringReader reader = new StringReader(s);
InputSource inputSource = new InputSource(reader);
Document doc_ = dBuilder.parse(inputSource);
and then I would like to create a new element in that node right under the root node with this code:
Node node = doc_.createElement("New_Node");
node.setNodeValue("New_Node_value");
doc_.getDocumentElement().appendChild(node);
The problem is that the node gets created and appended but the value isn't set. I don't know if I just can't see the value when I look at my xml if its hidden in some way but I don't think that's the case because I've tried to get the node value after the create node call and it returns null.
I'm new to xml and dom and I don't know where the value of the new node is stored. Is it like an attribute?
<New_Node value="New_Node_value" />
or does it put value here:
<New_Node> New_Node_value </New_Node>
Any help would be greatly appreciated,
Thanks, Josh

The following code:
Element node = doc_.createElement("New_Node");
node.setTextContent("This is the content"); //adds content
node.setAttribute("attrib", "attrib_value"); //adds an attribute
produces:
<New_Node attrib="attrib_value">This is the content</New_Node>
Hope this clarifies.

For clarification, when you create nodes use:
Attr x = doc.createAttribute(...);
Comment x = doc.createComment(...);
Element x = doc.createElement(...); // as #dogbane pointed out
Text x = doc.createTextNode(...);
instead of using the generic Node for what you get back from each method. It will make your code easier to read/debug.
Secondly, the getNodeValue() / setNodeValue() methods work differently depending on what type of Node you have. See the summary of the Node class for reference. For an Element, you can't use these methods, although for a Text node you can.
As #dogbane pointed out, use setTextContent() for the text between this element's tags. Note that this will destroy any existing child elements.

This is other solution, in my case this solution is working because the setTextContent() function not exist. I am working with Google Web Toolkit (GWT) (It is a development framework Java) and I am imported the XMLParser library for I can use DOM Parser.
import com.google.gwt.xml.client.XMLParser;
Document doc = XMLParser.createDocument();
Element node = doc.createElement("New_Node");
node.appendChild(doc.createTextNode("value"));
doc.appendChild(node);
The result is:
<New_Node> value </New_Node>

<New_Node value="New_Node_value" />
'value' is an attribute of
New_Node
element, for getting into DOM I suggest you http://www.w3schools.com/htmldom/default.asp

Parsing XML Textlist

I'm trying to parse a XML file. I'm able to parse normal text node but how do I parse a textlist? I'm getting the firstChild of the textlist thats sadly all. If I try to do
elem.nextSibling();
it is always null which can't be, I know there are two other values left.
Does someone can provide me an example maybe?
Thanks!
XML example
<viewentry position="1" unid="7125D090682C3C3EC1257671002F66F4" noteid="962" siblings="65">
<entrydata columnnumber="0" name="Categories">
<textlist>
<text>Lore1</text>
<text>Lore2</text>
</textlist>
</entrydata>
<entrydata columnnumber="1" name="CuttedSubjects">
<text>
LoreImpsum....
</text>
</entrydata>
<entrydata columnnumber="2" name="$35">
<datetime>20091117T094224,57+01</datetime>
</entrydata>
</viewentry>

I assume you're using a DOM parser.
The first child of the <textlist> node is not the first <text> node but rather the raw text that contains the whitespace and carriage return between the end of <textlist> and the beginning of <text>. The output of the following snippet (using org.w3c.dom.* and javax.xml.parsers.*)
Node grandpa = document.getElementsByTagName("textlist").item(0);
Node daddy = grandpa.getFirstChild();
while (daddy != null) {
System.out.println(">>> " + daddy.getNodeName());
Node child = daddy.getFirstChild();
if (child != null)
System.out.println(">>>>>>>> " + child.getTextContent());
daddy = daddy.getNextSibling();
}
shows that <textlist> has five children: the two <text> elements and the three raw text pieces before, between and after them.
>>> #text
>>> text
>>>>>>>> Lore1
>>> #text
>>> text
>>>>>>>> Lore2
>>> #text
When parsing XML this way, it's easy to overlook that the structure of the DOM-tree can be complicated. You can quickly end up iterating over a NodeList in the wrong generation, and then you get nulls where you would expect siblings. This is one of the reasons why people came up with all kinds of xml-to-java stuff, from homegrown XMLHelper classes to XPath expressions to Digester to JAXB, so you need to go down to the DOM level only when you absolutely have to.

Java appending XML docs to existing docs

I have two XML docs that I've created and I want to combine these two inside of a new envelope. So I have
<alert-set>
<warning>National Weather Service...</warning>
<start-date>5/19/2009</start-date>
<end-date>5/19/2009</end-date>
</alert-set>
and
<weather-set>
<chance-of-rain type="percent">31</chance-of-rain>
<conditions>Partly Cloudy</conditions>
<temperature type="Fahrenheit">78</temperature>
</weather-set>
What I'd like to do is combine the two inside a root node: < DataSet> combined docs < /DataSet>
I've tried creating a temporary doc and replacing children with the root nodes of the documents:
<DataSet>
<blank/>
<blank/>
</DataSet>
And I was hoping to replace the two blanks with the root elements of the two documents but I get "WRONG_DOCUMENT_ERR: A node is used in a different document than the one that created it." I tried adopting and importing the root nodes but I get the same error.
Is there not some easy way of combining documents without having to read through and create new elements for each node?
EDIT: Sample code snippets
Just trying to move one to the "blank" document for now... The importNode and adoptNode functions cannot import/adopt Document nodes, but they can't import the element node and its subtree... or if it does, it does not seem to work for appending/replacing still.
Document xmlDoc; //created elsewhere
Document weather = getWeather(latitude, longitude);
Element weatherRoot = weather.getDocumentElement();
Node root = xmlDoc.getDocumentElement();
Node adopt = weather.adoptNode(weatherRoot);
Node imported = weather.importNode(weatherRoot, true);
Node child = root.getFirstChild();
root.replaceChild(adopt, child); //initially tried replacing the <blank/> elements
root.replaceChild(imported, child);
root.appendChild(adopt);
root.appendChild(imported);
root.appendChild(adopt.cloneNode(true));
All of these throw the DOMException: WRONG_DOCUMENT_ERR: A node is used in a different document than the one that created it.
I think I'll have to figure out how to use stax or just reread the documents and create new elements... That kinda seems like too much work just to combine documents, though.

It's a bit tricky, but the following example runs:
public static void main(String[] args) {
DocumentImpl doc1 = new DocumentImpl();
Element root1 = doc1.createElement("root1");
Element node1 = doc1.createElement("node1");
doc1.appendChild(root1);
root1.appendChild(node1);
DocumentImpl doc2 = new DocumentImpl();
Element root2 = doc2.createElement("root2");
Element node2 = doc2.createElement("node2");
doc2.appendChild(root2);
root2.appendChild(node2);
DocumentImpl doc3 = new DocumentImpl();
Element root3 = doc3.createElement("root3");
doc3.appendChild(root3);
// root3.appendChild(root1); // Doesn't work -> DOMException
root3.appendChild(doc3.importNode(root1, true));
// root3.appendChild(root2); // Doesn't work -> DOMException
root3.appendChild(doc3.importNode(root2, true));
}

I know you got the issue solved already, but I still wanted to take a stab at this problem using the XOM library that I'm currently testing out (related to this question), and while doing that, offer a different approach than that of Andreas_D's answer.
(To simplify this example, I put your <alert-set> and <weather-set> into separate files, which I read into nu.xom.Document instances.)
import nu.xom.*;
[...]
Builder builder = new Builder();
Document alertDoc = builder.build(new File("src/xomtest", "alertset.xml"));
Document weatherDoc = builder.build(new File("src/xomtest", "weatherset.xml"));
Document mainDoc = builder.build("<DataSet><blank/><blank/></DataSet>", "");
Element root = mainDoc.getRootElement();
root.replaceChild(
root.getFirstChildElement("blank"), alertDoc.getRootElement().copy());
root.replaceChild(
root.getFirstChildElement("blank"), weatherDoc.getRootElement().copy());
The key is to make a copy of the elements to be inserted into mainDoc; otherwise you'll get a complain that "child already has a parent".
Outputting mainDoc now gives:
<?xml version="1.0" encoding="UTF-8"?>
<DataSet>
<alert-set>
<warning>National Weather Service...</warning>
<start-date>5/19/2009</start-date>
<end-date>5/19/2009</end-date>
</alert-set>
<weather-set>
<chance-of-rain type="percent">31</chance-of-rain>
<conditions>Partly Cloudy</conditions>
<temperature type="Fahrenheit">78</temperature>
</weather-set>
</DataSet>
To my delight, this turned out to be very straight-forward to do with XOM. It only took a few minutes to write this, even though I'm definitely not very experienced with the library yet. (It would have been even easier without the <blank/> elements, i.e., starting with simply <DataSet></DataSet>.)
So, unless you have compelling reasons for using only the standard JDK tools, I warmly recommend trying out XOM as it can make XML handling in Java much more pleasant.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

DOM Parser wrong childNodes Count - java

Well, there are 13 children if whitespace text nodes are included, but only 6 if whitespace text nodes are dropped. So there's some difference in the way the tree has been built between the two cases, that affects whether whitespace text nodes are retained or not.

The document under "Output XML" means that there is something wrong on the sender side. My guess would by that inp isn't a String but some kind of object and setText(inp) doesn't call inp.toString() but instead triggers some kind of serialization code which produces this odd XML that you're seeing.

Related

Difference between JSoup Element and JSoup Node

How to get elements from XPath in Java

Problems setting a new node value in java, dom, xml parsing

Parsing XML Textlist

Java appending XML docs to existing docs

Categories

Resources