XML DOM parsing with Java

XML DOM parsing with Java - java

I'm trying to parse this XML string:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<response type="success">
<lots>
<lot>32342</lot>
<lot>52644</lot>
</lots>
</response>
When I get the root node, which is "response", I use the method getChildNodes() which returns a NodeList of length 3. However what I'm confused about is the order the NodeList gets created in. I used some print statements to show whats in the list
Item length: 3
Item (0): [#text:
]
Item (1): [lots: null]
Item (2): [#text:
]
So the text node is first which is two levels below the root, then the next child of the root, and then the next text node.
Is there a particular order and/or reason the list is ordered in this way?

You are seeing the whitespace text surrounding the childnode "lots"
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<response type="success">
{txt1}<lots>
<lot>32342</lot>
<lot>52644</lot>
</lots>{txt2}
</response>

Actually I found the problem, I was using '\n' characters which was causing extra text nodes being parsed

Related

How to get the element from the xml using Xpath Java

<employees>
<employee>
<firstName>Lokesh</firstName>
<lastName>Gupta</lastName>
<department>
<id>101</id>
<name>IT</name>
</department>
</employee>
</employees>
I wanted to get the elements name using Xpath..
I need to count the number of elements that i am getting using count(//employees/*) and count(//employees/employee/department/*)
it is returning count of each parent..
I need to get the element names as well //employees/employee/*/name() to get the elements name FirstName, LastName and Department..
also (//employees/employee/department/*/name()) to return name and id.. but it is showing error javax.xml.transform.TransformerException: Unknown nodetype: name .

You want to get the elements names (not the value of it). name() has to appear the first.
Since javax only supports XPath 1.0, you can use :
concat(name(//employees/employee/*[1]),",",name(//employees/employee/*[2]),",",name(//employees/employee/*[3]))
Output : firstName,lastName,department
concat(name(//employees/employee/department/*[1]),",",name(//employees/employee/department/*[2]))
Output : id,name
If you don't know the number of child for each parent element, you should use a loop approach. First, count and store the number of child (count(//employees/employee/*)), then make a loop where you increase the position index ([i]) at each iteration //employees/employee/*[i] i=i+1.

Empty default XML namespace xmlns="" attribute being added?

I have simple code where I create root element and append child to it. The problem is that child appends with empty xmlns="" attribute, though I don't expect it. It is a problem only of the first child, and the child of second nesting level is already Ok.
So, the following code -
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.newDocument();
Element rootEl = doc.createElementNS("http://someNamespace.ru", "metamodel");
doc.appendChild(rootEl);
Element groupsEl = doc.createElement("groups");
// This appends with xmlns=""
rootEl.appendChild(groupsEl);
Element groupEl = doc.createElement("group");
// This appends normally
groupsEl.appendChild(groupEl);
Will result to output -
<?xml version="1.0" encoding="UTF-8"?>
<metamodel xmlns="http://someNamespace.ru">
<groups xmlns="">
<group/>
</groups>
</metamodel>
Instead of -
<?xml version="1.0" encoding="UTF-8"?>
<metamodel xmlns="http://someNamespace.ru">
<groups>
<group/>
</groups>
</metamodel>
Note, as I said above, the tag <group> is already free from xmlns.

Your desired markup shows all elements in the default namespace. In order to achieve this, you have to create all elements in the default namespace.
The actual output you're getting has <groups xmlns=""> because groups, and its group child element were created in no namespace:
Element groupsEl = doc.createElement("groups");
Change this to
Element groupsEl = doc.createElementNS("http://someNamespace.ru", "groups");
Similarly, change
Element groupEl = doc.createElement("group");
to
Element groupEl = doc.createElementNS("http://someNamespace.ru","group");

How to get only first level nodes with Jsoup

I have following XML tree and I'm using Jsoup to parse it.
<?xml version="1.0" encoding="UTF-8" ?>
<nodes>
<node>
<name>NODE 1</name>
<value1>
<value1>NODE 1 VALUE 1</value1>
</value1>
<nodes>
<node>
<name>NODE 1 CHILD</name>
<value1>NODE 1 CHILD VALUE 1</value1>
</node>
</nodes>
</node>
<node>
<name>NODE 2</name>
<value1>NODE 2 VALUE 1</value1>
</node>
</nodes>
However when I try to get only first level of node-elements. It returns all elements including children nodes, and it is doing it correctly, because clearly child elements also match my query.
Elements elements = data.select("nodes > node");
Is there any way to get just first level node-elements without adding additional level information to XML data?

You can do something like this:
Elements elements = data.select("nodes").first().select("> node");
This will work as well:
Elements elements = data.select("> nodes > node");
but only if you've used Jsoup.parse(xml, "", Parser.xmlParser()) to parse the XML and the XML is indeed as you've specified in your question (<nodes> is the root element)

Java appending an element to XML document

I am trying to append an element to my xml document so it looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<students>
</students>
However, it ends up looking like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<students/>
This is the code I am using:
// results is the new XML document I created using DocumentBuilder.newDocument();
Element root = results.createElement("students");
results.appendChild(root);
How come it isn't looking like how I want it to?

Java dom is implemented based on the xml specification, and by definition: An element with no content is said to be empty : https://www.w3.org/TR/REC-xml/#sec-starttags.

Finding all valid xpath from xml

I am trying to write a program in java where in i can find all the xpath for the given xml.I found out the link on the internet xpath generator but it does not work when one element can repeat multipletimes for example if we have xml like the following :-
<?xml version="1.0" encoding="UTF-8"?>
<Report>
<Name>
<FirstName>A</FirstName>
<LastName>B</LastName>
<MiddleName>C</MiddleName>
</Name>
<Name>
<FirstName>D</FirstName>
<LastName>E</LastName>
<MiddleName>S</MiddleName>
</Name>
</Report>
It will produce xpaths :-
/Report/Name/firstname for both firstname nodes.
but the expected should be /Report/Name1/firstname and /Report/Name[2]/firstname
Any ideas?

I think you may have to do this yourself.
Using a SAX parser will make it straightforward. Just maintain a stack of the elements you encounter and a count so you can increment the indexes (/Report/Name[1], /Report/Name[2]) easily.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XML DOM parsing with Java - java

You are seeing the whitespace text surrounding the childnode "lots" <?xml version="1.0" encoding="UTF-8" standalone="no"?> <response type="success"> {txt1}<lots> <lot>32342</lot> <lot>52644</lot> </lots>{txt2} </response>

Actually I found the problem, I was using '\n' characters which was causing extra text nodes being parsed

Related

How to get the element from the xml using Xpath Java

Empty default XML namespace xmlns="" attribute being added?

How to get only first level nodes with Jsoup

Java appending an element to XML document

Finding all valid xpath from xml

Categories

Resources