Java DOM XML Parsing How to walk through multiple node levels - java

I have the following xml structure
<clinic>
<category>
<employees>
<medic>
<medic_details>
<medic_name />
<medic_address />
</medic_details>
<pacients>
<pacient>
<pacient_details>
<pacient_name> ...
<pacient_address> ...
</pacient_details>
<diagnostic>
<disease>
<disease_name>Disease</disease_name>
<treatment>Treatment</treatment>
</disease>
<disease>
<disease_name>Disease</disease_name>
<treatment>Treatment</treatment>
</disease>
</diagnostic>
</pacient>
</pacients>
<medic>
</employees>
</category>
</clinic>
I have a JTextArea where I want to show information from the xml file. For example, for showing each medic, with its name, adress, and treating pacients with their respective names, i have the following code:
NodeList medicNList = doc.getElementsByTagName("medic");
for (int temp = 0; temp < medicNList.getLength(); temp++) {
Node medicNode = medicNList.item(temp);
Element eElement = (Element) medicNode;
area.append("\n");
area.append("Medic Name : " + getTagValue("medic_name", eElement) + "\n");
area.append("Medic Address : " + getTagValue("medic_address", eElement) + "\n");
area.append("\n");
area.append("Pacients : \n");
area.append("Pacient Name : " + getTagValue("pacient_name", eElement) + "\n");
area.append("Pacient Name : " + getTagValue("pacient_address", eElement) + "\n");
}
My question is, if i want to have more than 1 disease per pacient, how do I display all of the diseases for each pacient? I don't know how to "walk" to the diagnostic node for each pacient and showing the relevant data inside

Your code looks incorrect as it is. You currently have multiple pacient (patients) per medic so you should be iterating the list of patients for each medic.
Then iterate diseases for each patient. You need to use the getElementsByTagName method for each nesting in the XML. Plus you need to skip over the pluralised elements such as <pacients>.
I would suggest you use an XPath library instead as it can make the code a lot easier to read. There are plenty of good ones out there. I would recommend jaxen

I would give htmlcleaner a try.
HTMLCleaner is Java library used to safely parse and transform any HTML found on web to well-formed XML. It is designed to be small, fast, flexible and independant. HtmlCleaner may be used in java code, as command line tool or as Ant task. Result of parsing is lightweight document object model which can easily be transformed to standards like DOM or JDom, or serialized to XML output in various ways (compact, pretty printed and so on).
You can use XPath with htmlcleaner to get contents within xml tags.Here is a nice
example Xpath Example

Related

How to extract xml tag value without using the tag name in java?

I am using java.I have an xml file which looks like this:
<?xml version="1.0"?>
<personaldetails>
<phno>1553294232</phno>
<email>
<official>xya#gmail.com</official>
<personal>bk#yahoo.com</personal>
</email>
</personaldetails>
Now,I need to check each of the tag values for its type using specific conditions,and put them in separate files.
For example,in the above file,i write conditions like 10 digits equals phone number,
something in the format of xxx#yy.com is an email..
So,what i need to do is i need to extract the tag values in each tag and if it matches a certain condition,it is put in the first text file,if not in the second text file.
in that case,the first text file will contain:
1553294232
xya#gmail.com
bk#yahoo.com
and the rest of the values in the second file.
i just don't know how to extract the tag values without using the tag name.(or without using GetElementsByTagName).
i mean this code should extract the email bk#yahoo.com even if i give <mailing> instead of <personal> tag.It should not depend on the tag name.
Hope i am not confusing.I am new to java using xml.So,pardon me if my question is silly.
Please Help.
Seems like a typical use case for XPath
XPath allows you to query XML in a very flexible way.
This tutorial could help:
http://www.javabeat.net/2009/03/how-to-query-xml-using-xpath/
If you're using Java script, which could to be the case, since you mention getElementsByTagName(), you could just use JQuery selectors, it will give you a consistent behavior across browsers, and JQuery library is useful for a lot of other things, if you are not using it already... http://api.jquery.com/category/selectors/
Here for example is information on this:
http://www.switchonthecode.com/tutorials/xml-parsing-with-jquery
Since you don't know your element name, I would suggest creating a DOM tree and iterating through it. As and when you get a element, you would try to match it against your ruleset (and I would suggest using regex for this purpose) and then write it to your a file.
This would be a sample structure to help you get started, but you would need to modify it based on your requirement:
public void parseXML(){
try{
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc;
doc = documentBuilder.parse(new File("test.xml"));
getData(null, doc.getDocumentElement());
}catch(Exception exe){
exe.printStackTrace();
}
}
private void getData(Node parentNode, Node node){
switch(node.getNodeType()){
case Node.ELEMENT_NODE:{
if(node.hasChildNodes()){
NodeList list = node.getChildNodes();
int size = list.getLength();
for(int index = 0; index < size; index++){
getData(node, list.item(index));
}
}
break;
}
case Node.TEXT_NODE:{
String data = node.getNodeValue();
if(data.trim().length() > 0){
/*
* Here you need to check the data against your ruleset and perform your operation
*/
System.out.println(parentNode.getNodeName()+" :: "+node.getNodeValue());
}
break;
}
}
}
You might want to look at the Chain of Responsibility design pattern to design your ruleset.

extracting xml node(not text but complete xml ) and with other test nodes from xml file using SAX parser in java

I have to read from large xml files each ranging ~500MB. The batch processes typically 500 such files in each run. I have to extract text nodes from it and at the same time extract xml nodes from it. I used xpath DOM in java for easy of use but that doesn't work due to memory issues as i have limited resources.
I intent to use SAX or stax in java now - the text nodes can be easily extracted but i don't know how to extract xml nodes from xml using sax.
a sample:
<?xml version="1.0"?>
<Library>
<Book name = "ABC">
<Author>John</Author>
<PrintingCompanyDT><Printer>Sam</Printer><Printmachine>Laser</Printmachine>
<AssocPrint>Oreilly</AssocPrint> </PrintingCompanyDT>
</Book>
<Book name = "123">
<Author>Mason</Author>
<PrintingCompanyDTv<Printervkelly</Printer><Printmachine>DOTPrint</Printmachine>
<AssocPrint>Oxford</AssocPrint> </PrintingCompanyDT>
</Book>
</Library>
The expected result:
1)Book: ABC:
Author:John
PrintCompany Detail XML:
<PrintingCompanyDT>
<Printer>Sam</Printer>
<Printmachine>Laser</Printmachine>
<AssocPrint>Oreilly</AssocPrint>
</PrintingCompanyDT>
2) Book: 123
Author : Mason
PrintCompany Detail XML:
<PrintingCompanyDT>
<Printer>kelly</Printer>
<Printmachine>DOTPrint</Printmachine>
<AssocPrint>Oxford</AssocPrint>
</PrintingCompanyDT>
If i try in the regular way of appending characters in public void characters(char ch[], int start, int length) method
I get the below
1)Book: ABC:
Author:John
PrintCompany Detail XML :
Sam
Laser
Oreilly
exactly the content and spaces.
Can somebody suggest how to extract an xml node as it is from a xml file through SAX or StaX parser in java.
I'd be tempted to use XOM for this sort of task rather than SAX or StAX directly. XOM is a tree-based representation similar to DOM or JDOM but it has support for processing XML "twigs" in a kind of semi-streaming fashion, ideal for your kind of case where you have many similar elements that can be processed independently of one another. Also every Node has a toXML method that prints the node as XML.
import nu.xom.*;
public class LibraryProcessor extends NodeFactory {
private Nodes empty = new Nodes();
private bookNum = 0;
/** Called for each closing tag in the XML */
public Nodes finishMakingElement(Element element) {
if("Book".equals(element.getLocalName())) {
bookNum++;
// process the complete Book element ...
processBook(element);
// ... and throw it away
return empty;
} else {
// process other elements (except Book) in the normal way
return super.finishMakingElement(element);
}
}
private void processBook(Element book) {
System.out.println(bookNum + ": " +
book.getAttributeValue("name"));
System.out.println("Author: " +
book.getFirstChildElement("Author").getValue());
System.out.println("PrintCompany Detail XML: " +
book.getFirstChildElement("PrintingCompanyDT").toXML());
}
public static void main(String[] args) throws Exception {
Builder builder = new Builder(new LibraryProcessor());
builder.build(new File(args[0]));
}
}
This will work its way through the XML document, calling processBook once for each Book element in turn. Within processBook you have access to the whole Book XML tree as XOM nodes, but without having to load the entire file into memory in one go - the best of both worlds. The "Factories, Filters, Subclassing, and Streaming" section of the XOM tutorial has more detail on this technique.
This example just shows the most basic bits of the XOM API, but it also provides powerful XPath support if you need to do more complex processing. For example, you can directly access the PrintMachine element within processBook using
Element machine = (Element)book.query("PrintingCompanyDT/PrintMachine").get(0);
or if the structure is not so regular, for example if PrintingCompanyDT is sometimes a direct child of Book and sometimes deeper (e.g. a grandchild) then you can use a query like
Element printingCompanyDT = (Element)book.query(".//PrintingCompanyDT").get(0);
(// being the XPath notation for finding descendants at any level, as opposed to / which looks only for direct children).

Using XPath to get child elements

I'm trying to use DocumentBuilder and XPath to parse an XML document with structure like:
<questionnaire>
<item>
<question>How have you been?</question>
<response>Great</response>
<response>Good</response>
<response>So-so</response>
<response>Bad</response>
<response>Rather not answer</response>
</item>
</questionnaire>
To access question I've done this (which works):
expression = "/questionnaire/item[" + i + "]/question";
setQuestion(xmlReader.read(expression, XPathConstants.STRING).toString());
Now I need some way to create a list of string based on the response items. The number of responses is variable so one question could have any number of responses. Does anyone know how to do this?
Thanks
Something like this won't do it? You have to note that xmlReader.read probably return a collection in that case.
expression = "/questionnaire/item[" + i + "]/response";
setResponse(xmlReader.read(expression, XPathConstants.STRING));

Parsing XML Textlist

I'm trying to parse a XML file. I'm able to parse normal text node but how do I parse a textlist? I'm getting the firstChild of the textlist thats sadly all. If I try to do
elem.nextSibling();
it is always null which can't be, I know there are two other values left.
Does someone can provide me an example maybe?
Thanks!
XML example
<viewentry position="1" unid="7125D090682C3C3EC1257671002F66F4" noteid="962" siblings="65">
<entrydata columnnumber="0" name="Categories">
<textlist>
<text>Lore1</text>
<text>Lore2</text>
</textlist>
</entrydata>
<entrydata columnnumber="1" name="CuttedSubjects">
<text>
LoreImpsum....
</text>
</entrydata>
<entrydata columnnumber="2" name="$35">
<datetime>20091117T094224,57+01</datetime>
</entrydata>
</viewentry>
I assume you're using a DOM parser.
The first child of the <textlist> node is not the first <text> node but rather the raw text that contains the whitespace and carriage return between the end of <textlist> and the beginning of <text>. The output of the following snippet (using org.w3c.dom.* and javax.xml.parsers.*)
Node grandpa = document.getElementsByTagName("textlist").item(0);
Node daddy = grandpa.getFirstChild();
while (daddy != null) {
System.out.println(">>> " + daddy.getNodeName());
Node child = daddy.getFirstChild();
if (child != null)
System.out.println(">>>>>>>> " + child.getTextContent());
daddy = daddy.getNextSibling();
}
shows that <textlist> has five children: the two <text> elements and the three raw text pieces before, between and after them.
>>> #text
>>> text
>>>>>>>> Lore1
>>> #text
>>> text
>>>>>>>> Lore2
>>> #text
When parsing XML this way, it's easy to overlook that the structure of the DOM-tree can be complicated. You can quickly end up iterating over a NodeList in the wrong generation, and then you get nulls where you would expect siblings. This is one of the reasons why people came up with all kinds of xml-to-java stuff, from homegrown XMLHelper classes to XPath expressions to Digester to JAXB, so you need to go down to the DOM level only when you absolutely have to.

How do I use XMLUnit to compare only certain parts of files?

How do I use XMLUnit to compare 2 or more nodes (of the same name) in 2 different files?
I have 2 XML files that look like this:
<SearchResults>
<result type="header"> ...ignore this.... </result>
<result type="secondheader">...ignore this....</result>
<result>....data1....</result>
<result>....data2....</result>
<result>....data3....</result>
<result type="footer">...ignore this....</result>
</SearchResults>
And here is my method that I use to compare so far. The problem is that I do not want to compare the parts of the xml that have a result tag with any kind of attribute flag on them. How can I do this?
public void compareXMLEqualityToLastTest() throws Exception {
System.out.println("Checking differences.");
File firstFile = new File("C:\\Eclipse\\workspace\\Tests\\log\\" +
"Test_2.xml");
String file1sub = readXMLFromFile(firstFile);
File secondFile= new File("C:\\Eclipse\\workspace\\Tests\\log\\" +
"Test_1.xml");
String file2sub = readXMLFromFile(secondFile);
assertXMLNotEqual("files are equal", file1sub, file2sub );
assertXMLEqual("files are not equal", file1sub, file2sub );
}
I found a vague suggestion to use a ElementQualifier on page 5 of the XMLUnit manual, but I don't understand it yet. I wouldn't know how to tell it which nodes to compare.
Diff myDiff = new Diff(file1sub, file2sub);
myDiff.overrideElementQualifier(new ElementNameAndTextQualifier());
assertXMLEqual("But they are equal when an ElementQualifier controls " +
"which test element is compared with each control element", myDiff, true);
Should I follow that route and add this class to my project?
org.apache.wink.test.diff.DiffWithAttributeQualifier
The thought crossed my mind to put the nodes into a NodeList and then use org.custommonkey.xmlunit.compareNodeList but that feels like that is a hack. Is there a better way than that?
Wouldn't it be easier to use XPath Tests? I imagine something like this to work
//select all elements which don't have a type attribute
String xpath = "//result[not(#type)]";
assertXpathsEqual(xpath, file1sub, xpath, file1sub2)

Categories

Resources