XML Parsing in java [duplicate]

XML Parsing in java [duplicate] - java

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Best method to parse various custom XML documents in Java
HI all,
I am beginner to java. I hope the question I am asking may be easy one. My question is if I had an XML file and i want to parse it get the elements only with in specific tag.
for example if XML file looks like..
<date>2005-10-31</date>
<number>12345</number>
<purchased-by>
<name>My name</name>
<address>My address</address>
</purchased-by>
<order-items>
<item>
<code>687</code>
<type>CD</type>
<label>Some music</label>
</item>
<item>
<code>129851</code>
<type>DVD</type>
<label>Some video</label>
</item>
</order-items>
And from this XML I want to parse only the elements with in the tag name order-items.
Is there any generic way to do this..?Please let me know..
Thanks

As said in the comments, a short Google Search should bring you to the SUN examples on how to do this. Basically, you have two main XML parsing methods in Java :
SAX, where you use an handler to only grab what you want in your XML and ditch the rest
DOM, which parses your file all along, and allows you to grab all elements in a more tree-like fashion.
Another very useful XML parsing method, albeit a little more recent than these ones, and included in the JRE only since Java6, is StAX. StAX was conceived as a medial method between the tree-based of DOM and event-based approach of SAX. It is quite similar to SAX in the fact that parsing very large documents is easy, but in this case the application "pulls" info from the parser, instead of the parsing "pushing" events to the application. You can find more explanation on this subject here.
So, depending on what you want to achieve, you can use one of these approaches.

If you want to limit the parsing operation itself to the <order-items> element, then you'll have to use SAX. A SAX parser visits all elements of the input "file" (or stream) and you can define, that the parser shall ignore anything that is not <order-items> or any of its children. The result will be a Document containing these elements only.
If the xml documents are rather small and performance is not a limiting factor, then simply parse the whole document (that's a 2-liner) and use XPath expressions to select the correct nodes.

Use XPath. It lets you select nodes on their name and loads of other conditions. Very little code involved to setup.
IBM Example

It is a classic case for SAX. Register handler that receives tags and ignore all tags other than order-items.
Probably better way is to use Apache Digester but it is over-kill for your specific task.

You can use a DOM Parser to build a Document and then extract whatever elements you want using the getElementsByTagName method.
Here is some sample code to help you get started:
//parse file and build Document
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("file.xml"));
//get list of elements called order-items
NodeList orderItemsNodes = doc.getElementsByTagName("order-items");
//iterate over the elements
for(int i = 0 ; i <orderItemsNodes.getLength();i++ ){
Node orderItemNode = orderItemsNodes.item(i);
}

It honestly depends on how you are planning to use the item data. If you want to parse it into object and then work with it, I would use jaxb marshalling, but if you just want to strip string values from code, type, and label attributes of each item element, you may just consider using simple regex matching on the xml string - match content for each item tag, then match each attribute and extract its value.

Related

Dom4j vs JAXB for reading and updating large and complex XML files

I have an XML file with a stable tree structure and more than 5000 elements.
A fraction of it is below:
<Companies>
<Offices>
<RevenueInfo>
<TransactionId>14042015014606877</TransactionId>
<Company>
<Identification>
<GlobalId>25142400905</GlobalId>
<BranchId>373287734</BranchId>
<GeoId>874</GeoId>
<LastUpdated>2015-04-14T01:46:06.940</LastUpdated>
<RecordType>7785</RecordType>
</Identification>
<Info>
<DataEntry>
<EntryId>12345</EntryId>
</DataEntry>
<DataEntry>
<EntryId>34567</EntryId>
</DataEntry>
<DataEntry>
<EntryId>89076</EntryId>
</DataEntry>
<DataEntry>
<EntryId>13211</EntryId>
</DataEntry>
</Info>
...more elements
</Company>
</RevenueInfo>
</Offices>
</Companies>
I need to be able to update any of the values in the document based on user input and create a new XML file with the updated information. User will pass BranchId, the name of the element to update and it's number of order if multiple occurring element ( for example, for EntryId 12345 the user will pass 373287734 EntryId=1 010101 )
I've been looking at JAXB but it seems like a considerable effort to create the model classes for this kind of XML but it also seems like it would make printing to file and locating the element to update a lot easier.
Dom4j seems to have good performance results too, but not sure how parsing will be.
My question is, is JAXB the best approach in this case or can you suggest a better way to parse this type of XML?

In my experience JAXB only works well when the schema is simple and stable. In other cases you are better off using a generic tree model. The main generic models in the Java world are DOM, JDOM2, DOM4J, XOM, AXIOM. My own preferences are JDOM2 and XOM; DOM4J seems to me overcomplex, and somewhat old-fashioned. But it depends what you are looking for.
But then, the application you describe looks an ideal candidate for an "XML end-to-end" or XRX approach - XForms, XSLT, XQuery, XProc. You don't need Java at all.

Leaving performance and memory requirements aside, I would recommend trying XPath together with DOM4J (or JDOM, or even plain DOM). To select the company you could use an XPath expression like this:
"//Company[Identification/BranchId = '373287734']"
Then, using the returned company element as context, you can get the element to be updated with another XPath expression:
"//EntryId[position() = 1]"

XML Parsing: Parsing the entire xml for one field

I have a very large XML which I receive as input. From this XML I just need a single child element. Parsing the entire XML to retrieve just one element seems like an performance overkill. Are there any better approaches to resolve this issue?
One approach would be to use the DocumentBuilder API to parse the XML and then using XPath to retrieve the desired field. But the parse method will still unnecessarily parse the entire xml. Is there an overloaded parse method in any implementation of parser which takes the xpath and parses the XML only according to the XPath.

What you need is a SAX parser or a similar fast parser. SAX parsers do not parse the entire XML, they just parse the xml to the point until they find the element they are looking for.
You can read about SAX parsers in wikipedia's link. Also have a look at the java docs for SAX parser

Although there is no way around parsing for the proper treatment of your XML data, there is definitely a way around building an in-memory representation of the entire document. Java offers SAX parsing, which is event-based. You can implement an event handler for XML events, ignoring everything on the way to the content that you need, and stopping after retrieving the part that you are looking for.
Here is a tutorial from Oracle showing how to use SAX APIs to retrieve counts of individual tags without building a document in memory.
Since most XPath processors work with SAX as well, you could potentially feed events to an XPath processor, and look for the desired tag in that way, too. However, this may be an overkill for a situation when you need to fetch a single element.

XPath operates over the document object model. So you have to have a DOM in order to evaluate an XPath expression. Otherwise what would it validate against?
So XPath is out if you don't want to parse the document. Your other options are fast SAX parsing, where you ignore all SAX parsing events until you get to the element that you want, extract the text that you want, and then abandon the rest of the parsing process.
The other option is to go way simpler: use grep.

Xml Query in java?

I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.

Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.

Grabbing values in XML elements in Java

I'm looking into how I can get values from specific XML nodes in an XML file that I have. In my application, I have the entire XML file in a string, and I want to grab the specific information from there. I've heard a little bit about DOM and SAX, but I don't exactly know where to start. Any help?

One of the easiest ways is to use xPath. Here's a tutorial.

You can either use XPath (example) or you can use DOM or SAX (as you mentioned) You can view my answer here (how to retrieve element value of XML using Java?) on SO.

Well, there is also Xstream
http://x-stream.github.io/index.html
It let´s you do both directions (object to xml, and xml to object).
Here is the "two minutes tutorial":
http://x-stream.github.io/tutorial.html

How to access a subset of XML data in Java when the XML data is too large to fit in memory?

What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.
It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).
Here's what code that used such an API would look like.
URL url = ...
XMLStream xml = XXXFactory(url.inputStream()) ;
// process each <book> element in this document.
// the <book> element may have subnodes.
// You get a DOM/JDOM like tree rooted at the next <book>.
while (xml.hasContent()) {
XMLElement book = xml.getNextElement("book");
processBook(book);
}
Does anything like this exist?

You could do the following:
Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.
If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.
If you need it, you can build a DOM tree from the "copy" you created.
With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.
Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.

The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.
Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XML Parsing in java [duplicate] - java

Use XPath. It lets you select nodes on their name and loads of other conditions. Very little code involved to setup. IBM Example

It is a classic case for SAX. Register handler that receives tags and ignore all tags other than order-items. Probably better way is to use Apache Digester but it is over-kill for your specific task.

Related

Dom4j vs JAXB for reading and updating large and complex XML files

XML Parsing: Parsing the entire xml for one field

Xml Query in java?

Grabbing values in XML elements in Java

How to access a subset of XML data in Java when the XML data is too large to fit in memory?

Categories

Resources