Xml Query in java? - java

I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.

Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.

Related

Android: DOM vs SAX vs XMLPullParser parsing?

I am parsing XML Document using SAX Parser.
I want to know which is better and faster to work with DOM, SAX Parser or XMLPullParser.
it depends on what are you doing , if you have very large files then you should use SAX parser since it will fire events and releasing them ,nothing is stored in memory ,and using SAX parser you can't access element in a random way there is no going back ! , but Dom let you access any part of the xml file since it keeps the whole file/document in memory . hope this answer you question .
if you want to know which fastest parser Xerces is going to be the fastest you'll find and SAX parser should give you more performance than Dom
The SAX XML Parser already available into the Android SDK.
http://developer.android.com/reference/org/xml/sax/XMLReader.html
so it is easy to access.
One aspect by which different kinds of parsers can be classified is whether they need to load the entire XML document into memory up front. Parsers based on the Document Object Model (DOM) do that: they parse XML documents into a tree structure, which can then be traversed in-memory to read its contents. This allows you to traverse a document in arbitrary order, and gives rise to some useful APIs that can be slapped on top of DOM, such as XPath, a path query language that has been specifically designed for extracting information from trees. Using DOM alone isn’t much of a benefit because its API is clunky and it’s expensive to always read everything into memory even if you don’t need to. Hence, DOM parsers are, in most cases, not the optimal choice to parse XML on Android.
There are class of parsers that don’t need to load a document up front. These parsers are stream-based, which means they process an XML document while still reading it from the data source (the Web or a disk). This implies
that you do not have random access to the XML tree as with DOM because no internal representation of the document is being maintained. Stream parsers can be further distinguished from each other. There are push parsers that, while streaming the document, will call back to your application when encountering a new element. SAX parsers, fall into this class. Then there are pull parsers, which are more like iterators or cursors: here the client must explicitly ask for the next element to be retrieved.
Source: Android in Practice.

Reading Huge XML File using StAX and XPath

The input file contains thousands of transactions in XML format which is around 10GB of size. The requirement is to pick each transaction XML based on the user input and send it to processing system.
The sample content of the file
<transactions>
<txn id="1">
<name> product 1</name>
<price>29.99</price>
</txn>
<txn id="2">
<name> product 2</name>
<price>59.59</price>
</txn>
</transactions>
The (technical)user is expected to give the input tag name like <txn>.
We would like to provide this solution to be more generic. The file content might be different and users can give a XPath expression like "//transactions/txn" to pick individual transactions.
There are few technical things we have to consider here
The file can be in a shared location or FTP
Since the file size is huge, we can't load the entire file in JVM
Can we use StAX parser for this scenario? It has to take XPath expression as a input and pick/select transaction XML.
Looking for suggestions. Thanks in advance.
If performance is an important factor, and/or the document size is large (both of which seem to be the case here), the difference between an event parser (like SAX or StAX) and the native Java XPath implementation is that the latter builds a W3C DOM Document prior to evaluating the XPath expression. [It's interesting to note that all Java Document Object Model implementations like the DOM or Axiom use an event processor (like SAX or StAX) to build the in-memory representation, so if you can ever get by with only the event processor you're saving both memory and the time it takes to build a DOM.]
As I mentioned, the XPath implementation in the JDK operates upon a W3C DOM Document. You can see this in the Java JDK source code implementation by looking at com.sun.org.apache.xpath.internal.jaxp.XPathImpl, where prior to the evaluate() method being called the parser must first parse the source:
Document document = getParser().parse( source );
After this your 10GB of XML will be represented in memory (plus whatever overhead) — probably not what you want. While you may want a more "generic" solution, both your example XPath and your XML markup seem relatively simple, so there doesn't seem to be a really strong justification for an XPath (except perhaps programming elegance). The same would be true for the XProc suggestion: this would also build a DOM. If you truly need a DOM you could use Axiom rather than the W3C DOM. Axiom has a much friendlier API and builds its DOM over StAX, so it's fast, and uses Jaxen for its XPath implementation. Jaxen requires some kind of DOM (W3C DOM, DOM4J, or JDOM). This will be true of all XPath implementations, so if you don't truly need XPath sticking with just the events parser would be recommended.
SAX is the old streaming API, with StAX newer, and a great deal faster. Either using the native JDK StAX implementation (javax.xml.stream) or the Woodstox StAX implementation (which is significantly faster, in my experience), I'd recommend creating a XML event filter that first matches on element type name (to capture your <txn> elements). This will create small bursts of events (element, attribute, text) that can be checked for your matching user values. Upon a suitable match you could either pull the necessary information from the events or pipe the bounded events to build a mini-DOM from them if you found the result was easier to navigate. But it sounds like that might be overkill if the markup is simple.
This would likely be the simplest, fastest possible approach and avoid the memory overhead of building a DOM. If you passed the names of the element and attribute to the filter (so that your matching algorithm is configurable) you could make it relatively generic.
Stax and xpath are very different things. Stax allows you to parse a streaming XML document in a forward direction only. Xpath allows parsing in both directions. Stax is a very fast streaming XML parser, but, if you want xpath, java has a separate library for that.
Take a look at this question for a very similar discussion: Is there any XPath processor for SAX model?
We regularly parse 1GB+ complex XML files by using a SAX parser which does exactly what you described: It extracts partial DOM trees that can be conveniently queried using XPATH.
I blogged about it here - It's using a SAX not a StAX parser, but may be worth a look at.
It's definitely a use case for XProc with a streaming and parallel processing implementation like QuiXProc (http://code.google.com/p/quixproc)
In this situation, you will have to use
<p:for-each>
<p:iteration-source select="//transactions/txn"/>
<!-- you processing on a small file -->
</p:for-each>
You can even wrapp each of the resulting transformation with a single line of XProc
<p:wrap-sequence wrapper="transactions"/>
Hope this helps
A fun solution for processing huge XML files >10GB.
Use ANTLR to create byte offsets for the parts of interest. This will save some memory compared with a DOM based approach.
Use Jaxb to read parts from byte position
Find details at the example of wikipedia dumps (17GB) in this SO answer https://stackoverflow.com/a/43367629/1485527
Streaming Transformations for XML (STX) might be what you need.
Do you need to process it fast or you need fast lookups in the data ? These requirements need different approach.
For fast reading of the whole data StAX will be OK.
If you need fast lookups than you could need to load it to some database, Berkeley DB XML e.g.

Code to insert arbitrary XML string into XML document by XPath

I am attempting to create a script that wraps a Groovy class that will take the following arguments:
An input XML file to update.
An arbitrary snippet to insert into the input file (might not even be well-formed in an of itself; it would become part of a larger well-formed document).
XPath for the marker element (used for positioning the snippet in #2).
An action (insert before, insert after, append child).
Optional output XML file.
I'm at a loss for finding an API that will allow me to:
Find a node by XPath and
Cram XML from a String adjacent to the node.
Does anyone have some ideas for technologies that I can combine to achieve this effect? Small examples would be especially useful.
If the snippet is well-formed most DOM implementations I've seen will also support the non-standard DocumentFragment node type which allows you to inject dom nodes from string.
EDIT: Quick Google search throws up some JavaDocs: http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/DocumentFragment.html
IIRC the api works like this (pseudo code):
parent = find_parent_node_of_fragment(document);
fragment = document.createDocumentFragment();
fragment.appendXML("<my>xmlstring</my>");
parent.appendChild(fragment);
If you don't have this luxury or if your string is not well formed there is the option to inject CDATA.
If you can't make do with injecting CDATA (because you essentially want to affect nodes that follow, for instance the new node must become the parent of old nodes which will be enclosed in the new document), you could try an XSLT transformation.
I suspect what I was trying to do is non-trivial and would have required a much larger framework than what I had time for. I ended up abandoning this endeavor.

How to access a subset of XML data in Java when the XML data is too large to fit in memory?

What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.
It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).
Here's what code that used such an API would look like.
URL url = ...
XMLStream xml = XXXFactory(url.inputStream()) ;
// process each <book> element in this document.
// the <book> element may have subnodes.
// You get a DOM/JDOM like tree rooted at the next <book>.
while (xml.hasContent()) {
XMLElement book = xml.getNextElement("book");
processBook(book);
}
Does anything like this exist?
You could do the following:
Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.
If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.
If you need it, you can build a DOM tree from the "copy" you created.
With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.
Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.
The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.
Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax

ignore some XML tags in SAX

I'm parsing an XML document using SAX in Java.
I'm working with the XML that describes research publications in different fields.
Among others there are elements like "abstract" that shortly describes what the reserch paper is about. The basic HTML formatting is allowed in that field, but I don't want the SAX to threat the HTML tags (like i,b,u,sub,sup an so on) as real XML tags and fire strartElement() and endElement() events on that elements.
Is there a way to tell to SAX to ignore some predefined set of XML tags and to pass theirs XML code as is to the characters() method?
I suspect not, without some work. I would perhaps slot in different SAX handlers as you encounter different elements, and push/pop them off a stack. So when you encounter an <abstract> element, you slot in a new handler that the SAX parser delegates to, and that is intelligent enough to process your HTML elements as you require. Not a trivial solution, I'm afraid.

Categories

Resources