I have a XML file with several <text> nodes. Each text node has attributes named "top" and "left" and has a child node named <textValue>. This XML file basically represents the coordinate positions of text in a PDF file that has been converted to XML using a PDF2HTML converter.
I want to parse the XML file using conditions such as:
1. Give me all the consecutive nodes in the XML file that have the same "top" attribute. - Here. I am trying to get all nodes that have the same "top" attribute, but may have different "left" attribute value.
Which XML parser supports these kinds of queries? I am familiar with basic DOM parser that just allows me to iterate through the elements and access its attribute value. Is there any XML parser that allows conditional queries to be written on top of it?
Thanks
You'll want to investigate XPath, which can do exactly this. Java provides robust, built-in support for this, and can operate on top of a DOM tree. See How to read XML using XPath in Java for one example on how to get started with this.
You are not looking for a parser, you need a query processor. Any XQuery-compatible processor can do that. Just use a pair of nested loop in your xquery.
Related
I have two columns in a spreadsheet.
One column is an XPath expression used to get a value from an existing XML document.
The other column is an XPath expression from which I need to create my XSLT/output XML document. The value grabbed from the first column will be the value placed in the second column's element.
So for example if the second column has the XPath /A/B/C, I would create
<A>
<B>
<C><xsl:value-of select = "corresponding value from 1st column"/></C>
</B>
</A>
If the next XPath is /A/B/D, I would add
<D><xsl:value-of select = "corresponding value from 1st column"/></D>
as a sibling of C.
I'm expected to create this output XML/XSLT structure by hand. However there are thousands of lines.
I'm looking for suggestions on how to do this programmatically in Java. I've never mixed Java/XML/XPath so maybe there are libraries that can help with this?
If it's an enormous undertaking I won't be able to justify it as opposed to just doing it by hand. If I can write something that gets me most of the way there I'd be happy.
Is this a pipe dream?
Sure, most Java XML libraries support retrieving DOM nodes via XPath quite painlessly. Often they use Jaxen as their backend, so make sure you have the Jaxen JAR in your class path. See XPath support in JDOM, DOM4J, and XOM.
I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.
I am attempting to create a script that wraps a Groovy class that will take the following arguments:
An input XML file to update.
An arbitrary snippet to insert into the input file (might not even be well-formed in an of itself; it would become part of a larger well-formed document).
XPath for the marker element (used for positioning the snippet in #2).
An action (insert before, insert after, append child).
Optional output XML file.
I'm at a loss for finding an API that will allow me to:
Find a node by XPath and
Cram XML from a String adjacent to the node.
Does anyone have some ideas for technologies that I can combine to achieve this effect? Small examples would be especially useful.
If the snippet is well-formed most DOM implementations I've seen will also support the non-standard DocumentFragment node type which allows you to inject dom nodes from string.
EDIT: Quick Google search throws up some JavaDocs: http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/DocumentFragment.html
IIRC the api works like this (pseudo code):
parent = find_parent_node_of_fragment(document);
fragment = document.createDocumentFragment();
fragment.appendXML("<my>xmlstring</my>");
parent.appendChild(fragment);
If you don't have this luxury or if your string is not well formed there is the option to inject CDATA.
If you can't make do with injecting CDATA (because you essentially want to affect nodes that follow, for instance the new node must become the parent of old nodes which will be enclosed in the new document), you could try an XSLT transformation.
I suspect what I was trying to do is non-trivial and would have required a much larger framework than what I had time for. I ended up abandoning this endeavor.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Best method to parse various custom XML documents in Java
HI all,
I am beginner to java. I hope the question I am asking may be easy one. My question is if I had an XML file and i want to parse it get the elements only with in specific tag.
for example if XML file looks like..
<date>2005-10-31</date>
<number>12345</number>
<purchased-by>
<name>My name</name>
<address>My address</address>
</purchased-by>
<order-items>
<item>
<code>687</code>
<type>CD</type>
<label>Some music</label>
</item>
<item>
<code>129851</code>
<type>DVD</type>
<label>Some video</label>
</item>
</order-items>
And from this XML I want to parse only the elements with in the tag name order-items.
Is there any generic way to do this..?Please let me know..
Thanks
As said in the comments, a short Google Search should bring you to the SUN examples on how to do this. Basically, you have two main XML parsing methods in Java :
SAX, where you use an handler to only grab what you want in your XML and ditch the rest
DOM, which parses your file all along, and allows you to grab all elements in a more tree-like fashion.
Another very useful XML parsing method, albeit a little more recent than these ones, and included in the JRE only since Java6, is StAX. StAX was conceived as a medial method between the tree-based of DOM and event-based approach of SAX. It is quite similar to SAX in the fact that parsing very large documents is easy, but in this case the application "pulls" info from the parser, instead of the parsing "pushing" events to the application. You can find more explanation on this subject here.
So, depending on what you want to achieve, you can use one of these approaches.
If you want to limit the parsing operation itself to the <order-items> element, then you'll have to use SAX. A SAX parser visits all elements of the input "file" (or stream) and you can define, that the parser shall ignore anything that is not <order-items> or any of its children. The result will be a Document containing these elements only.
If the xml documents are rather small and performance is not a limiting factor, then simply parse the whole document (that's a 2-liner) and use XPath expressions to select the correct nodes.
Use XPath. It lets you select nodes on their name and loads of other conditions. Very little code involved to setup.
IBM Example
It is a classic case for SAX. Register handler that receives tags and ignore all tags other than order-items.
Probably better way is to use Apache Digester but it is over-kill for your specific task.
You can use a DOM Parser to build a Document and then extract whatever elements you want using the getElementsByTagName method.
Here is some sample code to help you get started:
//parse file and build Document
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("file.xml"));
//get list of elements called order-items
NodeList orderItemsNodes = doc.getElementsByTagName("order-items");
//iterate over the elements
for(int i = 0 ; i <orderItemsNodes.getLength();i++ ){
Node orderItemNode = orderItemsNodes.item(i);
}
It honestly depends on how you are planning to use the item data. If you want to parse it into object and then work with it, I would use jaxb marshalling, but if you just want to strip string values from code, type, and label attributes of each item element, you may just consider using simple regex matching on the xml string - match content for each item tag, then match each attribute and extract its value.
I'm looking into how I can get values from specific XML nodes in an XML file that I have. In my application, I have the entire XML file in a string, and I want to grab the specific information from there. I've heard a little bit about DOM and SAX, but I don't exactly know where to start. Any help?
One of the easiest ways is to use xPath. Here's a tutorial.
You can either use XPath (example) or you can use DOM or SAX (as you mentioned) You can view my answer here (how to retrieve element value of XML using Java?) on SO.
Well, there is also Xstream
http://x-stream.github.io/index.html
It let´s you do both directions (object to xml, and xml to object).
Here is the "two minutes tutorial":
http://x-stream.github.io/tutorial.html