I am attempting to create a script that wraps a Groovy class that will take the following arguments:
An input XML file to update.
An arbitrary snippet to insert into the input file (might not even be well-formed in an of itself; it would become part of a larger well-formed document).
XPath for the marker element (used for positioning the snippet in #2).
An action (insert before, insert after, append child).
Optional output XML file.
I'm at a loss for finding an API that will allow me to:
Find a node by XPath and
Cram XML from a String adjacent to the node.
Does anyone have some ideas for technologies that I can combine to achieve this effect? Small examples would be especially useful.
If the snippet is well-formed most DOM implementations I've seen will also support the non-standard DocumentFragment node type which allows you to inject dom nodes from string.
EDIT: Quick Google search throws up some JavaDocs: http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/DocumentFragment.html
IIRC the api works like this (pseudo code):
parent = find_parent_node_of_fragment(document);
fragment = document.createDocumentFragment();
fragment.appendXML("<my>xmlstring</my>");
parent.appendChild(fragment);
If you don't have this luxury or if your string is not well formed there is the option to inject CDATA.
If you can't make do with injecting CDATA (because you essentially want to affect nodes that follow, for instance the new node must become the parent of old nodes which will be enclosed in the new document), you could try an XSLT transformation.
I suspect what I was trying to do is non-trivial and would have required a much larger framework than what I had time for. I ended up abandoning this endeavor.
Related
I have a XML file with several <text> nodes. Each text node has attributes named "top" and "left" and has a child node named <textValue>. This XML file basically represents the coordinate positions of text in a PDF file that has been converted to XML using a PDF2HTML converter.
I want to parse the XML file using conditions such as:
1. Give me all the consecutive nodes in the XML file that have the same "top" attribute. - Here. I am trying to get all nodes that have the same "top" attribute, but may have different "left" attribute value.
Which XML parser supports these kinds of queries? I am familiar with basic DOM parser that just allows me to iterate through the elements and access its attribute value. Is there any XML parser that allows conditional queries to be written on top of it?
Thanks
You'll want to investigate XPath, which can do exactly this. Java provides robust, built-in support for this, and can operate on top of a DOM tree. See How to read XML using XPath in Java for one example on how to get started with this.
You are not looking for a parser, you need a query processor. Any XQuery-compatible processor can do that. Just use a pair of nested loop in your xquery.
I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Best method to parse various custom XML documents in Java
HI all,
I am beginner to java. I hope the question I am asking may be easy one. My question is if I had an XML file and i want to parse it get the elements only with in specific tag.
for example if XML file looks like..
<date>2005-10-31</date>
<number>12345</number>
<purchased-by>
<name>My name</name>
<address>My address</address>
</purchased-by>
<order-items>
<item>
<code>687</code>
<type>CD</type>
<label>Some music</label>
</item>
<item>
<code>129851</code>
<type>DVD</type>
<label>Some video</label>
</item>
</order-items>
And from this XML I want to parse only the elements with in the tag name order-items.
Is there any generic way to do this..?Please let me know..
Thanks
As said in the comments, a short Google Search should bring you to the SUN examples on how to do this. Basically, you have two main XML parsing methods in Java :
SAX, where you use an handler to only grab what you want in your XML and ditch the rest
DOM, which parses your file all along, and allows you to grab all elements in a more tree-like fashion.
Another very useful XML parsing method, albeit a little more recent than these ones, and included in the JRE only since Java6, is StAX. StAX was conceived as a medial method between the tree-based of DOM and event-based approach of SAX. It is quite similar to SAX in the fact that parsing very large documents is easy, but in this case the application "pulls" info from the parser, instead of the parsing "pushing" events to the application. You can find more explanation on this subject here.
So, depending on what you want to achieve, you can use one of these approaches.
If you want to limit the parsing operation itself to the <order-items> element, then you'll have to use SAX. A SAX parser visits all elements of the input "file" (or stream) and you can define, that the parser shall ignore anything that is not <order-items> or any of its children. The result will be a Document containing these elements only.
If the xml documents are rather small and performance is not a limiting factor, then simply parse the whole document (that's a 2-liner) and use XPath expressions to select the correct nodes.
Use XPath. It lets you select nodes on their name and loads of other conditions. Very little code involved to setup.
IBM Example
It is a classic case for SAX. Register handler that receives tags and ignore all tags other than order-items.
Probably better way is to use Apache Digester but it is over-kill for your specific task.
You can use a DOM Parser to build a Document and then extract whatever elements you want using the getElementsByTagName method.
Here is some sample code to help you get started:
//parse file and build Document
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("file.xml"));
//get list of elements called order-items
NodeList orderItemsNodes = doc.getElementsByTagName("order-items");
//iterate over the elements
for(int i = 0 ; i <orderItemsNodes.getLength();i++ ){
Node orderItemNode = orderItemsNodes.item(i);
}
It honestly depends on how you are planning to use the item data. If you want to parse it into object and then work with it, I would use jaxb marshalling, but if you just want to strip string values from code, type, and label attributes of each item element, you may just consider using simple regex matching on the xml string - match content for each item tag, then match each attribute and extract its value.
What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.
It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).
Here's what code that used such an API would look like.
URL url = ...
XMLStream xml = XXXFactory(url.inputStream()) ;
// process each <book> element in this document.
// the <book> element may have subnodes.
// You get a DOM/JDOM like tree rooted at the next <book>.
while (xml.hasContent()) {
XMLElement book = xml.getNextElement("book");
processBook(book);
}
Does anything like this exist?
You could do the following:
Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.
If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.
If you need it, you can build a DOM tree from the "copy" you created.
With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.
Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.
The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.
Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax
I want to store some fragments of an XML file in separate files.
It seems, there is no way to do it in a straight way:
Reading the chunks fails.
I always get the Exception
"javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed."
It only works when there is only ONE 'root' element (which is not
the root element in the normal sense).
I understand that XML with multiple 'roots' is not well-formed,
but it should be treated as a chunk.
Please, before suggesting some work-around-solutions, tell me:
Are XML chunks valid at all?
And IF so, can they be read out using standard JDK6 API?
Test code:
String testChunk1 = "<e1>text</e1>";
String testChunk2 = "<e1>text</e1><e2>text</e2>";
// the following doesn't work with 'testChunk2'
StringReader sr = new StringReader(testChunk1);
StringWriter sw = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
new StreamSource(sr), new StreamResult(sw));
System.out.println(sw.toString());
The W3C have been working towards defining a standard for XML fragment interchange. I'm mentioning it not because it's a solution to your problem, but it's definitely relevant to see that there's discussion of how to handle such things.
In the .NET world you can work with XML fragments and, for example, validate them against a schema. This suggests that it is worth searching for similar support in the Java libraries.
If you want to transform such fragments with XSLT, a very common approach is to put a wrapper element around them, which can then act as the root of the DOM.
While I suppose there must be some way, perhaps kludgy, to do what you want, I am not aware of any way to do it. The standard XML parsers expect well-formed XML, as you're discovering.
If you want to store your XML as a number of separate fragments in different files, then probably the best way to do this is to create your own Reader or InputStream that actually (behind the scenes) reads all of the fragments in order, and then provide that wrapped Reader or InputStream to the transformer. That way, the XML parser sees a single XML document but you can store it however you want.
If you do something like this, the fragments (except for the very first) cannot start with the standard XML header:
<?xml version="1.0" encoding="UTF-8" ?>
Please, before suggesting some work-around-solutions, tell me: Are XML chunks valid at all?
Not in their own right.
You can include them (served as XML external parsed entities) in other documents through methods such as an entity reference, and you can parse them as chunks into existing documents using methods such as DOM Level 3 LS's parseWithContext() (which Java doesn't give you, sorry), but they aren't documents so any interfaces that require a full document cannot accept them.
Transformer requires a full document as input because XSLT works on full documents, and would be confused by something that contained zero or more-than-one root element. The usual trick is to create a single root element by wrapping the document in start and end tags, but this does mean you can't have an XML declaration(*), as mentioned by Eddie.
(*: actually it's known as the ‘Text Declaration’ when included in an external parsed entity, but the syntax is exactly the same.)