I want to store some fragments of an XML file in separate files.
It seems, there is no way to do it in a straight way:
Reading the chunks fails.
I always get the Exception
"javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed."
It only works when there is only ONE 'root' element (which is not
the root element in the normal sense).
I understand that XML with multiple 'roots' is not well-formed,
but it should be treated as a chunk.
Please, before suggesting some work-around-solutions, tell me:
Are XML chunks valid at all?
And IF so, can they be read out using standard JDK6 API?
Test code:
String testChunk1 = "<e1>text</e1>";
String testChunk2 = "<e1>text</e1><e2>text</e2>";
// the following doesn't work with 'testChunk2'
StringReader sr = new StringReader(testChunk1);
StringWriter sw = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
new StreamSource(sr), new StreamResult(sw));
System.out.println(sw.toString());
The W3C have been working towards defining a standard for XML fragment interchange. I'm mentioning it not because it's a solution to your problem, but it's definitely relevant to see that there's discussion of how to handle such things.
In the .NET world you can work with XML fragments and, for example, validate them against a schema. This suggests that it is worth searching for similar support in the Java libraries.
If you want to transform such fragments with XSLT, a very common approach is to put a wrapper element around them, which can then act as the root of the DOM.
While I suppose there must be some way, perhaps kludgy, to do what you want, I am not aware of any way to do it. The standard XML parsers expect well-formed XML, as you're discovering.
If you want to store your XML as a number of separate fragments in different files, then probably the best way to do this is to create your own Reader or InputStream that actually (behind the scenes) reads all of the fragments in order, and then provide that wrapped Reader or InputStream to the transformer. That way, the XML parser sees a single XML document but you can store it however you want.
If you do something like this, the fragments (except for the very first) cannot start with the standard XML header:
<?xml version="1.0" encoding="UTF-8" ?>
Please, before suggesting some work-around-solutions, tell me: Are XML chunks valid at all?
Not in their own right.
You can include them (served as XML external parsed entities) in other documents through methods such as an entity reference, and you can parse them as chunks into existing documents using methods such as DOM Level 3 LS's parseWithContext() (which Java doesn't give you, sorry), but they aren't documents so any interfaces that require a full document cannot accept them.
Transformer requires a full document as input because XSLT works on full documents, and would be confused by something that contained zero or more-than-one root element. The usual trick is to create a single root element by wrapping the document in start and end tags, but this does mean you can't have an XML declaration(*), as mentioned by Eddie.
(*: actually it's known as the ‘Text Declaration’ when included in an external parsed entity, but the syntax is exactly the same.)
Related
I need to create a copy of an xml file in memory using java and i need to edit this file in memory without affecting the original one. After making changes to this xml in memory i need to send it as an input to a function. What is the appropriate option .Please help me.
You can use java native api for xml parsing:
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
File file = new File("xml_file_name");
Document doc = builder.parse(file);
and then edit the Document in memory before sending it to your designated function.
Do what you wrote:
Read the file.
Write it to another file.
Edit so called another file.
Pass it to the function. Here you have to decide if it's better to pass a file or a path.
What you are looking for is ByteArrayOutputStream. http://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayOutputStream.html
This will allow you to write a byte array in to memory most xml lib will accept implementations of OutputStream.
Given the file is XML you should consider using loading it into the Document Object Model (DOM): https://docs.oracle.com/javase/tutorial/jaxp/dom/readingXML.html
That will make it easier for you to modify it and write it back as valid XML document.
I would only suggest loading it as bytes/characters if you're operating on it at a byte level. An example of when that might be appropriate is if you're making some character encoding translation (say UTF-16 -> UTF-8) or removing 'illegal' characters.
Code that tries to parse and modify XML in place usually becomes dreadfully bloated if it covers all valid XML files.
Unless you're a domain expert for XML, pick the parser of the shelf. It's pretty full of good libraries.
If the files may be large and your logic ameanable I would prefer to use an XML stream model such as SAX: https://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html
However I get the impression you're not experienced and non-experts tend to struggle with the event driven parsing model of SAX.
Try DOM first time out.
I need to output large amounts of data to an XML file using JAXB. My question is a follow-up question to:
Can JAXB Incrementally Marshall An Object?
In Blaise Doughan's answer he stated to first manually write the opening xml tag followed by the repeated elements (which must be root elements) and then the closing tag. His example outputted to the console (System.out) and not to a file. If a FileOutputStream was used instead; what is the best way to ensure the XML Declaration (<?xml version="1.0" encoding="UTF-8" standalone="true"?>) is written to the file before the opening xml tag? I would not think the best answer would be to manually write it as well.
I reviewed the following answer:
How to stream large Files using JAXB Marshaller?
However, I would think JAXB would have a solution to this problem without using an external interface to do so.
If your object model fits into memory and you have a single root object then JAXB can marshal if for you and write out the XML declaration.
If on the other hand you had a large number of objects that wouldn't fit into memory If referenced by a single root object then you need do things differently. You would need to start the document yourself using StAX or an OutputStream/Writer directly and then marshal the objects according then ending the document yourself. With this approach you need to ensure the declaration is written out (StAX will handle this for you).
http://blog.bdoughan.com/2012/08/handle-middle-of-xml-document-with-jaxb.html
I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.
I am attempting to create a script that wraps a Groovy class that will take the following arguments:
An input XML file to update.
An arbitrary snippet to insert into the input file (might not even be well-formed in an of itself; it would become part of a larger well-formed document).
XPath for the marker element (used for positioning the snippet in #2).
An action (insert before, insert after, append child).
Optional output XML file.
I'm at a loss for finding an API that will allow me to:
Find a node by XPath and
Cram XML from a String adjacent to the node.
Does anyone have some ideas for technologies that I can combine to achieve this effect? Small examples would be especially useful.
If the snippet is well-formed most DOM implementations I've seen will also support the non-standard DocumentFragment node type which allows you to inject dom nodes from string.
EDIT: Quick Google search throws up some JavaDocs: http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/DocumentFragment.html
IIRC the api works like this (pseudo code):
parent = find_parent_node_of_fragment(document);
fragment = document.createDocumentFragment();
fragment.appendXML("<my>xmlstring</my>");
parent.appendChild(fragment);
If you don't have this luxury or if your string is not well formed there is the option to inject CDATA.
If you can't make do with injecting CDATA (because you essentially want to affect nodes that follow, for instance the new node must become the parent of old nodes which will be enclosed in the new document), you could try an XSLT transformation.
I suspect what I was trying to do is non-trivial and would have required a much larger framework than what I had time for. I ended up abandoning this endeavor.
What I would really like is a streaming API that works sort of like StAX, and sort of like DOM/JDom.
It would be streaming in the sense that it would be very lazy and not read things in until needed. It would also be streaming in the sense that it would read everything forwards (but not backwards).
Here's what code that used such an API would look like.
URL url = ...
XMLStream xml = XXXFactory(url.inputStream()) ;
// process each <book> element in this document.
// the <book> element may have subnodes.
// You get a DOM/JDOM like tree rooted at the next <book>.
while (xml.hasContent()) {
XMLElement book = xml.getNextElement("book");
processBook(book);
}
Does anything like this exist?
You could do the following:
Scan the XML file using SAX or StAX and immediately serizalize everything back into a StringBuilder, i.e. create your own copy of the XML file.
If you encounter a endElement and you know you don't need the subtree you just parsed, clear the StringBuilder.
If you need it, you can build a DOM tree from the "copy" you created.
With this you can fall back to standard frameworks, one for conventional SAX parsing and one for conventional DOM building. Only the custom serizalization might require some hacking.
Also it helps if you need to know the tree boundaries in advance. (book elements in your example) Otherwise further processing would be required.
The only way to parse the part of the document without fully loading it to the memory is using the SAX parser.
Here are some official SUN examples of how to use SAX: http://java.sun.com/developer/codesamples/xml.html#sax