I need to create a copy of an xml file in memory using java and i need to edit this file in memory without affecting the original one. After making changes to this xml in memory i need to send it as an input to a function. What is the appropriate option .Please help me.
You can use java native api for xml parsing:
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
File file = new File("xml_file_name");
Document doc = builder.parse(file);
and then edit the Document in memory before sending it to your designated function.
Do what you wrote:
Read the file.
Write it to another file.
Edit so called another file.
Pass it to the function. Here you have to decide if it's better to pass a file or a path.
What you are looking for is ByteArrayOutputStream. http://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayOutputStream.html
This will allow you to write a byte array in to memory most xml lib will accept implementations of OutputStream.
Given the file is XML you should consider using loading it into the Document Object Model (DOM): https://docs.oracle.com/javase/tutorial/jaxp/dom/readingXML.html
That will make it easier for you to modify it and write it back as valid XML document.
I would only suggest loading it as bytes/characters if you're operating on it at a byte level. An example of when that might be appropriate is if you're making some character encoding translation (say UTF-16 -> UTF-8) or removing 'illegal' characters.
Code that tries to parse and modify XML in place usually becomes dreadfully bloated if it covers all valid XML files.
Unless you're a domain expert for XML, pick the parser of the shelf. It's pretty full of good libraries.
If the files may be large and your logic ameanable I would prefer to use an XML stream model such as SAX: https://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html
However I get the impression you're not experienced and non-experts tend to struggle with the event driven parsing model of SAX.
Try DOM first time out.
Related
I have a java program which queries a table which has millions of records and generates a xml with each record as node.
The challenge is that the program is running out of heap memory. I have allocated 2GB heap for the program.
I am looking for alternate approaches of creating such huge xml.
Can we write out partial DOM object to file and release the memory?
For eg, create 100 nodes in DOM object, write to file, release the memory, then create next 100 nodes in DOM etc
Code to write a node to file
DOMSource source = new DOMSource(node);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
But how do I release the DOM memory after writing the nodes to file?
Why do you need to generate a DOM? Try to write the XML directly. The most convenient API for outputting XML from Java is the StAX XMLStreamWriter interface. There are a number of implementations of XMLStreamWriter that generate lexical (serialized) XML, including the Saxon serializer which gives you considerable control over the way in which it is serialized (e.g. indentation and encoding) if you need it.
I would use a simple OutputStreamWriter and format the xml by myself, you don't need to create a huge dom structure. I think this is the fastest way.
Of course depends on how much xml structure you want to accomplish. If one table row corresponds to one xml line, this should be the fastest way to do it.
For processing a huge document, SAX is often preferred precisely because it keeps in memory only what you have explicitly decided to keep in memory -- which means you can use a specialized, and hence smaller, data model. For tasks such as this one, where you have no need to crossreference different parts of the document, you may not need any data model at all and can just generate SAX events directly from the input data and feed those into the serializer.
(StAX is pretty much equivalent in this regard. I usually prefer to stay with SAX since it's part of the JAXP API package and should be present in just about every Java environment at this point, but StAX may be a bit easier to work with.)
I need to output large amounts of data to an XML file using JAXB. My question is a follow-up question to:
Can JAXB Incrementally Marshall An Object?
In Blaise Doughan's answer he stated to first manually write the opening xml tag followed by the repeated elements (which must be root elements) and then the closing tag. His example outputted to the console (System.out) and not to a file. If a FileOutputStream was used instead; what is the best way to ensure the XML Declaration (<?xml version="1.0" encoding="UTF-8" standalone="true"?>) is written to the file before the opening xml tag? I would not think the best answer would be to manually write it as well.
I reviewed the following answer:
How to stream large Files using JAXB Marshaller?
However, I would think JAXB would have a solution to this problem without using an external interface to do so.
If your object model fits into memory and you have a single root object then JAXB can marshal if for you and write out the XML declaration.
If on the other hand you had a large number of objects that wouldn't fit into memory If referenced by a single root object then you need do things differently. You would need to start the document yourself using StAX or an OutputStream/Writer directly and then marshal the objects according then ending the document yourself. With this approach you need to ensure the declaration is written out (StAX will handle this for you).
http://blog.bdoughan.com/2012/08/handle-middle-of-xml-document-with-jaxb.html
I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.
I need a xml parser to parse a file that is approximately 1.8 gb.
So the parser should not load all the file to memory.
Any suggestions?
Aside the recommended SAX parsing, you could use the StAX API (kind of a SAX evolution), included in the JDK (package javax.xml.stream ).
StAX Project Home: http://stax.codehaus.org/Home
Brief introduction: http://www.xml.com/pub/a/2003/09/17/stax.html
Javadoc: https://docs.oracle.com/javase/8/docs/api/javax/xml/stream/package-summary.html
Use a SAX based parser that presents you with the contents of the document in a stream of events.
StAX API is easier to deal with compared to SAX. Here is a short tutorial
Try VTD-XML. I've found it to be more performant, and more importantly, easier to use than SAX.
As others have said, use a SAX parser, as it is a streaming parser. Using the various events, you extract your information as necessary and then, on the fly store it someplace else (database, another file, what have you).
You can even store it in memory if you truly just need a minor subset, or if you're simply summarizing the file. Depends on the use case of course.
If you're spooling to a DB, make sure you take some care to make your process restartable or whatever. A lot can happen in 1.8GB that can fail in the middle.
Stream the file into a SAX parser and read it into memory in chunks.
SAX gives you a lot of control and being event-driven makes sense. The api is a little hard to get a grip on, you have to pay attention to some things like when the characters() method is called, but the basic idea is you write a content handler that gets called when the start and end of each xml element is read. So you can keep track of the current xpath in the document, identify which paths have which data you're interested in, and identify which path marks the end of a chunk that you want to save or hand off or otherwise process.
Use almost any SAX Parser to stream the file a bit at a time.
I had a similar problem - I had to read a whole XML file and create a data structure in memory. On this data structure (the whole thing had to be loaded) I had to do various operations. A lot of the XML elements contained text (which I had to output in my output file, but wasn't important for the algorithm).
FIrstly, as suggested here, I used SAX to parse the file and build up my data structure. My file was 4GB and I had an 8GB machine so I figured maybe 3GB of the file was just text, and java.lang.String would probably need 6GB for those text using its UTF-16.
If the JVM takes up more space than the computer has physical RAM, then the machine will swap. Doing a mark+sweep garbage collection will result in the pages getting accessed in a random-order manner and also objects getting moved from one object pool to another, which basically kills the machine.
So I decided to write all my strings out to disk in a file (the FS can obviously handle sequential-write of the 3GB just fine, and when reading it in the OS will use available memory for a file-system cache; there might still be random-access reads but fewer than a GC in java). I created a little helper class which you are more than welcome to download if it helps you: StringsFile javadoc | Download ZIP.
StringsFile file = new StringsFile();
StringInFile str = file.newString("abc"); // writes string to file
System.out.println("str is: " + str.toString()); // fetches string from file
+1 for StaX. It's easier to use than SaX because you don't need to write callbacks (you essentially just loop over all elements of the while until you're done) and it has (AFAIK) no limit as to the size of the files it can process.
I want to store some fragments of an XML file in separate files.
It seems, there is no way to do it in a straight way:
Reading the chunks fails.
I always get the Exception
"javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed."
It only works when there is only ONE 'root' element (which is not
the root element in the normal sense).
I understand that XML with multiple 'roots' is not well-formed,
but it should be treated as a chunk.
Please, before suggesting some work-around-solutions, tell me:
Are XML chunks valid at all?
And IF so, can they be read out using standard JDK6 API?
Test code:
String testChunk1 = "<e1>text</e1>";
String testChunk2 = "<e1>text</e1><e2>text</e2>";
// the following doesn't work with 'testChunk2'
StringReader sr = new StringReader(testChunk1);
StringWriter sw = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
new StreamSource(sr), new StreamResult(sw));
System.out.println(sw.toString());
The W3C have been working towards defining a standard for XML fragment interchange. I'm mentioning it not because it's a solution to your problem, but it's definitely relevant to see that there's discussion of how to handle such things.
In the .NET world you can work with XML fragments and, for example, validate them against a schema. This suggests that it is worth searching for similar support in the Java libraries.
If you want to transform such fragments with XSLT, a very common approach is to put a wrapper element around them, which can then act as the root of the DOM.
While I suppose there must be some way, perhaps kludgy, to do what you want, I am not aware of any way to do it. The standard XML parsers expect well-formed XML, as you're discovering.
If you want to store your XML as a number of separate fragments in different files, then probably the best way to do this is to create your own Reader or InputStream that actually (behind the scenes) reads all of the fragments in order, and then provide that wrapped Reader or InputStream to the transformer. That way, the XML parser sees a single XML document but you can store it however you want.
If you do something like this, the fragments (except for the very first) cannot start with the standard XML header:
<?xml version="1.0" encoding="UTF-8" ?>
Please, before suggesting some work-around-solutions, tell me: Are XML chunks valid at all?
Not in their own right.
You can include them (served as XML external parsed entities) in other documents through methods such as an entity reference, and you can parse them as chunks into existing documents using methods such as DOM Level 3 LS's parseWithContext() (which Java doesn't give you, sorry), but they aren't documents so any interfaces that require a full document cannot accept them.
Transformer requires a full document as input because XSLT works on full documents, and would be confused by something that contained zero or more-than-one root element. The usual trick is to create a single root element by wrapping the document in start and end tags, but this does mean you can't have an XML declaration(*), as mentioned by Eddie.
(*: actually it's known as the ‘Text Declaration’ when included in an external parsed entity, but the syntax is exactly the same.)