I need to modify a single information in XML file . XML file is about 100 lines . For modifying a single element in whole XML file what would be the most memory efficient way in JAVA ?
JAXB is better ?
Simple SAX parser ?
or any other way .....Kindly suggest .....
SAX parser gives more control on parsing and is faster than DOM parser. JAXB will be easy from the sense of less code writing. XStream is also another option but that is similar to JAXB which is a high level API, so it has some overhead task so it will be bit slower then SAX.
I will not suggest for direct string manipulation (applying String.indexOf() and String.replace()) although would be fastest way for updating a unique tag in the XML but its risky as your XML might not be valid and if xml structure is not simple then there will be risk of updating wrong level tag :-)
Therefore, SAX parser looks the best bet to me.
Your files are not big. The memory used to hold a 100-line XML file costs about as much as 5 milliseconds of a programmer's time. I would question your requirement: why do you need to do it in "the most memory efficient way"? I would use XSLT or JDOM2, unless there is clear quantified information that this will not meet externally-imposed performance requirement, which cannot be solved by buying a bit more memory.
Related
I have a requirement where I can have 100 mb or bigger xml file having list of companies for which I need to add each company into a table from that xml file.
I was thinking of using SAX parser however I was also thinking of using stax parser. Can someone pls help me know which one should I use.
thx
StAX has a much more easier to use API, so I think it is a better choice. SAX has a low-level push API, which is not very nice to use (e.g. working with char[]). StAX has a much nicer to use pull API.
Another potential advantage: using StAX you don't have read the whole document, you may stop if you have what you needed.
There is a nice - though quite old - comparison of the Java XML parsing APIs found here.
Using StAX will allow you to minimize the amount of data kept in memory to only the most recently parsed record. Once you insert that record into your table, you no longer need to keep it in memory.
If you use SAX you would (likely) have to parse the entire xml content into memory before inserting records into your table. While it would be possible to insert as you go (when encountering the closing element for a record), that is more complicated with SAX than StAX.
I have a question which make me think about how to improve speed and memory of system.
I will describe it by example, I have this file which have some string:
<e>Customer</e>
<a1>Customer Id</a1>
<a2>Customer Name</a2>
<e>Person</e>
It similar to xml file.
Now, my solution is when I read <e>Customer</e>, I will read from that to a nearest tag and then, substring from <e>Customer</e> to a nearest tag.
It make the system need to process so much. I used only regular expression to do it. I thought I will do the same as real compiler which have some phases (lexical analysis, parser).
Any ideas?
Thanks in advance!
If you really don't want to use one of the free and reliable xml parsers then a truly fast solution will almost certainly involve a state machine.
See this How to create a simple state machine in java question for a good start.
Please be sure to have a very good reason for taking this route.
Regular expressions are not the right tool for parsing complex structures like this. Since your file looks a lot like XML, it may make sense to add what's missing to make it XML (i.e. the header), and feed the result to an XML parser.
XML parsers are optimized for processing large volumes of data quickly (especially the SAX kind). You should see a significant improvement in performance if you switch to parsing XML from processing large volumes of text with regular expressions.
Just don't invest the time into an XML lexer/parser (its not worth it) and use what is allready out there.
For example http://www.mkyong.com/tutorials/java-xml-tutorials/ is a good tutorial,just use google.
I have a big xml file that could be downloaded from the internet. To parse it I tried using the DOM parser however it doesn't let me skip certain tags as it gives me an error. Is there a way around this? If i understood correctly the SAX parser allows you to skip tags whilst the DOM doesn't. Can someone kindly clarify this fact, as if that is the case, I can't understand what is the advantage of a DOM parser. Thanks in advance.
DOM was designed as a language-independent object model to hold any XML data, and as such is a large and complex system. It suits well the two-phase approach of first loading an XML document in, then performing various operations on it.
SAX, on the other hand, was designed as a fairly light-weight system using a single-phase approach. With SAX, user-specified operations are performed as the document is loaded. Some applications use SAX to generate a smaller object model, with uninteresting information filtered out, which is then processed similarly to DOM.
Note that although DOM and SAX are the well-known "standard" XML APIs, there are plenty of others available, and sometimes a particular application may be better off using a non-standard API. With XML the important bit is always the data; code can be rewritten.
Some quick points:
SAX is faster than DOM.
SAX is good for large documents because
it takes comparitively less memory than Dom.
SAX takes less time
to read a document where as Dom takes more time.
With SAX we can
access data but we can't modify data.With Dom we can modify data.
We can stop the SAX parsing when ever and where ever you want.
SAX is sequential parsing but with DOM we can move to back also.
To parse machine generated code SAX is better.To parse human
readable documents DOM is useful.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Looping over a large XML file
What is a better way to parse large XML data which is essentially a collection of XML data in Java and Java based frameworks? We get data from a webservice call which runs into few MB (typically 25MB+). This data essentially corresponds to an unmarshalled list of Objects. My objective is to create the list of objects from the XML.
I tried using the SAX parser and it takes a good 45 seconds to parse these 3000 objects.
What are the other recommended approaches?
Try pull parsing instead, use StAX?
First search hit on comparing:
http://docs.oracle.com/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html
Have you profiled and seen where the bottlenecks are?
StAX is built into java (since java 6), but some recommend the woodstox StAX implementation for even better performance. I have not tried it though. http://woodstox.codehaus.org/
I tried using the SAX parser and it takes a good 45 seconds to parse
these 3000 objects. What are the other recommended approaches?
There are only the following options:
DOM
SAX
StAX
SAX is the fastest SAXvsDOMvsStax so if you switch to different style, I don't think you'll get any benefit.
Unless you are doing something wrong now
Of course there are also the marshalling/demarshalling frameworks such as JAXB etc but IMO (not done any measurements) they could be slower since the add an extra layer of abstraction on the XML processing
SAX doesn't provide random access to the structure of the XML file, this means that SAX provides a relatively fast and efficient method of parsing. Because the SAX parser deals with only one element at a time, implementations can be extremely memory-efficient, making it often the one choice for dealing with large files.
Parsing 25Mb of XML should not take 45 seconds. There is something else going on. Perhaps most of the time is spent waiting for an external DTD to be fetched from the web, I don't know. Before changing your approach, you need to understand where the costs are coming from and therefore what part of the system will benefit from changes.
However, if you really do want to convert the XML into Java objects (not the application architecture I would choose, but never mind), then JAXB sounds a good bet. I haven't used JAXB much since I prefer to stick with XML-oriented languages like XSLT and XQuery, but when I did try JAXB I found it pretty fast. Of course it uses a SAX or StAX parser underneath.
I want to do some manipulation on xml content in Java. See below xml
From Source XML:
<ns1:Order xmlns:ns1="com.test.ns" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<OrderHeader>
<Image>Image as BinaryData of size 250KB</Image>
</OrderHeader>
</ns1:Order>
Target XML:
<OrderData>
<OrderHeader>
<Image>Image as BinaryData of size 250KB</Image>
</OrderHeader>
</OrderData>
As shown, I have Source xml and I want target xml for that .. The only difference we can observe is root_element "ns1:Order" is replace with "OrderData" in target xml.
Fyi, OrderHeader has one sub-element Image which holds binary image of 250KB (so this xml going to be large one) .. also root element of target xml "OrderData" is well-known in advance.
Now, I want to achieve above result in java with best performance .. I have Source xml content already as byte[] and I want target xml content also as byte[] .. I am open to use Sax parser too.
Please provide the solution which has best performance for doing above stuff.
Thanks in advance,
Nurali
Do you mean machine performance or human performance? Spending an infinite amount of programmer time to achieve a microscopic gain in machine performance is a strange trade-off to make these days, when a powerful computer costs about the same as half a day of a contract programmer's time.
I would recommend using XSLT. It might not be fastest, but it will be fast enough. For a simple transformation like this, XSLT performance will be dominated by parsing and serialization costs, and those won't be any worse than for any other solution.
Not much will beat direct bytes/String manipulation, for instance, a regular expression.
But be warned, manipulating XML with Regex is always a hot debate
I used XLST to transform XML documents. That's another way to do it. There are several Java implementations of XLST processors.
The fastest way to manipulate strings in Java is using direct manipulation and the StringBuilder for the results. I wrote code to modify 20 mb strings that built a table of change locations and then copied and modified the string into a new StringBuilder. For Strings XSLT and RegEx are much slower than direct manipulation and SAX/DOM parsers are slower still.