Java marshaller performance

Java marshaller performance - java

I've used JAXB Marshaller as well as my own marshaller for marshalling pure java bean objects into XML. It has been observed that both of them require almost same time to marshal. The performance is not acceptable and needs to be improved. What are possible ways where we can improve performance of marshaller? Like threading?

Make sure you create the JaxB context instance only once, creating the context takes some time as it uses reflection to parse the object's annotations.
Note that the JAXBContext is thread safe, but the marshallers\unmarshallers aren't, so you still have to create the marshaller for every thread. However I found that creating the marshallers when you already hold a jaxb context is pretty fast.

Seconding the Use of JibX. Like questzen, I found that JibX was 9 times faster than JAXB in my performance tests.
Also, make sure you have woodstox on the classpath when using JibX. I found woodstox's Stax Implementation is roughly 1050% faster than the Java6 implementation of Stax.

Byeond other good suggestions, I suggest there is something wrong with the way you use JAXB -- it is generally reasonably well performing as long as:
You use JAXB version 2 (NEVER ever use obsolete JAXB 1 -- that was horribly slow, useless piece of crap); preferably a recent 2.1.x version from http://jaxb.dev.java.net
Ensure that you use SAX or Stax source/destination; NEVER use DOM unless you absolute must for interoperability: using DOM will make it 3 - 5x slower, without any benefit (it just doubles object model: POJO -> DOM -> XML; DOM part is completely unnecessary)
Ideally use fastest SAX/Stax parser available; Woodstox is faster than Sun's bundled Stax processor (and BEA's ref. impl. is buggy, no faster than Sun's)
If JAXB is still more than 50% slower than manually written variant, I would profile it to see what else is going wrong. It should not work slowly when used properly -- I have measured it, continuously, and found it so fast that hand-writing converters is usually not worth time and effort.
Jibx is a good package too btw, so I have nothing against trying it out. It might still be bit faster than JAXB; just not 5x or 10x, when both are used correctly.

If large XML trees are written, providing a BufferedOutputStream to the
javax.xml.bind.Marshaller.marshal(Object jaxbElement, java.io.OutputStream os) method made a big difference in my case:
The time needed to write a 100MB XML file could be reduced from 130 sec to 7 sec.

In my experience, JIBX http://jibx.sourceforge.net/ was nearly 10X faster then JAXB. Yes, I measured it for a performance spec. We used it to bind java beans with large HL7 xml. That being said, the way to improve performance is not to rely on the schema definition but to write custom bindings.

We have just tracked down a JAXB performance problem related to the default parser configuration used by Xerces. JAXB performance was very slow (30s+) for one data file (<1Mb)
Quoting "How do I change the default parser configuration?" from http://xerces.apache.org/xerces2-j/faq-xni.html
The DOM and SAX parsers decide which parser configuration to use in the following order
The org.apache.xerces.xni.parser.XMLParserConfiguration system property is queried for the class name of the parser configuration.
If a file called xerces.properties exists in the lib subdirectory of the JRE installation and the org.apache.xerces.xni.parser.XMLParserConfiguration property is defined it, then its value will be read from the file.
The org.apache.xerces.xni.parser.XMLParserConfiguration file is requested from the META-INF/services/ directory. This file contains the class name of the parser configuration.
The org.apache.xerces.parsers.XIncludeAwareParserConfiguration is used as the default parser configuration.
Unmarshalling using JAXB results in this algorithm being repeatedly applied. So a huge amount of time can be spent repeatedly scanning the classpath, looking for the configuration file that doesn't exist. The fix is to do option 1, option 2 or option 3 (create the configuration file under META-INF). Anything to prevent the repeated classpath scanning.
Hope this helps someone - it's taken us days to track this down.

We can achieve the performance in Marshalling and unmarshalling by setting the fast booting property at system level. This will give lot of performance improvement.
System.setProperty("com.sun.xml.bind.v2.runtime.JAXBContextImpl.fastBoot","true");

Related

EclipseLink Moxy minimum libraries required for unmarshalling

i have been working with EclipseLink in the past couple of days to implement one of our small converter applications. The input for these are usually one document format type and now i.e. in the future a sophisticated metadata xml.
Since we have a schema for this and there are still slight changes to be expected in the future, i wanted to give the JAXB approach a try and i like it very much so far.
However, as i finish the application i noticed that due to the usage of eclipselink.jar, the application is rather large (~10MB) in comparision to similar converters (~1MB).
This is due to the fact that there is, for reasons of technological environment, no global classpath for the converter jars, but every one of them needs to be self sufficient.
This means that i copy every required jar into one big jar using ant.
I am not quite fond of this approach myself but so far can only hint that some different approach may or may not be more elegant.
There are some smaller jars containing fragments of needed classes with the eclipselink distribution, but i found none that contained the
org.eclipse.persistence.jaxb.JAXBContextFactory
(plus the dependencies for this).
It seems to me, but this is a lot of guess work, that the
eclipselink.jar
includes the complete-wellness-package-that-leaves-nothing-to-be-desired and that is a bit of an overkill for me.
Long story short:
Is there a light weight version of the eclipselink.jar which would support the unmarshalling of an xml for which i generated java classes in advance? Or am i trying the impossible?
Thank you in advance
Christian

Instead of using eclipselink.jar, you can use the bundles. Then you will need to include the following
org.eclipse.persistence.asm.version.jar
org.eclipse.persistence.core.version.jar
org.eclipse.persistence.moxy.version.jar
The total is still larger than other providers, but we're working on fixing that.

What is the most efficient way of repeatedly writing to an XML file in Android?

I am writing an application which needs to add nodes to an existing XML file repeatedly, throughout the day. Here is a sample of the list of nodes that I am trying to append:
<gx:Track>
<when>2012-01-21T14:37:18Z</when>
<gx:coord>-0.12345 52.12345 274.700</gx:coord>
<when>2012-01-21T14:38:18Z</when>
<gx:coord>-0.12346 52.12346 274.700</gx:coord>
<when>2012-01-21T14:39:18Z</when>
<gx:coord>-0.12347 52.12347 274.700</gx:coord>
....
This can happen up to several times a second over a long time and I would like to know what the best or most efficient way of doing this is.
Here is what I am doing right now: Use a DocumentBuilderFactory to parse the XML file, look for the container node, append the child nodes and then use the TransformerFactory to write it back to the SD card. However, I have noticed that as the file grows larger, this is taking more and more time.
I have tried to think of a better way and this is the only thing I can think of: Use RandomAccessFile to load the file and use .seek() to a specific position in the file. I would calculate the position based on the file length and subtract what I 'know' is the length of the file after what I'm appending.
I'm pretty sure this method will work but it feels a bit blind as opposed to the ease of using a DocumentBuilderFactory.
Is there a better way of doing this?

You should try using JAXB. It's a Java XML Binding library that comes in most of the Java 1.6 JDKs. JAXB lets you specify an XML Schema Definition file (and has some experimental support for DTD). The library will then compile Java classes for you to use in your code that translate back into an XML Document.
It's very quick and useful with optional support for validation. This would be a good starting point. This would be another good one to look at. Eclipse also has some great tools for generating the Java classes for you, and providing a nice GUI tool for XSD creation. The Eclipse plugins are called Davi I believe.

Compiling huge schema into Java

There are two major tools which provides a way to compile XSD schema into Java: xmlbeans and JAXB.
The problem is the XSD schema is really huge: 30MB of XML files, most of the schema isn't used in my project, so I can comment out most of the code, but it not a good solution.
Currently my project uses xmlbeans which compiles the schema with major changes. It produces ~60MB of classes and it takes ~30 min to compile.
Another solution is to use JAXB, which generates ~14MB of code without need to edit the code. But it produces huge ObjectFactory class, which fails to compile with "too many constants" error. I can throw the class away and compile the schema without it, but as I understand, it's very useful class.
Any ideas how to handle this huge schema?

Could you create a script to extract the portion(s) of the schema you need and integrate that into your build process prior to mapping with XmlBeans or JAXB?
You could probably script this extraction fairly simply and easily in Python, Perl, Awk, etc; or even in XSL if you have expertise there (I've never spent enough contiguous time coding XSL to get proficient, so I'd probably stick to a scripting language, but that's just me).
e.g.:
python extract.py big-schema.xsd >small-schema.xsd
xsd2java <args> small-schema.xsd
...
You might find that a subsequent update by the 3rd-party vendor would invalidate your extraction script, but unless they're making very large changes to the overall schema, you should be able to update the script fairly quickly, and it sounds like those updates should be fairly infrequent.
Incidentally, I'm a little partial to XmlBeans; when we did our own evaluation of XML-Java mapping tools, it seemed to handle constructs like xs:choice, xs:all, and type-substitution better than anything else we tried. But that was several years ago, and could certainly have changed by now. At this point, we're continuing to use it more out of institutional inertia than anything else, so take that recommendation with a dash of salt.

30Mb of schema? What on earth is this - I'd be interested to know if it's available as a test case for schema processors.
Data mapping (a la JAXB) works best with small schemas. I've seen people really struggle when the schema gets as large as about 200 element types. You must be dealing with something a couple of orders of magnitude larger here - I would say it's a non starter.

IN java, how a commons-Digester process an input XML file?

I am new to Java and I came across a statement in a Java project which says:
Digester digester = DigesterLoader.createDigester(getClass()
.getClassLoader().getResource("rules.xml"));
rules.xml file contains various patterns and every pattern has different attributes like classname, methodname and some another properties.
i googled about digester but couldn't found anything useful that could help me with the statement above. can anyone just tell me what are the steps followed in executing above statement ? In fact what is the advantage of this XML stuff ?

swapnil, as a user of Digester back in my Struts days I can honestly say it's tricky to learn/debug. It's a tough library to familiarize yourself with, essentially you are setting up event handlers for certain elements kinda like a SAX parser (in fact it's using SAX in behind the scenes). So you feed a rules engine some XPath for nodes you are interested in and setup rules which will instantiate, and set properties on some POJOs with data it finds in the XML file.
Great idea, and once you get used to it it's good, however if you have an xsd for your input xml file I'd sooner recommend you use JAXB.
The one thing that is nice about Digester is it will only do things with elements you are interested in, so memory footprint ends up being nice and low.

This is the method that's getting called here. Xml is used commonly in Java for configurations, since xml files do not need to be compiled. Having the same configuration in a java file would mean you have to compile the file.

I assume you understand how the rules file is being loaded using the class loader? It's basically looking in the same package as the class itself and creating a URL that gives the file's absolute location.
As for the Digester, I've not used it but a quick read of this (http://commons.apache.org/digester/) should explain all.

They used it at my last gig and all I remember is that it was extremely slow.

Castor performance issues

We recently upgraded to Castor 1.2 from version 0.9.5.3 and we've noticed a dramatic drop in performance when calling unmarshal on XML. We're unmarshaling to java classes that were generated by castor in both cases. For comparison, using identical XML the time for the XML unmarshal call used to take about 10-20ms and now takes about 2300ms. Is there something obvious I might be missing in our new castor implementation, maybe in a property file that I missed or should I start looking at reverting to the old version? Maybe there was something in the java class file generation that is killing the unmarshal call? I might also consider Castor alternatives if there were a good reason to drop it in favor of something else. We're using java 1.5 in a weblogic server.

We had very serious performance issues by using castor 1.0.5, with .castor.cdr file (a few seconds to unmarshal, whereas it took milliseconds by the past).
It appeared that the .castor.cdr generated file contained old/wrong values (inexisting types and descriptor). After deleting the incriminated lines in this file, all was back to normal.
Hope this could help for anyone who have the same problem!

You might want to consider using JiBX instead. It is considerably faster than Castor (9x faster on one project where I made the switch), and cleaner.
EDIT: See also my answer to this related question.

We wound up reverting to Castor version 0.9.5.3 and the performance jumped back up after we regenerated the java classes from the new XSD's. I'm not sure why exactly there's such a big difference between the resulting java but the 1.2 classes were about 2 orders of magnitude slower when unmarshaling.
**EDIT:**It looks like by creating ClassDescriptorResolvers/mapping file and turing off validation that we could improve the performance also but since we create about 1000 objects with the schema generation process this isn't really viable from a cost perspective.

I too have this issue, when generating a basic customer/address set of XML it takes around 3s to generate a document including 74 customers.
Reverting to 1.0.4 (for testing) sees this return to 1.4s,
But hand rolling the XML sees the output at under 40ms.. I know the frameworks add some overhead, but there must be something causing this.
Has there been any profiling run on Castor?
I'll go investigate JiBX as suggested by Dan.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.