Castor performance issues

Castor performance issues - java

We recently upgraded to Castor 1.2 from version 0.9.5.3 and we've noticed a dramatic drop in performance when calling unmarshal on XML. We're unmarshaling to java classes that were generated by castor in both cases. For comparison, using identical XML the time for the XML unmarshal call used to take about 10-20ms and now takes about 2300ms. Is there something obvious I might be missing in our new castor implementation, maybe in a property file that I missed or should I start looking at reverting to the old version? Maybe there was something in the java class file generation that is killing the unmarshal call? I might also consider Castor alternatives if there were a good reason to drop it in favor of something else. We're using java 1.5 in a weblogic server.

We had very serious performance issues by using castor 1.0.5, with .castor.cdr file (a few seconds to unmarshal, whereas it took milliseconds by the past).
It appeared that the .castor.cdr generated file contained old/wrong values (inexisting types and descriptor). After deleting the incriminated lines in this file, all was back to normal.
Hope this could help for anyone who have the same problem!

You might want to consider using JiBX instead. It is considerably faster than Castor (9x faster on one project where I made the switch), and cleaner.
EDIT: See also my answer to this related question.

We wound up reverting to Castor version 0.9.5.3 and the performance jumped back up after we regenerated the java classes from the new XSD's. I'm not sure why exactly there's such a big difference between the resulting java but the 1.2 classes were about 2 orders of magnitude slower when unmarshaling.
**EDIT:**It looks like by creating ClassDescriptorResolvers/mapping file and turing off validation that we could improve the performance also but since we create about 1000 objects with the schema generation process this isn't really viable from a cost perspective.

I too have this issue, when generating a basic customer/address set of XML it takes around 3s to generate a document including 74 customers.
Reverting to 1.0.4 (for testing) sees this return to 1.4s,
But hand rolling the XML sees the output at under 40ms.. I know the frameworks add some overhead, but there must be something causing this.
Has there been any profiling run on Castor?
I'll go investigate JiBX as suggested by Dan.

Related

EclipseLink Moxy minimum libraries required for unmarshalling

i have been working with EclipseLink in the past couple of days to implement one of our small converter applications. The input for these are usually one document format type and now i.e. in the future a sophisticated metadata xml.
Since we have a schema for this and there are still slight changes to be expected in the future, i wanted to give the JAXB approach a try and i like it very much so far.
However, as i finish the application i noticed that due to the usage of eclipselink.jar, the application is rather large (~10MB) in comparision to similar converters (~1MB).
This is due to the fact that there is, for reasons of technological environment, no global classpath for the converter jars, but every one of them needs to be self sufficient.
This means that i copy every required jar into one big jar using ant.
I am not quite fond of this approach myself but so far can only hint that some different approach may or may not be more elegant.
There are some smaller jars containing fragments of needed classes with the eclipselink distribution, but i found none that contained the
org.eclipse.persistence.jaxb.JAXBContextFactory
(plus the dependencies for this).
It seems to me, but this is a lot of guess work, that the
eclipselink.jar
includes the complete-wellness-package-that-leaves-nothing-to-be-desired and that is a bit of an overkill for me.
Long story short:
Is there a light weight version of the eclipselink.jar which would support the unmarshalling of an xml for which i generated java classes in advance? Or am i trying the impossible?
Thank you in advance
Christian

Instead of using eclipselink.jar, you can use the bundles. Then you will need to include the following
org.eclipse.persistence.asm.version.jar
org.eclipse.persistence.core.version.jar
org.eclipse.persistence.moxy.version.jar
The total is still larger than other providers, but we're working on fixing that.

Generating java code from wsdl using cxf gives code too large error

I have generated the code form wsdl to java using cxf 2.7.3 but when building the assembly I get "code too large" error. Indicating that one of the methods have exceeded java 64kb limit. I know exactly which class and to me this seems like bug in cxf. Actully Axis2 does the same so I was wondering if anyone knows how to solve this.
The code I'm playing around with is provided here in path eco-api-ex / examples / java /
How can I force the code generation to split up large generated method? or should I use some external tool for this?
[ERROR] \workspace\e-conomics\target\generated\src\main\java\com\e_conomic\Econo
micWebServiceSoap_EconomicWebServiceSoap12_Client.java:[34,23] error: code too l
arge

Don't run wsdl2java with the -client flag. The _Client.java class is just a sample class to show how to use the generated service class and proxies and such. It's not normally needed for anything. That SHOULD be the only class generated with a large method like that.

You've got a 3MB WSDL document there. (No wonder my browser was a bit unhappy when I tried to view a general XML document of that size.) It's got around 3000 elements defined in it; also 3k messages and 4.5k operations. I don't know exactly what you're hitting the limit in (there's a few places where registries of all entities of a particular type are constructed) but it doesn't matter too much. It's just far too large for most code to normally handle. (The limit you're hitting appears to be the one on the total size of bytecode for a method; hitting that is usually an indication of something somewhere else going badly awry: in this case, it's the bunker-busting WSDL document.)
Constructing a cut-down version that has a much smaller set of elements, messages and operations would be an excellent idea. Putting that cut-down version in your repository where Maven can find it (e.g., in src/main/wsdl) would also make a lot of sense, as it would stop you from downloading that 3MB document again each time you build.

Compiling huge schema into Java

There are two major tools which provides a way to compile XSD schema into Java: xmlbeans and JAXB.
The problem is the XSD schema is really huge: 30MB of XML files, most of the schema isn't used in my project, so I can comment out most of the code, but it not a good solution.
Currently my project uses xmlbeans which compiles the schema with major changes. It produces ~60MB of classes and it takes ~30 min to compile.
Another solution is to use JAXB, which generates ~14MB of code without need to edit the code. But it produces huge ObjectFactory class, which fails to compile with "too many constants" error. I can throw the class away and compile the schema without it, but as I understand, it's very useful class.
Any ideas how to handle this huge schema?

Could you create a script to extract the portion(s) of the schema you need and integrate that into your build process prior to mapping with XmlBeans or JAXB?
You could probably script this extraction fairly simply and easily in Python, Perl, Awk, etc; or even in XSL if you have expertise there (I've never spent enough contiguous time coding XSL to get proficient, so I'd probably stick to a scripting language, but that's just me).
e.g.:
python extract.py big-schema.xsd >small-schema.xsd
xsd2java <args> small-schema.xsd
...
You might find that a subsequent update by the 3rd-party vendor would invalidate your extraction script, but unless they're making very large changes to the overall schema, you should be able to update the script fairly quickly, and it sounds like those updates should be fairly infrequent.
Incidentally, I'm a little partial to XmlBeans; when we did our own evaluation of XML-Java mapping tools, it seemed to handle constructs like xs:choice, xs:all, and type-substitution better than anything else we tried. But that was several years ago, and could certainly have changed by now. At this point, we're continuing to use it more out of institutional inertia than anything else, so take that recommendation with a dash of salt.

30Mb of schema? What on earth is this - I'd be interested to know if it's available as a test case for schema processors.
Data mapping (a la JAXB) works best with small schemas. I've seen people really struggle when the schema gets as large as about 200 element types. You must be dealing with something a couple of orders of magnitude larger here - I would say it's a non starter.

Java marshaller performance

I've used JAXB Marshaller as well as my own marshaller for marshalling pure java bean objects into XML. It has been observed that both of them require almost same time to marshal. The performance is not acceptable and needs to be improved. What are possible ways where we can improve performance of marshaller? Like threading?

Make sure you create the JaxB context instance only once, creating the context takes some time as it uses reflection to parse the object's annotations.
Note that the JAXBContext is thread safe, but the marshallers\unmarshallers aren't, so you still have to create the marshaller for every thread. However I found that creating the marshallers when you already hold a jaxb context is pretty fast.

Seconding the Use of JibX. Like questzen, I found that JibX was 9 times faster than JAXB in my performance tests.
Also, make sure you have woodstox on the classpath when using JibX. I found woodstox's Stax Implementation is roughly 1050% faster than the Java6 implementation of Stax.

Byeond other good suggestions, I suggest there is something wrong with the way you use JAXB -- it is generally reasonably well performing as long as:
You use JAXB version 2 (NEVER ever use obsolete JAXB 1 -- that was horribly slow, useless piece of crap); preferably a recent 2.1.x version from http://jaxb.dev.java.net
Ensure that you use SAX or Stax source/destination; NEVER use DOM unless you absolute must for interoperability: using DOM will make it 3 - 5x slower, without any benefit (it just doubles object model: POJO -> DOM -> XML; DOM part is completely unnecessary)
Ideally use fastest SAX/Stax parser available; Woodstox is faster than Sun's bundled Stax processor (and BEA's ref. impl. is buggy, no faster than Sun's)
If JAXB is still more than 50% slower than manually written variant, I would profile it to see what else is going wrong. It should not work slowly when used properly -- I have measured it, continuously, and found it so fast that hand-writing converters is usually not worth time and effort.
Jibx is a good package too btw, so I have nothing against trying it out. It might still be bit faster than JAXB; just not 5x or 10x, when both are used correctly.

If large XML trees are written, providing a BufferedOutputStream to the
javax.xml.bind.Marshaller.marshal(Object jaxbElement, java.io.OutputStream os) method made a big difference in my case:
The time needed to write a 100MB XML file could be reduced from 130 sec to 7 sec.

In my experience, JIBX http://jibx.sourceforge.net/ was nearly 10X faster then JAXB. Yes, I measured it for a performance spec. We used it to bind java beans with large HL7 xml. That being said, the way to improve performance is not to rely on the schema definition but to write custom bindings.

We have just tracked down a JAXB performance problem related to the default parser configuration used by Xerces. JAXB performance was very slow (30s+) for one data file (<1Mb)
Quoting "How do I change the default parser configuration?" from http://xerces.apache.org/xerces2-j/faq-xni.html
The DOM and SAX parsers decide which parser configuration to use in the following order
The org.apache.xerces.xni.parser.XMLParserConfiguration system property is queried for the class name of the parser configuration.
If a file called xerces.properties exists in the lib subdirectory of the JRE installation and the org.apache.xerces.xni.parser.XMLParserConfiguration property is defined it, then its value will be read from the file.
The org.apache.xerces.xni.parser.XMLParserConfiguration file is requested from the META-INF/services/ directory. This file contains the class name of the parser configuration.
The org.apache.xerces.parsers.XIncludeAwareParserConfiguration is used as the default parser configuration.
Unmarshalling using JAXB results in this algorithm being repeatedly applied. So a huge amount of time can be spent repeatedly scanning the classpath, looking for the configuration file that doesn't exist. The fix is to do option 1, option 2 or option 3 (create the configuration file under META-INF). Anything to prevent the repeated classpath scanning.
Hope this helps someone - it's taken us days to track this down.

We can achieve the performance in Marshalling and unmarshalling by setting the fast booting property at system level. This will give lot of performance improvement.
System.setProperty("com.sun.xml.bind.v2.runtime.JAXBContextImpl.fastBoot","true");

Is it worth the effort to move from a hand crafted hibernate mapping file to annotaions?

I've got a webapp whose original code base was developed with a hand crafted hibernate mapping file. Since then, I've become fairly proficient at 'coding' my hbm.xml file. But all the cool kids are using annotations these days.
So, the question is: Is it worth the effort to refactor my code to use hibernate annotations? Will I gain anything, other than being hip and modern? Will I lose any of the control I have in my existing hand coded mapping file?
A sub-question is, how much effort will it be? I like my databases lean and mean. The mapping covers only a dozen domain objects, including two sets, some subclassing, and about 8 tables.
Thanks, dear SOpedians, in advance for your informed opinions.

"If it ain't broke - don't fix it!"
I'm an old fashioned POJO/POCO kind of guy anyway, but why change to annotations just to be cool? To the best of my knowledge you can do most of the stuff as annotations, but the more complex mappings are sometimes expressed more clearly as XML.

One thing you'll gain from using annotations instead of an external mapping file is that your mapping information will be on classes and fields which improves maintainability. You add a field, you immediately add the annotation. You remove one, you also remove the annotation. you rename a class or a field, the annotation is right there and you can rename the table or column as well. you make changes in class inheritance, it's taken into account. You don't have to go and edit an external file some time later. this makes the whole thing more efficient and less error prone.
On the other side, you'll lose the global view your mapping file used to give you.

I've recently done both in a project and found:
I prefer writing annotations to XML (plays well with static typing of Java, auto-complete in IDE, refactoring, etc). I like seeing the stuff all woven together rather than going back and forth between code and XML.
Encodes db information in your classes. Some people find that gross and unacceptable. I can't say it bothered me. It has to go somewhere and we're going to rebuild the WAR for a change regardless.
We actually went all the way to JPA annotations but there are definitely cases where the JPA annotations are not enough, so then had to use either Hibernate annotations or config to tweak.
Note that you can actually use both annotations AND hbm files. Might be a nice hybrid that specifies the O part in annotations and R part in hbm files but sounds like more trouble than it's worth.

As much as I like to move on to new and potentially better things I need to remember to not mess with things that aren't broken. So if having the hibernate mappings in a separate file is working for you now I wouldn't change it.

I definitely prefer annotations, having used them both. They are much more maintainable and since you aren't dealing with that many classes to re-map, I would say it's worth it. The annotations make refactoring much easier.

All the features are supported both in the XML and in annotations.
You will still be able to override your annotations with xml declaration.
As for the effort, i think it is worth it as you will be able to see all in one place and not switch between your code and the xml file (unless of-course you are using two monitors ;) )

The only thing you'll gain from using
annotations
I would probably argue that this is the thing you want to gain from using annotations. Because you don't get compile time safety with NHibernate this is the next best thing.

"If it ain't broke - don't fix it!"
#Macka - Thanks, I needed to hear that. And thanks to everyone for your answers.
While I am in the very fortunate position of having an insane amount of professional and creative control over my work, and can bring in just about any technology, library, or tool for just about any reason (baring expensive stuff) including "because all the cool kids are using it"...It does not really make sense to port what amounts to a significant portion of the core of an existing project.
I'll try out Hibernate or JPA annotations with a green-field project some time. Unfortunately, I rarely get new completely independent projects.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.