I need to query XML out of a MarkLogic server and marshal it into Java objects. What is a good way to go about this? Specifically:
Does using MarkLogic have any impact on the XML technology stack? (i.e. is there something about MarkLogic that leads to a different approach to searching for, reading and writing XML snippets?)
Should I process the XML myself using one of the XML APIs or is there a simpler way?
Is it worth using JAXB for this?
Someone asked a good question of why I am using Java. I am using Java/Java EE because I am strongest in that language. This is a one man project and I don't want to get stuck anywhere. The project is to develop web service APIs and data processing and transformation (CSV to XML) functionality. Java/Java EE can do this well and do it elegantly.
Note: I'm the EclipseLink JAXB (MOXy) lead, and a member of the JAXB 2 (JSR-222) expert group.
Does using MarkLogic have any impact on the XML technology stack?
(i.e. is there something about MarkLogic that leads to a different
approach to searching for, reading and writing XML snippets?)
Potentially. Some object-to-XML libraries support a larger variety of documents than other ones do. MOXy leverages XPath based mappings that allows it to handle a wider variety of documents. Below are some examples:
http://blog.bdoughan.com/2010/09/xpath-based-mapping-geocode-example.html
http://blog.bdoughan.com/2011/03/map-to-element-based-on-attribute-value.html
Should I process the XML myself using one of the XML APIs or is there
a simpler way?
Using a framework is generally easier. Java SE offers may standard libraries for processing XML: JAXB (javax.xml.bind), XPath (javax.xml.xpath), DOM, SAX, StAX. Since these standards there are also other implementations (i.e. MOXy and Apache JaxMe implement JAXB).
http://blog.bdoughan.com/2011/05/specifying-eclipselink-moxy-as-your.html
Is it worth using JAXB for this?
Yes.
There are a number of XML-> Java object marshall-ing libraries. I think you might want to look for an answer to this question by searching for generic Java XML marshalling/unmarshalling questions like this one:
Java Binding Vs Manually Defining Classes
Your use case is still not perfectly clear although the title edit helps - If you're looking for Java connectivity, you might also want to look at http://developer.marklogic.com/code/mljam which allows you to execute Java code from within MarkLogic XQuery.
XQSync uses XStream for this. As I understand it, JAXB is more powerful - but also more complex.
Having used JAXB to unmarshal xml served from XQuery for 5 years now, I have to say that I have found it to be exceptionally useful and time-saving. As for complexity, it is easy to learn and use for probably 90% of what you would be using it for. I've used it for both simple and complex schemas and found it to be very performant and time-saving. Executing Java code from within MarkLogic is usually a non-starter, because it runs in a separate VM on the Marklogic server, so it really can't leverage any session state or libraries from, say, a Java EE web application. With JAXB, it is very easy to take a result stream and convert it to Java objects. I really can't say enough good things about it. It has made my development efforts infinitely easier and allows you to leverage Java for those things that it does best (rich integration across various technologies and platforms, advanced business logic, fast memory management for heavy processing jobs, etc.) while still using XQuery for what it does best (i.e. searching and transforming content).
Does using MarkLogic have any impact on the XML technology stack?
No. By the time it comes out of MarkLogic, it's just XML that could have come from anywhere.
I need to query XML and marshal it into Java objects.
Why?
If you have a good reason for using Java, then we need to know what that reason is before we can tell you which Java technology is appropriate.
If you don't have a good reason for using Java, then you are better off using a high-level XML processing language such as XSLT or XQuery.
As for JAXB, it is appropriate when your schema is reasonably simple and stable. If the schema is complex (e.g. the schema for articles in an academic journal), then JAXB can be hopelessly unwieldy because of the number of classes that are generated. One problem with using it to process XQuery output is that it's very likely the XQuery output will not conform to any known schema, and the structure of the XQuery results will be different for each query that gets written.
Related
This question is somewhat related to
Fastest XML parser for small, simple documents in Java
but with a few more specifics.
I'm working on an application which needs to parse many (10s of millions), small (approx. 300k) xml documents. The current implementation is using xerces-j and it takes about 2.5 ms per xml document on a 1.5 GHz machine. I'd like to improve this performance. I came across this article
http://www.xml.com/pub/a/2007/05/16/xml-parser-benchmarks-part-2.html
claiming that libxml2 can parse about an order of magnitude faster than any java parsers. I'm not sure if I believe it, but it caught my attention. Has anyone tried using libxml2 from the jvm? If so, is it faster than java dom parsing (xerces)? I'm thinking I'd still need my java dom structure, but I'm guessing that copying from a c-structured dom into java-dom shouldn't take long. I must have java-dom - sax will not help me in this case.
update: I just wrote a test for libxml2 and it wasn't any faster than xerces... granted my c coding ability is extremely rusty.
update I broadened the question a bit here:
why is sax parsing faster than dom parsing ? and how does stax work?
and am open to the possibility of ditching dom.
Thanks
In Java, StAX JSR-173 is generally considered to be the fastest approach to parsing XML. There are multiple implementations of StAX, the Woodstox implementation is generally regarded as being fast.
To improve performance I would avoid DOM. What are you doing with the XML? If you are ultimately dealing with it as objects, the you should consider an OXM solution. The standard is JAXB JSR-222. JAXB implementations such as MOXy (I'm the tech lead) will even allow you to do a partial mapping which will improve performance:
http://bdoughan.blogspot.com/2010/09/xpath-based-mapping-geocode-example.html
First of all, your question does not contain a question. What do you want to know?
I suppose you were using JNI to convert the c-dom into a java-dom. I dont know if there are official numbers, but in my experience c+JNI often is slower than directly doing it in java.
If you really want to speed up your processing, try to get rid of the dom (why do you need it? Maybe we can think of a solution together). If all xml files have the same schema, use your own specialized data model (and a SAX parser).
If you only use a subset of xml (i.e. without namespaces, only few attributes), consider writing your own parser that directly produces more efficient java objects (but I would not recommend that).
I've been using JDOM for general XML parsing for a long time, but get the feeling that there must be something better, or at least more lightweight, for Java 5 or 6.
There's nothing wrong with the JDOM API, but I don't like having to include Xerces with my deployments. Is there a more lightweight alternative, if all I want to do is read in an XML file or write one out?
The best lightweight alternative is, in my opinion, XOM, but JDOM is still a very good API, and I see no reason to replace it.
It doesn't have a dependency on Xerces, though (at least, it doesn't need the Apache Xerces distro, it works alongside the Xerces that's packaged into the JRE).
I've used the javax.xml.stream package (XMLStreamReader/XMLStreamWriter) to read and write XML using xml pull/push techniques. It's worked for me so far.
We use JAXB - it generates the classes based on the schema. You can generate your files without a schema, and just annotate how you want the xml to be.
There was recently a fork of JDOM for java 5 called coffeeDOM. You should check it out.
You should check out Commons Digester (see the answer I've given here). It provides a very lightweight mechanism for parsing XML.
JDOM is very good and simple. There has been many new ways to parse XML after release of JDOM, but those has have different focus than simplicity. JAXB makes things simple in some cases when you have well known XML document has your schema does not get updated daily basis.
New push parsers are very good and even mandatory for very large XML files (hundreds of MBs).
Speed benefit for SAX parser can be ten fold.
Use one of the XML APIs that come standard with Java, so that you don't have to include any third-party libraries.
XML in the Java Platform Standard Edition (Java SE) 6
I would like to think JAXP is a good choise for you.
It's standard, included in JDK, it provides clear interface and allows to hook up any implementations..
If all what you need in is to read and write not very large and overcomplicated xml files, JAXP DOM api embedded in JDK will cover you requirements.
i want to do read simple XML file .i found
Simple way to do Xml in Java
There are also several parsers available just wanted to make sure that what are the advantages of using XOM parser over suns parser
Any suggestions?
XOM is extremely quick compared to the standard W3C DOM. If that's your priority, there's none better.
However, it's still a DOM-type API, and so it's not memory efficient. It's not a replacement for SAX or STAX.
You might want to check this question about the best XML library and its top (XOM) answer; lots of details about advantages of XOM. (Leave a comment if something is unclear; Peter Štibraný seems to know XOM inside and out.)
As mentioned, XOM is very quick and simple in most tasks compared to standard javax.xml. For examples, see this post in a question about the simplest way to read in an XML file in Java. I collected some nice examples that make XOM look pretty good (and javax.xml rather clumsy) there. :-)
So personally I've come to like XOM after evaluating (as you can see in the linked posts); for any new Java project I'd most likely choose XOM for XML handling. The only shortcoming I've found is that it doesn't directly support streaming XML (unlike dom4j where I'm coming from), but with a simple workaround it can stream just fine.
How do you need to access your data?
If it is one-pass, then you don't need to build the tree in memory. You can use SAX (fast, simple) or StAX (faster, not quite so simple).
If you need to keep the tree in memory to navigate, XOM or JDOM are good choices. DOM is the Choice Of Last Resort, whether it is level 1, 2, or 3, with or without extensions.
Xerces, which is the parser included with Java (although you should get the updated version from Apache and not use the one bundled with Java, even in 6.0), also has a streaming native interface called XNI.
If you want to hook other pre-made parts up in the chain, often SAX or StAX work well, since they might build their own model in memory. For example, the Saxon XSLT/XQuery engine works with DOM, SAX or StAX, but builds internally a TinyTree (default) or DOM (optional). DataDirect XQuery works with SAX, StAX or DOM also, but really likes StAX.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm looking for something like dom4j, but without dom4j's warts, such as bad or missing documentation and seemingly stalled development status.
Background: I've been using and advocating dom4j, but don't feel completely right about it because I know the library is far from optimal (example: see how methods in XSLT related Stylesheet class are documented; what would you pass to run() as the String mode parameter?)
Requirements:
The library should make basic XML handling easier than it is when using pure JDK (javax.xml and org.w3c.dom packages). Things like this:
Read an XML document (from file or String) into an object, easily traverse and manipulate the DOM, do XPath queries and run XSLT against it.
Build an XML document in your Java code, add elements and attributes and data, and finally write the document into a file or String.
I really like what dom4j promises, actually: "easy to use, open source library for working with XML, XPath and XSLT [...] with full support for DOM, SAX and JAXP." And upcoming dom4j 2.0 does claim to fix everything: fully utilise Java 5 and add missing documentation. But unfortunately, if you look closer:
Warning: dom4j 2.0 is in pre-alpha
stage. It is likely it can't be
compiled. In case it can be compiled
at random it is likely it can't run.
In case it runs occasionally it can
explode suddenly. If you want to use
dom4j, you want version 1.6.1. Really.
...and the website has said that for a long time. So is there a good alternative to dom4j? Please provide some justification for your preferred library, instead of just dumping names and links. :-)
Sure, XOM :-)
XOM is designed to be easy to learn
and easy to use. It works very
straight-forwardly, and has a very
shallow learning curve. Assuming
you're already familiar with XML, you
should be able to get up and running
with XOM very quickly.
I use XOM for several years now, and I still like it very much. Easy to use, plenty of documentation and articles on the web, API doesn't change between releases. 1.2 was released recently.
XOM is the only XML API that makes no
compromises on correctness. XOM only
accepts namespace well-formed XML
documents, and only allows you to
create namespace well-formed XML
documents. (In fact, it's a little
stricter than that: it actually
guarantees that all documents are
round-trippable and have well-defined
XML infosets.) XOM manages your XML so
you don't have to. With XOM, you can
focus on the unique value of your
application, and trust XOM to get the
XML right.
Check out web page http://www.xom.nu/ for FAQ, Cookbook, design rationale, etc. If only everything was designed with so much love :-)
Author also wrote about What's Wrong with XML APIs (and how to fix them). (Basically, reasons why XOM exists in the first place)
Here is also 5-part Artima interview with author about XOM, where they talk about what's wrong with XML APIs, The Good, the Bad, and the DOM, A Design Review of JDOM, Lessons Learned from JDOM and finally Design Principles and XOM.
The one built into the JDK ... with a few additions.
Yes, it's painful to use: it is modeled after W3C specs that were clearly designed by committee. However, it is available anywhere, and if you settle on it you don't run into the "I like Dom4J," "I like JDOM," "I like StringBuffer" arguments that come from third-party libraries. Especially since such arguments can turn into different pieces of code using different libraries ...
However, as I said, I do enhance slightly: the Practical XML library is a collection of utility classes that make it easier to work with the DOM. Other than the XPath wrapper, there's nothing complex here, just a bunch of routines that I found myself rewriting for every job.
I've been using XMLTool for replacing Dom4j and it's working pretty well.
XML Tool uses Fluent Interface pattern to facilitate XML manipulations:
XMLTag tag = XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");
System.out.println(tag.toString());
It's made for Java 5 and it's easy to create an iterable object over
selected elements:
for (XMLTag xmlTag : tag.getChilds()) {
System.out.println(xmlTag.getCurrentTagName());
}
I've always liked jdom. It was written to be more intuitive than DOM parsing(and SAX parsing always seems clumsy anyway).
From the mission statement:
There is no compelling reason for a
Java API to manipulate XML to be
complex, tricky, unintuitive, or a
pain in the neck. JDOMTM is both
Java-centric and Java-optimized. It
behaves like Java, it uses Java
collections, it is completely natural
API for current Java developers, and
it provides a low-cost entry point for
using XML.
That's pretty much been my experience - fairly intuitive navigation of node trees.
I use XStream, its a simple library to serialize objects to XML and back again.
it can be annotation-driven (like JAXB), but it has very simple and easy to use api and you can even generate JSON.
I'll add to the built-in answer by #kdgregory by saying why not JAXB?
With a few annotations its pretty easy to model most XML documents. I mean your probably going to parse the stuff and put in an object right?
JAXB 2.0 is built in to JDK 1.6 and unlike many other builtin javax libraries this one is pretty good (Kohusuke worked on it so you know its good).
In a recent project I had to do some XML parsing, and ended up using Simple Framework, recommended by a colleague.
I was quite happy with it in the end. It uses an annotation-based approach of mapping XML elements and attributes to Java classes and fields.
<example>
<a>
<b>
<x>foo</x>
</b>
<b>
<y>bar</y>
</b>
</a>
</example>
Corresponding Java code:
#Root
public class Example {
#Path("a/b[1]")
#Element
private String x;
#Path("a/b[2]")
#Element
private String y;
}
It's all quite different from dom4j or XOM. You avoid writing silly, boilerplatey XML handling code, but at first you'll probably bang your head against a wall for a while trying to get the annotations right.
(It was me who asked this question 4 years ago. While XOM seems a decent and quite popular dom4j replacement, I haven't come to fully embrace it. Curious that no-one had mentioned Simple Framework here. I decided to fix that, as I'd probably use it again.)
In our project we are using http://www.castor.org/ but just for small XML files. It's really easy to learn, needs just a mapping XML file (or none if the XML tags match perfectly class attributes) and it's done. It supports listeners (like callbacks) to perform additional processing. The cons: it is not a Java EE standard like JAXB.
you can try JAXB, with annotations its very handy and simple to do: Java Architecture for XML Binding.
I'm sometimes using Jericho, which is primarily HTML parser, but can parse any XML-like structure.
Of course it is only for the simplest XML operations, such as finding tags with given name, iterating through structure, replacing tags and its attributes, but aren't this the most use cases?
For building XML documetns, I suggest xmlenc. It is used in cassandra.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What are you using for binding XML to Java? JAXB, Castor, and XMLBeans are some of the available choices. The comparisons that I've seen are all three or four years old. I'm open to other suggestions. Marshalling / unmarshalling performance and ease of use are of particular interest.
Clarification: I'd like to see not just what framework you use, but your reasoning for using one over the others.
If you want to make an informed decision you need to be clear why you are translating between XML and java objects. The reason being that the different technologies in this space try to solve different problems. The different tools fall into two categories:
XML data binding - refers to the process of representing the information in an XML document as an object in computer memory. Typically, this means defining an XSD and generating a java source code equivalent. Interop between different languages is top priority (hence the use of XSD) - most typically for the implementation of SOAP-based web services.
XML serialisation - refers to writing out a graph of in memory objects to a stream, so that it can be reconstituted somewhere or sometime else. You write the java classes by hand; the xml representation is of secondary importance. Also, the need for performance is often greater and the need for interoperation with other languages such as .net is often lower.
For xml serialisation, Xstream is hard to beat. JAXB is the standard for XML binding.
In either case, if you are using J2EE you'll need to pay careful attention to classes retrieved from JPA since class proxies and persistence specific collection types can confuse binding / serialization tools.
JiBX. Previously I used Castor XML, but JiBX proved to be significantly better, particularly in terms of performance (a straight port of some application code from Castor XML to JiBX made it 9x faster). I also found the mapping format for JiBX to be more elegant than Castor's.
JiBX achieves its performance by using post-compilation bytecode manipulation rather than the reflection approach adopted by Castor. This has the advantage that it places fewer demands on the way that you write your mapped classes. There is no need for getters, setters and no-arg constructors just to satisfy the tools. Most of the time you can write the class without considering mapping issues and then map it without modifications.
If you have an XSD for the XML, and you don't need to bind the data to an existing set of classes, then I really like XMLBeans. Basically, it works like this:
Compile XSD
Use generated java classes to read/write documents conforming to this schema
Binding an XML document to the generated classes is as simple as:
EmployeesDocument empDoc = EmployeesDocument.Factory.parse(xmlFile);
We use xstream. Marshalling / unmarshalling is trivial. See their tutorial for examples.
Jibx is what is used around here. It is very fast, but the bindings can be a little tricky. However, it is especially useful if you have XML schemas describing your domain objects, as it really maps well to XSD (there's even a beta tool XSD2Jibx which can take XSDs and create stub domain classes and mappings, which you can then take and coax to fit your existing domain model).
It manipulates bytecode, so it must be run after the initial compilation of the Java .class files. You can use the Maven plugin for it, or just use it directly (the Eclipse plugin didn't seem to work for me).
I've used Jaxb with varying success. At the time (a couple of years back) the overall documentation was lackluster and the basic usage documentation (including where to download implementations) was difficult to find or varied.
The parser which wrote the Java classes was quite good with little discrepancy against the original XSD (though I think it had problems supporting abstract XML elements).
I haven't used it since, but I have an upcoming project which will require just such a framework and I will be interested to know how anyone else fairs with the above.
I used castor 7 years ago -- it worked fairly well. used DTDs. Not many choices at that time.
In current projects, I've used
1) JAXB -- standards based, Reference implementation available, command line and ant tools available. latest version - 2.1.8 needs java 5+.
2) XStream -- for Soap unmarshalling -- needs Java 5+. Is not as fast and standards compliant as JAXB latest.
BR,
~A
Related: XML serialization in Java?
XmlBeans is a good choice especially if you have 'broken' XSD/WSDL files.
Don mentioned
EmployeesDocument empDoc = EmployeesDocument.Factory.parse(xmlFile);
..but it can also take a Node, or a File, or just about any source.
No fighting with namespaces,
traverse to the object you want to unmarshall,
and Factory.parse it.
Wish I had found it 2 weeks ago.
We use Castor. It suits our needs fairly well.
I was wondering exactly the same question, and finally I found this performance tests made by IBM. http://www.ibm.com/developerworks/library/x-databdopt2/. JiBX is my choice I guess, hehe.