Alternatives for DOM API on Android? - java

We have an internal library that uses the org.w3c.dom DOM API to read and write XML. When attempting to use this library on Android I found that it no longer works. It appears that Android implements only a subset of the DOM API. I don't know the reasons for this, and I know that it's fixed in Android 2.2, but I still need to target older devices.
I know a number of alternative DOM libraries for "regular" Java, such as XOM and Dom4j. Can anyone recommend a DOM library that meets the following goals?
It has to work on Android.
It should be small (since people pay per MB).
Ideally, it should be similar to the org.w3c.dom API since I need to rewrite the existing code.
It's probably impossible to meet all three goals, but with two I would already be happy. Also, out of curiosity, does anyone know why the DOM API is not fully supported? I can understand the reasons for not implementing Java Sound etc., but XML seems quite essential to me.

My general recommendation would be to stay well away from a DOM parser, because the performance is abysmal compared to using a direct parser. You would be much better off parsing with XmlPullParser or SAX.
I know, you already have code that uses DOM, and that is how you want to do it. But believe me, you should not be using this on a mobile device.

Related

What pull parser implementation to use and when?

I need to use a xml pull parser. I can find stax-api.jar which seems to be already part of com.sun.xml.* and it seems that there is already something stax related implemented.
com.sun.xml unfortunately has no sources in JDK 6, so I can't tell.
Also there are xmlpull, stax.codehaus.org and apache axiom, that kinda implements stax-api. stax.codehaus.org seems to be a stax reference implementation. Xmlpull seems to be done by the same people as the reference implementation and Apache Axiom seems to be a StAX based parser that was created for Apache Axis2.
Could you please clarify what are the main differences, what API to use and when would you use one of these implementations and why ?
Edit: Before you decide to close this question, notice that xmlpull.org and stax.codehaus.org releases are pretty old (5 years) and one really can't say if the stax parser implementation is part of sun.com.xml.*.
I'd just need someone with pull parser experience to tell me, what to use and why.
For instance, Apache Abdera project (I'm parsing atom feeds too) is using Axiom implementation that seems to be implementing its Axiom-api and also geronimo-stax-api_1.0_spec
Aside from pointing out that JDK/JRE bundles Sun's SJSXP which works ok at this point, I would recommend AGAINST using Stax ref impl (stax.codehaus.org) -- do NOT use it for anything, ever. It has lots of remaining bugs (although many were fixed, initial versions were horrible), isn't particularly fast, doesn't implement even all mandatory features. Stay clear of it.
I am partial to Woodstox, which is by far the most complete implementation for XML features (on par with Xerces, about the only other Java XML parser that can say this), more performant than Sjsxp, and all around solid parser and generator -- this is why most modern Java XML web service frameworks and containers bundle Woodstox.
Or, if you want super-high performance, check out Aalto. It is successor to Woodstox, with less features (no DTD handling) but 2x faster for many common cases.
And if you ever need non-blocking/async parsing (for NIO based input for example), Aalto is the only known Java XML parser to offer that feature.
As to Axiom: it is NOT a parser, but tree model built on top of Stax parser like Woodstox, so they didn't reinvent the wheel. XmlPull predates Stax API by couple of years; basically Stax standardization came about people using XmlPull, liking what they saw, and Sun+BEA wanting to standardize the approach. There was some friction in the process, so in the end XmlPull was not discontinue when Stax was finalized, but one can think of Stax as successor -- XmlPull is still used for mobile devices; I think Android platform includes it.
(disclaimers: I am involved in both Aalto and Woodstox projects; as well as provided more than a dozen bug fixes to both SJSXP and Stax RI)
As of Java 1.6, there is a StaX implementation inside the plain bundled JRE. You can use that. If you don't like the performance, drop in woodstox.
Axiom is something else entirely, much more complex. Xmlpull seems to be going by the wayside in favor of one Stax implementation or another.

Best method for text parsing androidK

I am trying to write a program in android that does a lot of string and xml parsing.
I need suggestion of which way to go :
Use JNI and implement parsing in C++ and use C++ xml SAX parsing (Android - NDk)
Go with java and parse xml with SAX
Have you considered a server-side component that could do the majority of the parsing, which then sends a stream to devices that is significantly easier for the device to parse? You could present a web interface in addition to a mobile interface. It would be easier to setup subscriptions for which you could charge a fee if you someday decided to.
I would suggest option 2.
It may not be worth it to go with the complexity of NDK just for the parsing purpose.

libxml2 from java

This question is somewhat related to
Fastest XML parser for small, simple documents in Java
but with a few more specifics.
I'm working on an application which needs to parse many (10s of millions), small (approx. 300k) xml documents. The current implementation is using xerces-j and it takes about 2.5 ms per xml document on a 1.5 GHz machine. I'd like to improve this performance. I came across this article
http://www.xml.com/pub/a/2007/05/16/xml-parser-benchmarks-part-2.html
claiming that libxml2 can parse about an order of magnitude faster than any java parsers. I'm not sure if I believe it, but it caught my attention. Has anyone tried using libxml2 from the jvm? If so, is it faster than java dom parsing (xerces)? I'm thinking I'd still need my java dom structure, but I'm guessing that copying from a c-structured dom into java-dom shouldn't take long. I must have java-dom - sax will not help me in this case.
update: I just wrote a test for libxml2 and it wasn't any faster than xerces... granted my c coding ability is extremely rusty.
update I broadened the question a bit here:
why is sax parsing faster than dom parsing ? and how does stax work?
and am open to the possibility of ditching dom.
Thanks
In Java, StAX JSR-173 is generally considered to be the fastest approach to parsing XML. There are multiple implementations of StAX, the Woodstox implementation is generally regarded as being fast.
To improve performance I would avoid DOM. What are you doing with the XML? If you are ultimately dealing with it as objects, the you should consider an OXM solution. The standard is JAXB JSR-222. JAXB implementations such as MOXy (I'm the tech lead) will even allow you to do a partial mapping which will improve performance:
http://bdoughan.blogspot.com/2010/09/xpath-based-mapping-geocode-example.html
First of all, your question does not contain a question. What do you want to know?
I suppose you were using JNI to convert the c-dom into a java-dom. I dont know if there are official numbers, but in my experience c+JNI often is slower than directly doing it in java.
If you really want to speed up your processing, try to get rid of the dom (why do you need it? Maybe we can think of a solution together). If all xml files have the same schema, use your own specialized data model (and a SAX parser).
If you only use a subset of xml (i.e. without namespaces, only few attributes), consider writing your own parser that directly produces more efficient java objects (but I would not recommend that).

XOM v/s javax.xml.parsers

i want to do read simple XML file .i found
Simple way to do Xml in Java
There are also several parsers available just wanted to make sure that what are the advantages of using XOM parser over suns parser
Any suggestions?
XOM is extremely quick compared to the standard W3C DOM. If that's your priority, there's none better.
However, it's still a DOM-type API, and so it's not memory efficient. It's not a replacement for SAX or STAX.
You might want to check this question about the best XML library and its top (XOM) answer; lots of details about advantages of XOM. (Leave a comment if something is unclear; Peter Štibraný seems to know XOM inside and out.)
As mentioned, XOM is very quick and simple in most tasks compared to standard javax.xml. For examples, see this post in a question about the simplest way to read in an XML file in Java. I collected some nice examples that make XOM look pretty good (and javax.xml rather clumsy) there. :-)
So personally I've come to like XOM after evaluating (as you can see in the linked posts); for any new Java project I'd most likely choose XOM for XML handling. The only shortcoming I've found is that it doesn't directly support streaming XML (unlike dom4j where I'm coming from), but with a simple workaround it can stream just fine.
How do you need to access your data?
If it is one-pass, then you don't need to build the tree in memory. You can use SAX (fast, simple) or StAX (faster, not quite so simple).
If you need to keep the tree in memory to navigate, XOM or JDOM are good choices. DOM is the Choice Of Last Resort, whether it is level 1, 2, or 3, with or without extensions.
Xerces, which is the parser included with Java (although you should get the updated version from Apache and not use the one bundled with Java, even in 6.0), also has a streaming native interface called XNI.
If you want to hook other pre-made parts up in the chain, often SAX or StAX work well, since they might build their own model in memory. For example, the Saxon XSLT/XQuery engine works with DOM, SAX or StAX, but builds internally a TinyTree (default) or DOM (optional). DataDirect XQuery works with SAX, StAX or DOM also, but really likes StAX.

What Java XML library do you recommend (to replace dom4j)? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm looking for something like dom4j, but without dom4j's warts, such as bad or missing documentation and seemingly stalled development status.
Background: I've been using and advocating dom4j, but don't feel completely right about it because I know the library is far from optimal (example: see how methods in XSLT related Stylesheet class are documented; what would you pass to run() as the String mode parameter?)
Requirements:
The library should make basic XML handling easier than it is when using pure JDK (javax.xml and org.w3c.dom packages). Things like this:
Read an XML document (from file or String) into an object, easily traverse and manipulate the DOM, do XPath queries and run XSLT against it.
Build an XML document in your Java code, add elements and attributes and data, and finally write the document into a file or String.
I really like what dom4j promises, actually: "easy to use, open source library for working with XML, XPath and XSLT [...] with full support for DOM, SAX and JAXP." And upcoming dom4j 2.0 does claim to fix everything: fully utilise Java 5 and add missing documentation. But unfortunately, if you look closer:
Warning: dom4j 2.0 is in pre-alpha
stage. It is likely it can't be
compiled. In case it can be compiled
at random it is likely it can't run.
In case it runs occasionally it can
explode suddenly. If you want to use
dom4j, you want version 1.6.1. Really.
...and the website has said that for a long time. So is there a good alternative to dom4j? Please provide some justification for your preferred library, instead of just dumping names and links. :-)
Sure, XOM :-)
XOM is designed to be easy to learn
and easy to use. It works very
straight-forwardly, and has a very
shallow learning curve. Assuming
you're already familiar with XML, you
should be able to get up and running
with XOM very quickly.
I use XOM for several years now, and I still like it very much. Easy to use, plenty of documentation and articles on the web, API doesn't change between releases. 1.2 was released recently.
XOM is the only XML API that makes no
compromises on correctness. XOM only
accepts namespace well-formed XML
documents, and only allows you to
create namespace well-formed XML
documents. (In fact, it's a little
stricter than that: it actually
guarantees that all documents are
round-trippable and have well-defined
XML infosets.) XOM manages your XML so
you don't have to. With XOM, you can
focus on the unique value of your
application, and trust XOM to get the
XML right.
Check out web page http://www.xom.nu/ for FAQ, Cookbook, design rationale, etc. If only everything was designed with so much love :-)
Author also wrote about What's Wrong with XML APIs (and how to fix them). (Basically, reasons why XOM exists in the first place)
Here is also 5-part Artima interview with author about XOM, where they talk about what's wrong with XML APIs, The Good, the Bad, and the DOM, A Design Review of JDOM, Lessons Learned from JDOM and finally Design Principles and XOM.
The one built into the JDK ... with a few additions.
Yes, it's painful to use: it is modeled after W3C specs that were clearly designed by committee. However, it is available anywhere, and if you settle on it you don't run into the "I like Dom4J," "I like JDOM," "I like StringBuffer" arguments that come from third-party libraries. Especially since such arguments can turn into different pieces of code using different libraries ...
However, as I said, I do enhance slightly: the Practical XML library is a collection of utility classes that make it easier to work with the DOM. Other than the XPath wrapper, there's nothing complex here, just a bunch of routines that I found myself rewriting for every job.
I've been using XMLTool for replacing Dom4j and it's working pretty well.
XML Tool uses Fluent Interface pattern to facilitate XML manipulations:
XMLTag tag = XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");
System.out.println(tag.toString());
It's made for Java 5 and it's easy to create an iterable object over
selected elements:
for (XMLTag xmlTag : tag.getChilds()) {
System.out.println(xmlTag.getCurrentTagName());
}
I've always liked jdom. It was written to be more intuitive than DOM parsing(and SAX parsing always seems clumsy anyway).
From the mission statement:
There is no compelling reason for a
Java API to manipulate XML to be
complex, tricky, unintuitive, or a
pain in the neck. JDOMTM is both
Java-centric and Java-optimized. It
behaves like Java, it uses Java
collections, it is completely natural
API for current Java developers, and
it provides a low-cost entry point for
using XML.
That's pretty much been my experience - fairly intuitive navigation of node trees.
I use XStream, its a simple library to serialize objects to XML and back again.
it can be annotation-driven (like JAXB), but it has very simple and easy to use api and you can even generate JSON.
I'll add to the built-in answer by #kdgregory by saying why not JAXB?
With a few annotations its pretty easy to model most XML documents. I mean your probably going to parse the stuff and put in an object right?
JAXB 2.0 is built in to JDK 1.6 and unlike many other builtin javax libraries this one is pretty good (Kohusuke worked on it so you know its good).
In a recent project I had to do some XML parsing, and ended up using Simple Framework, recommended by a colleague.
I was quite happy with it in the end. It uses an annotation-based approach of mapping XML elements and attributes to Java classes and fields.
<example>
<a>
<b>
<x>foo</x>
</b>
<b>
<y>bar</y>
</b>
</a>
</example>
Corresponding Java code:
#Root
public class Example {
#Path("a/b[1]")
#Element
private String x;
#Path("a/b[2]")
#Element
private String y;
}
It's all quite different from dom4j or XOM. You avoid writing silly, boilerplatey XML handling code, but at first you'll probably bang your head against a wall for a while trying to get the annotations right.
(It was me who asked this question 4 years ago. While XOM seems a decent and quite popular dom4j replacement, I haven't come to fully embrace it. Curious that no-one had mentioned Simple Framework here. I decided to fix that, as I'd probably use it again.)
In our project we are using http://www.castor.org/ but just for small XML files. It's really easy to learn, needs just a mapping XML file (or none if the XML tags match perfectly class attributes) and it's done. It supports listeners (like callbacks) to perform additional processing. The cons: it is not a Java EE standard like JAXB.
you can try JAXB, with annotations its very handy and simple to do: Java Architecture for XML Binding.
I'm sometimes using Jericho, which is primarily HTML parser, but can parse any XML-like structure.
Of course it is only for the simplest XML operations, such as finding tags with given name, iterating through structure, replacing tags and its attributes, but aren't this the most use cases?
For building XML documetns, I suggest xmlenc. It is used in cassandra.

Categories

Resources