Does Java have a built in XML library for generating and parsing documents? If not, which third party one should I be using?
The Sun Java Runtime comes with the Xerces and Xalan implementations that provide the ability to parse XML (via the DOM and SAX intefaces), and also perform XSL transformations and execute XPath queries.
However, it is better to use the JAXP API to work on XML, since JAXP allows you to not worry about the underlying implementation used (Xerces or Crimson or any other). When you use JAXP, at runtime the JRE will use the service provider it can locate, to perform the needed operations. As indicated previously, Xerces/Xalan will be used since it is shipped with the Sun JRE (not others though), so you dont have to download and install a specific provider (say, a different version of Xerces, or Crimson).
A basic JAXP tutorial can be found in The J2EE 1.4 tutorial (Its from the J2EE tutorial, but it will help).
Do note that the Xerces/Xalan implementations provided by the Sun JRE, will not be found in the org.apache.xerces.* or org.apache.xalan.* packages. Instead, they will be present in the internal com.sun.org.apache.xerces.* and com.sun.org.apache.xalan.* packages.
By the way, JDOM is not an XML parser - it will use the parser provided to it by JAXP in order to provide you with an easier abstraction to work with.
Yes. It has a two options in the javax.xml package: DOM builds documents in memory, and SAX is an event-based approach.
You may also want to look at JDOM, which is a 3rd party library that offers a combination of the two, and can be easier to use.
Yes. Java contains javax.xml library. You can checkout some samples at Sun's Java API for XML Code Samples.
However, I personally like using JDOM library.
javax.xml package contains Java's native XML solution which is actually a special version of Xerces. You can do what you asked with it, however using 3rd party libraries such as JDOM makes the whole process a lot easier.
Have a look at JAX-B This is increasingly the "standard" way to do XML processing. Uses Java annotations to simplify the programming model. The reference gives sample code for reading and writing XML.
Java does come with a large set of packages and classes to handle XML. These are part of the Standard Edition JDK, and located under the javax.xml package.
Aside from reading XML and writing it with DOM or SAX, these packages also perform XSL transformations, JAX-B object marshalling and unmarshalling, XPath processing and web services SOAP handling. I advise you to read more about these online in Sun's excellent tutorials.
I can't tell you which one to use (few requirements specified, and there
are a dozen libraries), but I would seriously consider XOM (here).
Written by Eliotte Rusty Harold, it is quite complete in terms of the XML
spec, and generally excellent. I have found it very easy to use. See the
link above for Harold's motivation and criticism of other solutions.
You could have a look to the javax.xml package, which contains everything you need to work with XML documents in Java...
Java API for XML Processing (JAXP) is part of standard library JavaSE. JAXP allows you to code against standard interface and lets you pick the parser implementation later if needed.
The Java API for XML Processing, or
JAXP for short, enables applications
to parse and transform XML documents
using an API that is independent of a
particular XML processor
implementation. JAXP also provides a
pluggability feature which enables
applications to easily switch between
particular XML processor
implementations.
You can use StAX (streaming API for XML)
http://en.wikipedia.org/wiki/StAX
http://www.xml.com/pub/a/2003/09/17/stax.html
https://sjsxp.dev.java.net/
StAx is optimized to process large xml files, without causing OOM (out of memory) problem :)
As is said above... Java's SDK now comes with Xerces and Xalan. Xalan only implements version 1.0 of the XSLT API, so if you want 2.0, you should look at Saxon from Michael Kay.
Related
What is the difference of them? It is said that JAXP is only a API Specification, JDOM and DOM4J realized it, is it right? And all of them need a XML parser, just like XERCES, is it right?
thanks in advance!
JAXP (JSR-206)
Is a set of standard APIs for Java XML parsers. It covers the following areas:
DOM (org.w3c.dom package)
SAX (org.xml.sax package)
StAX/JSR-173 (java.xml.stream)
XSLT (javax.xml.transform)
XPath (javax.xml.xpath)
Validation (javax.xml.validation)
Datatypes (javax.xml.datatype)
This standard was created by an expert group with representatives from many companies and individuals. As a standard this means there are multiple implementations (Xerces implements JAXP), and it can be included in the JDK.
Xerces
Is an open source Java XML parser that provides DOM and SAX implementations that are compliant with the JAXP standard.
JDOM and DOM4J
Are open source Java XML parsers.
You're comparing apples and automobiles.
JAXP is an API that is now bundled with the JDK
JDOM is a different API, but also a library
DOM4J is also a different API and library
XERCES is a XML parser implemented in Java. A version of XERCES is also bundled in the JDK.
Which API you use is largely a question of personal preference. I like JDOM in part because I'm used to working with it. There are, similarly, several implementations of XML parsers. If you're programming in Java using a recent JDK, you will be able to use JAXP without having to add external libraries.
I need to use a xml pull parser. I can find stax-api.jar which seems to be already part of com.sun.xml.* and it seems that there is already something stax related implemented.
com.sun.xml unfortunately has no sources in JDK 6, so I can't tell.
Also there are xmlpull, stax.codehaus.org and apache axiom, that kinda implements stax-api. stax.codehaus.org seems to be a stax reference implementation. Xmlpull seems to be done by the same people as the reference implementation and Apache Axiom seems to be a StAX based parser that was created for Apache Axis2.
Could you please clarify what are the main differences, what API to use and when would you use one of these implementations and why ?
Edit: Before you decide to close this question, notice that xmlpull.org and stax.codehaus.org releases are pretty old (5 years) and one really can't say if the stax parser implementation is part of sun.com.xml.*.
I'd just need someone with pull parser experience to tell me, what to use and why.
For instance, Apache Abdera project (I'm parsing atom feeds too) is using Axiom implementation that seems to be implementing its Axiom-api and also geronimo-stax-api_1.0_spec
Aside from pointing out that JDK/JRE bundles Sun's SJSXP which works ok at this point, I would recommend AGAINST using Stax ref impl (stax.codehaus.org) -- do NOT use it for anything, ever. It has lots of remaining bugs (although many were fixed, initial versions were horrible), isn't particularly fast, doesn't implement even all mandatory features. Stay clear of it.
I am partial to Woodstox, which is by far the most complete implementation for XML features (on par with Xerces, about the only other Java XML parser that can say this), more performant than Sjsxp, and all around solid parser and generator -- this is why most modern Java XML web service frameworks and containers bundle Woodstox.
Or, if you want super-high performance, check out Aalto. It is successor to Woodstox, with less features (no DTD handling) but 2x faster for many common cases.
And if you ever need non-blocking/async parsing (for NIO based input for example), Aalto is the only known Java XML parser to offer that feature.
As to Axiom: it is NOT a parser, but tree model built on top of Stax parser like Woodstox, so they didn't reinvent the wheel. XmlPull predates Stax API by couple of years; basically Stax standardization came about people using XmlPull, liking what they saw, and Sun+BEA wanting to standardize the approach. There was some friction in the process, so in the end XmlPull was not discontinue when Stax was finalized, but one can think of Stax as successor -- XmlPull is still used for mobile devices; I think Android platform includes it.
(disclaimers: I am involved in both Aalto and Woodstox projects; as well as provided more than a dozen bug fixes to both SJSXP and Stax RI)
As of Java 1.6, there is a StaX implementation inside the plain bundled JRE. You can use that. If you don't like the performance, drop in woodstox.
Axiom is something else entirely, much more complex. Xmlpull seems to be going by the wayside in favor of one Stax implementation or another.
I've been using JDOM for general XML parsing for a long time, but get the feeling that there must be something better, or at least more lightweight, for Java 5 or 6.
There's nothing wrong with the JDOM API, but I don't like having to include Xerces with my deployments. Is there a more lightweight alternative, if all I want to do is read in an XML file or write one out?
The best lightweight alternative is, in my opinion, XOM, but JDOM is still a very good API, and I see no reason to replace it.
It doesn't have a dependency on Xerces, though (at least, it doesn't need the Apache Xerces distro, it works alongside the Xerces that's packaged into the JRE).
I've used the javax.xml.stream package (XMLStreamReader/XMLStreamWriter) to read and write XML using xml pull/push techniques. It's worked for me so far.
We use JAXB - it generates the classes based on the schema. You can generate your files without a schema, and just annotate how you want the xml to be.
There was recently a fork of JDOM for java 5 called coffeeDOM. You should check it out.
You should check out Commons Digester (see the answer I've given here). It provides a very lightweight mechanism for parsing XML.
JDOM is very good and simple. There has been many new ways to parse XML after release of JDOM, but those has have different focus than simplicity. JAXB makes things simple in some cases when you have well known XML document has your schema does not get updated daily basis.
New push parsers are very good and even mandatory for very large XML files (hundreds of MBs).
Speed benefit for SAX parser can be ten fold.
Use one of the XML APIs that come standard with Java, so that you don't have to include any third-party libraries.
XML in the Java Platform Standard Edition (Java SE) 6
I would like to think JAXP is a good choise for you.
It's standard, included in JDK, it provides clear interface and allows to hook up any implementations..
If all what you need in is to read and write not very large and overcomplicated xml files, JAXP DOM api embedded in JDK will cover you requirements.
i want to do read simple XML file .i found
Simple way to do Xml in Java
There are also several parsers available just wanted to make sure that what are the advantages of using XOM parser over suns parser
Any suggestions?
XOM is extremely quick compared to the standard W3C DOM. If that's your priority, there's none better.
However, it's still a DOM-type API, and so it's not memory efficient. It's not a replacement for SAX or STAX.
You might want to check this question about the best XML library and its top (XOM) answer; lots of details about advantages of XOM. (Leave a comment if something is unclear; Peter Štibraný seems to know XOM inside and out.)
As mentioned, XOM is very quick and simple in most tasks compared to standard javax.xml. For examples, see this post in a question about the simplest way to read in an XML file in Java. I collected some nice examples that make XOM look pretty good (and javax.xml rather clumsy) there. :-)
So personally I've come to like XOM after evaluating (as you can see in the linked posts); for any new Java project I'd most likely choose XOM for XML handling. The only shortcoming I've found is that it doesn't directly support streaming XML (unlike dom4j where I'm coming from), but with a simple workaround it can stream just fine.
How do you need to access your data?
If it is one-pass, then you don't need to build the tree in memory. You can use SAX (fast, simple) or StAX (faster, not quite so simple).
If you need to keep the tree in memory to navigate, XOM or JDOM are good choices. DOM is the Choice Of Last Resort, whether it is level 1, 2, or 3, with or without extensions.
Xerces, which is the parser included with Java (although you should get the updated version from Apache and not use the one bundled with Java, even in 6.0), also has a streaming native interface called XNI.
If you want to hook other pre-made parts up in the chain, often SAX or StAX work well, since they might build their own model in memory. For example, the Saxon XSLT/XQuery engine works with DOM, SAX or StAX, but builds internally a TinyTree (default) or DOM (optional). DataDirect XQuery works with SAX, StAX or DOM also, but really likes StAX.
I need to read an XML file using Java. Its contents are something like
<ReadingFile>
<csvFile>
<fileName>C:/Input.csv</fileName>
<delimiter>COMMA</delimiter>
<tableFieldNamesList>COMPANYNAME|PRODUCTNAME|PRICE</tableFieldNamesList>
<fieldProcessorDescriptorSize>20|20|20</fieldProcessorDescriptorSize>
<fieldName>company_name|product_name|price</fieldName>
</csvFile>
</ReadingFile>
Is there any special reader/JARs or should we read using FileInputStream?
Check out Java's JAXP APIs which come as standard. You can read the XML in from the file into a DOM (object model), or as SAX - a series of events (your code will receive an event for each start-of-element, end-of-element etc.). For both DOM and SAX, I would look at an API tutorial to get started.
Alternatively, you may find JDOM easier/more intuitive to use.
Another suggestion: Try out Commons digester. This allows you to develop parsing code very quickly using a rule-based approach. There's a tutorial here and the library is available here
I also agree with Brian and Alzoid in that JAXB is great to get you up and running quickly. You can use the xjc binding compiler that ships with the JDK to auto generate your Java classes given an XML schema.
xstream would do very nicely here. Check out the one page tutorial
You can user external libraries like
Castor https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1046622.html
I have used castor in past. Here are few other links that might help.
http://www.xml-training-guide.com/e-xml27.html
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/XMLReader.html
http://www.cafeconleche.org/books/xmljava/chapters/ch07.html
There are two major ways to parse XML with Java. The first is to use a SAX parser see here
which is fairly simple.
The second option is to use a DOM parser see here
which is more complicated but gives you more control.
JAXB is another technology that might suit your needs.