How to read an XML file with Java?

How to read an XML file with Java? - java

I don't need to read complex XML files. I just want to read the following configuration file with a simplest XML reader
<config>
<db-host>localhost</db-host>
<db-port>3306</db-port>
<db-username>root</db-username>
<db-password>root</db-password>
<db-name>cash</db-name>
</config>
How to read the above XML file with a XML reader through Java?

I like jdom:
SAXBuilder parser = new SAXBuilder();
Document docConfig = parser.build("config.xml");
Element elConfig = docConfig.getRootElement();
String host = elConfig.getChildText("host");

Since you want to parse config files, I think commons-configuration would be the best solution.
Commons Configuration provides a generic configuration interface which enables a Java application to read configuration data from a variety of sources (including XML)

You could use a simple DOM parser to read the xml representation.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
dom = db.parse("config.xml");

If you just need a simple solution that's included with the Java SDK (since 5.0), check out the XPath package. I'm sure others perform better, but this was all I needed. Here's an example:
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
...
try {
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource inputSource = new InputSource("strings.xml");
// result will equal "Save My Changes" (see XML below)
String result = xpath.evaluate("//string", inputSource);
}
catch(XPathExpressionException e) {
// do something
}
strings.xml
<?xml version="1.0" encoding="utf-8"?>
<resources>
<string name="saveLabel">Save My Changes</string>
</resources>

There are several XML parsers for Java. One I've used and found particularly developer friendly is JDOM. And by developer friendly, I mean "java oriented" (i.e., you work with objects in your program), instead of "document oriented", as some other tools are.

I would recommend Commons Digester, which allows you to parse a file without writing reams of code. It uses a series of rules to determine what action is should perform when encountering a given element or attribute (a typical rule might be to create a particular business object).

For a similar use case in my application I used JaxB. With Jaxb, reading XML files is like interacting with Java POJOs. But to use JAXB you need to have the xsd for this xml file. You can look for more info here

If you want to be able to read and write objects to XML directly, you can use XStream

Although I have not tried XPath yet as it has just come to my attention now, I have tried a few solutions and have not found anything that works for this scenario.
I decided to make a library that fulfills this need as long as you are working under the assumptions mentioned in the readme. It has the advantage that it uses SAX to parse the entire file and return it to the user in the form of a map so you can lookup values as key -> value.
https://github.com/daaso-consultancy/ConciseXMLParser
If something is missing kindly inform me of the missing item as I only develop it based on the needs of others and myself.

Related

which is the best way for fetching value from XML : JAXB or DOM?

Which one is the efficient way for reading xml. I'm aware of two ways:
1)JAXB:
By annotating my classes with jaxb annotation we get the xml in java object vice versa using Marshalling & Unmarshalling of object.
2)DOM:
Using dom parser for parsing the xml and using xpath values from xml can be accessed.
Example of DOM:
File fXmlFile = new File("/Users/link1/input.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
As per the business demands, I'm expecting to use the fastest way and the better way between the above two. Suggestions and few tactics would be appreciated.

First question to ask: does your XML always have the same structure and can this structure be mapped on a hierarchy of Java objects?
If Yes -> either use JAXB or Jackson XmlMapper
If No (the structure of your XML varies) -> Do you require random access to the data in your XML with many reads and possibly some writes (after which you convert the data back to XML)?
2.1. If Yes -> use DOM (It is designed for in memory handling of the XML Document Tree, but has more overhead)
2.2. If No (more efficient XML parsing) -> Do you need to parse all information in the XML or do you need XML validation?
2.2.1 If Yes -> use SAX (it is included in the JDK and allows for validation)
2.2.2 If No -> use StAX (it is an XML pull parser that allows reading some values in the XML without having to parse the full XML, but it does not offer validation.)

XSLT: Getting URI of a declared entity

I have an input XML that has entities declared in it. It looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doctype PUBLIC "desc" "DTD.dtd" [
<!ENTITY SLSD_68115_jpg SYSTEM "68115.jpg" NDATA JPEG>
]>
The DTD.dtd file contains the neccessary notation:
<!NOTATION JPEG SYSTEM "JPG" >
During XSLT transformation I would like to get the URI declared in the entity using the name 'SLSD_68115_jpg' like so:
<xsl:value-of select="unparsed-entity-uri('SLSD_68115_jpg')"/>
So that it would return something like "68115.jpg".
The problem is that it always returns an empty string. There is no way for me to modify the input xml. I understand that this could be a common problem from what I found on the internet, but i haven't found any final conclusions, solutions or alternatives to this problem.
It might be important to note that I had a problem before since I am using a StreamSource and things like systemId had to be set manually, I think this is where the problem might be hidden. It's like the transformator is unable to resolve the entity with given id.
I'm using Xalan, I probably need to provide more details but I'm not sure what to add, I'll answer any questions is there are any.
Any help would be greatly appretiated.

I found out why the "unparsed-entity-uri" was unable to resolve the declared entities. This might be a special case, but I will post this solution so it might save someone else a lot of time.
I'm (very) new to XSLT. The xsl file I got to work with however as a student was pretty extreme with multiple import statements and files containing more than 5K lines of code.
Simply by the time I got to the point where I needed the entities the transformator used a different document that was essentially the sub document of the original one, which is okay, but for example the entity declarations are not passed to the sub document. Therefore there is no way for me to use the entities from that point beyond.
Now like I said im new to XSLT but I think that for example lines like this can cause the problem:
<xsl:apply-templates select="exslt:node-set($nodelist)"/>
Because after this, entity references are no bueno.
If this was trivial then my apologies for waisting your time.
Thanks to everyone none the less!

Instead of a StreamSource, try a SAXSource configured with a validating parser:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setValidating(true);
spf.setNamespaceAware(true);
XMLReader xmlr = spf.newSAXParser().getXMLReader();
InputSource input = new InputSource(
new File("/path/to/file.xml").toURI().toString());
// if you already have an InputStream/Reader then do
// input.setByteStream or input.setCharacterStream as appropriate
SAXSource source = new SAXSource(xmlr, input);
Or you can use a DOMSource in the same way
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
dbf.setNamespaceAware(true);
File f = new File("/path/to/file.xml");
Document doc = dbf.newDocumentBuilder().parse(f);
DOMSource source = new DOMSource(doc, f.toURI().toString());

Are there any advantages to using an XSLT stylesheet compared to manually parsing an XML file using a DOM parser

For one of our applications, I've written a utility that uses java's DOM parser. It basically takes an XML file, parses it and then processes the data using one of the following methods to actually retrieve the data.
getElementByTagName()
getElementAtIndex()
getFirstChild()
getNextSibling()
getTextContent()
Now i have to do the same thing but i am wondering whether it would be better to use an XSLT stylesheet. The organisation that sends us the XML file keeps changing their schema meaning that we have to change our code to cater for these shema changes. Im not very familiar with XSLT process so im trying to find out whether im better of using XSLT stylesheets rather than "manual parsing".
The reason XSLT stylesheets looks attractive is that i think that if the schema for the XML file changes i will only need to change the stylesheet? Is this correct?
The other thing i would like to know is which of the two (XSLT transformer or DOM parser) is better performance wise. For the manual option, i just use the DOM parser to parse the xml file. How does the XSLT transformer actually parse the file? Does it include additional overhead compared to manually parsing the xml file? The reason i ask is that performance is important because of the nature of the data i will be processing.
Any advice?
Thanks
Edit
Basically what I am currently doing is parsing an xml file and process the values in some of the xml elements. I don't transform the xml file into any other format. I just extract some value, extract a row from an Oracle database and save a new row into a different table. The xml file I parse just contains reference values I use to retrieve some data from the database.
Is xslt not suitable in this scenario? Is there a better approach that I can use to avoid code changes if the schema changes?
Edit 2
Apologies for not being clear enough about what i am doing with the XML data. Basically there is an XML file which contains some information. I extract this information from the XML file and use it to retrieve more information from a local database. The data in the xml file is more like reference keys for the data i need in the database. I then take the content i extracted from the XML file plus the content i retrieved from the database using a specific key from the XML file and save that data into another database table.
The problem i have is that i know how to write a DOM parser to extract the information i need from the XML file but i was wondering whether using an XSLT stylesheet was a better option as i wouldnt have to change the code if the schema changes.
Reading the responses below it sounds like XSLT is only used for transorming and XML file to another XML file or some other format. Given that i dont intend to transform the XML file, there is probably no need to add the additional overhead of parsing the XSLT stylesheet as well as the XML file.

Transforming XML documents into other formats is XSLT's reason for being. You can use XSLT to output HTML, JSON, another XML document, or anything else you need. You don't specify what kind of output you want. If you're just grabbing the contents of a few elements, then maybe you won't want to bother with XSLT. For anything more, XSLT offers an elegant solution. This is primarily because XSLT understands the structure of the document it's working on. Its processing model is tree traversal and pattern matching, which is essentially what you're manually doing in Java.
You could use XSLT to transform your source data into the representation of your choice. Your code will always work on this structure. Then, when the organization you're working with changes the schema, you only have to change your XSLT to transform the new XML into your custom format. None of your other code needs to change. Why should your business logic care about the format of its source data?

You are right that XSLT's processing model based on a rule-based event-driven approach makes your code more resilient to changes in the schema.
Because it's a different processing model to the procedural/navigational approach that you use with DOM, there is a learning and familiarisation curve, which some people find frustrating; if you want to go this way, be patient, because it will be a while before the ideas click into place. Once you are there, it's much easier than DOM programming.
The performance of a good XSLT processor will be good enough for your needs. It's of course possible to write very inefficient code, just as it is in any language, but I've rarely seen a system where XSLT was the bottleneck. Very often the XML parsing takes longer than the XSLT processing (and that's the same cost as with DOM or JAXB or anything else.)
As others have said, a lot depends on what you want to do with the XML data, which you haven't really explained.

I think that what you need is actually an XPath expression. You could configure that expression in some property file or whatever you use to retrieve your setup parameters.
In this way, you'd just change the XPath expression whenever your customer hides away the info you use in yet another place.
Basically, an XSLT is an overkill, you just need an XPath expression. A single XPath expression will allow to home in onto each value you are after.
Update
Since we are now talking about JDK 1.4 I've included below 3 different ways of fetching text in an XML file using XPath. (as simple as possible, no NPE guard fluff I'm afraid ;-)
Starting from the most up to date.
0. First the sample XML config file
<?xml version="1.0" encoding="UTF-8"?>
<config>
<param id="MaxThread" desc="MaxThread" type="int">250</param>
<param id="rTmo" desc="RespTimeout (ms)" type="int">5000</param>
</config>
1. Using JAXP 1.3 standard part of Java SE 5.0
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder;
try {
builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(CFG_FILE);
XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH_FOR_PRM_MaxThread);
Object result = expr.evaluate(doc, XPathConstants.NUMBER);
if ( result instanceof Double ) {
System.out.println( ((Double)result).intValue() );
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
2. Using JAXP 1.2 standard part of Java SE 1.4-2
import javax.xml.parsers.*;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.*;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(CFG_FILE);
Node param = XPathAPI.selectSingleNode( doc, XPATH_FOR_PRM_MaxThread );
if ( param instanceof Text ) {
System.out.println( Integer.decode(((Text)(param)).getNodeValue() ) );
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
3. Using JAXP 1.1 standard part of Java SE 1.4 + jdom + jaxen
You need to add these 2 jars (available from www.jdom.org - binaries, jaxen is included).
import java.io.File;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
public class TestXPath {
private static final String CFG_FILE = "test.xml" ;
private static final String XPATH_FOR_PRM_MaxThread = "/config/param[#id='MaxThread']/text()";
public static void main(String[] args) {
try {
SAXBuilder sxb = new SAXBuilder();
Document doc = sxb.build(new File(CFG_FILE));
Element root = doc.getRootElement();
XPath xpath = XPath.newInstance(XPATH_FOR_PRM_MaxThread);
Text param = (Text) xpath.selectSingleNode(root);
Integer maxThread = Integer.decode( param.getText() );
System.out.println( maxThread );
} catch (Exception e) {
e.printStackTrace();
}
}
}

Since performance is important, I would suggest using a SAX parser for this. JAXB will give you roughly the same performance as DOM parsing PLUS it will be much easier and maintainable. Handling the changes in the schema also should not affect you badly if you are using JAXB, just get the new schema and regenerate the classes. If you have a bridge between the JAXB and your domain logic, then the changes can be absorbed in that layer without worrying about XML. I prefer treating XML as just a message that is used in the messaging layer. All the application code should be agnostic of XML schema.

Is there a Java XML API that can parse a document without resolving character entities?

I have program that needs to parse XML that contains character entities. The program itself doesn't need to have them resolved, and the list of them is large and will change, so I want to avoid explicit support for these entities if I can.
Here's a simple example:
<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>
Is there a Java XML API that can parse a document successfully without resolving (non-standard) character entities? Ideally it would translate them into a special event or object that could be handled specially, but I'd settle for an option that would silently suppress them.
Answer & Example:
Skaffman gave me the answer: use a StAX parser with IS_REPLACING_ENTITY_REFERENCES set to false.
Here's the code I whipped up to try it out:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
new FileInputStream("your file here"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
For the above XML, it will print "Entity Reference: something".

The STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:
Requires the parser to replace
internal entity references with their
replacement text and report them as
characters
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader. However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to not replace them. Still, it's got to be worth a try.

Works for me only when disabling support of external entities:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
inputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

A SAX parse with an org.xml.sax.EntityResolver might suit your purpose. You could for sure suppress them, and you could probably find a way to leave them unresolved.
This tutorial seems the most relevant: it shows how to resolve entities into strings.

I am not a Java developer, but I "think" Java xml classes support a similar functionality to .net for accomplishing this. IN .net the xmlreadersettings class you set the ProhibitDtd property false and set the XmlResolver property to null. This will cause the parser to ignore externally referenced entities without throwing an exception when they are read. I just did a google search for "Java ignore enity" and got lots of hits, some of which appear to address this topic. I realize this is not a total answer to your question but it should point you in a useful direction.

Converting a raw file (binary data ) into XML file

I'm working on a project under which i have to take a raw file from the server and convert it into XML file.
Is there any tool available in java which can help me to accomplish this task like JAXP can be used to parse the XML document ?

I guess you will need your objects for later use ,so create MyObject that will be some bean that you will load the values form your Raw File and you can write this to someFile.xml
FileOutputStream os = new FileOutputStream("someFile.xml");
XMLEncoder encoder = new XMLEncoder(os);
MyObject p = new MyObject();
p.setFirstName("Mite");
encoder.writeObject(p);
encoder.close();
Or you con go with TransformerFactory if you don't need the objects for latter use.

Yes. This assumes that the text in the raw file is already XML.
You start with the DocumentBuilderFactory to get a DocumentBuilder, and then you can use its parse() method to turn an input stream into a Document, which is an internal XML representation.
If the raw file contains something other than XML, you'll want to scan it somehow (your own code here) and use the stuff you find to build up from an empty Document.
I then usually use a Transformer from a TransformerFactory to convert the Document into XML text in a file, but there may be a simpler way.

JAXP can also be used to create a new, empty document:
Document dom = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.newDocument();
Then you can use that Document to create elements, and append them as needed:
Element root = dom.createElement("root");
dom.appendChild(root);
But, as Jørn noted in a comment to your question, it all depends on what you want to do with this "raw" file: how should it be turned into XML. And only you know that.

I think if you try to load it in an XmlDocument this will be fine

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.