Java XML parser? - java

I'm currently converting a program I wrote in Visual Basic .NET (the 2005 variety) into Java. It used built-in XML methods to parse and generate the user's saved data, does Java have an equivalent feature built in or am I going to have to change file processing implementations? (I'd rather not, there's a lot of code I'd have to change.)

Yes, Java can parse XML. Here's an example that takes in a String that contains XML and builds a Document object out of it:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
InputSource inputSource = new InputSource(new StringReader(xml));
Document document = documentBuilder.parse(inputSource);
You can then use the XPath API to query the dom. Here's a tutorial/writeup about it.
As far as serializing objects to XML, the official implementation is JAXB and it is part of Java since 1.6. Here's a simple example. It will let you serialize and deserialize to and from XML.
You can also create a DOM object manually and add nodes to it, but it's a little more tedious:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
Element rootNode = document.createElement("root");
Element childNode = document.createElement("child");
childNode.setTextContent("I am a child node");
childNode.setAttribute("attr", "value");
rootNode.appendChild(childNode);
document.appendChild(rootNode);

I'm assuming that you mean that the properties/structure was generated through the classes/beans themselves? If so, then the answer is no [without an third party component]. I've used XStream before, and that is about the closest that I've gotten to .NET's XML Class serialization.

Related

Can I parse XML in Java without taking XML file input from outside?

Generally using DOM, SAX or XPath etc parser we do take input from outside Java code like this:
File inputFile = new File("C:\\Users\\DELL\\Desktop\\catalog.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
So can you parse XML file without taking input like this? I want to write my XML code alongside Java code.
Use DocumentBuilder.parse(new InputStream(new StringReader(xml))) where xml is a string containing the XML to be parsed.
That's if you really must use DOM. I can't imagine why anyone uses it any more, when alternatives such as JDOM2 are so much better.

Http charset vs xml encoding (utf-8, utf-16, etc)

Which one I should use to parse the xml file. what is the recommended approach to the parse http-xml file. my approach is read xml as String and use DocumentBuilder to parse the String.
Is this right approach.
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
Document doc = null;
InputSource is = null;
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
is = new InputSource(new StringReader(xmlString));
doc = dBuilder.parse(is);
XML specifies its own encoding in <!xml encoding="..."> defaulting to UTF-8.
Using a StringReader using a String, already assumes that the reading has been done in a guessed encoding. That seems less recommendable, than using a pure binary format, like File or InputStream.
Another factor is the document base, to find included documents, xsd, dtd. There the usage of an XML catalog might help, storing such files offline.

XPath Java count of child nodes

I want to count some child nodes of a given xml. But it always returns me 0 and I can't figure out why.
Here's the xml:
<FirstOne xmlns:xxx="http://www.w3.org/2001/XMLSchema-instance">
<Formulas xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
</Formulas>
</FirstOne>
I want to count the number of "xxx:yyy". In this example 3.
I tried the following:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(fileArray[i].toString())));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
expression = "count(//Formulas/xxx:yyy)";
Double result = (Double) xpath.evaluate(expression, doc, XPathConstants.NUMBER);
It always gives me 0.0 ...
Thanks for your help!
The problems all stem from the namespaces.
Firstly, XPath evaluation is only defined over namespace-well-formed XML, so you need to ensure that the aa and cc prefixes are properly mapped to namespace URIs in the XML.
Secondly, you need to parse the XML into a DOM tree using a namespace-aware parser (for what I can only assume are historical reasons, DocumentBuilderFactory is not namespace-aware by default).
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(fileArray[i].toString())));
Now you have a proper namespace-well-formed DOM tree you need to handle the namespaces correctly in the XPath. You need to define a NamespaceContext telling the XPath how to relate prefixes and namespace URIs. Annoyingly there's no default implementation of this interface available in the core Java libraries but there are third-party implementations such as Spring's SimpleNamespaceContext, or it's only three methods to implement it yourself. With a SimpleNamespaceContext:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
xpath.setNamespaceContext(nsCtx);
nsCtx.bindNamespaceUri("x", "http://www.w3.org/2001/XMLSchema-instance");
With this context in place you can now select namespaced nodes in your XPath expression:
String expression = "count(//Formulas/x:yyy)";
(the prefixes you use are the ones in the NamespaceContext, not necessarily the ones in the original XML source).
While some DOM parsers and XPath implementations might let you get away with parsing non-namespace-aware and omitting the prefixes in the XPath expressions, this is an implementation detail and the behaviour is not defined by the specifications. It might work in one version but fail in another, or behave differently if you add additional JARs to your project that change the default parser, etc.
While xxx is the tag prefix, use just count(//Formulas/yyy).

How to append xml nodes (as a string) into an existing XML Element node (only using java builtins)?

(Disclaimer: using Rhino inside RingoJS)
Let's say I have a document with an element , I don't see how I can append nodes as string to this element. In order to parse the string to xml nodes and then append them to the node, I tried to use documentFragment but I couldn't get anywhere. In short, I need something as easy as .NET's .innerXML but it's not in the java api.
var dbFactory = javax.xml.parsers.DocumentBuilderFactory.newInstance();
var dBuilder = dbFactory.newDocumentBuilder();
var doc = dBuilder.newDocument();
var el = doc.createElement('test');
var nodesToAppend = '<foo bar="1">Hi <baz>there</baz></foo>';
el.appendChild(???);
How can I do this without using any third party library ?
[EDIT] It's not obvious in the example but I'm not supposed to know the content of variable 'nodesToAppend'. So please, don't point me to tutorials about how to create elements in an xml document.
You can do this in java - you should be able to derive the Rhino equivalent:
DocumentBuilderFactory dbFactory = javax.xml.parsers.DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
Element el = doc.createElement('test');
doc.appendChild(el);
String xml = "<foo bar=\"1\">Hi <baz>there</baz></foo>";
Document doc2 = builder.parse(new ByteArrayInputStream(xml.getBytes()));
Node node = doc.importNode(doc2.getDocumentElement(), true);
el.appendChild(node);
Since doc and doc2 are two different Documents the trick is to import the node from one document to another, which is done with the importNode api above
I think your question is like this question and there is answer on it :
Java: How to read and write xml files?
OR see this link http://www.mkyong.com/java/how-to-create-xml-file-in-java-dom/

Writing XML according to a DTD

I would like to know if there is a way (particularly, an API), in Java, to write a XML in a SAX-like way (i.e., event-like way, differently from JDOM, which I cannot use) that takes a DTD and guarantees that my XML document is being correctly written.
I have been using SAX for parsing and I have written a XML writer layer by myself as if I were writing a plain file (through OutputStreamWriter), but I have seen that my XML writer layer is not always following the DTD rules.
SAX does not know to write XML documents. It is attended to parse them. So, you can choose any method you want to create document and then validate it using SAX API against DTD.
BTW may I ask you why are you limiting yourself to using tools that were almost obsolete about 10 years ago? Why not to use higher level API that converts objects to XML and vice versa? For example JAXB.
The Standard DocumentBuilder methodology can validate for you.
This snippet taken from http://www.edankert.com/validate.html#Validate_using_internal_DTD
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
new Source[] {new StreamSource("contacts.xsd")}));
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new SimpleErrorHandler());
Document document = builder.parse(new InputSource("document.xml"));

Categories

Resources