Comparing two xml files using JAVA - java

I have to xml files say abc.xml & 123.xml which are almost similar, i mean has the same content, but the second one i.e, 123.xml has more content than the earlier one.
I want to read both the files using Java, and compare whether the content present in abc.xml for each tag is same as that in 123.xml, something like object comparison.
Please suggest me how to read the xml file using java and start comparing.
Thanks.

if you just want to compare then use this:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();
Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));
else see this
http://xmlunit.sourceforge.net/

I would go for the XMLUnit.
The features it provides :
the differences between two pieces of XML
The outcome of transforming a piece of XML using XSLT
The evaluation of an XPath expression on a piece of XML
The validity of a piece of XML
Individual nodes in a
piece of XML that are exposed by DOM Traversal
Good Luck!

I would use JAXB to generate Java objects from the XML files and then compare the Java files. They would make the handling much easier.

In general, if you know that you have two files with identical structure but slightly different and unordered content you are going to have to "read" the files to compare the contents.
If you have the XML Schema for your XML files then you could use JAXB to create a set of classes that will represent the specific DOM that is defined by your XML schema. The benefit of this approach is that you will not have to parse the XML file through generic functions for elements and attributes but rather through the actual fields that make sense to your problem.
Of course, to be able to detect the presence of the same entry across both files you are going to have to "match" them through some common field (for example, some ID).
To help you with the duplicates discovery process you could use some relevant data structure from Java's collections, like the Set (or one of its derivatives)
I hope this helps.

Well if you just want to compare and display then you can use Guiffy
It is a good tool. If u want to do the processing in backend then you must use DOM parser load both files to 2 DOM objects and compare attribute by attribute.

The right approach depends on two factors:
(a) how much control do you want over how the comparison is done? For example, do you need to control whether whitespace is significant, whether comments should be ignored, whether namespace prefixes should be ignored, whether redundant namespace declarations should be ignored, whether the XML declaration should be ignored?
(b) what answer do you want? (i) a boolean: same/different, (ii) a list of differences suitable for a human to process, (iii) a list of differences suitable for an application to process.
The two techniques I use are: (a) convert both files to Canonical XML and then compare strings. This gives very little control and only gives a boolean result. (b) compare the two trees using the XPath 2.0 deep-equal() function or the extended Saxon version saxon:deep-equal(). The Saxon version gives more control over how the comparison is done, and a more detailed report of the differences found (for human reading, not for application use).
If you want to write Java code, you could of course implement your own comparison logic - for example you could find an open source implementation of XPath deep-equal, and modify it to meet your requirements. It's only a hundred or so lines of code.

it's a bit overkill, but if your XML has schema, you can convert it into EMF metamodel & then use EMF Compare to compare.

Related

whitespace aware reading/writing of XML

I need to change some elements of an XML file which are under source control and write the file with no other differences to allow the developers to easily review the changes.
In detail I have a set of elements which need to have an id attribute in the xml code. I find these elements with a xpath expression and add an ID to it. But when the dom is written again, the formatting differs a bit.
the order of the attributes is changes to alphabetical
the definition of the namespace is moved to the elements (<ns1:root xmlns:ns1="abc" xmlns:ns2="xzy"><ns2:element/></ns1:root> changes to <root xmlns="abc"><element xmlns="xzy"/></root>)
linebreaks and indetion change
The xml is read with javax.xml.parsers.SAXParser (namespaceaware: true) and written with javax.xml.transform.TransformerFactory (indent: yes).
The best way to preserve the formatting would be to alter the source string, is there a good way to do this without diving in too deep into the xml parsing thing?
Or is there a way to parse the xml to dom whitespace aware?
the order of the attributes is changes to alphabetical
The order of the attributes is irrelevant as per the spec. If you have built a piece of software that relies on the order of the attributes in an XML file, then that software is broken, plain and easy.
the definition of the namespace is moved to the elements
That is irrelevant as well.
linebreaks and indetion change
So is this.
The best way to preserve the formatting would be to alter the source string,
Absolutely not. Don't do that, that's wrong on every level. XML parsers are complex things because XML parsing is a complex thing. If it was as simple as doing a bunch of string search-and-replace operations, then XML parsers would do that instead of being complex.
XML is identical when the DOM it creates is identical. There a countless ways to serialize a DOM. You are at fault if any part of your program relies on the serialized representation of the DOM, instead on the DOM itself.
In any case, most serializers do offer some settings that influence their behavior. If you use the same serializer with the same configuration then you can expect a predictable outcome. That might help a little (i.e. when checking the file into a source control system), but it should not be a reason to start relying on it at the code level.

Xml Query in java?

I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.

Code to insert arbitrary XML string into XML document by XPath

I am attempting to create a script that wraps a Groovy class that will take the following arguments:
An input XML file to update.
An arbitrary snippet to insert into the input file (might not even be well-formed in an of itself; it would become part of a larger well-formed document).
XPath for the marker element (used for positioning the snippet in #2).
An action (insert before, insert after, append child).
Optional output XML file.
I'm at a loss for finding an API that will allow me to:
Find a node by XPath and
Cram XML from a String adjacent to the node.
Does anyone have some ideas for technologies that I can combine to achieve this effect? Small examples would be especially useful.
If the snippet is well-formed most DOM implementations I've seen will also support the non-standard DocumentFragment node type which allows you to inject dom nodes from string.
EDIT: Quick Google search throws up some JavaDocs: http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/DocumentFragment.html
IIRC the api works like this (pseudo code):
parent = find_parent_node_of_fragment(document);
fragment = document.createDocumentFragment();
fragment.appendXML("<my>xmlstring</my>");
parent.appendChild(fragment);
If you don't have this luxury or if your string is not well formed there is the option to inject CDATA.
If you can't make do with injecting CDATA (because you essentially want to affect nodes that follow, for instance the new node must become the parent of old nodes which will be enclosed in the new document), you could try an XSLT transformation.
I suspect what I was trying to do is non-trivial and would have required a much larger framework than what I had time for. I ended up abandoning this endeavor.

Interpret a rule applying multiple xpath queries on multiple XML documents

I need to build a component which would take a few XML documents in input and check the following kind of rules:
XML1:/bookstore/book[price>35.00] != null
and (XML2:/city/name = 'Montreal'
or XML3://customer[#language] contains 'en')
Basically my component should be able to:
substitute the XML tokens with the corresponding XML document(before colon)
apply xpath query on this XML document
check the xpath output against expected result ("=", "!=", "contains")
follow the basic syntax ("and", "or" and parentheses)
tell if the rule is true or false
Do you know any library which could help me? maybe JavaCC?
Thanks
For evaluating XPATHs I recommend JAXEN.
Jaxen is an open source XPath library
written in Java. It is adaptable to
many different object models,
including DOM, XOM, dom4j, and JDOM.
Is it also possible to write adapters
that treat non-XML trees such as
compiled Java byte code or Java beans
as XML, thus enabling you to query
these trees with XPath too.
The Java XPath API (Java 5 / javax.xml.xpath) is also an option, but I haven't tried it yet.
Somebody on the JavaCC mailing list pointed me to the right direction, mentioning Schematron. It led me to Probatron which seems to be the best java implementation available.
Schematron web site claims that the language supports "jump across links and between XML documents to check constraints" but it seems Probatron doesn't allow that. I may not to tweak it or find a trick for that (like building a temporary XML document containing all my source documents). Apart from that, it looks Probatron is the right library for me.

Are XML chunks valid?

I want to store some fragments of an XML file in separate files.
It seems, there is no way to do it in a straight way:
Reading the chunks fails.
I always get the Exception
"javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed."
It only works when there is only ONE 'root' element (which is not
the root element in the normal sense).
I understand that XML with multiple 'roots' is not well-formed,
but it should be treated as a chunk.
Please, before suggesting some work-around-solutions, tell me:
Are XML chunks valid at all?
And IF so, can they be read out using standard JDK6 API?
Test code:
String testChunk1 = "<e1>text</e1>";
String testChunk2 = "<e1>text</e1><e2>text</e2>";
// the following doesn't work with 'testChunk2'
StringReader sr = new StringReader(testChunk1);
StringWriter sw = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
new StreamSource(sr), new StreamResult(sw));
System.out.println(sw.toString());
The W3C have been working towards defining a standard for XML fragment interchange. I'm mentioning it not because it's a solution to your problem, but it's definitely relevant to see that there's discussion of how to handle such things.
In the .NET world you can work with XML fragments and, for example, validate them against a schema. This suggests that it is worth searching for similar support in the Java libraries.
If you want to transform such fragments with XSLT, a very common approach is to put a wrapper element around them, which can then act as the root of the DOM.
While I suppose there must be some way, perhaps kludgy, to do what you want, I am not aware of any way to do it. The standard XML parsers expect well-formed XML, as you're discovering.
If you want to store your XML as a number of separate fragments in different files, then probably the best way to do this is to create your own Reader or InputStream that actually (behind the scenes) reads all of the fragments in order, and then provide that wrapped Reader or InputStream to the transformer. That way, the XML parser sees a single XML document but you can store it however you want.
If you do something like this, the fragments (except for the very first) cannot start with the standard XML header:
<?xml version="1.0" encoding="UTF-8" ?>
Please, before suggesting some work-around-solutions, tell me: Are XML chunks valid at all?
Not in their own right.
You can include them (served as XML external parsed entities) in other documents through methods such as an entity reference, and you can parse them as chunks into existing documents using methods such as DOM Level 3 LS's parseWithContext() (which Java doesn't give you, sorry), but they aren't documents so any interfaces that require a full document cannot accept them.
Transformer requires a full document as input because XSLT works on full documents, and would be confused by something that contained zero or more-than-one root element. The usual trick is to create a single root element by wrapping the document in start and end tags, but this does mean you can't have an XML declaration(*), as mentioned by Eddie.
(*: actually it's known as the ‘Text Declaration’ when included in an external parsed entity, but the syntax is exactly the same.)

Categories

Resources