How to compare two XML files with java? [duplicate] - java

I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.
When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)
I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.
So, boiled down, the question is:
Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

Sounds like a job for XMLUnit
http://www.xmlunit.org/
https://github.com/xmlunit
Example:
public class SomeTest extends XMLTestCase {
#Test
public void test() {
String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences
// can also compare xml Documents, InputSources, Readers, Diffs
assertXMLEqual(xml1, xml2); // assertXMLEquals comes from XMLTestCase
}
}

The following will check if the documents are equal using standard JDK libraries.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();
Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));
normalize() is there to make sure there are no cycles (there technically wouldn't be any)
The above code will require the white spaces to be the same within the elements though, because it preserves and evaluates it. The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml:space if that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM.

Xom has a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.
This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

The latest version of XMLUnit can help the job of asserting two XML are equal. Also XMLUnit.setIgnoreWhitespace() and XMLUnit.setIgnoreAttributeOrder() may be necessary to the case in question.
See working code of a simple example of XML Unit use below.
import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;
public class TestXml {
public static void main(String[] args) throws Exception {
String result = "<abc attr=\"value1\" title=\"something\"> </abc>";
// will be ok
assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
}
public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));
List<?> allDifferences = diff.getAllDifferences();
Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
}
}
If using Maven, add this to your pom.xml:
<dependency>
<groupId>xmlunit</groupId>
<artifactId>xmlunit</artifactId>
<version>1.4</version>
</dependency>

Building on Tom's answer, here's an example using XMLUnit v2.
It uses these maven dependencies
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-core</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-matchers</artifactId>
<version>2.0.0</version>
<scope>test</scope>
</dependency>
..and here's the test code
import static org.junit.Assert.assertThat;
import static org.xmlunit.matchers.CompareMatcher.isIdenticalTo;
import org.xmlunit.builder.Input;
import org.xmlunit.input.WhitespaceStrippedSource;
public class SomeTest extends XMLTestCase {
#Test
public void test() {
String result = "<root></root>";
String expected = "<root> </root>";
// ignore whitespace differences
// https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#whitespacestrippedsource
assertThat(result, isIdenticalTo(new WhitespaceStrippedSource(Input.from(expected).build())));
assertThat(result, isIdenticalTo(Input.from(expected).build())); // will fail due to whitespace differences
}
}
The documentation that outlines this is https://github.com/xmlunit/xmlunit#comparing-two-documents

Thanks, I extended this, try this ...
import java.io.ByteArrayInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
public class XmlDiff
{
private boolean nodeTypeDiff = true;
private boolean nodeValueDiff = true;
public boolean diff( String xml1, String xml2, List<String> diffs ) throws Exception
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new ByteArrayInputStream(xml1.getBytes()));
Document doc2 = db.parse(new ByteArrayInputStream(xml2.getBytes()));
doc1.normalizeDocument();
doc2.normalizeDocument();
return diff( doc1, doc2, diffs );
}
/**
* Diff 2 nodes and put the diffs in the list
*/
public boolean diff( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( diffNodeExists( node1, node2, diffs ) )
{
return true;
}
if( nodeTypeDiff )
{
diffNodeType(node1, node2, diffs );
}
if( nodeValueDiff )
{
diffNodeValue(node1, node2, diffs );
}
System.out.println(node1.getNodeName() + "/" + node2.getNodeName());
diffAttributes( node1, node2, diffs );
diffNodes( node1, node2, diffs );
return diffs.size() > 0;
}
/**
* Diff the nodes
*/
public boolean diffNodes( Node node1, Node node2, List<String> diffs ) throws Exception
{
//Sort by Name
Map<String,Node> children1 = new LinkedHashMap<String,Node>();
for( Node child1 = node1.getFirstChild(); child1 != null; child1 = child1.getNextSibling() )
{
children1.put( child1.getNodeName(), child1 );
}
//Sort by Name
Map<String,Node> children2 = new LinkedHashMap<String,Node>();
for( Node child2 = node2.getFirstChild(); child2!= null; child2 = child2.getNextSibling() )
{
children2.put( child2.getNodeName(), child2 );
}
//Diff all the children1
for( Node child1 : children1.values() )
{
Node child2 = children2.remove( child1.getNodeName() );
diff( child1, child2, diffs );
}
//Diff all the children2 left over
for( Node child2 : children2.values() )
{
Node child1 = children1.get( child2.getNodeName() );
diff( child1, child2, diffs );
}
return diffs.size() > 0;
}
/**
* Diff the nodes
*/
public boolean diffAttributes( Node node1, Node node2, List<String> diffs ) throws Exception
{
//Sort by Name
NamedNodeMap nodeMap1 = node1.getAttributes();
Map<String,Node> attributes1 = new LinkedHashMap<String,Node>();
for( int index = 0; nodeMap1 != null && index < nodeMap1.getLength(); index++ )
{
attributes1.put( nodeMap1.item(index).getNodeName(), nodeMap1.item(index) );
}
//Sort by Name
NamedNodeMap nodeMap2 = node2.getAttributes();
Map<String,Node> attributes2 = new LinkedHashMap<String,Node>();
for( int index = 0; nodeMap2 != null && index < nodeMap2.getLength(); index++ )
{
attributes2.put( nodeMap2.item(index).getNodeName(), nodeMap2.item(index) );
}
//Diff all the attributes1
for( Node attribute1 : attributes1.values() )
{
Node attribute2 = attributes2.remove( attribute1.getNodeName() );
diff( attribute1, attribute2, diffs );
}
//Diff all the attributes2 left over
for( Node attribute2 : attributes2.values() )
{
Node attribute1 = attributes1.get( attribute2.getNodeName() );
diff( attribute1, attribute2, diffs );
}
return diffs.size() > 0;
}
/**
* Check that the nodes exist
*/
public boolean diffNodeExists( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1 == null && node2 == null )
{
diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2 + "\n" );
return true;
}
if( node1 == null && node2 != null )
{
diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2.getNodeName() );
return true;
}
if( node1 != null && node2 == null )
{
diffs.add( getPath(node1) + ":node " + node1.getNodeName() + "!=" + node2 );
return true;
}
return false;
}
/**
* Diff the Node Type
*/
public boolean diffNodeType( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1.getNodeType() != node2.getNodeType() )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeType() + "!=" + node2.getNodeType() );
return true;
}
return false;
}
/**
* Diff the Node Value
*/
public boolean diffNodeValue( Node node1, Node node2, List<String> diffs ) throws Exception
{
if( node1.getNodeValue() == null && node2.getNodeValue() == null )
{
return false;
}
if( node1.getNodeValue() == null && node2.getNodeValue() != null )
{
diffs.add( getPath(node1) + ":type " + node1 + "!=" + node2.getNodeValue() );
return true;
}
if( node1.getNodeValue() != null && node2.getNodeValue() == null )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2 );
return true;
}
if( !node1.getNodeValue().equals( node2.getNodeValue() ) )
{
diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2.getNodeValue() );
return true;
}
return false;
}
/**
* Get the node path
*/
public String getPath( Node node )
{
StringBuilder path = new StringBuilder();
do
{
path.insert(0, node.getNodeName() );
path.insert( 0, "/" );
}
while( ( node = node.getParentNode() ) != null );
return path.toString();
}
}

AssertJ 1.4+ has specific assertions to compare XML content:
String expectedXml = "<foo />";
String actualXml = "<bar />";
assertThat(actualXml).isXmlEqualTo(expectedXml);
Here is the Documentation

Below code works for me
String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
XMLAssert.assertXMLEqual(actualxml, xmlInDb);

skaffman seems to be giving a good answer.
another way is probably to format the XML using a commmand line utility like xmlstarlet(http://xmlstar.sourceforge.net/) and then format both the strings and then use any diff utility(library) to diff the resulting output files. I don't know if this is a good solution when issues are with namespaces.

I'm using Altova DiffDog which has options to compare XML files structurally (ignoring string data).
This means that (if checking the 'ignore text' option):
<foo a="xxx" b="xxx">xxx</foo>
and
<foo b="yyy" a="yyy">yyy</foo>
are equal in the sense that they have structural equality. This is handy if you have example files that differ in data, but not structure!

I required the same functionality as requested in the main question. As I was not allowed to use any 3rd party libraries, I have created my own solution basing on #Archimedes Trajano solution.
Following is my solution.
import java.io.ByteArrayInputStream;
import java.nio.charset.Charset;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.junit.Assert;
import org.w3c.dom.Document;
/**
* Asserts for asserting XML strings.
*/
public final class AssertXml {
private AssertXml() {
}
private static Pattern NAMESPACE_PATTERN = Pattern.compile("xmlns:(ns\\d+)=\"(.*?)\"");
/**
* Asserts that two XML are of identical content (namespace aliases are ignored).
*
* #param expectedXml expected XML
* #param actualXml actual XML
* #throws Exception thrown if XML parsing fails
*/
public static void assertEqualXmls(String expectedXml, String actualXml) throws Exception {
// Find all namespace mappings
Map<String, String> fullnamespace2newAlias = new HashMap<String, String>();
generateNewAliasesForNamespacesFromXml(expectedXml, fullnamespace2newAlias);
generateNewAliasesForNamespacesFromXml(actualXml, fullnamespace2newAlias);
for (Entry<String, String> entry : fullnamespace2newAlias.entrySet()) {
String newAlias = entry.getValue();
String namespace = entry.getKey();
Pattern nsReplacePattern = Pattern.compile("xmlns:(ns\\d+)=\"" + namespace + "\"");
expectedXml = transletaNamespaceAliasesToNewAlias(expectedXml, newAlias, nsReplacePattern);
actualXml = transletaNamespaceAliasesToNewAlias(actualXml, newAlias, nsReplacePattern);
}
// nomralize namespaces accoring to given mapping
DocumentBuilder db = initDocumentParserFactory();
Document expectedDocuemnt = db.parse(new ByteArrayInputStream(expectedXml.getBytes(Charset.forName("UTF-8"))));
expectedDocuemnt.normalizeDocument();
Document actualDocument = db.parse(new ByteArrayInputStream(actualXml.getBytes(Charset.forName("UTF-8"))));
actualDocument.normalizeDocument();
if (!expectedDocuemnt.isEqualNode(actualDocument)) {
Assert.assertEquals(expectedXml, actualXml); //just to better visualize the diffeences i.e. in eclipse
}
}
private static DocumentBuilder initDocumentParserFactory() throws ParserConfigurationException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();
return db;
}
private static String transletaNamespaceAliasesToNewAlias(String xml, String newAlias, Pattern namespacePattern) {
Matcher nsMatcherExp = namespacePattern.matcher(xml);
if (nsMatcherExp.find()) {
xml = xml.replaceAll(nsMatcherExp.group(1) + "[:]", newAlias + ":");
xml = xml.replaceAll(nsMatcherExp.group(1) + "=", newAlias + "=");
}
return xml;
}
private static void generateNewAliasesForNamespacesFromXml(String xml, Map<String, String> fullnamespace2newAlias) {
Matcher nsMatcher = NAMESPACE_PATTERN.matcher(xml);
while (nsMatcher.find()) {
if (!fullnamespace2newAlias.containsKey(nsMatcher.group(2))) {
fullnamespace2newAlias.put(nsMatcher.group(2), "nsTr" + (fullnamespace2newAlias.size() + 1));
}
}
}
}
It compares two XML strings and takes care of any mismatching namespace mappings by translating them to unique values in both input strings.
Can be fine tuned i.e. in case of translation of namespaces. But for my requirements just does the job.

This will compare full string XMLs (reformatting them on the way). It makes it easy to work with your IDE (IntelliJ, Eclipse), cos you just click and visually see the difference in the XML files.
import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;
import static org.apache.xml.security.Init.init;
import static org.junit.Assert.assertEquals;
public class XmlUtils {
static {
init();
}
public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
return new String(canonXmlBytes);
}
public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
InputSource src = new InputSource(new StringReader(input));
Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
Boolean keepDeclaration = input.startsWith("<?xml");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
LSSerializer writer = impl.createLSSerializer();
writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
return writer.writeToString(document);
}
public static void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
String canonicalExpected = prettyFormat(toCanonicalXml(expected));
String canonicalActual = prettyFormat(toCanonicalXml(actual));
assertEquals(canonicalExpected, canonicalActual);
}
}
I prefer this to XmlUnit because the client code (test code) is cleaner.

Using XMLUnit 2.x
In the pom.xml
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-assertj3</artifactId>
<version>2.9.0</version>
</dependency>
Test implementation (using junit 5) :
import org.junit.jupiter.api.Test;
import org.xmlunit.assertj3.XmlAssert;
public class FooTest {
#Test
public void compareXml() {
//
String xmlContentA = "<foo></foo>";
String xmlContentB = "<foo></foo>";
//
XmlAssert.assertThat(xmlContentA).and(xmlContentB).areSimilar();
}
}
Other methods : areIdentical(), areNotIdentical(), areNotSimilar()
More details (configuration of assertThat(~).and(~) and examples) in this documentation page.
XMLUnit also has (among other features) a DifferenceEvaluator to do more precise comparisons.
XMLUnit website

Using JExamXML with java application
import com.a7soft.examxml.ExamXML;
import com.a7soft.examxml.Options;
.................
// Reads two XML files into two strings
String s1 = readFile("orders1.xml");
String s2 = readFile("orders.xml");
// Loads options saved in a property file
Options.loadOptions("options");
// Compares two Strings representing XML entities
System.out.println( ExamXML.compareXMLString( s1, s2 ) );

Since you say "semantically equivalent" I assume you mean that you want to do more than just literally verify that the xml outputs are (string) equals, and that you'd want something like
<foo> some stuff here</foo></code>
and
<foo>some stuff here</foo></code>
do read as equivalent. Ultimately it's going to matter how you're defining "semantically equivalent" on whatever object you're reconstituting the message from. Simply build that object from the messages and use a custom equals() to define what you're looking for.

Related

XPath to select multiple child values

How can i select all/multiple the values which satify below condition
XPATH
/X/X1/X2[#code='b']/X3/value
XML:
<X>
<X1>
<X2 code="a">
<X3>
<value>x3a1</value>
</X3>
</X2>
</X1>
<X1>
<X2 code="b">
<X3>
<value>x3b11</value>
</X3>
</X2>
</X1>
<X1>
<X2 code="b">
<X3>
<value>X3b12</value>
</X3>
</X2>
</X1>
</X>
Code:
import org.dom4j.Document;
import org.dom4j.Node;
Document doc = reader.read(new StringReader(xml));
Node valueNode = doc.selectSingleNode(XPATH);
Expected value
x3b11, X3b12
Instead of Document.selectSingleNode(), use Document.selectNodes() to select multiple nodes.
Also consider XPath.selectNodes(); here is a full example from the DOM4J Cookbook:
import java.util.Iterator;
import org.dom4j.Documet;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.XPath;
public class DeployFileLoaderSample {
private org.dom4j.Document doc;
private org.dom4j.Element root;
public void browseRootChildren() {
/*
Let's look how many "James" are in our XML Document an iterate them
( Yes, there are three James in this project ;) )
*/
XPath xpathSelector = DocumentHelper.createXPath("/people/person[#name='James']");
List results = xpathSelector.selectNodes(doc);
for ( Iterator iter = results.iterator(); iter.hasNext(); ) {
Element element = (Element) iter.next();
System.out.println(element.getName());
}
// select all children of address element having person element with attribute and value "Toby" as parent
String address = doc.valueOf( "//person[#name='Toby']/address" );
// Bob's hobby
String hobby = doc.valueOf( "//person[#name='Bob']/hobby/#name" );
// the second person living in UK
String name = doc.value( "/people[#country='UK']/person[2]" );
// select people elements which have location attriute with the value "London"
Number count = doc.numberValueOf( "//people[#location='London']" );
}
}

How to keep this code repeating more than once

My code pulls the links and adds them to the HashSet. I want the link to replace the original link and repeat the process till no more new links can be found to add. The program keeps running but the link isn't updating and the program gets stuck in an infinite loop doing nothing. How do I get the link to update so the program can repeat until no more links can be found?
package downloader;
import java.io.IOException;
import java.net.URL;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Stage2 {
public static void main(String[] args) throws IOException {
int q = 0;
int w = 0;
HashSet<String> chapters = new HashSet();
String seen = new String("/manga/manabi-ikiru-wa-fuufu-no-tsutome/i1778063/v1/c1");
String source = new String("https://mangapark.net" + seen);
// 0123456789
while( q == w ) {
String source2 = new String(source.substring(21));
String last = new String(source.substring(source.length() - 12));
String last2 = new String(source.substring(source.length() - 1));
chapters.add(seen);
for (String link : findLinks(source)) {
if(link.contains("/manga") && !link.contains(last) && link.contains("/i") && link.contains("/c") && !chapters.contains(link)) {
chapters.add(link);
System.out.println(link);
seen = link;
System.out.print(chapters);
System.out.println(seen);
}
}
}
System.out.print(chapters);
}
private static Set<String> findLinks(String url) throws IOException {
Set<String> links = new HashSet<>();
Document doc = Jsoup.connect(url)
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.get();
Elements elements = doc.select("a[href]");
for (Element element : elements) {
links.add(element.attr("href"));
}
return links;
}
}
Your progamm didn't stop becouse yout while conditions never change:
while( q == w )
is always true. I run your code without the while and I got 2 links print twice(!) and the programm stop.
If you want the links to the other chapters you have the same problem like me. In the element
Element element = doc.getElementById("sel_book_1");
the links are after the pseudoelement ::before. So they will not be in your Jsoup Document.
Here is my questsion to this topic:
How can I find a HTML tag with the pseudoElement ::before in jsoup

Tokenizing text content of an XML element using Dom Java

I have an XML file that contains tags such as:
<P>(b) <E T="03">Filing of financial reports.</E> (1)(i) Except as provided in paragraphs (b)(3) and (h) of this section,</p>
I need to parse the text content and get the results back as an array of strings ["(b)", "Filing of financial reports.", "(1)(i) Except as provided in paragraphs (b) (3) and (h) of this section,"].
In other words, I need to tokenize the text content of a <p> element according to <E T=03"> and store the results in an array of strings.
There's nothing to "tokenize", as the parsing has already been done for you when the DOM was built. The <P> node contains both text and child nodes. This is what the DOM looks like:
P
|
+---text "(b) "
|
+---E
| |
| +---attribute T=03
| |
| +---text "Filing of financial reports."
|
+---text "Except as provided ..."
To get the results you want you need to navigate through the sub-nodes of <P> and extract all the text nodes.
here's one way to do it using jsoup library:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;
class Test {
public static void main(String args[]) throws Exception {
String xml = "<P>(b) <E T=\"03\">Filing of financial reports.</E> (1)(i) Except as provided in paragraphs (b)(3) and (h) of this section,</p>";
Document doc = Jsoup.parse(xml);
for (Element e : doc.select("p"))
for (Node child : e.childNodes()) {
if (child instanceof TextNode) {
System.out.println(((TextNode) child).text());
} else {
System.out.println(((Element) child).text());
}
}
}
}
output:
(b)
Filing of financial reports.
(1)(i) Except as provided in paragraphs (b)(3) and (h) of this section,
Use XPath. If you don't want to use specialized Java libraries, you may just use standard Java API such us:
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class ExtractingAllTextNodes {
private static final String XML = "<P>(b) <E T=\"03\">Filing of financial reports.</E> (1)(i) Except as provided in paragraphs (b)(3) and (h) of this section,</P>";
public static void main(final String[] args) throws Exception {
final XPath xPath = XPathFactory.newInstance().newXPath();
final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder builder = builderFactory.newDocumentBuilder();
final String expression = "//text()";
final Document xmlDocument = builder.parse(new ByteArrayInputStream(XML.getBytes()));
final NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println("=> " + nodeList.item(i).getTextContent());
}
}
}
Output:
=> (b)
=> Filing of financial reports.
=> (1)(i) Except as provided in paragraphs (b)(3) and (h) of this section,
Depending on your needs, you may alter the XPath expression.
Ok. I finally managed to find a solution to the problem. The code is somewhat complex but it uses Dom which is the standard library for XML parsing:
public static void parseSection(Element sec){
NodeList pTags = ((Element) (((NodeList) sec
.getElementsByTagName("contents")).item(0)))
.getElementsByTagName("P");
int pTagIndex = 0;
while (pTagIndex < pTags.getLength()) {
System.out.println(pTagIndex);
Node pTag = pTags.item(pTagIndex);
NodeList pTagChildren = pTag.getChildNodes();
int pTagChildrenIndex = 0;
while(pTagChildrenIndex < pTagChildren.getLength()){
Node pTagChild = pTagChildren.item(pTagChildrenIndex);
if(pTagChild.getNodeName().equals("#text")){
System.out.println("Text: " + pTagChild.getNodeValue());
} else if(pTagChild.getNodeName().equals("E")){
System.out.println("E: " + pTagChild.getTextContent());
}
pTagChildrenIndex ++;
}

Personal Project "RSS FEED" XML Parser

I am relatively new to Java and I have been trying to figure out how to reach the following tags for output for a couple of long, LONG days now. I would really appreciate some insight into the problem. It seems like everything I could find and or try just does not pan out right. (Excuse the cheesy news articles)
<item>
<pubDate>Sat, 21 Sep 2013 02:30:23 EDT</pubDate>
<title>
<![CDATA[
Carmen Bryan Lashes Out at Beyonce Fans for Throwing Shade (#carmenbryan)
]]>
</title>
<link>
http://www.vladtv.com/blog/174937/carmen-bryan-lashes-out-at-beyonce-fans-for-throwing-shade/
</link>
<guid>
http://www.vladtv.com/blog/174937/carmen-bryan-lashes-out-at-beyonce-fans-for-throwing-shade/
</guid>
<description>
<![CDATA[
<img ... /><br />.
<p>In response to someone who reminded Bryan that Jay Z has Beyonce now, she tweeted.</p>
<p>Check out what else Bryan had to say above.</p>
<p>Source: </p>
]]>
</description>
</item>
I have managed to parse the XML and print out the content in both the title and description element tags, however the output for the description element tag also includes all its child element tags. I would like to use this project in future to build on my Java portfolio, please help!
My code so far:
public class NewXmlReader
{
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document docXml = builder.parse(NewXMLReaderHandlers.inputHandler());
docXml.getDocumentElement().normalize();
NewXMLReaderHandlers.handleItemTags(docXml, "item");
} catch (ParserConfigurationException | SAXException parserConfigurationException) {
System.out.println("You Are Not XML formated !!");
parserConfigurationException.printStackTrace();
} catch (IOException iOException) {
System.out.println("URL NOT FOUND");
iOException.getCause();
}
}
}
public class NewXMLReaderHandlers {
private static int ARTICLELENGTH;
public static String inputHandler() throws IOException {
InputStreamReader inputStream = new InputStreamReader(System.in);
BufferedReader bufferRead = new BufferedReader(inputStream);
System.out.println("Please Enter A Proper URL: ");
String urlPageString = bufferRead.readLine();
return urlPageString;
}
public static void handleItemTags( Document document, String rssFeedParentTopicTag){
NodeList listOfArticles = document.getElementsByTagName(rssFeedParentTopicTag);
NewXMLReaderHandlers.ARTICLELENGTH = listOfArticles.getLength();
String rootElement = document.getDocumentElement().getNodeName();
if (rootElement == "rss"){
System.out.println("We Have An RSS Feed To Parse");
for (int i = 0; i < NewXMLReaderHandlers.ARTICLELENGTH; i++) {
Node itemNode = (Node) listOfArticles.item(i);
if (itemNode.getNodeType() == Node.ELEMENT_NODE) {
Element itemElement= (Element) itemNode;
tagContent (itemElement, "title");
tagContent (itemElement, "description");
}
}
}
}
public static void tagContent (Element item, String tagName) {
NodeList tagNodeList = item.getElementsByTagName(tagName);
Element tagElement = (Element)tagNodeList.item(0);
NodeList tagTElist = tagElement.getChildNodes();
Node tagNode = tagTElist.item(0);
// System.out.println( " - " + tagName + " : " + tagNode.getNodeValue() + "\n");
if(tagName == "description"){
System.out.println( " - " + tagName + " : " + tagNode.getNodeValue() + "\n\n");
System.out.println(" Do We Have Any Siblings? " + tagNode.getNextSibling().getNodeValue() + "\n");
}
}
}
For my money, the easiest solution would be to use the XPath API.
Essentially, it's a query language for XML. See XPath Tutorial for a primer.
This example uses the RSS feed from SO, which uses <entry...> instead of <item>, but I've used the same technique for other RSS (and XML) files and even very complex HTML documents...
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class TestRSSFeed {
public static void main(String[] args) {
try {
// Read the feed...
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = factory.newDocumentBuilder().parse("http://stackoverflow.com/feeds/tag?tagnames=java&sort=newest");
Element root = doc.getDocumentElement();
// Create a xPath instance
XPath xPath = XPathFactory.newInstance().newXPath();
// Find all the nodes that are named <entry...> any where in
// the document that live under the parent node...
XPathExpression expression = xPath.compile("//entry");
NodeList nl = (NodeList) expression.evaluate(root, XPathConstants.NODESET);
System.out.println("Found " + nl.getLength() + " items...");
for (int index = 0; index < nl.getLength(); index++) {
Node node = nl.item(index);
// This is a sub node search.
// The search is based on the parent node and looks for a single
// node titled "title" that belongs to the parent node...
// I did this because I'm only expecting a single node...
expression = xPath.compile("title");
Node child = (Node) expression.evaluate(node, XPathConstants.NODE);
System.out.println(child.getTextContent());
}
} catch (IOException | ParserConfigurationException | SAXException exp) {
exp.printStackTrace();
} catch (XPathExpressionException ex) {
ex.printStackTrace();
}
}
}
Now, you can do some pretty complex queries, but I thought I'd start with a basic example ;)
Just in case anyone is still left wondering about how i managed to solve the CDATA puzzle:
The logic is as follows:
Once you get the program to extract all the xml to display the correct node tree as the rss feed displays, if any xml data is wrapped in CDATA tags, the only way to access that information is by creating new xml based on the text content in the CDATA tag. Once you parse the new document, you should be able to access all the data you need.

In Java, how do I parse an xml schema (xsd) to learn what's valid at a given element?

I'd like to be able to read in an XML schema (i.e. xsd) and from that know what are valid attributes, child elements, values as I walk through it.
For example, let's say I have an xsd that this xml will validate against:
<root>
<element-a type="something">
<element-b>blah</element-b>
<element-c>blahblah</element-c>
</element-a>
</root>
I've tinkered with several libraries and I can confidently get <root> as the root element. Beyond that I'm lost.
Given an element I need to know what child elements are required or allowed, attributes, facets, choices, etc. Using the above example I'd want to know that element-a has an attribute type and may have children element-b and element-c...or must have children element-b and element-c...or must have one of each...you get the picture I hope.
I've looked at numerous libraries such as XSOM, Eclipse XSD, Apache XmlSchema and found they're all short on good sample code. My search of the Internet has also been unsuccessful.
Does anyone know of a good example or even a book that demonstrates how to go through an XML schema and find out what would be valid options at a given point in a validated XML document?
clarification
I'm not looking to validate a document, rather I'd like to know the options at a given point to assist in creating or editing a document. If I know "I am here" in a document, I'd like to determing what I can do at that point. "Insert one of element A, B, or C" or "attach attribute 'description'".
This is a good question. Although, it is old, I did not find an acceptable answer. The thing is that the existing libraries I am aware of (XSOM, Apache XmlSchema) are designed as object models. The implementors did not have the intention to provide any utility methods — you should consider implement them yourself using the provided object model.
Let's see how querying context-specific elements can be done by the means of Apache XmlSchema.
You can use their tutorial as a starting point. In addition, Apache CFX framework provides the XmlSchemaUtils class with lots of handy code examples.
First of all, read the XmlSchemaCollection as illustrated by the library's tutorial:
XmlSchemaCollection xmlSchemaCollection = new XmlSchemaCollection();
xmlSchemaCollection.read(inputSource, new ValidationEventHandler());
Now, XML Schema defines two kinds of data types:
Simple types
Complex types
Simple types are represented by the XmlSchemaSimpleType class. Handling them is easy. Read the documentation: https://ws.apache.org/commons/XmlSchema/apidocs/org/apache/ws/commons/schema/XmlSchemaSimpleType.html. But let's see how to handle complex types. Let's start with a simple method:
#Override
public List<QName> getChildElementNames(QName parentElementName) {
XmlSchemaElement element = xmlSchemaCollection.getElementByQName(parentElementName);
XmlSchemaType type = element != null ? element.getSchemaType() : null;
List<QName> result = new LinkedList<>();
if (type instanceof XmlSchemaComplexType) {
addElementNames(result, (XmlSchemaComplexType) type);
}
return result;
}
XmlSchemaComplexType may stand for both real type and for the extension element. Please see the public static QName getBaseType(XmlSchemaComplexType type) method of the XmlSchemaUtils class.
private void addElementNames(List<QName> result, XmlSchemaComplexType type) {
XmlSchemaComplexType baseType = getBaseType(type);
XmlSchemaParticle particle = baseType != null ? baseType.getParticle() : type.getParticle();
addElementNames(result, particle);
}
When you handle XmlSchemaParticle, consider that it can have multiple implementations. See: https://ws.apache.org/commons/XmlSchema/apidocs/org/apache/ws/commons/schema/XmlSchemaParticle.html
private void addElementNames(List<QName> result, XmlSchemaParticle particle) {
if (particle instanceof XmlSchemaAny) {
} else if (particle instanceof XmlSchemaElement) {
} else if (particle instanceof XmlSchemaGroupBase) {
} else if (particle instanceof XmlSchemaGroupRef) {
}
}
The other thing to bear in mind is that elements can be either abstract or concrete. Again, the JavaDocs are the best guidance.
Many of the solutions for validating XML in java use the JAXB API. There's an extensive tutorial available here. The basic recipe for doing what you're looking for with JAXB is as follows:
Obtain or create the XML schema to validate against.
Generate Java classes to bind the XML to using xjc, the JAXB compiler.
Write java code to:
Open the XML content as an input stream.
Create a JAXBContext and Unmarshaller
Pass the input stream to the Unmarshaller's unmarshal method.
The parts of the tutorial you can read for this are:
Hello, world
Unmarshalling XML
I see you have tried Eclipse XSD. Have you tried Eclipse Modeling Framework (EMF)? You can:
Generating an EMF Model using XML Schema (XSD)
Create a dynamic instance from your metamodel (3.1 With the dynamic instance creation tool)
This is for exploring the xsd. You can create the dynamic instance of the root element then you can right click the element and create child element. There you will see what the possible children element and so on.
As for saving the created EMF model to an xml complied xsd: I have to look it up. I think you can use JAXB for that (How to use EMF to read XML file?).
Some refs:
EMF: Eclipse Modeling Framework, 2nd Edition (written by creators)
Eclipse Modeling Framework (EMF)
Discover the Eclipse Modeling Framework (EMF) and Its Dynamic Capabilities
Creating Dynamic EMF Models From XSDs and Loading its Instances From XML as SDOs
This is a fairly complete sample on how to parse an XSD using XSOM:
import java.io.File;
import java.util.Iterator;
import java.util.Vector;
import org.xml.sax.ErrorHandler;
import com.sun.xml.xsom.XSComplexType;
import com.sun.xml.xsom.XSElementDecl;
import com.sun.xml.xsom.XSFacet;
import com.sun.xml.xsom.XSModelGroup;
import com.sun.xml.xsom.XSModelGroupDecl;
import com.sun.xml.xsom.XSParticle;
import com.sun.xml.xsom.XSRestrictionSimpleType;
import com.sun.xml.xsom.XSSchema;
import com.sun.xml.xsom.XSSchemaSet;
import com.sun.xml.xsom.XSSimpleType;
import com.sun.xml.xsom.XSTerm;
import com.sun.xml.xsom.impl.Const;
import com.sun.xml.xsom.parser.XSOMParser;
import com.sun.xml.xsom.util.DomAnnotationParserFactory;
public class XSOMNavigator
{
public static class SimpleTypeRestriction
{
public String[] enumeration = null;
public String maxValue = null;
public String minValue = null;
public String length = null;
public String maxLength = null;
public String minLength = null;
public String[] pattern = null;
public String totalDigits = null;
public String fractionDigits = null;
public String whiteSpace = null;
public String toString()
{
String enumValues = "";
if (enumeration != null)
{
for(String val : enumeration)
{
enumValues += val + ", ";
}
enumValues = enumValues.substring(0, enumValues.lastIndexOf(','));
}
String patternValues = "";
if (pattern != null)
{
for(String val : pattern)
{
patternValues += "(" + val + ")|";
}
patternValues = patternValues.substring(0, patternValues.lastIndexOf('|'));
}
String retval = "";
retval += minValue == null ? "" : "[MinValue = " + minValue + "]\t";
retval += maxValue == null ? "" : "[MaxValue = " + maxValue + "]\t";
retval += minLength == null ? "" : "[MinLength = " + minLength + "]\t";
retval += maxLength == null ? "" : "[MaxLength = " + maxLength + "]\t";
retval += pattern == null ? "" : "[Pattern(s) = " + patternValues + "]\t";
retval += totalDigits == null ? "" : "[TotalDigits = " + totalDigits + "]\t";
retval += fractionDigits == null ? "" : "[FractionDigits = " + fractionDigits + "]\t";
retval += whiteSpace == null ? "" : "[WhiteSpace = " + whiteSpace + "]\t";
retval += length == null ? "" : "[Length = " + length + "]\t";
retval += enumeration == null ? "" : "[Enumeration Values = " + enumValues + "]\t";
return retval;
}
}
private static void initRestrictions(XSSimpleType xsSimpleType, SimpleTypeRestriction simpleTypeRestriction)
{
XSRestrictionSimpleType restriction = xsSimpleType.asRestriction();
if (restriction != null)
{
Vector<String> enumeration = new Vector<String>();
Vector<String> pattern = new Vector<String>();
for (XSFacet facet : restriction.getDeclaredFacets())
{
if (facet.getName().equals(XSFacet.FACET_ENUMERATION))
{
enumeration.add(facet.getValue().value);
}
if (facet.getName().equals(XSFacet.FACET_MAXINCLUSIVE))
{
simpleTypeRestriction.maxValue = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MININCLUSIVE))
{
simpleTypeRestriction.minValue = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MAXEXCLUSIVE))
{
simpleTypeRestriction.maxValue = String.valueOf(Integer.parseInt(facet.getValue().value) - 1);
}
if (facet.getName().equals(XSFacet.FACET_MINEXCLUSIVE))
{
simpleTypeRestriction.minValue = String.valueOf(Integer.parseInt(facet.getValue().value) + 1);
}
if (facet.getName().equals(XSFacet.FACET_LENGTH))
{
simpleTypeRestriction.length = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MAXLENGTH))
{
simpleTypeRestriction.maxLength = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MINLENGTH))
{
simpleTypeRestriction.minLength = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_PATTERN))
{
pattern.add(facet.getValue().value);
}
if (facet.getName().equals(XSFacet.FACET_TOTALDIGITS))
{
simpleTypeRestriction.totalDigits = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_FRACTIONDIGITS))
{
simpleTypeRestriction.fractionDigits = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_WHITESPACE))
{
simpleTypeRestriction.whiteSpace = facet.getValue().value;
}
}
if (enumeration.size() > 0)
{
simpleTypeRestriction.enumeration = enumeration.toArray(new String[] {});
}
if (pattern.size() > 0)
{
simpleTypeRestriction.pattern = pattern.toArray(new String[] {});
}
}
}
private static void printParticle(XSParticle particle, String occurs, String absPath, String indent)
{
boolean repeats = particle.isRepeated();
occurs = " MinOccurs = " + particle.getMinOccurs() + ", MaxOccurs = " + particle.getMaxOccurs() + ", Repeats = " + Boolean.toString(repeats);
XSTerm term = particle.getTerm();
if (term.isModelGroup())
{
printGroup(term.asModelGroup(), occurs, absPath, indent);
}
else if(term.isModelGroupDecl())
{
printGroupDecl(term.asModelGroupDecl(), occurs, absPath, indent);
}
else if (term.isElementDecl())
{
printElement(term.asElementDecl(), occurs, absPath, indent);
}
}
private static void printGroup(XSModelGroup modelGroup, String occurs, String absPath, String indent)
{
System.out.println(indent + "[Start of Group " + modelGroup.getCompositor() + occurs + "]" );
for (XSParticle particle : modelGroup.getChildren())
{
printParticle(particle, occurs, absPath, indent + "\t");
}
System.out.println(indent + "[End of Group " + modelGroup.getCompositor() + "]");
}
private static void printGroupDecl(XSModelGroupDecl modelGroupDecl, String occurs, String absPath, String indent)
{
System.out.println(indent + "[GroupDecl " + modelGroupDecl.getName() + occurs + "]");
printGroup(modelGroupDecl.getModelGroup(), occurs, absPath, indent);
}
private static void printComplexType(XSComplexType complexType, String occurs, String absPath, String indent)
{
System.out.println();
XSParticle particle = complexType.getContentType().asParticle();
if (particle != null)
{
printParticle(particle, occurs, absPath, indent);
}
}
private static void printSimpleType(XSSimpleType simpleType, String occurs, String absPath, String indent)
{
SimpleTypeRestriction restriction = new SimpleTypeRestriction();
initRestrictions(simpleType, restriction);
System.out.println(restriction.toString());
}
public static void printElement(XSElementDecl element, String occurs, String absPath, String indent)
{
absPath += "/" + element.getName();
String typeName = element.getType().getBaseType().getName();
if(element.getType().isSimpleType() && element.getType().asSimpleType().isPrimitive())
{
// We have a primitive type - So use that instead
typeName = element.getType().asSimpleType().getPrimitiveType().getName();
}
boolean nillable = element.isNillable();
System.out.print(indent + "[Element " + absPath + " " + occurs + "] of type [" + typeName + "]" + (nillable ? " [nillable] " : ""));
if (element.getType().isComplexType())
{
printComplexType(element.getType().asComplexType(), occurs, absPath, indent);
}
else
{
printSimpleType(element.getType().asSimpleType(), occurs, absPath, indent);
}
}
public static void printNameSpace(XSSchema s, String indent)
{
String nameSpace = s.getTargetNamespace();
// We do not want the default XSD namespaces or a namespace with nothing in it
if(nameSpace == null || Const.schemaNamespace.equals(nameSpace) || s.getElementDecls().isEmpty())
{
return;
}
System.out.println("Target namespace: " + nameSpace);
Iterator<XSElementDecl> jtr = s.iterateElementDecls();
while (jtr.hasNext())
{
XSElementDecl e = (XSElementDecl) jtr.next();
String occurs = "";
String absPath = "";
XSOMNavigator.printElement(e, occurs, absPath,indent);
System.out.println();
}
}
public static void xsomNavigate(File xsdFile)
{
ErrorHandler errorHandler = new ErrorReporter(System.err);
XSSchemaSet schemaSet = null;
XSOMParser parser = new XSOMParser();
try
{
parser.setErrorHandler(errorHandler);
parser.setAnnotationParser(new DomAnnotationParserFactory());
parser.parse(xsdFile);
schemaSet = parser.getResult();
}
catch (Exception exp)
{
exp.printStackTrace(System.out);
}
if(schemaSet != null)
{
// iterate each XSSchema object. XSSchema is a per-namespace schema.
Iterator<XSSchema> itr = schemaSet.iterateSchema();
while (itr.hasNext())
{
XSSchema s = (XSSchema) itr.next();
String indent = "";
printNameSpace(s, indent);
}
}
}
public static void printFile(String fileName)
{
File fileToParse = new File(fileName);
if (fileToParse != null && fileToParse.canRead())
{
xsomNavigate(fileToParse);
}
}
}
And for your Error Reporter use:
import java.io.OutputStream;
import java.io.PrintStream;
import java.text.MessageFormat;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ErrorReporter implements ErrorHandler {
private final PrintStream out;
public ErrorReporter( PrintStream o ) { this.out = o; }
public ErrorReporter( OutputStream o ) { this(new PrintStream(o)); }
public void warning(SAXParseException e) throws SAXException {
print("[Warning]",e);
}
public void error(SAXParseException e) throws SAXException {
print("[Error ]",e);
}
public void fatalError(SAXParseException e) throws SAXException {
print("[Fatal ]",e);
}
private void print( String header, SAXParseException e ) {
out.println(header+' '+e.getMessage());
out.println(MessageFormat.format(" line {0} at {1}",
new Object[]{
Integer.toString(e.getLineNumber()),
e.getSystemId()}));
}
}
For your main use:
public class WDXSOMParser
{
public static void main(String[] args)
{
String fileName = null;
if(args != null && args.length > 0 && args[0] != null)
fileName = args[0];
else
fileName = "C:\\xml\\CollectionComments\\CollectionComment1.07.xsd";
//fileName = "C:\\xml\\PropertyListingContractSaleInfo\\PropertyListingContractSaleInfo.xsd";
//fileName = "C:\\xml\\PropertyPreservation\\PropertyPreservation.xsd";
XSOMNavigator.printFile(fileName);
}
}
It's agood bit of work depending on how compex your xsd is but basically.
if you had
<Document>
<Header/>
<Body/>
<Document>
And you wanted to find out where were the alowable children of header you'd (taking account of namespaces)
Xpath would have you look for '/element[name="Document"]/element[name="Header"]'
After that it depends on how much you want to do. You might find it easier to write or find something that loads an xsd into a DOM type structure.
Course you are going to possibly find all sorts of things under that elment in xsd, choice, sequence, any, attributes, complexType, SimpleContent, annotation.
Loads of time consuming fun.
Have a look at this.
How to parse schema using XOM Parser.
Also, here is the project home for XOM

Categories

Resources