How to validate an XML document against an XSD schema using JDom - java

I am working on an application that used JDom for parsing XML documents.
Following is the existing code:
private Document openDocumentAtPath(File file) {
// Create a sax builder for building the JDOM document
SAXBuilder builder = new SAXBuilder();
// JDOM document to be created from XML document
Document doc = null;
// Try to build the document
try {
// Get the file into a single string
BufferedReader input = new BufferedReader(
new FileReader( file ) );
String content = "";
String line = null;
while( ( line = input.readLine() ) != null ) {
content += "\n" + line;
}
StringReader reader = new StringReader( content );
doc = builder.build(reader);
}// Only thrown when a XML document is not well-formed
catch ( JDOMException e ) {
System.out.println(this.file + " is not well-formed!");
System.out.println("Error Message: " + e.getMessage());
}
catch (IOException e) {
System.out.println("Cannot access: " + this.file.toString());
System.out.println("Error Message: " + e.getMessage());
}
return doc;
}
Now I also want to validate the XML against an XSD. I read the API and it tells to use JAXP and stuff and I don't know how.
The application is using JDom 1.1.1 and the examples I found online used some classes that are not available in this version. Can someone explain how to validate an XML against an XSD, especially for this version.

How about simply copy-pasting code from the JDOM FAQ?

Or, use JDOM 2.0.1, and change the line:
SAXBuilder builder = new SAXBuilder();
to be
SAXBuilder builder = new SAXBuilder(XMLReaders.XSDVALIDATING);
See the JDOM 2.0.1 javadoc (examples at bottom of page): http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/input/sax/package-summary.html
Oh, and I should update the FAQs

Related

How to use the return value of a java method in an XML

I have an xml like this
<step>
<navigationButtonSelector>.step-one-action-button</navigationButtonSelector>
<formField>
<name>emailAddress</name>
<value>new2#gmail.com</value>
</formField>
</step>
Now I can't hardcode the value (new2#gmail.com) here. Instead I have to randomly generate a string and use it whenever I am running my automation suite. Can I generate a random string at runtime within the xml itself.
The other possibility is to create a Java method which returns a random String. Now how could I use that value in my XML file? Please help
This code may look bloated but actually is very powerful in case you have to perform more editing. It shows the whole roundtrip from parsing, modifying and serializing the XML data from and to String.
String xml = "<step>\n" +
" <navigationButtonSelector>.step-one-action-button</navigationButtonSelector>\n" +
" <formField>\n" +
" <name>emailAddress</name>\n" +
" <value>new2#gmail.com</value>\n" +
" </formField>\n" +
"</step>";
// parse document
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml)));
// find editing locatioin
XPath xpath = XPathFactory.newInstance().newXPath();
Element value = (Element)xpath.evaluate("/step/formField/value", doc, XPathConstants.NODE);
// replace value
while (value.hasChildNodes()) {
value.removeChild(value.getFirstChild());
}
value.appendChild(doc.createTextNode("a random string"));
// serialize document
StringWriter sw = new StringWriter();
Transformer t = TransformerFactory.newDefaultInstance().newTransformer();
t.transform(new DOMSource(doc), new StreamResult(sw));
xml = sw.toString();
System.out.println(xml);

Javax DocumentBuilder produces “double-UTF-8’ed” charset encoding

I’ve got a Java DOM Document which MyFilter has rewritten. From logging output I know that the contents of the Document are still correct. I am using the following lines to convert theDocument to a List<String> to pass it back through an interface:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
transformer.transform(new DOMSource(theDocument), new StreamResult(buffer));
return Arrays.asList(new String(buffer.toByteArray()).split("\r?\n"));
The filter is called from this file copying method using org.apache.commons.io.FileUtils:
List<String> lines = FileUtils.readLines(source, "UTF-8");
if (filters != null) {
for (final MyFilter filter : filters) {
lines = filter.filter(lines);
}
}
FileUtils.writeLines(destination, "UTF-8", lines);
This works perfectly fine on my machine (where I could debug it), but on other machines just running the code, reproducibly any non-ASCII characters get double-UTF-8’ed (e.g., Größe becomes Größe). The code is executed within a web app running in Tomcat. I am sure they are differently configured, but what I want is that I get the non-corrupt result on any configuration.
Any ideas what I could be missing?
When you have Document object created you have to read it Content.
After it you have to write it to file using LSSerializer interface, which DOM standart provides for this purpose.
By default, the LSSerializer produces an XML document without spaces or line
breaks. As a result, the output looks less pretty, but it is actually more suitable for parsing by another program because it is free from unnecessary white space.
If you want white space, you use yet another magic incantation after creating the serializer:
ser.getDomConfig().setParameter("format-pretty-print", true);
Code snippets looks like:
private String getContentFromDocument(Document doc) {
String content;
DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer ser = implLS.createLSSerializer();
ser.getDomConfig().setParameter("format-pretty-print", true);
content = ser.writeToString(doc);
return content;
}
And after you have string content you can write it to file, like:
public void writeToXmlFile(String xmlContent) {
File theDir = new File("./output");
if (!theDir.exists())
theDir.mkdir();
String fileName = "./output/" + this.getClass().getSimpleName() + "_"
+ Calendar.getInstance().getTimeInMillis() + ".xml";
try (OutputStream stream = new FileOutputStream(new File(fileName))) {
try (OutputStreamWriter out = new OutputStreamWriter(stream, StandardCharsets.UTF_8)) {
out.write(xmlContent);
out.write("\n");
}
} catch (IOException ex) {
System.err.println("Cannot write to file!" + ex.getMessage());
}
}
BTW:
Have you tried to get Document object at a little bit easier, like:
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = documentFactory.newDocumentBuilder();
Document doc = builder.parse(new File(fileName));
You can try this as well. It should be enough for parsing xml file.
I finally found it: The problem was within the String(byte[]) constructor which interprets byte[] relative to the platform’s default charset. This should at least have been tagged deprecated. The transformer obviously produces UTF-8 output independent of the platform. Changing the method like below passes the same charset to both:
final String ENCODING = "UTF-8";
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, ENCODING);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
transformer.transform(new DOMSource(theDocument), new StreamResult(buffer));
return Arrays.asList(new String(buffer.toByteArray(), ENCODING).split("\r?\n"));
To get it working, it does not really matter which encoding, just both should use the same. Hovever, it is good to choose some unicode charset as otherwise unmappable characters may get lost. However, the charset will be reflected in the XML declaration, thus when the List<String> gets saved later, it is important to save it accordigly.

Apache Tika : How to use XPath queries

I am parsing an XML file using Apache Tika. I would like to extract certain tags with their content from the XML and store them in a HashMap. Right now, i can extract the entire content of the XML but the tags are lost
//detecting the file type
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = null;
try
{
inputstream = new FileInputStream(new File(ParseXML.class.getClassLoader().getResource("xml/a.xml").toURI()));
}
catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
ParseContext pcontext = new ParseContext();
//Xml parser
XMLParser xmlparser = new XMLParser();
xmlparser.parse(inputstream, handler, metadata, pcontext);
System.out.println("Contents of the document:" + handler.toString());
System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
}
which shows me the entire content of the XML
now, i want to extract certain parts of the XML, and since Tika allows XPath queries, i tried this
XPathParser xhtmlParser = new XPathParser("xhtml", XHTMLContentHandler.XHTML);
Matcher divContentMatcher = xhtmlParser.parse("/Product/Source/Publisher/PublisherName[#nameType='Person']");
ContentHandler xhandler = new MatchingContentHandler(
new ToXMLContentHandler(), divContentMatcher);
AutoDetectParser parser = new AutoDetectParser();
Metadata xmetadata = new Metadata();
try (FileInputStream stream = new FileInputStream(new File(ParseXML.class.getClassLoader().getResource("xml/a.xml").toURI()))) {
parser.parse(stream, xhandler, xmetadata);
System.out.println(xhandler.toString());
} catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
but it does not show any output! i was hoping it would only give me the nodes specified in the XQuery.
Any idea what's going on?
by the way, here is the corresponding XML
<Product productID="xvc22" shortProductID="x" language="en">
<ProductStatus statusType="Published" />
<Source>
<Publisher sequence="1" primaryIndicator="Yes">
<PublisherID idType="Shortname">jjkjkj</PublisherID>
<PublisherID idType="BM">6666</PublisherID>
<PublisherName nameType="Legal">ABT</PublisherName>
<PublisherName nameType="Person">
<LastName>pppp</LastName>
<FirstName>lkkk</FirstName>
</PublisherName>
</Publisher>
</Source>
</Product>
also, when i test the query on
http://www.freeformatter.com/xpath-tester.html
i see the correct result i.e.
Element='<PublisherName nameType="Person">
<LastName>pppp</LastName>
<FirstName>lkkk</FirstName>
</PublisherName>'
is this some syntax issue with JAVA or Tika?
EDIT
Note that if i parse without Tika, it works
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File(ParseXML.class.getClassLoader().getResource("xml/a.xml").toURI()));
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/Product/Source/Publisher/PublisherName[#nameType='Person']");
System.out.println(expr.evaluate(doc, XPathConstants.STRING));
this prints out
pppp
lkkk
which is perfect. so why cant Tika parse the XPath query?

Converting XML to document in java creates null document

I'm trying to parse xml, downloaded from the web, in java, following examples from here (stackoverflow) and other sources.
First I pack the xml in a string:
String xml = getXML(url, logger);
If I printout the xml string at this point:
System.out.println("XML " + xml);
I get a printout of the xml so I'm assuming there is no fault up to this point.
Then I try to create a document that I can evaluate:
InputSource is= new InputSource(new StringReader(xml));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
If I print out the document here:
System.out.println("Doc: " + doc);
I get:
Doc: [#document: null]
When I later try to evaluate expressions with Xpath I get java.lang.NullPointerException and also when just trying to get the length of the root:
System.out.println("Root length " + rootNode.getLength());
which leaves me to believe the document (and later the node) is truly null.
When I try to print out the Input Source or the Node I get eg.
Input Source: org.xml.sax.InputSource#29453f44
which I don't know how to interpret.
Can any one see what I've done wrong or suggest a way forward?
Thanks in advance.
You may need another way to render the document as a string.
For JDOM:
public static String toString(final Document document) {
try {
final ByteArrayOutputStream out = new ByteArrayOutputStream(1024);
final XMLOutputter outp = new XMLOutputter();
outp.output(document, out);
final String string = out.toString("UTF-8");
return string;
}
catch (final Exception e) {
throw new IllegalStateException("Cannot stringify document.", e);
}
}
The output
org.xml.sax.InputSource#29453f44
simply is the class name + the hash code of the instance (as defined in the Object class). It indicates that the class of the instance has toString not overridden.

Cannot create XML Document from String

I am trying to create an org.w3c.dom.Document form an XML string. I am using this How to convert string to xml file in java as a basis. I am not getting an exception, the problem is that my document is always null. The XML is system generated and well formed. I wish to convert it to a Document object so that I can add new Nodes etc.
public static org.w3c.dom.Document stringToXML(String xmlSource) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = IOUtils.toInputStream(xmlSource); //uses Apache commons to obtain InputStream
BOMInputStream bomIn = new BOMInputStream(input); //create BOMInputStream from InputStream
InputSource is = new InputSource(bomIn); // InputSource with BOM removed
Document document = builder.parse(new InputSource(new StringReader(xmlSource)));
Document document2 = builder.parse(is);
System.out.println("Document=" + document.getDoctype()); // always null
System.out.println("Document2=" + document2.getDoctype()); // always null
return document;
}
I have tried these things: I created a BOMInputStream thinking that a BOM was causing the conversion to fail. I thought that this was my issue but passing the BOMInputStream to the InputSource doesn't make a difference. I have even tried passing a literal String of simple XML and nothing but null. The toString method returns [#document:null]
I am using Xpages, a JSF implementation that uses Java 6. Full name of Document class used to avoid confusion with Xpage related class of the same name.
Don't rely on what toString is telling you. It is providing diagnostic information that it thinks is useful about the current class, which is, in this case, nothing more then...
"["+getNodeName()+": "+getNodeValue()+"]";
Which isn't going to help you. Instead, you will need to try and transform the model back into a String, for example...
String text
= "<fruit>"
+ "<banana>yellow</banana>"
+ "<orange>orange</orange>"
+ "<pear>yellow</pear>"
+ "</fruit>";
InputStream is = null;
try {
is = new ByteArrayInputStream(text.getBytes());
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
System.out.println("Document=" + document.toString()); // always null
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.setOutputProperty(OutputKeys.METHOD, "xml");
tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
ByteArrayOutputStream os = null;
try {
os = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(document);
StreamResult sr = new StreamResult(os);
tf.transform(domSource, sr);
System.out.println(new String(os.toByteArray()));
} finally {
try {
os.close();
} finally {
}
}
} catch (ParserConfigurationException | SAXException | IOException | TransformerConfigurationException exp) {
exp.printStackTrace();
} catch (TransformerException exp) {
exp.printStackTrace();
} finally {
try {
is.close();
} catch (Exception e) {
}
}
Which outputs...
Document=[#document: null]
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<fruit>
<banana>yellow</banana>
<orange>orange</orange>
<pear>yellow</pear>
</fruit>
You can try using this: http://www.wissel.net/blog/downloads/SHWL-8MRM36/$File/SimpleXMLDoc.java

Categories

Resources