Ignoring DTD when parsing XML - java

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here
And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?
Here is a code snippet:
public BlastXMLParser(String filePath) {
Builder b = new Builder(false);
//not a good idea to have exception-throwing code in constructor
try {
_document = b.build(filePath);
} catch (ParsingException ex) {
Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
} catch (IOException ex) {
//
}
private Elements getBlastReads() {
Element root = _document.getRootElement();
Elements rootChildren = root.getChildElements();
for (int i = 0; i < rootChildren.size(); i++) {
Element child = rootChildren.get(i);
if (child.getLocalName().equals("BlastOutput_iterations")) {
return child.getChildElements();
}
}
return null;
}
}
I get a NullPointerException at this line:
Element root = _document.getRootElement();
With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.

The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you
don't have access to the DTD and
are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
you are using the Xerces XML Parser implementation
you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
...
XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);

If not using XOM but simply JAXP the abovementioned solution just need to be tweaked into
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(...);

According to their documentation this is the way to parse document without any validation.
try {
Builder parser = new Builder();
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
If you do want to validate XML schema you have to call new Builder(true):
try {
Builder parser = new Builder(true);
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
Pay attention that now yet another exception can be thrown: ValidityException

Related

How to do "builder.parse(xml file stored inside of this jar)"?

I am trying to use a file object, with the path of a XML file inside of the current jar file, which is running, in builder.parse(not the absolute path to the xml file);
DocumentBuilder builder = dbf.newDocumentBuilder();
Document doc = builder.parse("resources/userConfig.xml");
The code works in eclipse but doesn't in a exported jar file. When i run the exported jar it can't find the XML in C:\Users...
For your case you need the one but last method below, that takes an InputStream. The last method is added as an example for your case with the file in a jar in the classpath. You may want to do the exceptionhandling differently.
public class XMLLib {
public static DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
public static Document readXML(File file) {
try {
final DocumentBuilder builder = builderFactory.newDocumentBuilder();
return builder.parse(file);
} catch (ParserConfigurationException ex) {
return null;
} catch (SAXException | IOException ex) {
return null;
}
}
public static Document readXML(InputStream is) {
try {
final DocumentBuilder builder = builderFactory.newDocumentBuilder();
return builder.parse(is);
} catch (ParserConfigurationException ex) {
return null;
} catch (SAXException | IOException ex) {
return null;
}
}
//example call:
public static Document getDocumentResource(String resourcepath){
try (InputStream is = XMLLib.class.getResourceAsStream(resourcepath)){
return readXML(is);
} catch (IOException ex) {}
return null;
}
}
Some more explaination:
In the Eclipse environment the xml-file is extracted, so you can access it directly as a File(as it is a "File").
Now with everything in a jar the xml doesn't exist as a "File" anymore so you have to extract it from the jar file.
The classloader is able to do so via getResourceAsStream - just like it can read other resources and classes.
So in essence, what this is doing is loading the xml like java loads a class. For that to work, the resourcepath must be given with respect to the classpath. Only the InputStream version above will work(or some other aproach e.g. using Path and Filesystem)
Usualy if you have a resource in "resources/userConfig.xml" the path is simple "/userConfig.xml". (but that depends on how the project is assembled)
I can only guess(depends on how the project is assembled) in your case you need:
Document document = XMLLib.getDocumentResource("/userConfig.xml");
Important: the InputStream Version works with the file in the jar and with the file extracted(this is allways) - as long as it can be found by the classloader.

JAXB unmarshal: unexpected element

Background:
I am using JAXB to unmarshal XML into Java objects. Originally, I was using just JAXB to perform the unmarshal. Then a static analysis was performed on the code and a high criticality issue was raised for XML External Entity Injection. After a little research, I found a suggestion (https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#JAXB_Unmarshaller) to use a parser configured to prevent external entities from being parsed. An example of what to do was provided:
//Disable XXE
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
//Do unmarshall operation
Source xmlSource = new SAXSource(spf.newSAXParser().getXMLReader(), new InputSource(new StringReader(xml)));
JAXBContext jc = JAXBContext.newInstance(Object.class);
Unmarshaller um = jc.createUnmarshaller();
um.unmarshal(xmlSource);
I have not done this exactly as shown, but I believe I have done the same in effect:
XMLReader reader = getXMLReader();
if (reader == null) {
logger.warn("Unable to create XML reader");
return;
}
JAXBContext context = JAXBContext.newInstance(messageClass);
Unmarshaller unmarshaller = context.createUnmarshaller();
for (File file : files) {
try {
InputSource source = new InputSource(new FileReader(file));
Source xmlSource = new SAXSource(reader, source);
JAXBElement<? extends BaseType> object =
(JAXBElement<? extends BaseType>) unmarshaller.unmarshal(xmlSource);
messages.add(object.getValue());
} catch (FileNotFoundException e) {
logger.error("Exception", e);
}
}
...
private XMLReader getXMLReader() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
} catch (SAXNotRecognizedException | SAXNotSupportedException
| ParserConfigurationException e) {
logger.error("Exception", e);
}
XMLReader reader = null;
try {
reader = factory.newSAXParser().getXMLReader();
} catch (SAXException | ParserConfigurationException e) {
logger.error("Exception", e);
}
return reader;
}
Problem:
After implementing the correction, I am now getting an unmarshal exception when the program attempts to read in XML:
javax.xml.bind.UnmarshalException: unexpected element (uri:"", local:"ns1:TypeXYZ"). Expected elements are <{protected namespace URI}TypeABC>,...<{protected namespace URI}TypeXYZ>,...
Before the above fix where I was just using JAXB to unmarshal, it was able to properly parse the provided XML with no problem.
I assume that the SAX parser expects the XML to provide extra information that's missing, or that it needs to be configured to ignore whatever it's complaining about. I tried a few other "features" (http://xml.org/sax/features/namespace-prefixes=true and http://xml.org/sax/features/validation=false), but that did not resolve the problem.
I have no control over the XML schema that defines the XML types, nor do I have control over how the corresponding Java classes are generated.
Any information to help me understand what's going on and that helps me resolve this problem, would be very much appreciated.
After a little experimentation I was able to resolve the error by setting the following features:
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://xml.org/sax/features/namespaces", true);
factory.setFeature("http://xml.org/sax/features/namespace-prefixes", true);

How to prevent XML Injection like XML Bomb and XXE attack

I am developing an android application with
android:minSdkVersion="14"
In this app in need to parse an xml.For that I am using a DOM parser like this
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = null;
Document doc = null;
try {
dBuilder = dbFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
But when the code is checked for security I got two security issues on line
dBuilder = dbFactory.newDocumentBuilder();, which are
1.XML Entity Expansion Injection (XML Bomb)
2.XML External Entity Injection (XXE attack)
After some researching I added the line
dbFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
But now I am getting an exception when this line is executed
javax.xml.parsers.ParserConfigurationException: http://javax.xml.XMLConstants/feature/secure-processing
Can anybody help me?
Did you try the following snippet from OWASP page?
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException; // catching unsupported features
...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
// This is the PRIMARY defense. If DTDs (doctypes) are disallowed, almost all XML entity attacks are prevented
// Xerces 2 only - http://xerces.apache.org/xerces2-j/features.html#disallow-doctype-decl
String FEATURE = "http://apache.org/xml/features/disallow-doctype-decl";
dbf.setFeature(FEATURE, true);
// If you can't completely disable DTDs, then at least do the following:
// Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-general-entities
// Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-general-entities
FEATURE = "http://xml.org/sax/features/external-general-entities";
dbf.setFeature(FEATURE, false);
// Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-parameter-entities
// Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-parameter-entities
FEATURE = "http://xml.org/sax/features/external-parameter-entities";
dbf.setFeature(FEATURE, false);
// and these as well, per Timothy Morgan's 2014 paper: "XML Schema, DTD, and Entity Attacks" (see reference below)
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
// And, per Timothy Morgan: "If for some reason support for inline DOCTYPEs are a requirement, then
// ensure the entity settings are disabled (as shown above) and beware that SSRF attacks
// (http://cwe.mitre.org/data/definitions/918.html) and denial
// of service attacks (such as billion laughs or decompression bombs via "jar:") are a risk."
// remaining parser logic
...
catch (ParserConfigurationException e) {
// This should catch a failed setFeature feature
logger.info("ParserConfigurationException was thrown. The feature '" +
FEATURE +
"' is probably not supported by your XML processor.");
...
}
catch (SAXException e) {
// On Apache, this should be thrown when disallowing DOCTYPE
logger.warning("A DOCTYPE was passed into the XML document");
...
}
catch (IOException e) {
// XXE that points to a file that doesn't exist
logger.error("IOException occurred, XXE may still possible: " + e.getMessage());
...
}
String jaxbContext = "com.fnf.dfbatch.jaxb";
JAXBContext jc = null;
Unmarshaller u = null;
String FEATURE_GENERAL_ENTITIES = "http://xml.org/sax/features/external-general-entities";
String FEATURE_PARAMETER_ENTITIES = "http://xml.org/sax/features/external-parameter-entities";
try {
jc = JAXBContext.newInstance(jaxbContext);
u = jc.createUnmarshaller();
/*jobsDef = (BatchJobs) u.unmarshal(DfBatchDriver.class
.getClassLoader().getResourceAsStream(
DfJobManager.configFile));*/
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature(FEATURE_GENERAL_ENTITIES, false);
dbf.setFeature(FEATURE_PARAMETER_ENTITIES, false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(DfBatchDriver.class
.getClassLoader().getResourceAsStream(
DfJobManager.configFile));
jobsDef = (BatchJobs) u.unmarshal(document);

validation a xml with several xsd schema DOM java

After search on internet and in differnets forums, have not found my answer.
I have a XML file which is define by two XSD schema.
For write the XML file, there are two ways to write the XML file :
(I have to delete the "<" charactere to display the XML file)
First methode to write it :
?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Policy xmlns="http://www.W3C.com/Policy/v3#" xmlns:ns2="http://www.W3C.com /PolicyExtension/v3#">
DigestAlg Algorithm="http://test"/>
Transforms>
Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n20010315"></Transform>
/Transforms>
ns2:Validation>
ns2:ConditionID>1.0.1</ns2:ConditionID>
ns2:TConditionID>1.0.2</ns2:TConditionID>
/ns2:Validation>
/Policy>"
second methodes :
?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Policy xmlns="http://www.W3C.com/Policy/v3#">
DigestAlg Algorithm="http://test"/>
Transforms>
Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n20010315"></Transform>
/Transforms>
Validation xmlns:ns2="http://www.W3C.com/PolicyExtension/v3#">
ConditionID>1.0.1</ns2:ConditionID>
TConditionID>1.0.2</ns2:TConditionID>
/Validation>
/Policy>
For pasring my XML files, i use :
InputStream doc = new FileInputStream(myXMLFile);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
List<Source> sourceListSchema = new ArrayList<Source>();
sourceListSchema.add(new StreamSource(SignaturePolicy.class.getResourceAsStream(MY_XSD_SCHEMA_1)));
sourceListSchema.add(new StreamSource(SignaturePolicy.class.getResourceAsStream(MY_XSD_SCHEMA_2)));
Schema schema;
try {
Source[] sourceTmp = new Source[1];
schema = sf.newSchema(sourceListSchema.toArray(sourceTmp));
} catch (SAXException e) {
LogMachine.logger.severe(
"SAXException : The schema can not be parse :"+e.getMessage());
}
dbf.setIgnoringElementContentWhitespace(true);
dbf.setNamespaceAware(true);
dbf.setIgnoringComments(true);
dbf.setSchema(schema);
DocumentBuilder db;
try {
db = dbf.newDocumentBuilder();
documentPolicy = db.parse(Doc);
} catch (ParserConfigurationException e) {
LogMachine.logger.severe(
"ParserConfigurationException : the file can not be parse by DOM :"+e.getMessage());
} catch (SAXException e) {
LogMachine.logger.severe(
"SAXException : the file can not be parse by DOM :"+e.getMessage());
} catch (IOException e) {
LogMachine.logger.severe(
"IOException : the file can not be open like a file :"+e.getMessage());
}
When I want to parse this documents with DOM, the first XML file display an error
Exception in thread "main" org.w3c.dom.ls.LSException: The prefix "ns2" for element "ns2:Validation" is not bound.
But the second XML file is well parse.
Someone can help me to parse the two documents ??
Thank you for you help

How to validate an XML file against an XSD file?

I'm generating some xml files that needs to conform to an xsd file that was given to me. How should I verify they conform?
The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd:
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}
The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.
You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.
Here's how to do it using Xerces2. A tutorial for this, here (req. signup).
Original attribution: blatantly copied from here:
import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;
public class SchemaTest {
public static void main (String args[]) {
File docFile = new File("memory.xml");
try {
DOMParser parser = new DOMParser();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
"memory.xsd");
ErrorChecker errors = new ErrorChecker();
parser.setErrorHandler(errors);
parser.parse("memory.xml");
} catch (Exception e) {
System.out.print("Problem parsing the file.");
}
}
}
We build our project using ant, so we can use the schemavalidate task to check our config files:
<schemavalidate>
<fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>
Now naughty config files will fail our build!
http://ant.apache.org/manual/Tasks/schemavalidate.html
Since this is a popular question, I will point out that java can also validate against "referred to" xsd's, for instance if the .xml file itself specifies XSD's in the header, using xsi:schemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) ex:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
...
or schemaLocation (always a list of namespace to xsd mappings)
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/my_namespace http://www.example.com/document.xsd">
...
The other answers work here as well, because the .xsd files "map" to the namespaces declared in the .xml file, because they declare a namespace, and if matches up with the namespace in the .xml file, you're good. But sometimes it's convenient to be able to have a custom resolver...
From the javadocs: "If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
and this works for multiple namespaces, etc.
The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll by default go out and hit the network with each and every validation, not always optimal.
Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):
public static void verifyValidatesInternalXsd(String filename) throws Exception {
InputStream xmlStream = new new FileInputStream(filename);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new RaiseOnErrorHandler());
builder.parse(new InputSource(xmlStream));
xmlStream.close();
}
public static class RaiseOnErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
public void error(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
public void fatalError(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
}
You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:
Source xmlFile = new StreamSource(xmlFileLocation);
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setResourceResolver(new LSResourceResolver() {
#Override
public LSInput resolveResource(String type, String namespaceURI,
String publicId, String systemId, String baseURI) {
InputSource is = new InputSource(
getClass().getResourceAsStream(
"some_local_file_in_the_jar.xsd"));
// or lookup by URI, etc...
return new Input(is); // for class Input see
// https://stackoverflow.com/a/2342859/32453
}
});
validator.validate(xmlFile);
See also here for another tutorial.
I believe the default is to use DOM parsing, you can do something similar with SAX parser that is validating as well saxReader.setEntityResolver(your_resolver_here);
Using Java 7 you can follow the documentation provided in package description.
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
// validate the DOM tree
try {
validator.validate(new StreamSource(new File("instance.xml"));
} catch (SAXException e) {
// instance document is invalid!
}
If you have a Linux-Machine you could use the free command-line tool SAXCount. I found this very usefull.
SAXCount -f -s -n my.xml
It validates against dtd and xsd.
5s for a 50MB file.
In debian squeeze it is located in the package "libxerces-c-samples".
The definition of the dtd and xsd has to be in the xml! You can't config them separately.
With JAXB, you could use the code below:
#Test
public void testCheckXmlIsValidAgainstSchema() {
logger.info("Validating an XML file against the latest schema...");
MyValidationEventCollector vec = new MyValidationEventCollector();
validateXmlAgainstSchema(vec, inputXmlFileName, inputXmlSchemaName, inputXmlRootClass);
assertThat(vec.getValidationErrors().isEmpty(), is(expectedValidationResult));
}
private void validateXmlAgainstSchema(final MyValidationEventCollector vec, final String xmlFileName, final String xsdSchemaName, final Class<?> rootClass) {
try (InputStream xmlFileIs = Thread.currentThread().getContextClassLoader().getResourceAsStream(xmlFileName);) {
final JAXBContext jContext = JAXBContext.newInstance(rootClass);
// Unmarshal the data from InputStream
final Unmarshaller unmarshaller = jContext.createUnmarshaller();
final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final InputStream schemaAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(xsdSchemaName);
unmarshaller.setSchema(sf.newSchema(new StreamSource(schemaAsStream)));
unmarshaller.setEventHandler(vec);
unmarshaller.unmarshal(new StreamSource(xmlFileIs), rootClass).getValue(); // The Document class is the root object in the XML file you want to validate
for (String validationError : vec.getValidationErrors()) {
logger.trace(validationError);
}
} catch (final Exception e) {
logger.error("The validation of the XML file " + xmlFileName + " failed: ", e);
}
}
class MyValidationEventCollector implements ValidationEventHandler {
private final List<String> validationErrors;
public MyValidationEventCollector() {
validationErrors = new ArrayList<>();
}
public List<String> getValidationErrors() {
return Collections.unmodifiableList(validationErrors);
}
#Override
public boolean handleEvent(final ValidationEvent event) {
String pattern = "line {0}, column {1}, error message {2}";
String errorMessage = MessageFormat.format(pattern, event.getLocator().getLineNumber(), event.getLocator().getColumnNumber(),
event.getMessage());
if (event.getSeverity() == ValidationEvent.FATAL_ERROR) {
validationErrors.add(errorMessage);
}
return true; // you collect the validation errors in a List and handle them later
}
}
One more answer: since you said you need to validate files you are generating (writing), you might want to validate content while you are writing, instead of first writing, then reading back for validation. You can probably do that with JDK API for Xml validation, if you use SAX-based writer: if so, just link in validator by calling 'Validator.validate(source, result)', where source comes from your writer, and result is where output needs to go.
Alternatively if you use Stax for writing content (or a library that uses or can use stax), Woodstox can also directly support validation when using XMLStreamWriter. Here's a blog entry showing how that is done:
If you are generating XML files programatically, you may want to look at the XMLBeans library. Using a command line tool, XMLBeans will automatically generate and package up a set of Java objects based on an XSD. You can then use these objects to build an XML document based on this schema.
It has built-in support for schema validation, and can convert Java objects to an XML document and vice-versa.
Castor and JAXB are other Java libraries that serve a similar purpose to XMLBeans.
Using Woodstox, configure the StAX parser to validate against your schema and parse the XML.
If exceptions are caught the XML is not valid, otherwise it is valid:
// create the XSD schema from your schema file
XMLValidationSchemaFactory schemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
XMLValidationSchema validationSchema = schemaFactory.createSchema(schemaInputStream);
// create the XML reader for your XML file
WstxInputFactory inputFactory = new WstxInputFactory();
XMLStreamReader2 xmlReader = (XMLStreamReader2) inputFactory.createXMLStreamReader(xmlInputStream);
try {
// configure the reader to validate against the schema
xmlReader.validateAgainst(validationSchema);
// parse the XML
while (xmlReader.hasNext()) {
xmlReader.next();
}
// no exceptions, the XML is valid
} catch (XMLStreamException e) {
// exceptions, the XML is not valid
} finally {
xmlReader.close();
}
Note: If you need to validate multiple files, you should try to reuse your XMLInputFactory and XMLValidationSchema in order to maximize the performance.
Are you looking for a tool or a library?
As far as libraries goes, pretty much the de-facto standard is Xerces2 which has both C++ and Java versions.
Be fore warned though, it is a heavy weight solution. But then again, validating XML against XSD files is a rather heavy weight problem.
As for a tool to do this for you, XMLFox seems to be a decent freeware solution, but not having used it personally I can't say for sure.
Validate against online schemas
Source xmlFile = new StreamSource(Thread.currentThread().getContextClassLoader().getResourceAsStream("your.xml"));
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(Thread.currentThread().getContextClassLoader().getResource("your.xsd"));
Validator validator = schema.newValidator();
validator.validate(xmlFile);
Validate against local schemas
Offline XML Validation with Java
I had to validate an XML against XSD just one time, so I tried XMLFox. I found it to be very confusing and weird. The help instructions didn't seem to match the interface.
I ended up using LiquidXML Studio 2008 (v6) which was much easier to use and more immediately familiar (the UI is very similar to Visual Basic 2008 Express, which I use frequently). The drawback: the validation capability is not in the free version, so I had to use the 30 day trial.

Categories

Resources