I have the following code which takes XML as input and produces a bunch of other files as output.
public void transformXml(InputStream inputFileStream, Path outputDir) {
try {
Resource resource = resourceLoader
.getResource("classpath:demo.xslt");
LOGGER.info("Creating output XMLs and Assessment Report in {}", outputDir);
final File outputFile = new File(outputDir.toString());
final Processor processor = getSaxonProcessor();
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable stylesheet = compiler.compile(new StreamSource(resource.getFile()));
Xslt30Transformer transformer = stylesheet.load30();
Serializer out = processor.newSerializer(outputFile);
out.setOutputProperty(Serializer.Property.METHOD, "xml");
transformer.transform(new StreamSource(inputFileStream), out);
LOGGER.debug("Generated DTD XMLs and Assessment Report successfully in {}", outputDir);
} catch (SaxonApiException e) {
throw new XmlTransformationException("Error occured during transformation", e);
} catch (IOException e) {
throw new XmlTransformationException("Error occured during loading XSLT file", e);
}
}
private Processor getSaxonProcessor() {
final Configuration configuration = Configuration.newConfiguration();
configuration.disableLicensing();
Processor processor = new Processor(configuration);
return processor;
}
The XML input contains a DOCTYPE tag which resolves to a DTD that is not available to me. Hence why I am wanting to use a catalog to point it to a dummy DTD which is on my classpath.
I am struggling to find a way to this. Most examples that I find out there, are not using the s9api implementation. Any ideas?
Instead of
new StreamSource(inputFileStream)
you should instantiate a SAXSource, containing an XMLReader initialized to use the catalog resolver as its EntityResolver.
If you need to do the same thing for other source documents, such as those read using doc() or document(), you should supply a URIResolver which itself returns a SAXSource initialized in the same way.
There are other ways of doing it using Saxon configuration properties, but I think the above is the simplest.
I have some classes with JAXB annotations, I have created some instances and I need to validate them against my XSD files. I should be able to get the details of what is wrong when the objects are invalid.
So far I haven't had luck, I know about this class ValidationEventHandler but apperantly I can use it with the Unmarshaller class, the problem is that I have to validate the objects not the raw XML.
I have this code:
MyClass myObject = new MyClass();
JAXBContext jaxbContext = JAXBContext.newInstance("x.y.z");
JAXBSource jaxbSource = new JAXBSource(jaxbContext, myObject);
SchemaFactory factory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source schemaFile = new StreamSource(getClass().getClassLoader()
.getResourceAsStream("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(jaxbSource);
This code will work, it will validate the object and throw an exception with the message, something like this:
cvc-pattern-valid: Value '12345678901' is not facet-valid with respect
to pattern '\d{10}' for type 'id'.]
The problem is that I need specific details, with a string like that I would have to parse all the messages.
You can set an instance of ErrorHandler on the Validator to catch individual errors:
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyErrorHandler());
validator.validate(source);
MyErrorHandler
Below is a sample implementation of the ErrorHandler interface. If you don't rethrow the exception the validation will continue.
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class MyErrorHandler implements ErrorHandler {
public void warning(SAXParseException exception) throws SAXException {
System.out.println("\nWARNING");
exception.printStackTrace();
}
public void error(SAXParseException exception) throws SAXException {
System.out.println("\nERROR");
exception.printStackTrace();
}
public void fatalError(SAXParseException exception) throws SAXException {
System.out.println("\nFATAL ERROR");
exception.printStackTrace();
}
}
For More Information
http://blog.bdoughan.com/2010/11/validate-jaxb-object-model-with-xml.html
I. If you validate a complex object hierarchy, you can create the Marshaller yourself and set its listener:
Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.setListener(yourListener);
JAXBSource source = new JAXBSource(marshaller, object);
This listener will get notified with instances of your objects as it walks the hierarchy.
II. Add an ErrorHandler from the other answer. At least with Wildfly 15 the messages look like:
cvc-maxInclusive-valid: Value '360.953674' is not facet-valid with respect to maxInclusive '180.0' for type '#AnonType_longitudeGeographicalPosition'.'
cvc-type.3.1.3: The value '360.953674' of element 'longitude' is not valid.'
So you can parse out the element name, which is the guilty terminal field name.
III. Combine I and II with some introspection and you can reconstruct a full Java Beans style path to the erroneous field if necessary.
I have
an XML document,
base XSD file and
extended XSD file.
Both XSD files have one namespace.
File 3) includes file 2): <xs:include schemaLocation="someschema.xsd"></xs:include>
XML document (file 1) has following root tag:
<tagDefinedInSchema xmlns="http://myurl.com/myapp/myschema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://myurl.com/myapp/myschema schemaFile2.xsd">
where schemaFile2.xsd is the file 3 above.
I need to validate file 1 against both schemas, without
modifying the file itself and
merging two schemas in one file.
How can I do this in Java?
UPD: Here is the code I'm using.
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
DocumentBuilderFactory documentFactory = DocumentBuilderFactory
.newInstance();
documentFactory.setNamespaceAware(namespaceAware);
DocumentBuilder builder = documentFactory.newDocumentBuilder();
Document document = builder.parse(new ByteArrayInputStream(xmlData
.getBytes("UTF-8")));
File schemaLocation = new File(schemaFileName);
Schema schema = schemaFactory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
Source source = new DOMSource(document);
validator.validate(source);
UPD 2: This works for me:
public static void validate(final String xmlData,
final String schemaFileName, final boolean namespaceAware)
throws SAXException, IOException {
final SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
schemaFactory.setResourceResolver(new MySchemaResolver());
final Schema schema = schemaFactory.newSchema();
final Validator validator = schema.newValidator();
validator.setResourceResolver(schemaFactory.getResourceResolver());
final InputSource is = new InputSource(new ByteArrayInputStream(xmlData
.getBytes("UTF-8")));
validator.validate(new SAXSource(is), new SAXResult(new XMLReaderAdapter()));
}
class MySchemaResolver implements LSResourceResolver {
#Override
public LSInput resolveResource(final String type,
final String namespaceURI, final String publicId, String systemId,
final String baseURI) {
final LSInput input = new DOMInputImpl();
try {
if (systemId == null) {
systemId = SCHEMA1;
}
FileInputStream fis = new FileInputStream(
new File("path_to_schema_directory/" + systemId));
input.setByteStream(fis);
return input;
} catch (FileNotFoundException ex) {
LOGGER.error("File Not found", ex);
return null;
}
}
}
A bit of terminology: you have one schema here, which is built from two schema documents.
If you specify schemaFile2.xsd to the API when building the Schema, it should automatically pull in the other document via the xs:include. If you suspect that this isn't happening, you need to explain what the symptoms are that cause you to believe this.
It may seem a bit inefficient, but couldn't you validate against schema A, create a new validator using schema B and validate against that one too?
I want to validate an XML file against an XSD schema. The XML files root element does not have any namespace or xsi details. It has no attributes so just <root>.
I have tried the following code from http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html with no luck as I receive
cvc-elt.1: Cannot find the declaration of element 'root'
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaFile = new File("schema.xsd");
Schema xsdScheme = factory.newSchema(schemaFile);
Validator validator = xsdScheme.newValidator();
Source source = new StreamSource(xmlfile);
validator.validate(source);
The xml validates fine with the namespace headers included etc (added via xmlspy), but I would have thought the xml namespace could be declared without having to manually edit the source file?
Edit and Solution:
public static void validateAgainstXSD(File file) {
try {
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaFile = new File("path/to/xsd");
Schema xsdScheme = factory.newSchema(schemaFile);
Validator validator = xsdScheme.newValidator();
SAXSource source = new SAXSource(
new NamespaceFilter(XMLReaderFactory.createXMLReader()),
new InputSource(new FileInputStream(file)));
validator.validate(source,null);
} catch (Exception e) {
e.printStackTrace();
}
}
protected static class NamespaceFilter extends XMLFilterImpl {
String requiredNamespace = "namespace";
public NamespaceFilter(XMLReader parent) {
super(parent);
}
#Override
public void startElement(String arg0, String arg1, String arg2, Attributes arg3) throws SAXException {
if(!arg0.equals(requiredNamespace))
arg0 = requiredNamespace;
super.startElement(arg0, arg1, arg2, arg3);
}
}
You have two separate concerns you need to take care of:
Declaring the namespace that your document uses.
Putting an xsi:schemaLocation attribute in the file to give a hint (!) where the schema is.
You can safely skip the second part, as the location is really only a hint. You cannot skip the first part. The namespace declared in the XML file is matched against the schema. Important, this:
<xml> ... </xml>
Is not the same as this:
<xml xmlns="urn:foo"> ... </xml>
So you need to declare your namespace in the XML document, otherwise it will not correspond to your schema and you will get this error.
I'm generating some xml files that needs to conform to an xsd file that was given to me. How should I verify they conform?
The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd:
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}
The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.
You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.
Here's how to do it using Xerces2. A tutorial for this, here (req. signup).
Original attribution: blatantly copied from here:
import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;
public class SchemaTest {
public static void main (String args[]) {
File docFile = new File("memory.xml");
try {
DOMParser parser = new DOMParser();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
"memory.xsd");
ErrorChecker errors = new ErrorChecker();
parser.setErrorHandler(errors);
parser.parse("memory.xml");
} catch (Exception e) {
System.out.print("Problem parsing the file.");
}
}
}
We build our project using ant, so we can use the schemavalidate task to check our config files:
<schemavalidate>
<fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>
Now naughty config files will fail our build!
http://ant.apache.org/manual/Tasks/schemavalidate.html
Since this is a popular question, I will point out that java can also validate against "referred to" xsd's, for instance if the .xml file itself specifies XSD's in the header, using xsi:schemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) ex:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
...
or schemaLocation (always a list of namespace to xsd mappings)
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/my_namespace http://www.example.com/document.xsd">
...
The other answers work here as well, because the .xsd files "map" to the namespaces declared in the .xml file, because they declare a namespace, and if matches up with the namespace in the .xml file, you're good. But sometimes it's convenient to be able to have a custom resolver...
From the javadocs: "If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
and this works for multiple namespaces, etc.
The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll by default go out and hit the network with each and every validation, not always optimal.
Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):
public static void verifyValidatesInternalXsd(String filename) throws Exception {
InputStream xmlStream = new new FileInputStream(filename);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new RaiseOnErrorHandler());
builder.parse(new InputSource(xmlStream));
xmlStream.close();
}
public static class RaiseOnErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
public void error(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
public void fatalError(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
}
You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:
Source xmlFile = new StreamSource(xmlFileLocation);
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setResourceResolver(new LSResourceResolver() {
#Override
public LSInput resolveResource(String type, String namespaceURI,
String publicId, String systemId, String baseURI) {
InputSource is = new InputSource(
getClass().getResourceAsStream(
"some_local_file_in_the_jar.xsd"));
// or lookup by URI, etc...
return new Input(is); // for class Input see
// https://stackoverflow.com/a/2342859/32453
}
});
validator.validate(xmlFile);
See also here for another tutorial.
I believe the default is to use DOM parsing, you can do something similar with SAX parser that is validating as well saxReader.setEntityResolver(your_resolver_here);
Using Java 7 you can follow the documentation provided in package description.
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
// validate the DOM tree
try {
validator.validate(new StreamSource(new File("instance.xml"));
} catch (SAXException e) {
// instance document is invalid!
}
If you have a Linux-Machine you could use the free command-line tool SAXCount. I found this very usefull.
SAXCount -f -s -n my.xml
It validates against dtd and xsd.
5s for a 50MB file.
In debian squeeze it is located in the package "libxerces-c-samples".
The definition of the dtd and xsd has to be in the xml! You can't config them separately.
With JAXB, you could use the code below:
#Test
public void testCheckXmlIsValidAgainstSchema() {
logger.info("Validating an XML file against the latest schema...");
MyValidationEventCollector vec = new MyValidationEventCollector();
validateXmlAgainstSchema(vec, inputXmlFileName, inputXmlSchemaName, inputXmlRootClass);
assertThat(vec.getValidationErrors().isEmpty(), is(expectedValidationResult));
}
private void validateXmlAgainstSchema(final MyValidationEventCollector vec, final String xmlFileName, final String xsdSchemaName, final Class<?> rootClass) {
try (InputStream xmlFileIs = Thread.currentThread().getContextClassLoader().getResourceAsStream(xmlFileName);) {
final JAXBContext jContext = JAXBContext.newInstance(rootClass);
// Unmarshal the data from InputStream
final Unmarshaller unmarshaller = jContext.createUnmarshaller();
final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final InputStream schemaAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(xsdSchemaName);
unmarshaller.setSchema(sf.newSchema(new StreamSource(schemaAsStream)));
unmarshaller.setEventHandler(vec);
unmarshaller.unmarshal(new StreamSource(xmlFileIs), rootClass).getValue(); // The Document class is the root object in the XML file you want to validate
for (String validationError : vec.getValidationErrors()) {
logger.trace(validationError);
}
} catch (final Exception e) {
logger.error("The validation of the XML file " + xmlFileName + " failed: ", e);
}
}
class MyValidationEventCollector implements ValidationEventHandler {
private final List<String> validationErrors;
public MyValidationEventCollector() {
validationErrors = new ArrayList<>();
}
public List<String> getValidationErrors() {
return Collections.unmodifiableList(validationErrors);
}
#Override
public boolean handleEvent(final ValidationEvent event) {
String pattern = "line {0}, column {1}, error message {2}";
String errorMessage = MessageFormat.format(pattern, event.getLocator().getLineNumber(), event.getLocator().getColumnNumber(),
event.getMessage());
if (event.getSeverity() == ValidationEvent.FATAL_ERROR) {
validationErrors.add(errorMessage);
}
return true; // you collect the validation errors in a List and handle them later
}
}
One more answer: since you said you need to validate files you are generating (writing), you might want to validate content while you are writing, instead of first writing, then reading back for validation. You can probably do that with JDK API for Xml validation, if you use SAX-based writer: if so, just link in validator by calling 'Validator.validate(source, result)', where source comes from your writer, and result is where output needs to go.
Alternatively if you use Stax for writing content (or a library that uses or can use stax), Woodstox can also directly support validation when using XMLStreamWriter. Here's a blog entry showing how that is done:
If you are generating XML files programatically, you may want to look at the XMLBeans library. Using a command line tool, XMLBeans will automatically generate and package up a set of Java objects based on an XSD. You can then use these objects to build an XML document based on this schema.
It has built-in support for schema validation, and can convert Java objects to an XML document and vice-versa.
Castor and JAXB are other Java libraries that serve a similar purpose to XMLBeans.
Using Woodstox, configure the StAX parser to validate against your schema and parse the XML.
If exceptions are caught the XML is not valid, otherwise it is valid:
// create the XSD schema from your schema file
XMLValidationSchemaFactory schemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
XMLValidationSchema validationSchema = schemaFactory.createSchema(schemaInputStream);
// create the XML reader for your XML file
WstxInputFactory inputFactory = new WstxInputFactory();
XMLStreamReader2 xmlReader = (XMLStreamReader2) inputFactory.createXMLStreamReader(xmlInputStream);
try {
// configure the reader to validate against the schema
xmlReader.validateAgainst(validationSchema);
// parse the XML
while (xmlReader.hasNext()) {
xmlReader.next();
}
// no exceptions, the XML is valid
} catch (XMLStreamException e) {
// exceptions, the XML is not valid
} finally {
xmlReader.close();
}
Note: If you need to validate multiple files, you should try to reuse your XMLInputFactory and XMLValidationSchema in order to maximize the performance.
Are you looking for a tool or a library?
As far as libraries goes, pretty much the de-facto standard is Xerces2 which has both C++ and Java versions.
Be fore warned though, it is a heavy weight solution. But then again, validating XML against XSD files is a rather heavy weight problem.
As for a tool to do this for you, XMLFox seems to be a decent freeware solution, but not having used it personally I can't say for sure.
Validate against online schemas
Source xmlFile = new StreamSource(Thread.currentThread().getContextClassLoader().getResourceAsStream("your.xml"));
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(Thread.currentThread().getContextClassLoader().getResource("your.xsd"));
Validator validator = schema.newValidator();
validator.validate(xmlFile);
Validate against local schemas
Offline XML Validation with Java
I had to validate an XML against XSD just one time, so I tried XMLFox. I found it to be very confusing and weird. The help instructions didn't seem to match the interface.
I ended up using LiquidXML Studio 2008 (v6) which was much easier to use and more immediately familiar (the UI is very similar to Visual Basic 2008 Express, which I use frequently). The drawback: the validation capability is not in the free version, so I had to use the 30 day trial.