How to validate a document using a grammar in Xerces

How to validate a document using a grammar in Xerces - java

I have following situation
- I create XML-documents (DocumentImpl) on the fly (using data). So the XML is never written to disc.
- I create XSD-schemas on the fly (using data-definitions), these also are never written to disc. The grammars are complex with assertions, so they need to be used as XMLSchema v1.1
- I store the grammars (SchemaGrammar) from the XSD-schemas in a hashmap, this is because the same grammars are often used more times.
Now my question,
I want to validate the documents against a grammar. I know which grammar to use. They are found by the according data-definition-name.
My problem is that I cannot find example code how to do this, because all the examples seem to work from streams or files, while I have the objects ready.

I think, it works like this:
`
XMLGrammarPoolImpl pool = new XMLGrammarPoolImpl();
pool.putGrammar(grammar);
XMLSchema11Factory factory = new XMLSchema11Factory();
Schema schema = factory.newSchema(pool);
Validator validator = schema.newValidator();
DOMSource source = new DOMSource(document);
validator.validate(source);
`

Related

Validating XSD itself

Could anyone please tell me how to validate an XSD file itself (not XML against XSD)? I have checked many forums and sites (including SO) and most of them refers some or the other online validator. But this is not a one-time check for us. Our application involves an XSL transformation using an XSD, so we need to determine whether the XSD to be used is itself in a valid format or not, as in, all the tags match, with a starting and a closing one. Certain tags aren't allowed as a child tag, etc. That's why we need a proper java code to achieve the same.
Any help would be highly appreciated.

You can validate an XSD file against the w3 XSD schema that can be found here.
Use the same validation techniques you validate any other XML file with an XSD file, only the source document would be your XSD file.

you can use xmllint for that:
xmllint --noout --dtdvalid http://www.w3.org/2001/XMLSchema.dtd my-schema.xsd

You can try javax.xml.validation package
SchemaFactory f = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema s = f.newSchema(new File("1.xsd"));
Schema.newSchema() API
Parses the specified File as a schema and returns it as a Schema

You can validate your XSD online here.
Just copy and paste your XSD here and click on validate Schema , it will give you the result.

Validate a XSD file

I want to validate an XSD file (not XML). The approach i am using is to treat the XSD as any other XML file and use this www.w3.org/2001/XMLSchema.xsd as the schema.
I am using the following code:
String schemaLang = "http://www.w3.org/2001/XMLSchema";
SchemaFactory factory = SchemaFactory.newInstance(schemaLang);
Schema schema = factory.newSchema(new StreamSource("C:\\Users\\aprasad\\Desktop\\XMLSchema.xsd"));
Validator validator = schema.newValidator();
validator.validate(new StreamSource("shiporder.xsd"));
But i am getting the following error:
Failed to read schema document 'XMLSchema.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
Not sure what the error is as the file path is correct.
Please tell me the correct approach to validate an XSD file.

You need to have two additional files right beside XMLSchema.xsd. These are:
XMLSchema.dtd
datatypes.dtd
XMLSchema.xsd references these two files.
Right beside, so if XMLSchema.xsd is located at C:/XMLSchema.xsd then you have to have C:/XMLSchema.dtd and C:/datatypes.dtd.
SchemaFactory instances use (see SchemaFactory.setResourceResolver(LSResourceResolver)) by default an internal class called XMLCatalogResolver which implements LSResourceResolver. The former (I assume) looks for referenced files beside the referer.
If you look really hard then the cause of your SAXParseException is a FileNotFoundException that says the the system couldn't find the XMLSchema.dtd file.
Other than this, your code is OK (and your schema too).

According to the javadoc for the StreamSource class, if you use the constructor method that takes a String, that string needs to be a valid URI. For example, if you are trying to reference a local file, you may need to prefix the path with file:/. Alternatively, you can pass a File object to the constructor:
Schema schema = factory.newSchema(new File(new StreamSource("C:\\Users\\aprasad\\Desktop\\XMLSchema.xsd")));
In summary, it would be beneficial in this case to do some simple testing to rule out problems caused by your program not finding the necessary files, for example
File schemaFile1 = new File("C:\\Users\\aprasad\\Desktop\\XMLSchema.xsd");
File schemaFile2 = new File("shiporder.xsd");
assert schemaFile1.exists();
assert schemaFile2.exists();

I wonder what you are trying to achieve? If factory.newSchema(X) throws no exception, then X must be a valid schema(*). That seems a much more straightforward thing to do than validating against the schema for schema documents.
(*) the reverse isn't necessarily true of course: X might be valid against the schema for schema documents, but be invalid for other reasons, such as violating a UPA constraint.

How to transform XML with XSL using Java

I am currently using the standard javax.xml.transform library to transform my XML to CSV using XSL. My XSL file is large - at around 950 lines. My XML files can be quite large also.
It was working fine in the prototype stage with a fraction of the XSL in place at around 50 lines or so. Now in the 'final system' when it performs the transform it comes up with the error Branch target offset too large for short.
private String transformXML() {
String formattedOutput = "";
try {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer =
tFactory.newTransformer( new StreamSource( xslFilename ) );
StreamSource xmlSource = new StreamSource(new ByteArrayInputStream( xmlString.getBytes() ) );
ByteArrayOutputStream baos = new ByteArrayOutputStream();
transformer.transform( xmlSource, new StreamResult( baos ) );
formattedOutput = baos.toString();
} catch( Exception e ) {
e.printStackTrace();
}
return formattedOutput;
}
I came across a few postings on this error but not too sure what to do.
Am I doing anything wrong code wise?
Are there any alternative 3rd Party transformers available that could do this?
Thanks,
Andez

Try Saxon instead.
Your code would stay the same. All you would need to do is set javax.xml.transform.TransformerFactory to net.sf.saxon.TransformerFactoryImpl in the JVM's system properties.

Use saxon. offtop: if you use the same stylesheet to transform many XML files, you might want to consider templates (pre-compiled stylesheets):
javax.xml.transform.Templates style = tFactory.newTemplates(xslSource);
style.newTransformer().transform(...);

I came across a post on the net that mentioned apache XALAN. So I added the jars to my project. Everything has started working since even though I do not directly reference any XALAN classes in my code. As far as I can tell it still should use the jaxax.xml classes.
Not too sure what is happening there. But it is working.

As an alternative to Saxon, you can split up your large template into smaller templates.
Template definitions contained in XSLT stylesheets are compiled by SAP
JVM's XSLT compiler "Xalan" into Java methods for faster execution of
transformations. Java bytecode branch instructions contained in these
Java methods are limited to 32K offsets. Large template definitions
can now lead to very large Java methods, where the branch offset would
need to be larger than 32K. Therefore these stylesheets cannot be
compiled to Java methods and therefore cannot be used for
transformations.
Solution
Since each template definition of an XSLT stylesheet is compiled into
a separate Java method, using multiple smaller templates can be used
as solution. A very large template can be broken into multiple smaller
templates by using the "call-template" element.
It is described in-depth in this article Size limitation for XSLT stylesheets.
Sidenote: I would only recommend this as a last resort if saxon is not available, as this requires quite a few changes to your xsl file.

Is there a Java XML API that can parse a document without resolving character entities?

I have program that needs to parse XML that contains character entities. The program itself doesn't need to have them resolved, and the list of them is large and will change, so I want to avoid explicit support for these entities if I can.
Here's a simple example:
<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>
Is there a Java XML API that can parse a document successfully without resolving (non-standard) character entities? Ideally it would translate them into a special event or object that could be handled specially, but I'd settle for an option that would silently suppress them.
Answer & Example:
Skaffman gave me the answer: use a StAX parser with IS_REPLACING_ENTITY_REFERENCES set to false.
Here's the code I whipped up to try it out:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
new FileInputStream("your file here"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
For the above XML, it will print "Entity Reference: something".

The STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:
Requires the parser to replace
internal entity references with their
replacement text and report them as
characters
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader. However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to not replace them. Still, it's got to be worth a try.

Works for me only when disabling support of external entities:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
inputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

A SAX parse with an org.xml.sax.EntityResolver might suit your purpose. You could for sure suppress them, and you could probably find a way to leave them unresolved.
This tutorial seems the most relevant: it shows how to resolve entities into strings.

I am not a Java developer, but I "think" Java xml classes support a similar functionality to .net for accomplishing this. IN .net the xmlreadersettings class you set the ProhibitDtd property false and set the XmlResolver property to null. This will cause the parser to ignore externally referenced entities without throwing an exception when they are read. I just did a google search for "Java ignore enity" and got lots of hits, some of which appear to address this topic. I realize this is not a total answer to your question but it should point you in a useful direction.

How can I validate documents against Schematron schemas in Java?

As far as I can tell, JAXP by default supports W3C XML Schema and RelaxNG from Java 6.
I can see a few APIs, mostly experimental or incomplete, on the schematron.com links page.
Is there an approach on validating schematron in Java that's complete, efficient and can be used with the JAXP API?

Jing supports pre-ISO Schematron validation (note that Jing's implementation is based also on XSLT).
There are also XSLT implementations that can be very easily invoked from Java. You need to decide what version of Schematron you are interested in and then get the corresponding stylesheet - all of them should be available from schematron.com. The process is very simple simple, involving basically 2 steps:
apply the skeleton XSLT on your Schematron schema to get a new XSLT stylesheet that represents your Schematron schema in XSLT
apply the obtained XSLT on your instance document or documents to validate them
JAXP is just an API and it does not require support for Relax NG from an implementation. You need to check the specific implementation that you use to see if that supports Relax NG or not.

A pure Java Schematron implementation is located at https://github.com/phax/ph-schematron/
It brings support for both the XSLT approach and the pure Java approach.

You can check out SchematronAssert (disclosure: my code). It is meant primarily for unit testing, but you may use it for normal code too. It is implemented using XSLT.
Unit testing example:
ValidationOutput result = in(booksDocument)
.forEvery("book")
.check("author")
.validate();
assertThat(result).hasNoErrors();
Standalone validation example:
StreamSource schemaSource = new StreamSource(... your schematron schema ...);
StreamSource xmlSource = new StreamSource(... your xml document ... );
StreamResult output = ... here your SVRL will be saved ...
// validation
validator.validate(xmlSource, schemaSource, output);
Work with an object representation of SVRL:
ValidationOutput output = validator.validate(xmlSource, schemaSource);
// look at the output
output.getFailures() ...
output.getReports() ...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to validate a document using a grammar in Xerces - java

Related

Validating XSD itself

Validate a XSD file

How to transform XML with XSL using Java

Is there a Java XML API that can parse a document without resolving character entities?

How can I validate documents against Schematron schemas in Java?

Categories

Resources