Validate a XSD file - java

I want to validate an XSD file (not XML). The approach i am using is to treat the XSD as any other XML file and use this www.w3.org/2001/XMLSchema.xsd as the schema.
I am using the following code:
String schemaLang = "http://www.w3.org/2001/XMLSchema";
SchemaFactory factory = SchemaFactory.newInstance(schemaLang);
Schema schema = factory.newSchema(new StreamSource("C:\\Users\\aprasad\\Desktop\\XMLSchema.xsd"));
Validator validator = schema.newValidator();
validator.validate(new StreamSource("shiporder.xsd"));
But i am getting the following error:
Failed to read schema document 'XMLSchema.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
Not sure what the error is as the file path is correct.
Please tell me the correct approach to validate an XSD file.

You need to have two additional files right beside XMLSchema.xsd. These are:
XMLSchema.dtd
datatypes.dtd
XMLSchema.xsd references these two files.
Right beside, so if XMLSchema.xsd is located at C:/XMLSchema.xsd then you have to have C:/XMLSchema.dtd and C:/datatypes.dtd.
SchemaFactory instances use (see SchemaFactory.setResourceResolver(LSResourceResolver)) by default an internal class called XMLCatalogResolver which implements LSResourceResolver. The former (I assume) looks for referenced files beside the referer.
If you look really hard then the cause of your SAXParseException is a FileNotFoundException that says the the system couldn't find the XMLSchema.dtd file.
Other than this, your code is OK (and your schema too).

According to the javadoc for the StreamSource class, if you use the constructor method that takes a String, that string needs to be a valid URI. For example, if you are trying to reference a local file, you may need to prefix the path with file:/. Alternatively, you can pass a File object to the constructor:
Schema schema = factory.newSchema(new File(new StreamSource("C:\\Users\\aprasad\\Desktop\\XMLSchema.xsd")));
In summary, it would be beneficial in this case to do some simple testing to rule out problems caused by your program not finding the necessary files, for example
File schemaFile1 = new File("C:\\Users\\aprasad\\Desktop\\XMLSchema.xsd");
File schemaFile2 = new File("shiporder.xsd");
assert schemaFile1.exists();
assert schemaFile2.exists();

I wonder what you are trying to achieve? If factory.newSchema(X) throws no exception, then X must be a valid schema(*). That seems a much more straightforward thing to do than validating against the schema for schema documents.
(*) the reverse isn't necessarily true of course: X might be valid against the schema for schema documents, but be invalid for other reasons, such as violating a UPA constraint.

Related

Is there any handy way to inspect whether a xml file contins invalid characters

I am writing a Java program that parses/unmarshals XML files to Java objects.
This program takes XML files, which are generated by some third party and I do not have any control over of.
Upon getting the files, the program checks whether they are invalid format using their respective XSDs↓
URL schemaFile = this.getClass().getClassLoader().getResource(xsd/some.xsd);
Source xmlFile = new StreamSource(new File(/path/to/xml));
SchemaFactory schemaFactory = SchemaFactory.newInstance(W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
then starts parsing/unmarshalling them individually using JAXP.
The problem I am facing is that even after the validation above, sometimes I get the following error. (the validator above does not seem to check whether the XML contains invalid characters, but only compare the input with its XSD)
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[xxx,xxx]
Is there any handy way to inspect whether XML file contains invalid characters using programmatically or some tool?
I have extracted the portion(line 245) where the exception occurs using "sed -n '240,250p'".
sample.xml
Do you have a whitelist of allowed characters? Here's one pattern:
For each streamed character, if it is not whitelisted replace it with nothing.
Ask whether your file content after filtering is the same as before (diff pattern)
If the content in both files is not equal then the source file had invalid characters.

Validating XSD itself

Could anyone please tell me how to validate an XSD file itself (not XML against XSD)? I have checked many forums and sites (including SO) and most of them refers some or the other online validator. But this is not a one-time check for us. Our application involves an XSL transformation using an XSD, so we need to determine whether the XSD to be used is itself in a valid format or not, as in, all the tags match, with a starting and a closing one. Certain tags aren't allowed as a child tag, etc. That's why we need a proper java code to achieve the same.
Any help would be highly appreciated.
You can validate an XSD file against the w3 XSD schema that can be found here.
Use the same validation techniques you validate any other XML file with an XSD file, only the source document would be your XSD file.
you can use xmllint for that:
xmllint --noout --dtdvalid http://www.w3.org/2001/XMLSchema.dtd my-schema.xsd
You can try javax.xml.validation package
SchemaFactory f = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema s = f.newSchema(new File("1.xsd"));
Schema.newSchema() API
Parses the specified File as a schema and returns it as a Schema
You can validate your XSD online here.
Just copy and paste your XSD here and click on validate Schema , it will give you the result.

How do I validate an XML with a schema file that imports another schema file?

I want to validate XML files against this schema (that is inside the zip); it imports two other XSD files.
<import namespace="http://www.w3.org/2000/09/xmldsig#"
schemaLocation="xmldsig-core-schema.xsd"/>
<import namespace="http://www.w3.org/2001/04/xmlenc#"
schemaLocation="xenc-schema.xsd"/>
The 2 files are also available here:
http://www.forum-datenaustausch.ch/xmldsig-core-schema.zip
http://www.forum-datenaustausch.ch/xenc-schema.zip.
On validation, I get this error:
Src-resolve: Cannot Resolve The Name 'xenc:EncryptedData' To A(n) 'element Declaration' Component.
My validation/unmarshalling code looks like this (using moxy as a JAXB provider):
jaxbContext = JAXBContext.newInstance(type.getRequestType().getPackage().getName());
Unmarshaller um = jaxbContext.createUnmarshaller();
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new StreamSource(this.getClass().getResourceAsStream("/xsd/" + type.getXsdName())));
um.setSchema(schema);
root = um.unmarshal(new StreamSource(new ByteArrayInputStream(xmlData)), type.getRequestType());
Before you ask what does the type do: I wrote code that could import all types of invoices from http://www.forum-datenaustausch.ch/. But versions 4.3 and above use the two additional schema files. How can validate the XML files?
Have a look at the accepted answer for this post. Based on the little I can see in your post, and from what I can remember, the problem with the way you're loading the XSD has to do with the fact that doing it your way, the factory doesn't get a base uri, hence it can't apply its smarts to resolve references to the other two XSDs. I have to assume that your other two XSDs are also packed as resources in the same jar, in the same directory.
I have to admit that I am intrigued by the error message, which seems to imply the schema it loaded is valid, so it might be an XML issue; if the above doesn't help you, then you should consider posting the offending XML as well...
UPDATE: As per my comments, it is working as described. See code snippet below:
SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema compiledSchema = schemaFactory.newSchema(new SOTests().getClass()
.getClassLoader().getResource("generalInvoiceRequest_430.xsd"));
Validator validator = compiledSchema.newValidator();
try {
validator.validate(new StreamSource("D:\\...\\dentist_ersred_TG_430.xml"));
System.out.println("Valid.");
}
catch (SAXException ex) {
System.out.print("Not valid because..." + ex.getMessage());
}
You don't need to load more than the first XSD, and you shouldn't, since all the imports provide hints to the schema location.
The XSD files are all in the same directory in the generated jar, and they must be, since the relative URIs used for imports indicate same parent.
The program output with one of the files downloaded from here:
Valid.
And after introducing invalid content:
Not valid because...cvc-complex-type.2.4.a: Invalid content was found starting with element 'wrong'. One of '{"http://www.forum-datenaustausch.ch/invoice":processing}' is expected.
My recommendations to who reads this:
use getResource() to get an URL; as I've said above, a StreamResource doesn't have a base URI, hence relative references as per your xsd:import schemaLocation attribute cannot be de-referenced.
Let the schema/factory do the loading for you, when they're explicitly provided.
make sure that when provided, relative references match the folder structure in your jar file.
Similar problems here. I tried to import the GeneralInvoice into an integration Software.
Parser didn't like the DOCTYPE tag at the start of the import schemas. Solution: deleted the DOCTYPE section.
Parser didn't lke anyAttribute, anyURI etc. Solution: changed anyURI to string and removed other stuff by putting comments around.
Had to import the xenc and xwldsig xsd's first and then import GeneralInvoice in a second step.
Hope this helps

How to validate a document using a grammar in Xerces

I have following situation
- I create XML-documents (DocumentImpl) on the fly (using data). So the XML is never written to disc.
- I create XSD-schemas on the fly (using data-definitions), these also are never written to disc. The grammars are complex with assertions, so they need to be used as XMLSchema v1.1
- I store the grammars (SchemaGrammar) from the XSD-schemas in a hashmap, this is because the same grammars are often used more times.
Now my question,
I want to validate the documents against a grammar. I know which grammar to use. They are found by the according data-definition-name.
My problem is that I cannot find example code how to do this, because all the examples seem to work from streams or files, while I have the objects ready.
I think, it works like this:
`
XMLGrammarPoolImpl pool = new XMLGrammarPoolImpl();
pool.putGrammar(grammar);
XMLSchema11Factory factory = new XMLSchema11Factory();
Schema schema = factory.newSchema(pool);
Validator validator = schema.newValidator();
DOMSource source = new DOMSource(document);
validator.validate(source);
`

Creating XML Schema from URL works but from Local File fails?

I need to validate XML Schema Instance (XSD) documents which are programmatically generated so I'm using the following Java snippet, which works fine:
SchemaFactory factory = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema xsdSchema = factory.newSchema( // Reads URL every time...
new URL("http://www.w3.org/2001/XMLSchema.xsd"));
Validator xsdValidator = xsdSchema.newValidator();
xsdValidator.validate(new StreamSource(schemaInstanceStream));
However, when I save the XML Schema definition file locally and refer to it this way:
Schema schema = factory.newSchema(
new File("test/xsd/XMLSchema.xsd"));
It fails with the following exception:
org.xml.sax.SAXParseException: schema_reference.4: Failed to read schema document 'file:/Users/foo/bar/test/xsd/XMLSchema.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
I've ensured that the file exists and is readable by doing exists() and canRead() assertions on the File object. I've also downloaded the file with a couple different utilities (web browser, wget) to ensure that there is no corruption.
Any idea why I can validate XSD instance documents when I generate the schema from the HTTP URL but I get the above exception when trying to generate from a local file with the same contents?
[Edit]
To elaborate, I've tried multiple forms of factory.newSchema(...) using Readers and InputStreams (instead of the File directly) and still get exactly the same error. Moreover, I've dumped the file contents before using it or the various input streams to ensure it's the right one. Quite vexing.
Full Answer
It turns out that there are three additional files referenced by XML Schema which must be also stored locally and XMLSchema.xsd contains an import statement whose schemaLocation attribute must be changed. Here are the files that must be saved in the same directory:
XMLSchema.xsd - change schemaLocation to "xml.xsd" in the "import" element for XML Namespace.
XMLSchema.dtd - as is.
datatypes.dtd - as is.
xml.xsd - as is.
Thanks to #Blaise Doughan and #Tomasz Nurkiewicz for their hints.
I assume you are trying to load XMLSchema.xsd. Please also download XMLSchema.dtd and datatypes.dtd and put them in the same directory. This should push you a little bit further.
UPDATE
Is XMLSchema.xsd importing any other schemas by relative paths that are not on the local file systen?
Your relative path may not be correct wrt your working directory. Try entering a fully qualified path to eliminate the possibility that the file can not be found.
org.xml.sax.SAXParseException: schema_reference.4: Failed to read
schema document 'file:/Users/foo/bar/test/xsd/XMLSchema.xsd', because
1) could not find the document; 2) the document could not be read; 3)
the root element of the document is not .

Categories

Resources