I have a xml file and I want to convert it into a java object using JAXB. I am getting an exception related to validation. It seems JAXB is validating it against the DTD which is declared in the xml file. Unfortunately the DTD is not at the location which is mentioned in the xml file. So I kept a local copy and used an EntityResolver to make JAXB use the local DTD. And the code worked like a charm.
Below is the code
JAXBContext context = JAXBContext.newInstance(Student.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
final SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
final XMLReader reader = saxParserFactory.newSAXParser().getXMLReader();
reader.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(final String publicId, final String systemId) throws SAXException, IOException {
return new InputSource(getClass().getResourceAsStream("/student.dtd"));
}
});
final SAXSource saxSource = new SAXSource(reader, new InputSource(inputStream));
student = (Student) unmarshaller.unmarshal(saxSource);
This code is inside my method parse(xmlFilePath). Every time I call parse method a new EntityResolver is created. Is this not redundant? Can I create one EntityResolver in the class's constructor and pass it to the setEntityResolver method?
public StudentParser() {
this.entityResolver = new EntityResolver() {
#Override
public InputSource resolveEntity(final String publicId, final String systemId) throws SAXException, IOException {
return new InputSource(getClass().getResourceAsStream("/student.dtd"));
}
}
}
Inside parse method
public Student parse(filePath) {
...
...
reader.setEntityResolver(this.entityResolver);
...
}
First of all, check these answers:
How to disable DTD fetching using JAXB2.0
Make DocumentBuilder.parse ignore DTD references
How to read well formed XML in Java, but skip the schema?
How do I turn off validation when parsing well-formed XML using DocumentBuilder.parse?
How to ignore inline DTD when parsing XML file in Java
In short, you can disable the DTD processing completely.
Next, as long as your entity resolver is thread-safe and reusable you definitely don't need to create new instances every time.
We are trying to track down a bug. We get the above error in the logs.
Can anyone explain what this message means? Are there any typical reasons for getting this message?
The stacktrace is:
org.apache.axiom.om.OMException: java.lang.IllegalArgumentException: local part cannot be "null" when creating a QName
at org.apache.axiom.om.impl.builder.StAXOMBuilder.next(StAXOMBuilder.java:206)
at org.apache.axiom.om.impl.llom.OMNodeImpl.build(OMNodeImpl.java:318)
at org.apache.axiom.om.impl.llom.OMElementImpl.build(OMElementImpl.java:618)
at org.apache.axis2.jaxws.message.util.impl.SAAJConverterImpl.toOM(SAAJConverterImpl.java:147)
at org.apache.axis2.jaxws.message.impl.XMLPartImpl._convertSE2OM(XMLPartImpl.java:77)
at org.apache.axis2.jaxws.message.impl.XMLPartBase.getContentAsOMElement(XMLPartBase.java:203)
at org.apache.axis2.jaxws.message.impl.XMLPartBase.getAsOMElement(XMLPartBase.java:255)
at org.apache.axis2.jaxws.message.impl.MessageImpl.getAsOMElement(MessageImpl.java:464)
at org.apache.axis2.jaxws.message.util.MessageUtils.putMessageOnMessageContext(MessageUtils.java:202)
at org.apache.axis2.jaxws.core.controller.AxisInvocationController.prepareRequest(AxisInvocationController.java:370)
at org.apache.axis2.jaxws.core.controller.InvocationController.invoke(InvocationController.java:120)
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:317)
at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invoke(JAXWSProxyHandler.java:148)
I got the same error message (local part cannot be "null" when creating a QName) while trying to construct a org.w3c.dom.Document from String. The problem went away after calling setNamespaceAware(true) on DocumentBuilderFactory. Working code snippet is given below.
private static Document getDocumentFromString(final String xmlContent)
throws Exception
{
DocumentBuilderFactory documentBuilderFactory =
DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
try
{
return documentBuilderFactory
.newDocumentBuilder()
.parse(new InputSource(new StringReader(xmlContent)));
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}
It means you are creating a DOM element or attribute using one of the namespace methods like createElementNS thus
document.createElementNS(namespace, null)
or createElementNS or setAttrbuteNS
and the second argument, the qname is null, or includes a prefix but no local part as in "foo:".
EDIT:
I would try to run the XML it's parsing through a validator. It's likely there's some tag or attribute name like foo: or foo:bar:baz that is a valid XML identifier but invalid according to the additional restrictions introduced by XML namespaces.
After whole hours in searching, I just wanted to share the Answer in this thread helped me with a Talend code migration - involving SOAP messages - from java8 to java11.
// Used libraries
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.soap.SOAPBody;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
...
// This is the node I want to replace: "<DataArea><Contact/></DataArea>"
// <SOAP-ENV:Body> > SOAP Action (e.g. <ns:GetContact>) > <GetContactRequest> > <DataArea>
SOAPBody soapBodyRequest = objSOAPMessage.getSOAPPart().getEnvelope().getBody();
Node nodeDataArea = soapBodyRequest.getFirstChild().getFirstChild().getFirstChild();
// Build a valid Node object starting from a string e.g. "<Contact> etc etc nested-etc </Contact>"
DocumentBuilderFactory objDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
// As per java11, this is essential. It forces the Factory to consider the ':' as a namespace separator rather than part of a tag.
objDocumentBuilderFactory.setNamespaceAware(true);
// Create the node to replace "<DataArea><Contact/></DataArea>" with "<DataArea><Contact>content and nested tags</Contact></DataArea>"
Node nodeParsedFromString = objDocumentBuilderFactory.newDocumentBuilder().parse(new ByteArrayInputStream(strDataArea.getBytes())).getDocumentElement();
// Import the newly parsed Node object in the request envelop being built and replace the existing node.
nodeDataArea.replaceChild(
/*newChild*/nodeDataArea.getOwnerDocument().importNode(nodeParsedFromString, true),
/*oldChild*/nodeDataArea.getFirstChild()
);
This throws the Local part cannot be "null" when creating a QName exception if you don't put the .setNamespaceAware(true) instruction
Although this is an old thread, I hope this answer would help some one else searching this error.
I have faced the same error when I was trying to build a web app using maven-enunciate-cxf-plugin:1.28.
For me this was caused after I added a checked exception to my web service signature:
#WebMethod(operationName = "enquirySth")
public IBANResponse enquirySth(String input) throws I
InvalidInputException { ...
I have used JAX-WS Spec for exception throwing but no success. At the end I have found this issue in Enunciate issue tracker system which indicates this issue is resolved in current version, but I think it still exist.
Finally I have done following workaround to fix my problem:
adding #XmlRootElement to my FaultBean.
#XmlRootElement
#XmlAccessorType(XmlAccessType.FIELD)
#XmlType(name = "FaultBean",propOrder = {"errorDescription", "errorCode"})
public class FaultBean implements Serializable {
#XmlElement(required = true, nillable = true)
protected String errorDescription;
#XmlElement(required = true, nillable = true)
protected String errorCode;
public FaultBean() {
}
public String getErrorDescription() {
return this.errorDescription;
}
public void setErrorDescription(String var1) {
this.errorDescription = var1;
}
public String getErrorCode() {
return this.errorCode;
}
public void setErrorCode(String var1) {
this.errorCode = var1;
}
}
That's it. Hope it helps.
I'm trying to parse a simple fragment of HTML with NekoHTML :
<h1>This is a basic test</h1>
To do so, I've set a specific Neko feature not to have any HTML, HEAD or BODY tag calling startElement(..) callback.
Unfortunatly, it doesn't work for me.. I certainly missed something but can't figured out what it would be.
Here is a very simple code to reproduce my problem :
public static class MyContentHandler implements ContentHandler {
public void characters(char[] ch, int start, int length) throws SAXException {
String text = String.valueOf(ch, start, length);
System.out.println(text);
}
public void startElement(String nameSpaceURI, String localName, String rawName, Attributes attributes) throws SAXException {
System.out.println(rawName);
}
public void endElement(String nameSpaceURI, String localName, String rawName) throws SAXException {
System.out.println("end " + localName);
}
}
And the main() to launch a test :
public static void main(String[] args) throws SAXException, IOException {
SAXParser saxReader = new SAXParser();
// set the feature like explained in documentation : http://nekohtml.sourceforge.net/faq.html#fragments
saxReader.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);
saxReader.setContentHandler(new MyContentHandler());
saxReader.parse(new InputSource(new StringInputStream("<h1>This is a basic test</h1>")));
}
The corresponding output :
HTML
HEAD
end HEAD
BODY
H1
This is a basic test
end H1
end BODY
end HTML
whereas I was expecting
H1
This is a basic test
end H1
Any idea ?
I finally got it !
Actually, I was parsing my HTML string in a GWT application, where I've added the gwt-dev.jar dependency. This jar packages a lot of external librairies, like the xercesImpl. But the version of embedded xerces classes does not match the one requiered by NeokHTML.
As a (strange) result, it appears that NeokHTML SAX parser didn't use any custom feature when using gwt-dev embedded xerces version.
So, I had to rework some code to remove the gwt-dev dependency, which by the way is not recommanded to be added to any standard GWT project.
I'm using the Apache web service xml rpc library to make requests to an rpc service. Somewhere in that process is a xml document with a DTD reference to http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd, which the library attempts to download when parsing the XML. That download fails with a 503 status code because the w3c is blocking repeated downloads of this largely static document from Java clients.
The solution is XML Catalogs to locally cache the DTD. However, while I can find examples of setting an EntityHandler on a JAXP SAXParser instance directly to enable catalog parser support, I don't actually have access to the underlying parser here. It's just being used by the xml rpc library. Is there any way I can set a global property or something that will tell JAXP to use XML catalogs?
I think you want the system property xml.catalog.files.
Take a look at http://xml.apache.org/commons/components/resolver/resolver-article.html
BTW, this was the third hit on a Google search for jaxp catalog
Unfortunately, setting xml.catalog.files does NOT have any effect on the parser factory. Ideally it should, of course, but the only way to use a resolver is to somehow add a method that delegates resolution to the catalog resolver in the handler that the SAX parser uses.
If you are already using a SAX parser, that's pretty easy:
final CatalogResolver catalogResolver = new CatalogResolver();
DefaultHandler handler = new DefaultHandler() {
public InputSource resolveEntity (String publicId, String systemId) {
return catalogResolver.resolveEntity(publicId, systemId);
}
public void startElement(String namespaceURI, String lname, String qname,
Attributes attrs) {
// the stuff you'd normally do
}
...
};
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser saxParser = factory.newSAXParser();
String url = args.length == 0 ? "http://horstmann.com/index.html" : args[0];
saxParser.parse(new URL(url).openStream(), handler);
Otherwise, you'll need to figure out if you can supply your own entity resolver. With a javax.xml.parsers.DocumentBuilder, you can. With the scala.xml.XML object, you can't but you can use subterfuge:
val res = new com.sun.org.apache.xml.internal.resolver.tools.CatalogResolver
val loader = new factory.XMLLoader[Elem] {
override def adapter = new parsing.NoBindingFactoryAdapter() {
override def resolveEntity(publicId: String, systemId: String) = {
res.resolveEntity(publicId, systemId)
}
}
}
val doc = loader.load(new URL("http://horstmann.com/index.html"))enter code here
I'm generating some xml files that needs to conform to an xsd file that was given to me. How should I verify they conform?
The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd:
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}
The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.
You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.
Here's how to do it using Xerces2. A tutorial for this, here (req. signup).
Original attribution: blatantly copied from here:
import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;
public class SchemaTest {
public static void main (String args[]) {
File docFile = new File("memory.xml");
try {
DOMParser parser = new DOMParser();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setProperty(
"http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation",
"memory.xsd");
ErrorChecker errors = new ErrorChecker();
parser.setErrorHandler(errors);
parser.parse("memory.xml");
} catch (Exception e) {
System.out.print("Problem parsing the file.");
}
}
}
We build our project using ant, so we can use the schemavalidate task to check our config files:
<schemavalidate>
<fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>
Now naughty config files will fail our build!
http://ant.apache.org/manual/Tasks/schemavalidate.html
Since this is a popular question, I will point out that java can also validate against "referred to" xsd's, for instance if the .xml file itself specifies XSD's in the header, using xsi:schemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) ex:
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
...
or schemaLocation (always a list of namespace to xsd mappings)
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.com/my_namespace http://www.example.com/document.xsd">
...
The other answers work here as well, because the .xsd files "map" to the namespaces declared in the .xml file, because they declare a namespace, and if matches up with the namespace in the .xml file, you're good. But sometimes it's convenient to be able to have a custom resolver...
From the javadocs: "If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
and this works for multiple namespaces, etc.
The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll by default go out and hit the network with each and every validation, not always optimal.
Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):
public static void verifyValidatesInternalXsd(String filename) throws Exception {
InputStream xmlStream = new new FileInputStream(filename);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setErrorHandler(new RaiseOnErrorHandler());
builder.parse(new InputSource(xmlStream));
xmlStream.close();
}
public static class RaiseOnErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
public void error(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
public void fatalError(SAXParseException e) throws SAXException {
throw new RuntimeException(e);
}
}
You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:
Source xmlFile = new StreamSource(xmlFileLocation);
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setResourceResolver(new LSResourceResolver() {
#Override
public LSInput resolveResource(String type, String namespaceURI,
String publicId, String systemId, String baseURI) {
InputSource is = new InputSource(
getClass().getResourceAsStream(
"some_local_file_in_the_jar.xsd"));
// or lookup by URI, etc...
return new Input(is); // for class Input see
// https://stackoverflow.com/a/2342859/32453
}
});
validator.validate(xmlFile);
See also here for another tutorial.
I believe the default is to use DOM parsing, you can do something similar with SAX parser that is validating as well saxReader.setEntityResolver(your_resolver_here);
Using Java 7 you can follow the documentation provided in package description.
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
// validate the DOM tree
try {
validator.validate(new StreamSource(new File("instance.xml"));
} catch (SAXException e) {
// instance document is invalid!
}
If you have a Linux-Machine you could use the free command-line tool SAXCount. I found this very usefull.
SAXCount -f -s -n my.xml
It validates against dtd and xsd.
5s for a 50MB file.
In debian squeeze it is located in the package "libxerces-c-samples".
The definition of the dtd and xsd has to be in the xml! You can't config them separately.
With JAXB, you could use the code below:
#Test
public void testCheckXmlIsValidAgainstSchema() {
logger.info("Validating an XML file against the latest schema...");
MyValidationEventCollector vec = new MyValidationEventCollector();
validateXmlAgainstSchema(vec, inputXmlFileName, inputXmlSchemaName, inputXmlRootClass);
assertThat(vec.getValidationErrors().isEmpty(), is(expectedValidationResult));
}
private void validateXmlAgainstSchema(final MyValidationEventCollector vec, final String xmlFileName, final String xsdSchemaName, final Class<?> rootClass) {
try (InputStream xmlFileIs = Thread.currentThread().getContextClassLoader().getResourceAsStream(xmlFileName);) {
final JAXBContext jContext = JAXBContext.newInstance(rootClass);
// Unmarshal the data from InputStream
final Unmarshaller unmarshaller = jContext.createUnmarshaller();
final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final InputStream schemaAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(xsdSchemaName);
unmarshaller.setSchema(sf.newSchema(new StreamSource(schemaAsStream)));
unmarshaller.setEventHandler(vec);
unmarshaller.unmarshal(new StreamSource(xmlFileIs), rootClass).getValue(); // The Document class is the root object in the XML file you want to validate
for (String validationError : vec.getValidationErrors()) {
logger.trace(validationError);
}
} catch (final Exception e) {
logger.error("The validation of the XML file " + xmlFileName + " failed: ", e);
}
}
class MyValidationEventCollector implements ValidationEventHandler {
private final List<String> validationErrors;
public MyValidationEventCollector() {
validationErrors = new ArrayList<>();
}
public List<String> getValidationErrors() {
return Collections.unmodifiableList(validationErrors);
}
#Override
public boolean handleEvent(final ValidationEvent event) {
String pattern = "line {0}, column {1}, error message {2}";
String errorMessage = MessageFormat.format(pattern, event.getLocator().getLineNumber(), event.getLocator().getColumnNumber(),
event.getMessage());
if (event.getSeverity() == ValidationEvent.FATAL_ERROR) {
validationErrors.add(errorMessage);
}
return true; // you collect the validation errors in a List and handle them later
}
}
One more answer: since you said you need to validate files you are generating (writing), you might want to validate content while you are writing, instead of first writing, then reading back for validation. You can probably do that with JDK API for Xml validation, if you use SAX-based writer: if so, just link in validator by calling 'Validator.validate(source, result)', where source comes from your writer, and result is where output needs to go.
Alternatively if you use Stax for writing content (or a library that uses or can use stax), Woodstox can also directly support validation when using XMLStreamWriter. Here's a blog entry showing how that is done:
If you are generating XML files programatically, you may want to look at the XMLBeans library. Using a command line tool, XMLBeans will automatically generate and package up a set of Java objects based on an XSD. You can then use these objects to build an XML document based on this schema.
It has built-in support for schema validation, and can convert Java objects to an XML document and vice-versa.
Castor and JAXB are other Java libraries that serve a similar purpose to XMLBeans.
Using Woodstox, configure the StAX parser to validate against your schema and parse the XML.
If exceptions are caught the XML is not valid, otherwise it is valid:
// create the XSD schema from your schema file
XMLValidationSchemaFactory schemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
XMLValidationSchema validationSchema = schemaFactory.createSchema(schemaInputStream);
// create the XML reader for your XML file
WstxInputFactory inputFactory = new WstxInputFactory();
XMLStreamReader2 xmlReader = (XMLStreamReader2) inputFactory.createXMLStreamReader(xmlInputStream);
try {
// configure the reader to validate against the schema
xmlReader.validateAgainst(validationSchema);
// parse the XML
while (xmlReader.hasNext()) {
xmlReader.next();
}
// no exceptions, the XML is valid
} catch (XMLStreamException e) {
// exceptions, the XML is not valid
} finally {
xmlReader.close();
}
Note: If you need to validate multiple files, you should try to reuse your XMLInputFactory and XMLValidationSchema in order to maximize the performance.
Are you looking for a tool or a library?
As far as libraries goes, pretty much the de-facto standard is Xerces2 which has both C++ and Java versions.
Be fore warned though, it is a heavy weight solution. But then again, validating XML against XSD files is a rather heavy weight problem.
As for a tool to do this for you, XMLFox seems to be a decent freeware solution, but not having used it personally I can't say for sure.
Validate against online schemas
Source xmlFile = new StreamSource(Thread.currentThread().getContextClassLoader().getResourceAsStream("your.xml"));
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(Thread.currentThread().getContextClassLoader().getResource("your.xsd"));
Validator validator = schema.newValidator();
validator.validate(xmlFile);
Validate against local schemas
Offline XML Validation with Java
I had to validate an XML against XSD just one time, so I tried XMLFox. I found it to be very confusing and weird. The help instructions didn't seem to match the interface.
I ended up using LiquidXML Studio 2008 (v6) which was much easier to use and more immediately familiar (the UI is very similar to Visual Basic 2008 Express, which I use frequently). The drawback: the validation capability is not in the free version, so I had to use the 30 day trial.