Tell JAXP the path to DTD file - java

I have xml files with a reference to a dtd file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE varman SYSTEM "referenced.dtd">
...
I managed to read this files with JAXP, but only if referenced.dtd is located in the same folder as the xml file. Otherwise I get an exception that the dtd file could not be loaded. And I could not find the place where to insert a handler or anything to resolve this missing resource. Please give me enlightment!

Use the properties settings to allow external paths:
jaxp properties
Add an error handler to catch unsupported properties:
public boolean isNewPropertySupported() {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
parser.setProperty("http://javax.xml.XMLConstants/property/accessExternalDTD", "file");
} catch (ParserConfigurationException ex) {
fail(ex.getMessage());
} catch (SAXException ex) {
String err = ex.getMessage();
if (err.indexOf("Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.") > -1)
{
//expected, jaxp 1.5 not supported
return false;
}
}
return true;
}

Related

Adding a catalog to XSLT Saxon s9api in Java

I have the following code which takes XML as input and produces a bunch of other files as output.
public void transformXml(InputStream inputFileStream, Path outputDir) {
try {
Resource resource = resourceLoader
.getResource("classpath:demo.xslt");
LOGGER.info("Creating output XMLs and Assessment Report in {}", outputDir);
final File outputFile = new File(outputDir.toString());
final Processor processor = getSaxonProcessor();
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable stylesheet = compiler.compile(new StreamSource(resource.getFile()));
Xslt30Transformer transformer = stylesheet.load30();
Serializer out = processor.newSerializer(outputFile);
out.setOutputProperty(Serializer.Property.METHOD, "xml");
transformer.transform(new StreamSource(inputFileStream), out);
LOGGER.debug("Generated DTD XMLs and Assessment Report successfully in {}", outputDir);
} catch (SaxonApiException e) {
throw new XmlTransformationException("Error occured during transformation", e);
} catch (IOException e) {
throw new XmlTransformationException("Error occured during loading XSLT file", e);
}
}
private Processor getSaxonProcessor() {
final Configuration configuration = Configuration.newConfiguration();
configuration.disableLicensing();
Processor processor = new Processor(configuration);
return processor;
}
The XML input contains a DOCTYPE tag which resolves to a DTD that is not available to me. Hence why I am wanting to use a catalog to point it to a dummy DTD which is on my classpath.
I am struggling to find a way to this. Most examples that I find out there, are not using the s9api implementation. Any ideas?
Instead of
new StreamSource(inputFileStream)
you should instantiate a SAXSource, containing an XMLReader initialized to use the catalog resolver as its EntityResolver.
If you need to do the same thing for other source documents, such as those read using doc() or document(), you should supply a URIResolver which itself returns a SAXSource initialized in the same way.
There are other ways of doing it using Saxon configuration properties, but I think the above is the simplest.

How to do "builder.parse(xml file stored inside of this jar)"?

I am trying to use a file object, with the path of a XML file inside of the current jar file, which is running, in builder.parse(not the absolute path to the xml file);
DocumentBuilder builder = dbf.newDocumentBuilder();
Document doc = builder.parse("resources/userConfig.xml");
The code works in eclipse but doesn't in a exported jar file. When i run the exported jar it can't find the XML in C:\Users...
For your case you need the one but last method below, that takes an InputStream. The last method is added as an example for your case with the file in a jar in the classpath. You may want to do the exceptionhandling differently.
public class XMLLib {
public static DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
public static Document readXML(File file) {
try {
final DocumentBuilder builder = builderFactory.newDocumentBuilder();
return builder.parse(file);
} catch (ParserConfigurationException ex) {
return null;
} catch (SAXException | IOException ex) {
return null;
}
}
public static Document readXML(InputStream is) {
try {
final DocumentBuilder builder = builderFactory.newDocumentBuilder();
return builder.parse(is);
} catch (ParserConfigurationException ex) {
return null;
} catch (SAXException | IOException ex) {
return null;
}
}
//example call:
public static Document getDocumentResource(String resourcepath){
try (InputStream is = XMLLib.class.getResourceAsStream(resourcepath)){
return readXML(is);
} catch (IOException ex) {}
return null;
}
}
Some more explaination:
In the Eclipse environment the xml-file is extracted, so you can access it directly as a File(as it is a "File").
Now with everything in a jar the xml doesn't exist as a "File" anymore so you have to extract it from the jar file.
The classloader is able to do so via getResourceAsStream - just like it can read other resources and classes.
So in essence, what this is doing is loading the xml like java loads a class. For that to work, the resourcepath must be given with respect to the classpath. Only the InputStream version above will work(or some other aproach e.g. using Path and Filesystem)
Usualy if you have a resource in "resources/userConfig.xml" the path is simple "/userConfig.xml". (but that depends on how the project is assembled)
I can only guess(depends on how the project is assembled) in your case you need:
Document document = XMLLib.getDocumentResource("/userConfig.xml");
Important: the InputStream Version works with the file in the jar and with the file extracted(this is allways) - as long as it can be found by the classloader.

JAXB unmarshal: unexpected element

Background:
I am using JAXB to unmarshal XML into Java objects. Originally, I was using just JAXB to perform the unmarshal. Then a static analysis was performed on the code and a high criticality issue was raised for XML External Entity Injection. After a little research, I found a suggestion (https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#JAXB_Unmarshaller) to use a parser configured to prevent external entities from being parsed. An example of what to do was provided:
//Disable XXE
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
//Do unmarshall operation
Source xmlSource = new SAXSource(spf.newSAXParser().getXMLReader(), new InputSource(new StringReader(xml)));
JAXBContext jc = JAXBContext.newInstance(Object.class);
Unmarshaller um = jc.createUnmarshaller();
um.unmarshal(xmlSource);
I have not done this exactly as shown, but I believe I have done the same in effect:
XMLReader reader = getXMLReader();
if (reader == null) {
logger.warn("Unable to create XML reader");
return;
}
JAXBContext context = JAXBContext.newInstance(messageClass);
Unmarshaller unmarshaller = context.createUnmarshaller();
for (File file : files) {
try {
InputSource source = new InputSource(new FileReader(file));
Source xmlSource = new SAXSource(reader, source);
JAXBElement<? extends BaseType> object =
(JAXBElement<? extends BaseType>) unmarshaller.unmarshal(xmlSource);
messages.add(object.getValue());
} catch (FileNotFoundException e) {
logger.error("Exception", e);
}
}
...
private XMLReader getXMLReader() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
} catch (SAXNotRecognizedException | SAXNotSupportedException
| ParserConfigurationException e) {
logger.error("Exception", e);
}
XMLReader reader = null;
try {
reader = factory.newSAXParser().getXMLReader();
} catch (SAXException | ParserConfigurationException e) {
logger.error("Exception", e);
}
return reader;
}
Problem:
After implementing the correction, I am now getting an unmarshal exception when the program attempts to read in XML:
javax.xml.bind.UnmarshalException: unexpected element (uri:"", local:"ns1:TypeXYZ"). Expected elements are <{protected namespace URI}TypeABC>,...<{protected namespace URI}TypeXYZ>,...
Before the above fix where I was just using JAXB to unmarshal, it was able to properly parse the provided XML with no problem.
I assume that the SAX parser expects the XML to provide extra information that's missing, or that it needs to be configured to ignore whatever it's complaining about. I tried a few other "features" (http://xml.org/sax/features/namespace-prefixes=true and http://xml.org/sax/features/validation=false), but that did not resolve the problem.
I have no control over the XML schema that defines the XML types, nor do I have control over how the corresponding Java classes are generated.
Any information to help me understand what's going on and that helps me resolve this problem, would be very much appreciated.
After a little experimentation I was able to resolve the error by setting the following features:
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://xml.org/sax/features/namespaces", true);
factory.setFeature("http://xml.org/sax/features/namespace-prefixes", true);

validation a xml with several xsd schema DOM java

After search on internet and in differnets forums, have not found my answer.
I have a XML file which is define by two XSD schema.
For write the XML file, there are two ways to write the XML file :
(I have to delete the "<" charactere to display the XML file)
First methode to write it :
?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Policy xmlns="http://www.W3C.com/Policy/v3#" xmlns:ns2="http://www.W3C.com /PolicyExtension/v3#">
DigestAlg Algorithm="http://test"/>
Transforms>
Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n20010315"></Transform>
/Transforms>
ns2:Validation>
ns2:ConditionID>1.0.1</ns2:ConditionID>
ns2:TConditionID>1.0.2</ns2:TConditionID>
/ns2:Validation>
/Policy>"
second methodes :
?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Policy xmlns="http://www.W3C.com/Policy/v3#">
DigestAlg Algorithm="http://test"/>
Transforms>
Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n20010315"></Transform>
/Transforms>
Validation xmlns:ns2="http://www.W3C.com/PolicyExtension/v3#">
ConditionID>1.0.1</ns2:ConditionID>
TConditionID>1.0.2</ns2:TConditionID>
/Validation>
/Policy>
For pasring my XML files, i use :
InputStream doc = new FileInputStream(myXMLFile);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
List<Source> sourceListSchema = new ArrayList<Source>();
sourceListSchema.add(new StreamSource(SignaturePolicy.class.getResourceAsStream(MY_XSD_SCHEMA_1)));
sourceListSchema.add(new StreamSource(SignaturePolicy.class.getResourceAsStream(MY_XSD_SCHEMA_2)));
Schema schema;
try {
Source[] sourceTmp = new Source[1];
schema = sf.newSchema(sourceListSchema.toArray(sourceTmp));
} catch (SAXException e) {
LogMachine.logger.severe(
"SAXException : The schema can not be parse :"+e.getMessage());
}
dbf.setIgnoringElementContentWhitespace(true);
dbf.setNamespaceAware(true);
dbf.setIgnoringComments(true);
dbf.setSchema(schema);
DocumentBuilder db;
try {
db = dbf.newDocumentBuilder();
documentPolicy = db.parse(Doc);
} catch (ParserConfigurationException e) {
LogMachine.logger.severe(
"ParserConfigurationException : the file can not be parse by DOM :"+e.getMessage());
} catch (SAXException e) {
LogMachine.logger.severe(
"SAXException : the file can not be parse by DOM :"+e.getMessage());
} catch (IOException e) {
LogMachine.logger.severe(
"IOException : the file can not be open like a file :"+e.getMessage());
}
When I want to parse this documents with DOM, the first XML file display an error
Exception in thread "main" org.w3c.dom.ls.LSException: The prefix "ns2" for element "ns2:Validation" is not bound.
But the second XML file is well parse.
Someone can help me to parse the two documents ??
Thank you for you help

Ignoring DTD when parsing XML

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here
And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?
Here is a code snippet:
public BlastXMLParser(String filePath) {
Builder b = new Builder(false);
//not a good idea to have exception-throwing code in constructor
try {
_document = b.build(filePath);
} catch (ParsingException ex) {
Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
} catch (IOException ex) {
//
}
private Elements getBlastReads() {
Element root = _document.getRootElement();
Elements rootChildren = root.getChildElements();
for (int i = 0; i < rootChildren.size(); i++) {
Element child = rootChildren.get(i);
if (child.getLocalName().equals("BlastOutput_iterations")) {
return child.getChildElements();
}
}
return null;
}
}
I get a NullPointerException at this line:
Element root = _document.getRootElement();
With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.
The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you
don't have access to the DTD and
are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
you are using the Xerces XML Parser implementation
you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
...
XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);
If not using XOM but simply JAXP the abovementioned solution just need to be tweaked into
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(...);
According to their documentation this is the way to parse document without any validation.
try {
Builder parser = new Builder();
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
If you do want to validate XML schema you have to call new Builder(true):
try {
Builder parser = new Builder(true);
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
Pay attention that now yet another exception can be thrown: ValidityException

Categories

Resources