Workaround for XMLSchema not supporting maxOccurs larger than 5000 - java

My problem is with parsing an XSD Schema that has elements with maxOccurs larger than 5000 (but not unbounded).
This is actually a know issue in either Xerces (which I'm using, version 2.9.1) or JAXP, as described here: http://bugs.sun.com/view_bug.do;jsessionid=85335466c2c1fc52f0245d20b2e?bug_id=4990915
I already know that if I changed the maxOccurs numbers in my XSD from numbers larger than 5000 to unbounded all works well. Sadly, this is not an option in my case (I cannot meddle with the XSD file).
My question is:
Does someone know some other workaround in Xerces for this issue? Or
Can someone recommend another XML parser that does not have this limitation?
Thanks!

I had the same problem. I used this:
System.setProperty("jdk.xml.maxOccurLimit", "XXXXX");

I have found a solution that doesn't require changing the parser.
There is a FEATURE_SECURE_PROCESSING feature which puts that 5000 limitation on maxOccurs (along with several others).
And here is the document describing the limitations: http://docs.oracle.com/javase/7/docs/technotes/guides/xml/jaxp/JAXP-Compatibility_160.html#JAXP_security

I came across this thread when looking for solutions for this problem when using xjc command in console.
For anyone who is using xjc command to parse xsd, this works for me:
$ xjc -nv foo.xsd
Be aware though:
By default, the XJC binding compiler performs strict validation of the source schema before processing it. Use this option to disable strict schema validation. This does not mean that the binding compiler will not perform any validation, but means that it will perform a less-strict validation.
So if you think your xsd is from a good source, using less strict validation should not be a problem.

If you use Eclipse IDE with the Dali plugin for JAXB you may obtain the aforementioned error in the console.
Such error can be avoided if you uncheck 'Use strict validation' on panel 'Classes Generator Options' when establishing the options for JAXB generation from a XSD file.
Such panel is the third one after 'Java Project' and 'Generate classes from Schema'.
Adding the additional argument -nv suggested by #minjun-yu also works. Instead of applying my first suggestion, you can set such argument in the fourth panel labeled 'Classes Generator Extension Configurations'
On parsing data to load the JAXB generated classes, if you are validating against a schema, you still may obtain the SAXException. As pointed by #mzywiol and #marioosh, the exception is avoided setting a special feature on creating the SchemaFactory
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// Avoid SAXParseException on maxOccurs > 5000
sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);
URL xsdURL = TestParse.class.getResource(xsdLocation);
Schema schema = sf.newSchema(xsdURL);
JAXBContext ctx = JAXBContext.newInstance(MyJAXBClass.class.getPackage().getName());
Unmarshaller unmarshaller = ctx.createUnmarshaller();
unmarshaller.setSchema(schema);

Related

How to fix CWE 73 External Control of File Name or Path

During veracode scan i got CWE 73 issue in my result. Can someone suggest me how to fix this solution for the below coding scenario?
The existing solutions provide is not working,also i would like to know any ESAPI properties can be used to get rid of this issue?
try
{
String serviceFile = System.getProperty("PROP", "");
logger.info("service A", "Loading service file [" + serviceFile+ "].");//Security Issue CWE 73 Occurs in this line
}
There are several solutions for it:
Validate with a whitelist but use the input from the entry point As
we mentioned at Use a list of hardcoded values.
Validate with a simple regular expression whitelist
Canonicalise the input and validate the path
I used the first and second solutions and work fine.
More info at: https://community.veracode.com/s/article/how-do-i-fix-cwe-73-external-control-of-file-name-or-path-in-java
this document provides detailed information on recommended remedies any explicit custom solution. In my case, I tried whitelisting or blacklisting patterns in the provided method parameters however that still did not resolve CWE-73 risk. I think thats expected and the document does state this:
A static engine is limited in what it can detect. It can only scan your code, it won’t actually run your code (unlike Dynamic Scanning). So it won’t pick up any custom validation (if conditions, regular expressions, etc.). But that doesn’t mean that this is not a perfectly valid solution that removes the risk!
I tried my custom blacklisted pattern based solution and annotated the method with FilePathCleanser as documented in here and it resolved CWE-73 and veracode no longer shows that as a risk.
Try to validate this PROP string with Recommended OWASP ESAPI Validator methods, like below.
Example:
String PropParam= ESAPI.validator().getValidInput("",System.getProperty("PROP", ""),"FileRegex",false);
FileRegex is the Regular Expression against which PROP string i.e. path of the file is getting validated.
Define FileRegex = ^(.+)\/([^\/]+)$ in Validator.properties.

Generating BPEL files programmatically?

Is there a way to generate BPEL programmatically in Java?
I tried using the BPEL Eclipse Designer API to write this code:
Process process = null;
try {
Resource.Factory.Registry reg =Resource.Factory.Registry.INSTANCE;
Map<String, Object> m = reg.getExtensionToFactoryMap();
m.put("bpel", new BPELResourceFactoryImpl());//it works with XMLResourceFactoryImpl()
//create resource
URI uri =URI.createFileURI("myBPEL2.bpel");
ResourceSet rSet = new ResourceSetImpl();
Resource bpelResource = rSet.createResource(uri);
//create/populate process
process = BPELFactory.eINSTANCE.createProcess();
process.setName("myBPEL");
Sequence mySeq = BPELFactory.eINSTANCE.createSequence();
mySeq.setName("mainSequence");
process.setActivity(mySeq);
//save resource
bpelResource.getContents().add(process);
Map<String,String> map= new HashMap<String, String>();
map.put("bpel", "http://docs.oasis-open.org/wsbpel/2.0/process/executable");
map.put("tns", "http://matrix.bpelprocess");
map.put("xsd", "http://www.w3.org/2001/XMLSchema");
bpelResource.save(map);
}
catch (Exception e) {
e.printStackTrace();
}
}
but I received an error:
INamespaceMap cannot be attached to an eObject ...
I read this message by Simon:
I understand that using the BPEL model outside of eclipse might be desirable, but it was never intended by us. Thus, this isn't supported
Is there any other API that can help?
You might want to give JAXB a try. It helps you to transform the official BPEL XSD into Java classes. You use those classes to construct your BPEL document and output it.
I had exactly the same problem with the BPELUnit [1], so I started a module in BPELUnit that has the first things necessary for generating and reading BPEL Models [2] although it is far from complete. Supported is only BPEL 2.0 (1.1 will follow later) and handlers are also currently not supported (but will be added). It is under active development because BPELUnit's code coverage component will be based on it so it will get BPEL-feature complete over time. You are happily invited to contribute if you need to close gaps earlier.
You can check it out from GitHub or grap the Maven artifact.
As of now there is no documentation but you can have a look at the JUnit tests that read and write processes.
If this is not suitable for, I'd like to share some experiences with you:
Do not use JAXB: You will need to read and write XML Namespaces which are not preserved with JAXB. That's why I have chosen XMLBeans. DOM would be the other alternative that I can think of.
The inheritance in the XML Schema is not really developer friendly. That's why there are own interface structures and wrappers around the XMLBeans generated classes.
Daniel
[1] http://www.bpelunit.net
[2] https://github.com/bpelunit/bpelunit/tree/master/net.bpelunit.model.bpel
This has been solved using the unify framework API after adding the necessary classes to handle correlation. BPELUnit stated by #Daniel seems to be another alternative.
The Eclipse BPEL API is based on an EMF Model. So you could generate your own artifacts using JET or Xpand based on that. This way there is no requirement to run inside Eclipse.
Although you may can't use BPEL outside of Eclipse, have you considered moving parts of your application inside it?
The BPEL XML Schemas are listed in the appendig of the spec. So you could also base your work on that and integrate with existing BPEL applications where necessary.
In case anyone is looking to solve the above problem while still running inside eclipse environment.
The problem can be resolved as stated by Luca Pino here by adding:
AdapterRegistry.INSTANCE.registerAdapterFactory( BPELPackage.eINSTANCE, BasicBPELAdapterFactory.INSTANCE );
before the resource creation line i.e.
Resource bpelResource = rSet.createResource(uri);
Note: Another solution, to the same problem, also stating how to resolve the dependencies to make this code work, can be found in my other answer here.

XmlPullParserException parsing attribute that contains entity

I'm using kXML2 on a legacy JavaME project.
I'm receiving an XML where some attributes contain encoded entites. When I retrieve that attribute value with the call:
parser.getAttributeValue
It throws an Exception:
XmlPullParserException: unresolved
I have downloaded the last version of this parser, but it still shows this behavior.
If I remove the problematic line from XML, then there are no errors.
Ok, here is what is happening:
The parser must decode entities in attributes, unless you set this property:
parser.setFeature(KXmlParser.FEATURE_PROCESS_DOCDECL, true);
But this implementation throws an exception when that line is called. Allright, so I debugged into the parser source code and I found out that this pull-parser implementation has trouble with entities that are not very common.
So I must inflate the parser entity replacement map with my own "odd" entities for it to work, like this:
parser.defineEntityReplacementText("Ntilde", "Ñ");
And then everything works fine.

Saxon is slow parsing

I am trying to parse some xml with saxon to make some xpath querying on it but got 2 problems : the first one is that saxon is very long to build a very short document in xhtml.
code is this :
Processor processorInstance = new Processor(false);
processorInstance.setConfigurationProperty(FeatureKeys.DTD_VALIDATION, false);
XPathCompiler XPathCompilerInstance = processorInstance.newXPathCompiler();
XPathCompilerInstance.setBackwardsCompatible(false);
String expressionTitre = "//div[#class='score_global']/preceding-sibling::img[1]";
XPathExecutable XPathExecutableInstance = XPathCompilerInstance.compile(expressionTitre);
XPathSelector selector = XPathExecutableInstance.load();
logger.info("Xpath compiled.");
// Phase 2, load xml document.
DocumentBuilder documentBuilderInstance = processorInstance.newDocumentBuilder();
documentBuilderInstance.setSchemaValidator(null);
documentBuilderInstance.setLineNumbering(false);
documentBuilderInstance.setRetainPSVI(false);
XdmNode context = documentBuilderInstance.build(new File("sample/sample.xml")); // This line takes ages to return.
What I don't understand is that if I do it with SAX, it loads at normal speed :(.
What did I forget to provide in saxon ?
Java 1.6
Saxon 9.1.0.8
Second problem is that he is unable to process accented characters while my xml was like this:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
So I removed xml:lang en lang= attributes but got no better luck :(
Do you have any ideas ?
Thank you !
Well After much reading, it was simply necessary to define a CatalogResolver and downloading locally the Xhtml dtds. I dropped saxon and used simple JaxP/SaxReader instead.
This page http://xml.apache.org/commons/components/resolver/resolver-article.html proved very interesting.
Hope this considerations will prove themselves useful to someone :)
Ok, I've found out that although I configured Saxon not to validate, he nonetheless tried to resolve the URI and did not manage to find it locally, so he went online and gets & 503 from W3c which takes a long time to return.
I removed the DTD declaration in my xml, and it worked.
My next step is to make it stop to try to resolve it. I am currently reading saxon doc and playing with entity resolver and it should be ok.

Is Scala/Java not respecting w3 "excess dtd traffic" specs?

I'm new to Scala, so I may be off base on this, I want to know if the problem is my code. Given the Scala file httpparse, simplified to:
object Http {
import java.io.InputStream;
import java.net.URL;
def request(urlString:String): (Boolean, InputStream) =
try {
val url = new URL(urlString)
val body = url.openStream
(true, body)
}
catch {
case ex:Exception => (false, null)
}
}
object HTTPParse extends Application {
import scala.xml._;
import java.net._;
def fetchAndParseURL(URL:String) = {
val (true, body) = Http request(URL)
val xml = XML.load(body) // <-- Error happens here in .load() method
"True"
}
}
Which is run with (URL doesn't matter, this is a joke example):
scala> HTTPParse.fetchAndParseURL("http://stackoverflow.com")
The result invariably:
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/html4/strict.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1187)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:973)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEnti...
I've seen the Stack Overflow thread on this with respect to Java, as well as the W3C's System Team Blog entry about not trying to access this DTD via the web. I've also isolated the error to the XML.load() method, which is a Scala library method as far as I can tell.
My Question: How can I fix this? Is this something that is a by product of my code (cribbed from Raphael Ferreira's post), a by product of something Java specific that I need to address as in the previous thread, or something that is Scala specific? Where is this call happening, and is it a bug or a feature? ("Is it me? It's her, right?")
I've bumped into the SAME issue, and I haven't found an elegant solution (I'm thinking into posting the question to the Scala mailing list) Meanwhile, I found a workaround: implement your own SAXParserFactoryImpl so you can set the f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); property. The good thing is it doesn't require any code change to the Scala code base (I agree that it should be fixed, though).
First I'm extending the default parser factory:
package mypackage;
public class MyXMLParserFactory extends SAXParserFactoryImpl {
public MyXMLParserFactory() throws SAXNotRecognizedException, SAXNotSupportedException, ParserConfigurationException {
super();
super.setFeature("http://xml.org/sax/features/validation", false);
super.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false);
super.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
super.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
}
}
Nothing special, I just want the chance to set the property.
(Note: that this is plain Java code, most probably you can write the same in Scala too)
And in your Scala code, you need to configure the JVM to use your new factory:
System.setProperty("javax.xml.parsers.SAXParserFactory", "mypackage.MyXMLParserFactory");
Then you can call XML.load without validation
Without addressing, for now, the problem, what do you expect to happen if the function request return false below?
def fetchAndParseURL(URL:String) = {
val (true, body) = Http request(URL)
What will happen is that an exception will be thrown. You could rewrite it this way, though:
def fetchAndParseURL(URL:String) = (Http request(URL)) match {
case (true, body) =>
val xml = XML.load(body)
"True"
case _ => "False"
}
Now, to fix the XML parsing problem, we'll disable DTD loading in the parser, as suggested by others:
def fetchAndParseURL(URL:String) = (Http request(URL)) match {
case (true, body) =>
val f = javax.xml.parsers.SAXParserFactory.newInstance()
f.setNamespaceAware(false)
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
val MyXML = XML.withSAXParser(f.newSAXParser())
val xml = MyXML.load(body)
"True"
case _ => "False"
}
Now, I put that MyXML stuff inside fetchAndParseURL just to keep the structure of the example as unchanged as possible. For actual use, I'd separate it in a top-level object, and make "parser" into a def instead of val, to avoid problems with mutable parsers:
import scala.xml.Elem
import scala.xml.factory.XMLLoader
import javax.xml.parsers.SAXParser
object MyXML extends XMLLoader[Elem] {
override def parser: SAXParser = {
val f = javax.xml.parsers.SAXParserFactory.newInstance()
f.setNamespaceAware(false)
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
f.newSAXParser()
}
}
Import the package it is defined in, and you are good to go.
This is a scala problem. Native Java has an option to disable loading the DTD:
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
There are no equivalent in scala.
If you somewhat want to fix it yourself, check scala/xml/parsing/FactoryAdapter.scala and put the line in
278 def loadXML(source: InputSource): Node = {
279 // create parser
280 val parser: SAXParser = try {
281 val f = SAXParserFactory.newInstance()
282 f.setNamespaceAware(false)
<-- insert here
283 f.newSAXParser()
284 } catch {
285 case e: Exception =>
286 Console.err.println("error: Unable to instantiate parser")
287 throw e
288 }
GClaramunt's solution worked wonders for me. My Scala conversion is as follows:
package mypackage
import org.xml.sax.{SAXNotRecognizedException, SAXNotSupportedException}
import com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
import javax.xml.parsers.ParserConfigurationException
#throws(classOf[SAXNotRecognizedException])
#throws(classOf[SAXNotSupportedException])
#throws(classOf[ParserConfigurationException])
class MyXMLParserFactory extends SAXParserFactoryImpl() {
super.setFeature("http://xml.org/sax/features/validation", false)
super.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
super.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
super.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
}
As mentioned his the original post, it is necessary to place the following line in your code somewhere:
System.setProperty("javax.xml.parsers.SAXParserFactory", "mypackage.MyXMLParserFactory")
It works. After some detective work, the details as best I can figure them:
Trying to parse a developmental RESTful interface, I build the parser and get the above (rather, a similar) error. I try various parameters to change the XML output, but get the same error. I try to connect to an XML document I quickly whip up (cribbed stupidly from the interface itself) and get the same error. Then I try to connect to anything, just for kicks, and get the same (again, likely only similar) error.
I started questioning whether it was an error with the sources or the program, so I started searching around, and it looks like an ongoing issue- with many Google and SO hits on the same topic. This, unfortunately, made me focus on the upstream (language) aspects of the error, rather than troubleshoot more downstream at the sources themselves.
Fast forward and the parser suddenly works on the original XML output. I confirmed that there was some additional work has been done server side (just a crazy coincidence?). I don't have either earlier XML but suspect that it is related to the document identifiers being changed.
Now, the parser works fine on the RESTful interface, as well any well formatted XML I can throw at it. It also fails on all XHTML DTD's I've tried (e.g. www.w3.org). This is contrary to what #SeanReilly expects, but seems to jive with what the W3 states.
I'm still new to Scala, so can't determine if I have a special, or typical case. Nor can I be assured that this problem won't re-occur for me in another form down the line. It does seem that pulling XHTML will continue to cause this error unless one uses a solution similar to those suggested by #GClaramunt $ #J-16 SDiZ have used. I'm not really qualified to know if this is a problem with the language, or my implementation of a solution (likely the later)
For the immediate timeframe, I suspect that the best solution would've been for me to ensure that it was possible to parse that XML source-- rather than see that other's have had the same error and assume there was a functional problem with the language.
Hope this helps others.
There are two problems with what you are trying to do:
Scala's xml parser is trying to physically retrieve the DTD when it shouldn't. J-16 SDiZ seems to have some advice for this problem.
The Stack overflow page you are trying to parse isn't XML. It's Html4 strict.
The second problem isn't really possible to fix in your scala code. Even once you get around the dtd problem, you'll find that the source just isn't valid XML (empty tags aren't closed properly, for example).
You have to either parse the page with something besides an XML parser, or investigate using a utility like tidy to convert the html to xml.
My knowledge of Scala is pretty poor, but couldn't you use ConstructingParser instead?
val xml = new java.io.File("xmlWithDtd.xml")
val parser = scala.xml.parsing.ConstructingParser.fromFile(xml, true)
val doc = parser.document()
println(doc.docElem)
For scala 2.7.7 I managed to do this with scala.xml.parsing.XhtmlParser
Setting Xerces switches only works if you are using Xerces. An entity resolver works for any JAXP parser.
There are more generalized entity resolvers out there, but this implementation does the trick when all I'm trying to do is parse valid XHTML.
http://code.google.com/p/java-xhtml-cache-dtds-entityresolver/
Shows how trivial it is to cache the DTDs and forgo the network traffic.
In any case, this is how I fix it. I always forget. I always get the error. I always go fetch this entity resolver. Then I'm back in business.

Categories

Resources