Handle external Entities and Stylesheet in Sax Parser (XML)

Handle external Entities and Stylesheet in Sax Parser (XML) - java

I want to ignore external entities and external stylesheets (eg. <?xml-stylesheet type="text/xsl" href="......."?>).
I know I have to set XMLReader property to ignore external entities but I don't know how to ignore stylesheets...
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.XMLReader;
//...
final XMLReader parser = new SAXParser();
// Ignore entities
parser.setProperty("http://xml.org/sax/features/external-general-entities", false);
// IS CORRECT???
parser.setProperty("http://xml.org/sax/features/external-general-entities", false);
There are more properties to set to avoid external entities and stylesheet?
How Can I understand if there are external entities o stylesheets?

Working for me:
public class SaxParser extends DefaultHandler
implements ContentHandler, DTDHandler, EntityResolver{
public transient static final String STYLE_SHEET_TAG = "xml-stylesheet";
public transient static final String EXTERNAL_ENTITY = "ExternalEntity";
public static void main(String[] args) {
new SaxParser().execute();
}
public void execute() {
String pathFileXml = "test/XML.xml";
final XMLReader parser = new SAXParser();
parser.setContentHandler(this);
parser.setDTDHandler(this);
parser.setEntityResolver(this);
try {
parser.parse(pathFileXml);
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
if (SaxParser.STYLE_SHEET_TAG.equals(e.getMessage())
|| SaxParser.EXTERNAL_ENTITY.equals(e.getMessage())) {
System.out.println("CATCH ERRORE");
}
e.printStackTrace();
}
System.out.println("OK");
}
#Override
public void processingInstruction(String target, String data)
throws SAXException {
System.out.println("Processing Instruction");
System.out.println("PI=> target: " + target + ", data: " + data);
if (STYLE_SHEET_TAG.equalsIgnoreCase(target.trim())) {
throw new SAXException(STYLE_SHEET_TAG);
}
return;
}
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws IOException, SAXException {
System.out.println("publicId: " + publicId + ", systemId: " + systemId);
throw new SAXException(SaxParser.EXTERNAL_ENTITY);
}
}

The external stylesheet declaration is a standard processing instruction.
You can ignore processing instructions by not implementing the handler method:
void processingInstruction(java.lang.String target, java.lang.String data) {}
in your SAX handler.

Related

XMLStreamReader to StreamSource in java

I want to convert XMLStreamReader object to StreamSource object so that I can apply XSD validation. Can any one help me to convert this.
public XMLStreamReader getStreamReader(InputStream inputStream) {
try {
XMLInputFactory xif = XMLInputFactory.newInstance();
return new XmlStreamReaderDelegate(xif.createXMLStreamReader(inputStream, "UTF-8"));
} catch (XMLStreamException e) {
//logger.info(XmlValidatorSettings.class,e.getMessage(),e.getMessage());
throw new UnmarshallingException(e);
}
}
I need to call this method to validate the XML against XSD.
public List<SAXParseException> validateSrcXmlAgainstSchema(Source xmlFile, String schemaFilePath) throws SAXException, IOException {
List<SAXParseException> exceptions = new LinkedList<SAXParseException>();
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new File(schemaFilePath));
// logger.debug("Schema Initialised for Schema located in : " + schemaFilePath);
XmlValidationExceptionHandler validationErrorHandler = new XmlValidationExceptionHandler();
Validator validator = schema.newValidator();
validator.setErrorHandler(validationErrorHandler);
// logger.debug("START - validate xml against schema ");
validator.validate(xmlFile);
// logger.debug("END - validate xml against schema ");
exceptions = validationErrorHandler.getExceptionList();
return exceptions;
}
XmlStreamReaderDelegate class used to avoid XML case-sensitive issue.
public class XmlStreamReaderDelegate extends StreamReaderDelegate {
public XmlStreamReaderDelegate(XMLStreamReader xsr) {
super(xsr);
}
#Override
public String getAttributeLocalName(int index) {
return super.getAttributeLocalName(index).toLowerCase().intern();
}
#Override
public String getLocalName() {
return super.getLocalName().toLowerCase().intern();
}
}

Parsing a big xml file Java

I have big xml files (~1GB) with this structure:
<?xml version="1.0" encoding="UTF-8"?>
<GenoExchange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.ncbi.nlm.nih.gov/SNP/geno" xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/geno ftp://ftp.ncbi.nlm.nih.gov/snp/specs/genoex_1_5.xsd" dbSNPBuildNo="146" reportId="MT" reportType="chromosome">
<Population popId="638" handle="TSC-CSHL" locPopId="TSC_42_AA">
<popClass self="NORTH AMERICA"/>
</Population>
<SnpInfo rsId="1041870" observed="C/T">
<SnpLoc genomicAssembly="107:GRCh38.p2" geneId="4512" geneSymbol="COX1" chrom="MT" start="6150" locType="2" rsOrientToChrom="fwd" contigAllele="T" contig="NC_012920:1"/>
<SsInfo ssId="1508548" locSnpId="TSC0349089" ssOrientToRs="fwd">
<ByPop popId="1303" sampleSize="184">
<AlleleFreq allele="T" freq="1"/>
<AlleleFreq allele="C" freq="0"/>
</ByPop>
</SsInfo>
</SnpInfo>
<SnpInfo rsId="1029293" observed="C/T">
<SnpLoc genomicAssembly="107:GRCh38.p2" geneId="4512" geneSymbol="COX1" chrom="MT" start="6307" locType="2" rsOrientToChrom="fwd" contigAllele="C" contig="NC_012920:1"/>
<SsInfo ssId="1494519" locSnpId="TSC0254145" ssOrientToRs="fwd">
<ByPop popId="639" sampleSize="82">
<AlleleFreq allele="T" freq="0"/>
<AlleleFreq allele="C" freq="1"/>
</ByPop>
<ByPop popId="1303" sampleSize="184">
<AlleleFreq allele="T" freq="0"/>
<AlleleFreq allele="C" freq="1"/>
</ByPop>
</SsInfo>
</SnpInfo>
I want to find a specific rsID, for example rsID="1029293" and extract all the information inside that node. I don't want to run all the file. I only want to find that ID, extract that information and end the iteration.
From what I read it's better if I use SAX or Stax parsers. I'm using SAX, this is my code:
class UserHandler extends DefaultHandler {
String rsID = null;
String i = "1029293";
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("SnpInfo")) {
rsID = attributes.getValue("rsId");
//System.out.println("value: " + rsID);
}
if((i).equals(rsID) &&
qName.equalsIgnoreCase("SnpInfo")){
System.out.println("Start Element: " + qName + " " + rsID);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("SsInfo")) {
String a = attributes.getValue("ssId");
System.out.println("SSID: " + a);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("ByPop")) {
String p = attributes.getValue("popId");
System.out.println("POPID: " + p);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("AlleleFreq")) {
String p = attributes.getValue("allele");
String f = attributes.getValue("freq");
System.out.println("ALLELE: " + p + " FREQ: " + f);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("GTypeFreq")) {
String p = attributes.getValue("gtype");
String f = attributes.getValue("freq");
System.out.println("GTYPE: " + p + " FREQ: " + f);
}
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("SnpInfo")) {
if((i).equals(rsID)
&& qName.equalsIgnoreCase("SnpInfo"))
System.out.println("End Element: " + qName);
}
}
}
public class XMLParser {
public static void main(String argv[]) {
try {
InputStream fileStream = new FileInputStream("/home/xml/gt_chr10.xml.gz");
InputStream gzipStream = new GZIPInputStream(fileStream);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(gzipStream, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
My problem is that my code searches the whole file for the ID and that takes more than 2 minutes each time. I can't have a code that takes so long.
Is there a better approach for this?

Using STAX gives you more control when parsing XML, since you actively pull elements from the stream. This way you can pull the next event, handle it and once you found your data, simply terminate the loop (using a flag or even a return statement if you must)
InputStream in = ...
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader(in);
boolean found = false;
while (!found && eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
switch (event.getEventType()) {
case XMLStreamConstants.START_ELEMENT:
// your logic here
// once you found your element, you can terminate the loop
found = true;
break;
case XMLStreamConstants.END_ELEMENT:
// your logic here
break;
}
}
(omitted exception and resource handling for brevity)
On a side note, you will gain some performance by combining your if ((i).equals(rsID) && ... into a single one, with detail checks in nested ifs
if ((i).equals(rsID)) {
if(qName.equalsIgnoreCase("GTypeFreq")) {
...
}
}

You can throw an exception in your end element handler, to indicate to the parser that it aborts parsing (http://www.ibm.com/developerworks/library/x-tipsaxstop/):
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("SnpInfo")) {
if((i).equals(rsID)
&& qName.equalsIgnoreCase("SnpInfo"))
System.out.println("End Element: " + qName);
throw SAXException("Element found.");
}
}

The only way to avoid parsing the whole file every time you run this is to put the data in an XML database. Parsing a 1Gb file is going to take about a minute, plus or minus depending on the speed of your machine and what processing you do on each node.
A streamed XSLT 3.0 solution is simply:
<xsl:transform version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.ncbi.nlm.nih.gov/SNP/geno">
<xsl:template name="xsl:initial-template">
<xsl:stream href="input.xml">
<xsl:copy-of select="/GenoExchange/SnpInfo[#rsId='1041870'][1]"/>
</xsl:stream>
</xsl:template>
</xsl:transform>
No need to write all that pesky SAX or StAX code.
I put the "[1]" predicate in to allow the processor to abandon the search when it has found the first hit.

The best approach is to use vtd-xml and xpath... 1GB xml file takes about 1.5GB heap space and < 10 sec in a 3~4 year old intel processor.see code example below.. One more thing, if you want to eliminate parsing entirely, you can create a vtd+XML file format so any subsequent query can directly access the vtd index portion, which could easily triple or quadruple your app performance...
import com.ximpleware.*;
public class simpleXpathSearch{
public static void main(String s[]) throws VTDException,java.io.UnsupportedEncodingException,java.io.IOException{
VTDGen vg = new VTDGen();
vg.setLCLevel(5);
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/*/*[#rsID='1029293']");
int i=0;
while((i=ap.evalXPath())!=-1){
// your code logic here
}

//Main class
public static void main(String[] args) {
SAXReader.read();
}
//SAXReader
public static void read(){
try {
XMLReader processor = XMLReaderFactory.createXMLReader();
processor.setContentHandler(new SAXController());
processor.parse(new InputSource("MyXML.xml"));
} catch (SAXException | IOException e) {
System.err.println(e.getMessage());
}
}
//SAXController
// The SAXController extends DefaultHandler
private int tab = 0;
private void tabulation() {
for (int i=0; i<tab; i++)
System.out.print(" ");
}
#Override
public void startDocument() {
tabulation();
System.out.println("Starting XML Document");
tab++;
}
#Override
public void endDocument() {
tab--;
tabulation();
System.out.println("Ending XML Document");
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
tabulation();
System.out.print(localName);
if (attributes.getLength()>0) {
for (int i=0; i<attributes.getLength(); i++) {
System.out.print(attributes.getLocalName(i)+": "+attributes.getValue(i));
}
}
System.out.println();
tab++;
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
tab--;
tabulation();
System.out.println(localName);
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String content= new String(ch, start, length);
content= content.replaceAll("[\t\n]", "").trim();
if (!content.equals("")) {
tabulation();
System.out.println(content);
}
}

Java-Read values from excel to a object

I am trying to read a very large excel file Apache POI. I have managed to read the file but i am able to store the values to a object. I am doing so because i have to change some values and then add it to a table.
Below is the code
protected void processSheet(StylesTable styles,
ReadOnlySharedStringsTable strings, InputStream sheetInputStream)
throws IOException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
System.out.println("sheet is not read");
ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {
#Override
public void startRow(int rowNum) {
System.out.println("rwnum"+rowNum);
}
#Override
public void endRow() {
System.out.println("endrow");
}
#Override
public void cell(String cellReference, String formattedValue) {
}
#Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
},
false//means result instead of formula
);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}

parse an xml string in java?

how do you parse xml stored in a java string object?
Java's XMLReader only parses XML documents from a URI or inputstream. is it not possible to parse from a String containing an xml data?
Right now I have the following:
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser sp = factory.newSAXParser();
XMLReader xr = sp.getXMLReader();
ContactListXmlHandler handler = new ContactListXmlHandler();
xr.setContentHandler(handler);
xr.p
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
And on my handler i have this:
public class ContactListXmlHandler extends DefaultHandler implements Resources {
private List<ContactName> contactNameList = new ArrayList<ContactName>();
private ContactName contactItem;
private StringBuffer sb;
public List<ContactName> getContactNameList() {
return contactNameList;
}
#Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
super.startDocument();
sb = new StringBuffer();
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// TODO Auto-generated method stub
super.startElement(uri, localName, qName, attributes);
if(localName.equals(XML_CONTACT_NAME)){
contactItem = new ContactName();
}
sb.setLength(0);
}
#Override
public void characters(char[] ch, int start, int length){
// TODO Auto-generated method stub
try {
super.characters(ch, start, length);
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
sb.append(ch, start, length);
}
#Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
super.endDocument();
}
/**
* where the real stuff happens
*/
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
//super.endElement(arg0, arg1, arg2);
if(contactItem != null){
if (localName.equalsIgnoreCase("title")) {
contactItem.setUid(sb.toString());
Log.d("handler", "setTitle = " + sb.toString());
} else if (localName.equalsIgnoreCase("link")) {
contactItem.setFullName(sb.toString());
} else if (localName.equalsIgnoreCase("item")){
Log.d("handler", "adding rss item");
contactNameList.add(contactItem);
}
sb.setLength(0);
}
}
Thanks in advance

The SAXParser can read an InputSource.
An InputSource can take a Reader in its constructor
So, you can put parse XML string via a StringReader
new InputSource(new StringReader("... your xml here....")));

Try jcabi-xml (see this blog post) with a one-liner:
XML xml = new XMLDocument("<document>...</document>")

Your XML might be simple enough to parse manually using the DOM or SAX API, but I'd still suggest using an XML serialization API such as JAXB, XStream, or Simple instead because writing your own XML serialization/deserialization code is a drag.
Note that the XStream FAQ erroneously claims that you must use generated classes with JAXB:
How does XStream compare to JAXB (Java API for XML Binding)?
JAXB is a Java binding tool. It generates Java code from a schema and
you are able to transform from those classes into XML matching the
processed schema and back. Note, that you cannot use your own objects,
you have to use what is generated.
It seems this was true was true at one time, but JAXB 2.0 no longer requires you to use Java classes generated from a schema.
If you go this route, be sure to check out the side-by-side comparisons of the serialization/marshalling APIs I've mentioned:
http://blog.bdoughan.com/2010/10/how-does-jaxb-compare-to-xstream.html
http://blog.bdoughan.com/2010/10/how-does-jaxb-compare-to-simple.html

Take a look at this: http://www.rgagnon.com/javadetails/java-0573.html
import javax.xml.parsers.*;
import org.xml.sax.InputSource;
import org.w3c.dom.*;
import java.io.*;
public class ParseXMLString {
public static void main(String arg[]) {
String xmlRecords =
"<data>" +
" <employee>" +
" <name>John</name>" +
" <title>Manager</title>" +
" </employee>" +
" <employee>" +
" <name>Sara</name>" +
" <title>Clerk</title>" +
" </employee>" +
"</data>";
try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlRecords));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("employee");
// iterate the employees
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList name = element.getElementsByTagName("name");
Element line = (Element) name.item(0);
System.out.println("Name: " + getCharacterDataFromElement(line));
NodeList title = element.getElementsByTagName("title");
line = (Element) title.item(0);
System.out.println("Title: " + getCharacterDataFromElement(line));
}
}
catch (Exception e) {
e.printStackTrace();
}
/*
output :
Name: John
Title: Manager
Name: Sara
Title: Clerk
*/
}
public static String getCharacterDataFromElement(Element e) {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
return "?";
}
}

JAXB: How to ignore namespace during unmarshalling XML document?

My schema specifies a namespace, but the documents don't. What's the simplest way to ignore namespace during JAXB unmarshalling (XML -> object)?
In other words, I have
<foo><bar></bar></foo>
instead of,
<foo xmlns="http://tempuri.org/"><bar></bar></foo>

Here is an extension/edit of VonCs solution just in case someone doesn´t want to go through the hassle of implementing their own filter to do this. It also shows how to output a JAXB element without the namespace present. This is all accomplished using a SAX Filter.
Filter implementation:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.XMLFilterImpl;
public class NamespaceFilter extends XMLFilterImpl {
private String usedNamespaceUri;
private boolean addNamespace;
//State variable
private boolean addedNamespace = false;
public NamespaceFilter(String namespaceUri,
boolean addNamespace) {
super();
if (addNamespace)
this.usedNamespaceUri = namespaceUri;
else
this.usedNamespaceUri = "";
this.addNamespace = addNamespace;
}
#Override
public void startDocument() throws SAXException {
super.startDocument();
if (addNamespace) {
startControlledPrefixMapping();
}
}
#Override
public void startElement(String arg0, String arg1, String arg2,
Attributes arg3) throws SAXException {
super.startElement(this.usedNamespaceUri, arg1, arg2, arg3);
}
#Override
public void endElement(String arg0, String arg1, String arg2)
throws SAXException {
super.endElement(this.usedNamespaceUri, arg1, arg2);
}
#Override
public void startPrefixMapping(String prefix, String url)
throws SAXException {
if (addNamespace) {
this.startControlledPrefixMapping();
} else {
//Remove the namespace, i.e. don´t call startPrefixMapping for parent!
}
}
private void startControlledPrefixMapping() throws SAXException {
if (this.addNamespace && !this.addedNamespace) {
//We should add namespace since it is set and has not yet been done.
super.startPrefixMapping("", this.usedNamespaceUri);
//Make sure we dont do it twice
this.addedNamespace = true;
}
}
}
This filter is designed to both be able to add the namespace if it is not present:
new NamespaceFilter("http://www.example.com/namespaceurl", true);
and to remove any present namespace:
new NamespaceFilter(null, false);
The filter can be used during parsing as follows:
//Prepare JAXB objects
JAXBContext jc = JAXBContext.newInstance("jaxb.package");
Unmarshaller u = jc.createUnmarshaller();
//Create an XMLReader to use with our filter
XMLReader reader = XMLReaderFactory.createXMLReader();
//Create the filter (to add namespace) and set the xmlReader as its parent.
NamespaceFilter inFilter = new NamespaceFilter("http://www.example.com/namespaceurl", true);
inFilter.setParent(reader);
//Prepare the input, in this case a java.io.File (output)
InputSource is = new InputSource(new FileInputStream(output));
//Create a SAXSource specifying the filter
SAXSource source = new SAXSource(inFilter, is);
//Do unmarshalling
Object myJaxbObject = u.unmarshal(source);
To use this filter to output XML from a JAXB object, have a look at the code below.
//Prepare JAXB objects
JAXBContext jc = JAXBContext.newInstance("jaxb.package");
Marshaller m = jc.createMarshaller();
//Define an output file
File output = new File("test.xml");
//Create a filter that will remove the xmlns attribute
NamespaceFilter outFilter = new NamespaceFilter(null, false);
//Do some formatting, this is obviously optional and may effect performance
OutputFormat format = new OutputFormat();
format.setIndent(true);
format.setNewlines(true);
//Create a new org.dom4j.io.XMLWriter that will serve as the
//ContentHandler for our filter.
XMLWriter writer = new XMLWriter(new FileOutputStream(output), format);
//Attach the writer to the filter
outFilter.setContentHandler(writer);
//Tell JAXB to marshall to the filter which in turn will call the writer
m.marshal(myJaxbObject, outFilter);
This will hopefully help someone since I spent a day doing this and almost gave up twice ;)

I have encoding problems with XMLFilter solution, so I made XMLStreamReader to ignore namespaces:
class XMLReaderWithoutNamespace extends StreamReaderDelegate {
public XMLReaderWithoutNamespace(XMLStreamReader reader) {
super(reader);
}
#Override
public String getAttributeNamespace(int arg0) {
return "";
}
#Override
public String getNamespaceURI() {
return "";
}
}
InputStream is = new FileInputStream(name);
XMLStreamReader xsr = XMLInputFactory.newFactory().createXMLStreamReader(is);
XMLReaderWithoutNamespace xr = new XMLReaderWithoutNamespace(xsr);
Unmarshaller um = jc.createUnmarshaller();
Object res = um.unmarshal(xr);

I believe you must add the namespace to your xml document, with, for example, the use of a SAX filter.
That means:
Define a ContentHandler interface with a new class which will intercept SAX events before JAXB can get them.
Define a XMLReader which will set the content handler
then link the two together:
public static Object unmarshallWithFilter(Unmarshaller unmarshaller,
java.io.File source) throws FileNotFoundException, JAXBException
{
FileReader fr = null;
try {
fr = new FileReader(source);
XMLReader reader = new NamespaceFilterXMLReader();
InputSource is = new InputSource(fr);
SAXSource ss = new SAXSource(reader, is);
return unmarshaller.unmarshal(ss);
} catch (SAXException e) {
//not technically a jaxb exception, but close enough
throw new JAXBException(e);
} catch (ParserConfigurationException e) {
//not technically a jaxb exception, but close enough
throw new JAXBException(e);
} finally {
FileUtil.close(fr); //replace with this some safe close method you have
}
}

In my situation, I have many namespaces and after some debug I find another solution just changing the NamespaceFitler class. For my situation (just unmarshall) this work fine.
import javax.xml.namespace.QName;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.XMLFilterImpl;
import com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector;
public class NamespaceFilter extends XMLFilterImpl {
private SAXConnector saxConnector;
#Override
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
if(saxConnector != null) {
Collection<QName> expected = saxConnector.getContext().getCurrentExpectedElements();
for(QName expectedQname : expected) {
if(localName.equals(expectedQname.getLocalPart())) {
super.startElement(expectedQname.getNamespaceURI(), localName, qName, atts);
return;
}
}
}
super.startElement(uri, localName, qName, atts);
}
#Override
public void setContentHandler(ContentHandler handler) {
super.setContentHandler(handler);
if(handler instanceof SAXConnector) {
saxConnector = (SAXConnector) handler;
}
}
}

Another way to add a default namespace to an XML Document before feeding it to JAXB is to use JDom:
Parse XML to a Document
Iterate through and set namespace on all Elements
Unmarshall using a JDOMSource
Like this:
public class XMLObjectFactory {
private static Namespace DEFAULT_NS = Namespace.getNamespace("http://tempuri.org/");
public static Object createObject(InputStream in) {
try {
SAXBuilder sb = new SAXBuilder(false);
Document doc = sb.build(in);
setNamespace(doc.getRootElement(), DEFAULT_NS, true);
Source src = new JDOMSource(doc);
JAXBContext context = JAXBContext.newInstance("org.tempuri");
Unmarshaller unmarshaller = context.createUnmarshaller();
JAXBElement root = unmarshaller.unmarshal(src);
return root.getValue();
} catch (Exception e) {
throw new RuntimeException("Failed to create Object", e);
}
}
private static void setNamespace(Element elem, Namespace ns, boolean recurse) {
elem.setNamespace(ns);
if (recurse) {
for (Object o : elem.getChildren()) {
setNamespace((Element) o, ns, recurse);
}
}
}

This is just a modification of lunicon's answer (https://stackoverflow.com/a/24387115/3519572) if you want to replace one namespace for another during parsing. And if you want to see what exactly is going on, just uncomment the output lines and set a breakpoint.
public class XMLReaderWithNamespaceCorrection extends StreamReaderDelegate {
private final String wrongNamespace;
private final String correctNamespace;
public XMLReaderWithNamespaceCorrection(XMLStreamReader reader, String wrongNamespace, String correctNamespace) {
super(reader);
this.wrongNamespace = wrongNamespace;
this.correctNamespace = correctNamespace;
}
#Override
public String getAttributeNamespace(int arg0) {
// System.out.println("--------------------------\n");
// System.out.println("arg0: " + arg0);
// System.out.println("getAttributeName: " + getAttributeName(arg0));
// System.out.println("super.getAttributeNamespace: " + super.getAttributeNamespace(arg0));
// System.out.println("getAttributeLocalName: " + getAttributeLocalName(arg0));
// System.out.println("getAttributeType: " + getAttributeType(arg0));
// System.out.println("getAttributeValue: " + getAttributeValue(arg0));
// System.out.println("getAttributeValue(correctNamespace, LN):"
// + getAttributeValue(correctNamespace, getAttributeLocalName(arg0)));
// System.out.println("getAttributeValue(wrongNamespace, LN):"
// + getAttributeValue(wrongNamespace, getAttributeLocalName(arg0)));
String origNamespace = super.getAttributeNamespace(arg0);
boolean replace = (((wrongNamespace == null) && (origNamespace == null))
|| ((wrongNamespace != null) && wrongNamespace.equals(origNamespace)));
return replace ? correctNamespace : origNamespace;
}
#Override
public String getNamespaceURI() {
// System.out.println("getNamespaceCount(): " + getNamespaceCount());
// for (int i = 0; i < getNamespaceCount(); i++) {
// System.out.println(i + ": " + getNamespacePrefix(i));
// }
//
// System.out.println("super.getNamespaceURI: " + super.getNamespaceURI());
String origNamespace = super.getNamespaceURI();
boolean replace = (((wrongNamespace == null) && (origNamespace == null))
|| ((wrongNamespace != null) && wrongNamespace.equals(origNamespace)));
return replace ? correctNamespace : origNamespace;
}
}
usage:
InputStream is = new FileInputStream(xmlFile);
XMLStreamReader xsr = XMLInputFactory.newFactory().createXMLStreamReader(is);
XMLReaderWithNamespaceCorrection xr =
new XMLReaderWithNamespaceCorrection(xsr, "http://wrong.namespace.uri", "http://correct.namespace.uri");
rootJaxbElem = (JAXBElement<SqgRootType>) um.unmarshal(xr);
handleSchemaError(rootJaxbElem, pmRes);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Handle external Entities and Stylesheet in Sax Parser (XML) - java

The external stylesheet declaration is a standard processing instruction. You can ignore processing instructions by not implementing the handler method: void processingInstruction(java.lang.String target, java.lang.String data) {} in your SAX handler.

Related

XMLStreamReader to StreamSource in java

Parsing a big xml file Java

Java-Read values from excel to a object

parse an xml string in java?

JAXB: How to ignore namespace during unmarshalling XML document?

Categories

Resources