How to read modify fragments of XML using StAX in Java?

How to read modify fragments of XML using StAX in Java? - java

My goal is to read objects (featureMember) into DOM, change them and write back into new XML. XML is too big to use DOM itself. I figured what I need is StAX and TransformerFactory, but I can't make it work.
This is what I've done till now:
private void change(File pathIn, File pathOut) {
try {
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLOutputFactory factoryOut = XMLOutputFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
XMLEventReader in = factory.createXMLEventReader(new FileReader(pathIn));
XMLEventWriter out = factoryOut.createXMLEventWriter(new FileWriter(pathOut));
while (in.hasNext()) {
XMLEvent e = in.nextTag();
if (e.getEventType() == XMLStreamConstants.START_ELEMENT) {
if (((StartElement) e).getName().getLocalPart().equals("featureMember")) {
DOMResult result = new DOMResult();
t.transform(new StAXSource(in), result);
Node domNode = result.getNode();
System.out.println(domnode);
}
}
out.add(e);
}
in.close();
out.close();
} catch (FileNotFoundException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
} catch (TransformerConfigurationException e1) {
e1.printStackTrace();
} catch (XMLStreamException e1) {
e1.printStackTrace();
} catch (TransformerException e1) {
e1.printStackTrace();
}
}
I get exception (on t.transform()):
Exception in thread "AWT-EventQueue-0" java.lang.IllegalStateException: StAXSource(XMLEventReader) with XMLEventReader not in XMLStreamConstants.START_DOCUMENT or XMLStreamConstants.START_ELEMENT state
Simplified version of my xml looks like (it has namespaces):
<?xml version="1.0" encoding="UTF-8"?>
<gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml/3.2" gml:id="featureCollection">
<gml:featureMember>
</eg:RST>
<eg:pole>Krakow</eg:pole>
<eg:localId>id1234</eg:localId>
</gml:featureMember>
<gml:featureMember>
<eg:RST>1002</eg:RST>
<eg:pole>Rzeszow</eg:pole>
<eg:localId>id1235</eg:localId>
</gml:featureMember>
</gml:FeatureCollection>
I have a list of localId's of objects (featureMember), which I want to change and correspoding changed RST or pole (it depends on user which one is changed):
localId (id1234) RST (1001)
localId (id1236) RST (1003)
...

The problem you're having is that when you create the StAXSource, your START_ELEMENT event has already been consumed. So the XMLEventReader is probably at some whitespace text node event, or something else that can't be an XML document source. You can use the peek() method to view the next event without consuming it. Make sure there is an event with hasNext() first, though.
I'm not 100% sure of what you wish to accomplish, so here's some things you could do depending on the scenario.
EDIT: I just read some of the comments on your question which make things a bit more clear. The below could still help you to achieve the desired result with some adjustment. Also note that Java XSLT processors allow for extension functions and extension elements, which can call into Java code from an XSLT stylesheet. This can be a powerful method to extend basic XSLT functionality with external resources such as database queries.
In case you want the input XML to be transformed into one output XML, you might be better of simply using an XML stylesheet transformation. In your code, you create a transformer without any templates, so it becomes the default "identity transformer" which just copies input to output. Suppose your input XML is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml/3.2" gml:id="featureCollection" xmlns:eg="acme.com">
<gml:featureMember>
<eg:RST/>
<eg:pole>Krakow</eg:pole>
<eg:localId>id1234</eg:localId>
</gml:featureMember>
<gml:featureMember>
<eg:RST>1002</eg:RST>
<eg:pole>Rzeszow</eg:pole>
<eg:localId>id1235</eg:localId>
</gml:featureMember>
</gml:FeatureCollection>
I've bound the eg prefix to some dummy namespace since it was missing from your sample and fixed the malformed RST element.
The following program will run an XSLT transformation on your input and writes it to an output file.
package xsltplayground;
import java.io.File;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class XSLTplayground {
public static void main(String[] args) throws Exception {
URL url = XSLTplayground.class.getResource("sample.xml");
File input = new File(url.toURI());
URL url2 = XSLTplayground.class.getResource("stylesheet.xsl");
File xslt = new File(url2.toURI());
URL url3 = XSLTplayground.class.getResource(".");
File output = new File(new File(url3.toURI()), "output.xml");
change(input, output, xslt);
}
private static void change(File pathIn, File pathOut, File xsltFile) {
try {
// Creating transformer with XSLT file
TransformerFactory tf = TransformerFactory.newInstance();
Source xsltSource = new StreamSource(xsltFile);
Transformer t = tf.newTransformer(xsltSource);
// Input source
Source input = new StreamSource(pathIn);
// Output target
Result output = new StreamResult(pathOut);
// Transforming
t.transform(input, output);
} catch (TransformerConfigurationException ex) {
Logger.getLogger(XSLTplayground.class.getName()).log(Level.SEVERE, null, ex);
} catch (TransformerException ex) {
Logger.getLogger(XSLTplayground.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
Here's a sample stylesheet.xsl file, which for convenience I just dumped into the same package as the input XML and class.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:eg="acme.com">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="gml:featureMember">
<gml:member>
<xsl:apply-templates select="node()|#*" />
</gml:member>
</xsl:template>
</xsl:stylesheet>
The above stylesheet will copy everything by default, but when it gets to a <gml:featureMember> element it will wrap the contents into a new <gml:member> element. Just a very simple example of what you can do with XSLT.
The output would be:
<?xml version="1.0" encoding="UTF-8"?>
<gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:eg="acme.com" gml:id="featureCollection">
<gml:member>
<eg:RST/>
<eg:pole>Krakow</eg:pole>
<eg:localId>id1234</eg:localId>
</gml:member>
<gml:member>
<eg:RST>1002</eg:RST>
<eg:pole>Rzeszow</eg:pole>
<eg:localId>id1235</eg:localId>
</gml:member>
</gml:FeatureCollection>
Since both input and output are file streams, you don't need the entire DOM in memory. XSLT in Java is pretty fast and efficient, so this might suffice.
Maybe you actually want to split every occurrence of some element into its own output file, with some changes to it. Here's an example of code that uses StAX for splitting off the <gml:featureMember> elements as separate documents. You could then iterate over the created files an transform them however you want (XSLT would again be a good choice). Obviously the error handling would need to be a bit more robust. This is just for demonstration.
package xsltplayground;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URL;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;
import javax.xml.transform.stream.StreamSource;
public class XSLTplayground {
public static void main(String[] args) throws Exception {
URL url = XSLTplayground.class.getResource("sample.xml");
File input = new File(url.toURI());
URL url2 = XSLTplayground.class.getResource("stylesheet.xsl");
File xslt = new File(url2.toURI());
URL url3 = XSLTplayground.class.getResource(".");
File output = new File(url3.toURI());
change(input, output, xslt);
}
private static void change(File pathIn, File directoryOut, File xsltFile) throws InterruptedException {
try {
// Creating a StAX event reader from the input
XMLInputFactory xmlIf = XMLInputFactory.newFactory();
XMLEventReader reader = xmlIf.createXMLEventReader(new StreamSource(pathIn));
// Create a StAX output factory
XMLOutputFactory xmlOf = XMLOutputFactory.newInstance();
int counter = 1;
// Keep going until no more events
while (reader.hasNext()) {
// Peek into the next event to find out what it is
XMLEvent next = reader.peek();
// If it's the start of a featureMember element, commence output
if (next.isStartElement()
&& next.asStartElement().getName().getLocalPart().equals("featureMember")) {
File output = new File(directoryOut, "output_" + counter + ".xml");
try (OutputStream ops = new FileOutputStream(output)) {
XMLEventWriter writer = xmlOf.createXMLEventWriter(ops);
copy(reader, writer);
writer.flush();
writer.close();
}
counter++;
} else {
// Not in a featureMember element: ignore
reader.next();
}
}
} catch (XMLStreamException ex) {
Logger.getLogger(XSLTplayground.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(XSLTplayground.class.getName()).log(Level.SEVERE, null, ex);
}
}
private static void copy(XMLEventReader reader, XMLEventWriter writer) throws XMLStreamException {
// Creating an XMLEventFactory
XMLEventFactory ef = XMLEventFactory.newFactory();
// Writing an XML document start
writer.add(ef.createStartDocument());
int depth = 0;
boolean stop = false;
while (!stop) {
XMLEvent next = reader.nextEvent();
writer.add(next);
if (next.isStartElement()) {
depth++;
} else if (next.isEndElement()) {
depth--;
if (depth == 0) {
writer.add(ef.createEndDocument());
stop = true;
}
}
}
}
}

Related

pl/sql java source class errorjavax.xml.transform.TransformerConfigurationException: Could not compile stylesheet

I'm transforming xml with xslt and java extension inside oracle database
When I pars the xml with xsl I'm getting the error:
errorjavax.xml.transform.TransformerConfigurationException: Could not compile stylesheet
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:858)
I have tested without extending the xslt and it works ok.
Also I have tested the transformation in an other environement and it works ok.
The java version is 1.6.0_43.
The xslt is:
<xsl:stylesheet version="1.0"
xmlns:java="http://xml.apache.org/xalan/java/XsltTransformer"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:template match="APSDocument">
<atag value='{java:GetDateTimeNow()}' ></atag>
</xsl:template>
</xsl:stylesheet>
The XML:
<APSDocument Tag="APP002">
<Section Tag="APPLICATION_FINISHED">
<SectionBody>
<Field Tag="APP_FINISHED">1</Field>
</SectionBody>
</Section>
</APSDocument>
The the java class:
CREATE OR REPLACE AND RESOLVE JAVA SOURCE NAMED "XsltTransformer" AS
import java.io.*;
import java.text.*;
import java.util.Date;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import org.xml.sax.InputSource;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.sql.*;
public class XsltTransformer
{
public static String TransformToSvTLV(String xmlDoc, String xsltDoc) throws Exception
{
try
{
XsltTransformer xsltTransformer = new XsltTransformer();
Document svmlDoc = xsltTransformer.TransformToSvML(xmlDoc, xsltDoc);
return xsltTransformer.TransformSvMLToStTlv(svmlDoc);
} catch (Exception e)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
e.printStackTrace(pw);
return sw.toString();
}
}
private Document TransformToSvML(String xmlDoc, String xsltDoc) throws Exception
{
TransformerFactory tFactory=TransformerFactory.newInstance();
Source xslSourceDoc=new StreamSource(new StringReader(xsltDoc));
Source xmlSourceDoc=new StreamSource(new StringReader(xmlDoc));
StringWriter writer = new StringWriter();
Transformer trasform=tFactory.newTransformer(xslSourceDoc);
trasform.transform(xmlSourceDoc, new StreamResult(writer));
//System.out.println(writer.toString());
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(writer.toString())));
doc.getDocumentElement().normalize();
return doc;
}
public static String GetDateTimeNow()
{
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd/MM/yyyy hh:mm");
return simpleDateFormat.format((new Date())).replace('/', '.');
}
private String TransformXmlNodeToSvTlv(Node svmlNode, String endChar)throws Exception
{
return "do nothing";
}
}
Calling the TransformToSvTLV method:
FUNCTION EtlTransformToSvXML(xmlDoc LONG,xsltDoc LONG) RETURN VARCHAR2
IS LANGUAGE JAVA NAME
'XsltTransformer.TransformToSvTLV(java.lang.String, java.lang.String)
return java.lang.String';
Thanks in advance!

Most probable problem in this case is the program is not able to read the xslt file.
Possible soluion I would suggest is to use
new StreamSource(new File("path/to/your/xslt"))

I have no experience with oracle or xalan but I seen issues similar to this in other other environment. With this in mind and consider the question has been active for some time, I'm going out on a limb here to make a suggestion rather then provide an possible answer.
Often xsl compiler failures are due to a problem with namespaces or permissions (especially in regards to being hosted within databases). My layman understanding is that the xslt processor will attempt to create an instance of the extension class during compilation of the xslt. If this is correct, then having the correct namespace and permissions will be significant.
The only example I've seen using xslt function extensions in oracle uses a namespace-uri as follows: "http://www.oracle.com/XSL/Transform/java/classname" where classname is fully qualified (i.e. module.class). Of course, your URI root may differ but I'd certain be considering this as a possible cause for the error.

Java Stax for Complex / Large XML

I have an XML file that is 4.2 GB! Obviously parsing the entire DOM is not practical. I have been looking at SAX and STAX to accomplish parsing this gigantic XML file. However all the examples I've seen are simple. The XML file I am dealing with has nested on nested on nested. There are areas where it goes 10+ levels.
I found this tutorial but not sure if its a viable solution.
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html (botton example using STAX)
I'm not really sure how to handle nested objects.
I have created Java objects to mimic the structure of the XML. Here are a few, too many to display.
Record.java
public class Record implements Serializable {
String uid;
StaticData staticData;
DynamicData dynamicData;
}
Summary.java
public class Summary {
EWUID ewuid;
PubInfo pubInfo;
Titles titles;
Names names;
DocTypes docTypes;
Publishers publishers;
}
EWUID.java
public class EWUID {
String collId;
String edition;
}
PubInfo.java
public class PubInfo {
String coverDate;
String hasAbstract;
String issue;
String pubMonth;
String pubType;
String pubYear;
String sortDate;
String volume;
}
This is the code I've come up with so far.
public class TRWOSParser {
XMLEventReader eventReader;
XMLInputFactory inputFactory;
InputStream inputStream;
public TRWOSParser(String file) throws FileNotFoundException, XMLStreamException {
inputFactory = XMLInputFactory.newInstance();
inputStream = new FileInputStream(file);
eventReader = inputFactory.createXMLEventReader(inputStream);
}
public void parse() throws XMLStreamException{
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartElement()) {
StartElement startElement = event.asStartElement();
if (startElement.getName().getLocalPart().equals("record")) {
Record record = new Record();
Iterator<Attribute> attributes = startElement.getAttributes();
while (attributes.hasNext()) {
Attribute attribute = attributes.next();
if (attribute.getName().toString().equals("UID")) {
System.out.println("UID: " + attribute.getValue());
}
}
}
}
}
}
}
Update:
The data in the XML is licensed so I cannot show the full file. This is a very very small segment in which I have scrambled the data.
<?xml version="1.0" encoding="UTF-8"?>
<records>
<REC>
<UID>WOS:000310438600004</UID>
<static_data>
<summary>
<EWUID>
<WUID coll_id="WOS" />
<edition value="WOS.SCI" />
</EWUID>
<pub_info coverdate="NOV 2012" has_abstract="N" issue="5" pubmonth="NOV" pubtype="Journal" pubyear="2012" sortdate="2012-11-01" vol="188">
<page begin="1662" end="1663" page_count="2">1662-1663</page>
</pub_info>
<titles count="6">
<title type="source">JOURNAL OF UROLOGY</title>
<title type="source_abbrev">J UROLOGY</title>
<title type="abbrev_iso">J. Urol.</title>
<title type="abbrev_11">J UROL</title>
<title type="abbrev_29">J UROL</title>
<title type="item">Something something</title>
</titles>
<names count="1">
<name addr_no="1 2 3" reprint="Y" role="author" seq_no="1">
<display_name>John Doe</display_name>
<full_name>John Doe</full_name>
<wos_standard>Doe, John</wos_standard>
<first_name>John</first_name>
<last_name>Doe</last_name>
</name>
</names>
<doctypes count="1">
<doctype>Editorial Material</doctype>
</doctypes>
<publishers>
<publisher>
<address_spec addr_no="1">
<full_address>360 PARK AVE SOUTH, NEW YORK, NY 10010-1710 USA</full_address>
<city>NEW YORK</city>
</address_spec>
<names count="1">
<name addr_no="1" role="publisher" seq_no="1">
<display_name>ELSEVIER SCIENCE INC</display_name>
<full_name>ELSEVIER SCIENCE INC</full_name>
</name>
</names>
</publisher>
</publishers>
</summary>
</static_data>
</REC>
</records>

A similar solution to lscoughlin's answer is to use DOM4J which has mechanims to deal with this scenario: http://dom4j.sourceforge.net/
In my opionin it is more straight forward and easier to follow. It might not support namespaces, though.

I'm making two assumptions 1) that there is an early level of repetition, and 2) that you can do something meaningful with a partial document.
Let's assume you can move some level of nesting in, and then handle the document multiple times, removing the nodes at the working level each time you "handle" the document. This means that only a single working subtree will be in memory at any given time.
Here's a working code snippet:
package bigparse;
import static javax.xml.stream.XMLStreamConstants.CHARACTERS;
import static javax.xml.stream.XMLStreamConstants.END_DOCUMENT;
import static javax.xml.stream.XMLStreamConstants.END_ELEMENT;
import static javax.xml.stream.XMLStreamConstants.START_DOCUMENT;
import static javax.xml.stream.XMLStreamConstants.START_ELEMENT;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class BigParse {
public static void main(String... args) {
XMLInputFactory factory = XMLInputFactory.newInstance();
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
try {
XMLStreamReader streamReader = factory.createXMLStreamReader(new FileReader("src/main/resources/test.xml"));
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
Element rootElement = null;
Element currentElement = null;
int branchLevel = 0;
int maxBranchLevel = 1;
while (streamReader.hasNext()) {
int event = streamReader.next();
switch (event) {
case START_DOCUMENT:
continue;
case START_ELEMENT:
if (branchLevel < maxBranchLevel) {
Element workingElement = readElementOnly(streamReader, document);
if (rootElement == null) {
document.appendChild(workingElement);
rootElement = document.getDocumentElement();
currentElement = rootElement;
} else {
currentElement.appendChild(workingElement);
currentElement = workingElement;
}
branchLevel++;
} else {
workingLoop(streamReader, document, currentElement);
}
continue;
case CHARACTERS:
currentElement.setTextContent(streamReader.getText());
continue;
case END_ELEMENT:
if (currentElement != rootElement) {
currentElement = (Element) currentElement.getParentNode();
branchLevel--;
}
continue;
case END_DOCUMENT:
break;
}
}
} catch (ParserConfigurationException
| FileNotFoundException
| XMLStreamException e) {
throw new RuntimeException(e);
}
}
private static Element readElementOnly(XMLStreamReader streamReader, Document document) {
Element workingElement = document.createElement(streamReader.getLocalName());
for (int attributeIndex = 0; attributeIndex < streamReader.getAttributeCount(); attributeIndex++) {
workingElement.setAttribute(
streamReader.getAttributeLocalName(attributeIndex),
streamReader.getAttributeValue(attributeIndex));
}
return workingElement;
}
private static void workingLoop(final XMLStreamReader streamReader, final Document document, final Element fragmentRoot)
throws XMLStreamException {
Element startElement = readElementOnly(streamReader, document);
fragmentRoot.appendChild(startElement);
Element currentElement = startElement;
while (streamReader.hasNext()) {
int event = streamReader.next();
switch (event) {
case START_DOCUMENT:
continue;
case START_ELEMENT:
Element workingElement = readElementOnly(streamReader, document);
currentElement.appendChild(workingElement);
currentElement = workingElement;
continue;
case CHARACTERS:
currentElement.setTextContent(streamReader.getText());
continue;
case END_ELEMENT:
if (currentElement != startElement) {
currentElement = (Element) currentElement.getParentNode();
continue;
} else {
handleDocument(document, startElement);
fragmentRoot.removeChild(startElement);
startElement = null;
return;
}
}
}
}
// THIS FUNCTION DOES SOMETHING MEANINFUL
private static void handleDocument(Document document, Element startElement) {
System.out.println(stringify(document));
}
private static String stringify(Document document) {
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(document);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
return xmlString;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
EDIT: I made an incredibly silly mistake. It's fixed now. It's working but imperfect -- should be enough to lead you in a useful direction.

Consider using an XSLT 3.0 streaming transformation of the form:
<xsl:template name="main">
<xsl:stream href="bigInput.xml">
<xsl:for-each select="copy-of(/records/REC)">
<!-- process one record -->
</xsl:for-each>
</xsl:stream>
</xsl:template>
You can process this using Saxon-EE 9.6.
The "process one record" logic could use the Saxon SQL extension, or it could invoke an extension function: the context node will be a REC element with its contained tree, fully navigable within the subtree, but with no ability to navigate outside the REC element currently being processed.

Howto refer dynamically to an XML file in XQuery in Saxon

I am using the XQuery processor Saxon.
Now we write our XQuery in a ".xqy" file where we refer to the XML file on which we will perform our XQuery.
Please see the example below:
for $x in doc("books.xml")/books/book
where $x/price>30
return $x/title
Now I want to use dynamically generated XML not stored in some path. Say, for example, I want to refer below XML that is available as a string.
How to do that?
String book=
<books>
<book category="JAVA">
<title lang="en">Learn Java in 24 Hours</title>
<author>Robert</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="DOTNET">
<title lang="en">Learn .Net in 24 hours</title>
<author>Peter</author>
<year>2011</year>
<price>40.50</price>
</book>
<book category="XML">
<title lang="en">Learn XQuery in 24 hours</title>
<author>Robert</author>
<author>Peter</author>
<year>2013</year>
<price>50.00</price>
</book>
<book category="XML">
<title lang="en">Learn XPath in 24 hours</title>
<author>Jay Ban</author>
<year>2010</year>
<price>16.50</price>
</book>
</books>
Java code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;
import com.saxonica.xqj.SaxonXQDataSource;
public class XQueryTester {
public static void main(String[] args){
try {
execute();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XQException e) {
e.printStackTrace();
}
}
private static void execute() throws FileNotFoundException, XQException{
InputStream inputStream = new FileInputStream(new File("books.xqy"));
XQDataSource ds = new SaxonXQDataSource();
XQConnection conn = ds.getConnection();
XQPreparedExpression exp = conn.prepareExpression(inputStream);
XQResultSequence result = exp.executeQuery();
while (result.next()) {
System.out.println(result.getItemAsString(null));
}
}
}

If you look for a way to bind the input (the context item) of the query using Java, I recommend using Saxon's S9API (the most intuitive API for XSLT, XPath and XQuery processing in Java).
Here is how to instantiate Saxon, compile the query, parse the input and evaluate the query with the input document bound as its context item:
// the Saxon processor object
Processor saxon = new Processor(false);
// compile the query
XQueryCompiler compiler = saxon.newXQueryCompiler();
XQueryExecutable exec = compiler.compile(new File("yours.xqy"));
// parse the string as a document node
DocumentBuilder builder = saxon.newDocumentBuilder();
String input = "<xml>...</xml>";
Source src = new StreamSource(new StringReader(input));
XdmNode doc = builder.build(src);
// instantiate the query, bind the input and evaluate
XQueryEvaluator query = exec.load();
query.setContextItem(doc);
XdmValue result = query.evaluate();
Note that if the Java code is generating the XML document, I strongly advice you to use S9API to build the tree directly in memory, instead of generating the XML document as a string, then parse it. If possible.

here is how I implemented as suggested by above user-
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;
import net.sf.saxon.s9api.DocumentBuilder;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.XQueryCompiler;
import net.sf.saxon.s9api.XQueryEvaluator;
import net.sf.saxon.s9api.XQueryExecutable;
import net.sf.saxon.s9api.XdmNode;
import net.sf.saxon.s9api.XdmValue;
import com.saxonica.xqj.SaxonXQDataSource;
public class Xml { public static void main(String[] args){
try {
process();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void process() throws SaxonApiException, IOException{
// the Saxon processor object
Processor saxon = new Processor(false);
// compile the query
XQueryCompiler compiler = saxon.newXQueryCompiler();
XQueryExecutable exec;
exec = compiler.compile(new File("E:\\abc.xqy"));
// parse the string as a document node
DocumentBuilder builder = saxon.newDocumentBuilder();
String input = "<data><employee id=\"1\"><name>A</name>"
+ "<title>Manager</title></employee>+<employee id=\"2\"><name>B</name>"
+ "<title>Manager</title></employee>+</data>";
Source src = new StreamSource(new StringReader(input));
XdmNode doc = builder.build(src);
// instantiate the query, bind the input and evaluate
XQueryEvaluator query = exec.load();
query.setContextItem(doc);
XdmValue result = query.evaluate();
System.out.println(result.itemAt(0));
}

Prettify XML in org.w3c.dom.Document to file

Summary: I want to save a org.w3c.dom.Document to file with nice indentation (pretty print it). The below code with a Transformer does the job in some cases, but not in all cases (see example). Can you help me fix this?
I have a org.w3c.dom.Document (not org.jdom.Document) and want to automatically format it nicely and print it into a file. How can I do that? I tried this, but it doesn't work if there are additional newlines in the document:
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
public class Main {
public static void main(String[] args) {
try {
String input = "<asdf>\n\n<a>text</a></asdf>";
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(input.getBytes()));
System.out.println("-- input -------------------\n" + input + "\n----------------------------");
System.out.println("-- output ------------------");
prettify(doc);
System.out.println("----------------------------");
} catch (Exception e) {}
}
public static void prettify(Document doc) {
try {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.transform(new DOMSource(doc), new StreamResult(System.out));
} catch (Exception e) {}
}
}
I have directed the ouput to System.out so that you can run it easily wherever you want (for instance on Ideone.com). You can see, that the output is not pretty. If I remove the \n\n from the input string, everything is fine. And the document usually doesn't come from a string, but from a file and gets modified heavily before I want to prettify it.
This Transformer seems to be the right way, but I am missing something. Can you tell me, what I am doing wrong?
SSCCE output:
-- input -------------------
<asdf>
<a>text</a></asdf>
----------------------------
-- output ------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<asdf>
<a>text</a>
</asdf>
----------------------------
Expected output:
-- input -------------------
<asdf>
<a>text</a></asdf>
----------------------------
-- output ------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<asdf>
<a>text</a>
</asdf>
----------------------------

Try this:
It needs org.apache.xml.serialize.XMLSerializer and org.apache.xml.serialize.OutputFormat ;
OutputFormat format = new OutputFormat(document); //document is an instance of org.w3c.dom.Document
format.setLineWidth(65);
format.setIndenting(true);
format.setIndent(2);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
String formattedXML = out.toString();

Java XML parser blocks (very unusual and strange!)

I have a very strange case:
I tried to parse several XHTML-conform websites using default Java XML parser(s). The test blocks during parsing (not during downloading).
Can this be a bug, or does the parser tries to download additional referenced resources during parsing (which would be a "nice" anti-feature)?
With simple data, it works. (TEST1)
With complex data, it blocks. (TEST2)
(I tried en.wikipedia.org and validator.w3.org)
When blocking occurs, CPU is idle.
Tested with JDK6 and JDK7, same results.
Please see test case, source is ready for copy + paste + run.
Source
import java.io.*;
import java.net.*;
import java.nio.charset.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
public class _XmlParsingBlocks {
private static Document parseXml(String data)
throws Exception {
Transformer t = TransformerFactory.newInstance().newTransformer();
DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
DOMResult out = new DOMResult(b.newDocument());
t.transform(new StreamSource(new StringReader(data)), out);
return (Document) out.getNode();
}
private static byte[] streamToByteArray(InputStream is)
throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
for (;;) {
byte[] buffer = new byte[256];
int count = is.read(buffer);
if (count == -1) {
is.close();
break;
}
baos.write(buffer, 0, count);
}
return baos.toByteArray();
}
private static void test(byte[] data)
throws Exception {
String asString = new String(data, Charset.forName("UTF-8"));
System.out.println("===== PARSING STARTED =====");
Document doc = parseXml(asString);
System.out.println("===== PARSING ENDED =====");
}
public static void main(String[] args)
throws Exception {
{
System.out.println("********** TEST 1");
test("<html>test</html>".getBytes("UTF-8"));
}
{
System.out.println("********** TEST 2");
URL url = new URL("http://validator.w3.org/");
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
byte[] data = streamToByteArray(is);
System.out.println("===== DOWNLOAD FINISHED =====");
test(data);
}
}
}
Output
********** TEST 1
===== PARSING STARTED =====
===== PARSING ENDED =====
********** TEST 2
===== DOWNLOAD FINISHED =====
===== PARSING STARTED =====
[here it blocks]

W3C have in the last few months started blocking requests for common DTDs such as the XHTML DTD - they can't cope with the traffic generated. If you're not using a proxy server that caches the DTDs, you will need to use an EntityResolver or catalog to redirect the references to a local copy.

Looking at the page you downloaded, it contains some more http: URLs.
This is the start:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
I could imagine that the XML parser is trying to download the referenced DTD here, to be able to validate the XML content.
Try to add the preamble to your simple document, or try to let it away from your complex one, to see if this changes something.
Switch the parser to non-validating, and look if this helps. (Alternatively, there are some options to configure how the parser behaves - setURIResolver looks good, for example.)

Solution: prefetch (or better: use offline stored) DTDs for a custom EntityResolver.
When it is expected, that no external XML entities are used (such as ), an empty InputSource can be returned, see inner enum. Otherwise, a prepared mapping of DTD URI -> bytearray can be used to prevent downloading DTDs online.
Class
import java.io.*;
import java.util.*;
import javax.annotation.*;
import org.xml.sax.*;
public final class PrefetchedEntityResolver
implements EntityResolver {
/**
* NOTE: {#see #RETURN_NULL} seems to cause default behavior
* (which is: downloading the DTD);
* use {#see #RETURN_EMPTY_DATA} to ensure "offline" behavior
* (which could lead to entity parsing errors).
*/
public static enum NoMatchBehavior {
THROW_EXCEPTION, RETURN_NULL, RETURN_EMPTY_DATA;
}
private final SortedMap<String, byte[]> prefetched;
private final NoMatchBehavior noMatchBehavior;
public PrefetchedEntityResolver(NoMatchBehavior noMatchBehavior,
#Nullable SortedMap<String, byte[]> prefetched) {
this.noMatchBehavior = noMatchBehavior;
this.prefetched = new TreeMap<>(prefetched == null
? Collections.<String, byte[]>emptyMap() : prefetched);
}
#Override
public InputSource resolveEntity(String name, String uri)
throws SAXException, IOException {
byte[] data = prefetched.get(uri);
if (data == null) {
switch (noMatchBehavior) {
case RETURN_NULL:
return null;
case RETURN_EMPTY_DATA:
return new InputSource(new ByteArrayInputStream(new byte[]{}));
case THROW_EXCEPTION:
throw new SAXException("no prefetched DTD found for: " + uri);
default:
throw new Error("unsupported: " + noMatchBehavior.toString());
}
}
return new InputSource(new ByteArrayInputStream(data));
}
}
Usage
public static Document parseXml(byte[] data)
throws Exception {
DocumentBuilderFactory df = DocumentBuilderFactory.newInstance();
df.setValidating(false);
df.setXIncludeAware(false);
df.setCoalescing(false);
df.setExpandEntityReferences(false);
DocumentBuilder b = df.newDocumentBuilder();
b.setEntityResolver(new PrefetchedEntityResolver(
PrefetchedEntityResolver.NoMatchBehavior.RETURN_EMPTY_DATA,
/* pass some prepared SortedMap<String, byte[]> */));
ByteArrayInputStream bais = new ByteArrayInputStream(data);
return b.parse(bais);
}

Perhaps your "count == -1" condition needs to become "count <= 0" ?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read modify fragments of XML using StAX in Java? - java

Related

pl/sql java source class errorjavax.xml.transform.TransformerConfigurationException: Could not compile stylesheet

Java Stax for Complex / Large XML

Howto refer dynamically to an XML file in XQuery in Saxon

Prettify XML in org.w3c.dom.Document to file

Java XML parser blocks (very unusual and strange!)

Categories

Resources