Java program to convert a Text file to XML using XSL

Java program to convert a Text file to XML using XSL - java

I am trying to implement a small example where I want to convert content in a text file to XML file using XSL as transformer. I came across this example - XSL - create well formed xml from text file in SO and I was trying to implement the same but facing some issues.
I am using the same text file as input and the XSL file mentioned in answer of the SO post. This is the Java program I am trying to use:
public class Parser {
public static void main(String[] args) {
String path="src/";
String text = path+"input.txt";
String xslt = path+"input.xsl";
String output = path+"output.xml";
System.setProperty("javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl");
try {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer tr = tf.newTransformer(new StreamSource(xslt));
tr.transform(new StreamSource(text), new StreamResult(
new FileOutputStream(output)));
System.out.println("Output to " + output);
} catch (Exception e) {
System.out.println(e);
e.printStackTrace();
}
}
}
I am getting exception as:
Error on line 1 column 1 of input.txt:
SXXP0003: Error reported by XML parser: Content is not allowed in prolog.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: Content is not allowed in prolog.
net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:418)
at net.sf.saxon.event.Sender.send(Sender.java:214)
at net.sf.saxon.event.Sender.send(Sender.java:50)
at net.sf.saxon.Controller.transform(Controller.java:1611)
at three.Parser.main(Parser.java:21)
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1427)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1036)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:404)
... 4 more
It seems I cannot use the text file as input in my program. Can someone please help me in fixing the issue.
Update:
I have solved it using the Saxon S9 API (using Jar - saxon9he.jar) as suggested by Martin in his answer, here is the JAVA code that worked.
import java.io.File;
import javax.xml.transform.stream.StreamSource;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.QName;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.Serializer;
import net.sf.saxon.s9api.XsltCompiler;
import net.sf.saxon.s9api.XsltExecutable;
import net.sf.saxon.s9api.XsltTransformer;
public class Parser {
public static void main(String[] args) throws SaxonApiException {
Processor proc = new Processor(false);
XsltCompiler comp = proc.newXsltCompiler();
XsltExecutable exp = comp.compile(new StreamSource(new File(
"src/input.xsl")));
Serializer out = new Serializer();
out.setOutputProperty(Serializer.Property.METHOD, "xml");
out.setOutputProperty(Serializer.Property.INDENT, "yes");
out.setOutputFile(new File("src/output.xml"));
XsltTransformer trans = exp.load();
trans.setInitialTemplate(new QName("main"));
trans.setDestination(out);
trans.transform();
System.out.println("Output written to text file");
}
}

The code to transform text to XML depends on XSLT version 2.0 and an XSLT 2.0 processor like Saxon 9. The JAXP API you are trying to use is solely useful with an XSLT 1.0 approach of having an XML input document as the primary source to the XSLT code. Thus if you want to use that API then you need to make sure you pass a dummy input XML to the transformer, while the URI of the plain text file should be passed in as a parameter. I would however suggest to use the Saxon S9 API to simply start the stylesheet with a named template main, also passing in the plain text URI as a parameter.

You can't feed plain text to an XSL transformer. It only accepts well-formed XML as input.
So the code in the linked question starts the transformer with no input and then inside of XSLT, it loads the text with
<xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />

Related

Which methods can be used to return valid and invalid XML data from a file in Java?

I have the following data that is supposed to be XML:
<?xml version="1.0" encoding="UTF-8"?>
<Product>
<id>1</id>
<description>A new product</description>
<price>123.45</price>
</Product>
<Product>
<id>1</id>
<description>A new product</description>
<price>123.45</price>
</Product>
<ProductTTTTT>
<id>1</id>
<description>A new product</description>
<price>123.45</price>
</Product>
<Product>
<id>1</id>
<description>A new product</description>
<price>123.45</price>
</ProductAAAAAA>
So, basically I have multiple root elements (product)...
The point is that I'm trying to transform this data into 2 XML documents, 1 for valid nodes and other for invalid nodes.
Valid node:
<Product>
...
</Product>
Invalid nodes: <ProductTTTTT>...</Product> and <Product>...</ProductAAAAAA>
Then I am thinking how I can achieve this using JAVA (not web).
If I am not wrong, validating it with a XSD will invalidate the whole file, so not an option.
Using default JAXB parser (unmarshaller) will lead to item above since internally it creates a XSD of my entity.
Using XPath just (from what I know) will just return the whole file, I did not find a way to get something like GET !VALID (It is just to explain...)
Using XQuery (maybe?).. by the way, how to use XQuery with JAXB?
XSL(T) will lead to same thing on XPath, since it uses XPath to select the content.
So... which method can I use to achieve the objective? (And if possible, provide links or code please)

Firstly, you're confusing valid and well-formed. You say you want to find invalid elements, but your examples aren't just invalid, they are ill-formed. That means that no XML parser is going to do anything with them other than throwing an error message at you. You can't use JAXB or XPath, or XQuery, or XSLT, or anything to process something that isn't XML.
You say "unfortunately I do not have access to the system that sends this xml format". I'm not sure why you call it an XML format: it isn't. I also don't understand why you (and many others on StackOverflow) are prepared to spend your time digging in garbage like this rather than telling the sender to get their act together. If you were served a salad with maggots in it, would you try to pick them out, or would you send it back for replacement? You should adopt a zero-tolerance approach to bad data; that's the only way senders will learn to improve the quality.

If the file contains lines with start and end tags who's name begins with "Product", you could:
use a file scanner to split this document into individual pieces whenever a line starts with <Product or </Product
attempt to parse the extracted text as XML using an XML API.
If it succeeds, add that object to a list of "good" well-formed XML documents
then perform any additional schema validation or validity checks
If it throws a parse error, catch it, and add that snippet of text to the list of "bad" items that need to be cleaned up or otherwise handled
An example to get you started:
package com.stackoverflow.questions.52012383;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class FileSplitter {
public static void parseFile(File file, String elementName)
throws ParserConfigurationException, IOException {
List<Document> good = new ArrayList<>();
List<String> bad = new ArrayList<>();
String start-tag = "<" + elementName;
String end-tag = "</" + elementName;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
StringBuffer buffer = new StringBuffer();
String line;
boolean append = false;
try (Scanner scanner = new Scanner(file)) {
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.startsWith(startTag)) {
append = true; //start accumulating content
} else if (line.startsWith(endTag)) {
append = false;
buffer.append(line);
//instead of the line above, you could hard-code the ending tag to compensate for bad data:
// buffer.append(endTag + ">");
try { // to parse as XML
builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(buffer.toString())));
good.add(document); // parsed successfully, add it to the good list
buffer.setLength(0); //reset the buffer to start a new XML doc
} catch (SAXException ex) {
bad.add(buffer.toString()); // something is wrong, not well-formed XML
}
}
if (append) { // accumulate content
buffer.append(line);
}
}
System.out.println("Good items: " + good.size() + " Bad items: " + bad.size());
//do stuff with the good/bad results...
}
}
public static void main(String args[])
throws ParserConfigurationException, IOException {
File file = new File("/tmp/test.xml");
parseFile(file, "Product");
}
}

How to append XML (String) to XmlEventWriter (StAX) which already created the start of document

In order to create big XML files, we decided to make use of the StAX API. The basic structure is build by using the low-level api's: createStartDocument(), createStartElement(). This works as expected.
However, in some cases we like to append existing XML data which resides in a String (retrieved from database). The following snippet illustrates this:
import java.lang.*;
import java.io.*;
import javax.xml.stream.*;
public class Example {
public static void main(String... args) throws XMLStreamException {
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
StringWriter writer = new StringWriter();
XMLEventWriter eventWriter = outputFactory.createXMLEventWriter(writer);
eventWriter.add(eventFactory.createStartDocument("UTF-8", "1.0"));
eventWriter.add(eventFactory.createStartElement("ns0", "http://example.org", "root"));
eventWriter.add(eventFactory.createNamespace("ns0", "http://example.org"));
//In here, we want to append a piece of XML which is stored in a string variable.
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
eventWriter.add(inputFactory.createXMLEventReader(new StringReader(xml)));
eventWriter.add(eventFactory.createEndDocument());
System.out.println(writer.toString());
}
}
With the above code, depending on the implementation, we are not getting the expected result:
Woodstox: The following exception is thrown:'Can not output XML declaration, after output has already been done'. It seems that the XMLEventReader starts off with a startDocument event, but since a startDocument event was already triggered programatically, it throws the error.
JDK: It appends <?xml version="1.0" ... <fragments><fragment>... -> Which leads to invalid XML.
I have also tried to append the XML by using:
eventFactory.createCharacters(xml);
The problem here is that even though the XML is appended, the < and > are transformed into &lt and &gt. Therefore, this results in invalid XML.
Am I missing an API that allows me to simply append a String as XML?

You can first consume any StartDocument if necessary:
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
XMLEventReader xer = inputFactory.createXMLEventReader(new StringReader(xml));
if (xer.peek().isStartDocument())
{
xer.nextEvent();
}
eventWriter.add(xer);

Parsing XML file containing HTML entities in Java without changing the XML

I have to parse a bunch of XML files in Java that sometimes -- and invalidly -- contain HTML entities such as —, > and so forth. I understand the correct way of dealing with this is to add suitable entity declarations to the XML file before parsing. However, I can't do that as I have no control over those XML files.
Is there some kind of callback I can override that is invoked whenever the Java XML parser encounters such an entity? I haven't been able to find one in the API.
I'd like to use:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( stream );
I found that I can override resolveEntity in org.xml.sax.helpers.DefaultHandler, but how do I use this with the higher-level API?
Here's a full example:
public class Main {
public static void main( String [] args ) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( new FileInputStream( "test.xml" ));
}
}
with test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Produces:
[Fatal Error] :3:20: The entity "nbsp" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 20; The entity "nbsp" was referenced, but not declared.
Update: I have been poking around in the JDK source code with a debugger, and boy, what an amount of spaghetti. I have no idea what the design is there, or whether there is one. Just how many layers of an onion can one layer on top of each other?
They key class seems to be com.sun.org.apache.xerces.internal.impl.XMLEntityManager, but I cannot find any code that either lets me add stuff into it before it gets used, or that attempts to resolve entities without going through that class.

I would use a library like Jsoup for this purpose. I tested the following below and it works. I don't know if this helps. It can be located here: http://jsoup.org/download
public static void main(String args[]){
String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}
Result:
<bar>
Some text — invalid!
</bar>
Loading from a file can be found here:
http://jsoup.org/cookbook/input/load-document-from-file

Issue - 1: I have to parse a bunch of XML files in Java that sometimes -- and
invalidly -- contain HTML entities such as —
XML has only five predefined entities. The —, is not among them. It works only when used in plain HTML or in legacy JSP. So, SAX will not help. It can be done using StaX which has high level iterator based API. (Collected from this link)
Issue - 2: I found that I can override resolveEntity in
org.xml.sax.helpers.DefaultHandler, but how do I use this with the
higher-level API?
Streaming API for XML, called StaX, is an API for reading and writing XML Documents.
StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.
The core StaX API falls into two categories and they are listed below. They are
Cursor based API: It is low-level API. cursor-based API allows the application to process XML as a stream of tokens aka events
Iterator based API: The higher-level iterator-based API allows the application to process XML as a series of event objects, each of which communicates a piece of the XML structure to the application.
STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:
Requires the parser to replace internal entity references with their
replacement text and report them as characters
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader.
However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to notreplace them.
You may try it. Hope it will solve your issue. For your case,
Main.java
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.EntityReference;
import javax.xml.stream.events.XMLEvent;
public class Main {
public static void main(String[] args) {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(
XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader;
try {
reader = inputFactory
.createXMLEventReader(new FileInputStream("F://test.xml"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
}
}
test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Output:
Entity Reference: nbsp
Entity Reference: mdash
Credit goes to #skaffman.
Related Link:
http://www.journaldev.com/1191/how-to-read-xml-file-in-java-using-java-stax-api
http://www.journaldev.com/1226/java-stax-cursor-based-api-read-xml-example
http://www.vogella.com/tutorials/JavaXML/article.html
Is there a Java XML API that can parse a document without resolving character entities?
UPDATE:
Issue - 3: Is there a way to use StaX to "filter" the entities (replacing them
with something else, for example) and still produce a Document at the
end of the process?
To create a new document using the StAX API, it is required to create an XMLStreamWriter that provides methods to produce XML opening and closing tags, attributes and character content.
There are 5 methods of XMLStreamWriter for document.
xmlsw.writeStartDocument(); - initialises an empty document to which
elements can be added
xmlsw.writeStartElement(String s) -creates a new element named s
xmlsw.writeAttribute(String name, String value)- adds the attribute
name with the corresponding value to the last element produced by a
call to writeStartElement. It is possible to add attributes as long
as no call to writeElementStart,writeCharacters or writeEndElement
has been done.
xmlsw.writeEndElement - close the last started element
xmlsw.writeCharacters(String s) - creates a new text node with
content s as content of the last started element.
A sample example is attached with it:
StAXExpand.java
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
import java.util.Arrays;
public class StAXExpand {
static XMLStreamWriter xmlsw = null;
public static void main(String[] argv) {
try {
xmlsw = XMLOutputFactory.newInstance()
.createXMLStreamWriter(System.out);
CompactTokenizer tok = new CompactTokenizer(
new FileReader(argv[0]));
String rootName = "dummyRoot";
// ignore everything preceding the word before the first "["
while(!tok.nextToken().equals("[")){
rootName=tok.getToken();
}
// start creating new document
xmlsw.writeStartDocument();
ignorableSpacing(0);
xmlsw.writeStartElement(rootName);
expand(tok,3);
ignorableSpacing(0);
xmlsw.writeEndDocument();
xmlsw.flush();
xmlsw.close();
} catch (XMLStreamException e){
System.out.println(e.getMessage());
} catch (IOException ex) {
System.out.println("IOException"+ex);
ex.printStackTrace();
}
}
public static void expand(CompactTokenizer tok, int indent)
throws IOException,XMLStreamException {
tok.skip("[");
while(tok.getToken().equals("#")) {// add attributes
String attName = tok.nextToken();
tok.nextToken();
xmlsw.writeAttribute(attName,tok.skip("["));
tok.nextToken();
tok.skip("]");
}
boolean lastWasElement=true; // for controlling the output of newlines
while(!tok.getToken().equals("]")){ // process content
String s = tok.getToken().trim();
tok.nextToken();
if(tok.getToken().equals("[")){
if(lastWasElement)ignorableSpacing(indent);
xmlsw.writeStartElement(s);
expand(tok,indent+3);
lastWasElement=true;
} else {
xmlsw.writeCharacters(s);
lastWasElement=false;
}
}
tok.skip("]");
if(lastWasElement)ignorableSpacing(indent-3);
xmlsw.writeEndElement();
}
private static char[] blanks = "\n".toCharArray();
private static void ignorableSpacing(int nb)
throws XMLStreamException {
if(nb>blanks.length){// extend the length of space array
blanks = new char[nb+1];
blanks[0]='\n';
Arrays.fill(blanks,1,blanks.length,' ');
}
xmlsw.writeCharacters(blanks, 0, nb+1);
}
}
CompactTokenizer.java
import java.io.Reader;
import java.io.IOException;
import java.io.StreamTokenizer;
public class CompactTokenizer {
private StreamTokenizer st;
CompactTokenizer(Reader r){
st = new StreamTokenizer(r);
st.resetSyntax(); // remove parsing of numbers...
st.wordChars('\u0000','\u00FF'); // everything is part of a word
// except the following...
st.ordinaryChar('\n');
st.ordinaryChar('[');
st.ordinaryChar(']');
st.ordinaryChar('#');
}
public String nextToken() throws IOException{
st.nextToken();
while(st.ttype=='\n'||
(st.ttype==StreamTokenizer.TT_WORD &&
st.sval.trim().length()==0))
st.nextToken();
return getToken();
}
public String getToken(){
return (st.ttype == StreamTokenizer.TT_WORD) ? st.sval : (""+(char)st.ttype);
}
public String skip(String sym) throws IOException {
if(getToken().equals(sym))
return nextToken();
else
throw new IllegalArgumentException("skip: "+sym+" expected but"+
sym +" found ");
}
}
For more, you can follow the tutorial
https://docs.oracle.com/javase/tutorial/jaxp/stax/example.html
http://www.ibm.com/developerworks/library/x-tipstx2/index.html
http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/HTML/ch09s03.html
http://staf.sourceforge.net/current/STAXDoc.pdf

Another approach, since you're not using a rigid OXM approach anyway.
You might want to try using a less rigid parser such as JSoup?
This will stop immediate problems with invalid XML schemas etc, but it will just devolve the problem into your code.

Just to throw in a different approach to a solution:
You might envelope your input stream with a stream inplementation that replaces the entities by something legal.
While this is a hack for sure, it should be a quick and easy solution (or better say: workaround).
Not as elegant and clean as a xml framework internal solution, though.

I made yesterday something similar i need to add value from unziped XML in stream to database.
//import I'm not sure if all are necessary :)
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
//I didnt checked this code now because i'm in work for sure its work maybe
you will need to do little changes
InputSource is = new InputSource(new FileInputStream("test.xml"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String words= xpath.evaluate("/foo/bar", doc.getDocumentElement());
ParsingHexToChar.parseToChar(words);
// lib which i use common-lang3.jar
//metod to parse
public static String parseToChar( String words){
String decode= org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(words);
return decode;
}

Try this using org.apache.commons package :
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
InputStream in = new FileInputStream(xmlfile);
String unescapeHtml4 = IOUtils.toString(in);
CharSequenceTranslator obj = new AggregateTranslator(new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE())
);
unescapeHtml4 = obj.translate(unescapeHtml4);
StringReader readerInput= new StringReader(unescapeHtml4);
InputSource is = new InputSource(readerInput);
Document doc = parser.parse(is);

Creating XML from XPATH

I am a new bee with xml and XSL, work with legacy platforms...
I am looking for a solution to create XML from XPATH. Happen to see this post How to Generate an XML File from a set of XPath Expressions?
which helped me a lot.
Similar to the request discussed under "Comments" section, I am trying to pass whole XSLT as string, and receiving result as a sting back using Saxon. Receiving result as string, no issues. But when passing XSL as string, it complain about "document()" which is part of < xsl:variable name="vStylesheet" select="document('')" />. Error is "SXXP0003: Error reported by XML parser: Content is not allowed in prolog."
My basic requirement is I should be able to pass XSL (whole file or "vPop" portion) as a string and should receive result in another string without involving any files. That way I can improve the performance and make it generic so that anyone in our shop who does not know how to deal with XML can still generate one...
My java code looks like..
public static String simpleTransform(final String xsltStr) {
String strResult = "";
TransformerFactory tFactory = TransformerFactory.newInstance();
String tempXMLStr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<OpnXMLTAG>Dummy</OpnXMLTAG>";
try {
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
StreamSource XSLSource = new StreamSource(new StringReader(xsltStr));
Transformer transformer = tFactory.newTransformer(XSLSource );
transformer.transform(new StreamSource(new StringReader(tempXMLStr)), result);
strResult = writer.toString();
} catch (Exception e) {
e.printStackTrace();
}
return strResult;
}
And XSLT string I am passing is the same from earlier post.

When you call document(''), it treats '' as a relative URI reference to be resolved against the base URI of the stylesheet. This won't work if the base URI of the stylesheet is unknown, which is the case when you supply it as a StreamSource wrapping a StringReader with no systemId.
In the case of Saxon, document('') actually needs to re-read the stylesheet. It doesn't keep the source file around at run-time just in case it's needed. So you'll need to (a) supply a URI as the systemId property on the StreamSource (any URI will do, it won't actually be read), and (b) supply a URIResolver to resolve the call on document('') and supply the original string.

StAX XML formatting in Java

Is it possible using StAX (specifically woodstox) to format the output xml with newlines and tabs, i.e. in the form:
<element1>
<element2>
someData
</element2>
</element1>
instead of:
<element1><element2>someData</element2></element1>
If this is not possible in woodstox, is there any other lightweight libs that can do this?

There is com.sun.xml.txw2.output.IndentingXMLStreamWriter
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLStreamWriter writer = new IndentingXMLStreamWriter(xmlof.createXMLStreamWriter(out));

Using the JDK Transformer:
public String transform(String xml) throws XMLStreamException, TransformerException
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
Writer out = new StringWriter();
t.transform(new StreamSource(new StringReader(xml)), new StreamResult(out));
return out.toString();
}

Via the JDK: transformer.setOutputProperty(OutputKeys.INDENT, "yes");.

If you're using the StAX cursor API, you can indent the output by wrapping the XMLStreamWriter in an indenting proxy. I tried this in my own project and it worked nicely.

Rather than relying on a com.sun...class that might go away (or get renamed com.oracle...class), I recommend downloading the StAX utility classes from java.net. This package contains a IndentingXMLStreamWriter class that works nicely. (Source and javadoc are included in the download.)

How about StaxMate:
http://www.cowtowncoder.com/blog/archives/2006/09/entry_21.html
Works well with Woodstox, fast, low-memory usage (no in-memory tree built), and indents like so:
SMOutputFactory sf = new SMOutputFactory(XMLOutputFactory.newInstance());
SMOutputDocument doc = sf.createOutputDocument(new FileOutputStream("output.xml"));
doc.setIndentation("\n ", 1, 2); // for unix linefeed, 2 spaces per level
// write doc like:
SMOutputElement root = doc.addElement("element1");
root.addElement("element2").addCharacters("someData");
doc.closeRoot(); // important, flushes, closes output

If you're using the iterating method (XMLEventReader), can't you just attach a new line '\n' character to the relevant XMLEvents when writing to your XML file?

Not sure about stax, but there was a recent discussion about pretty printing xml here
pretty print xml from java
this was my attempt at a solution
How to pretty print XML from Java?
using the org.dom4j.io.OutputFormat.createPrettyPrint() method

if you are using XMLEventWriter, then an easier way to do that is:
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLEventWriter writer = outputFactory.createXMLEventWriter(w);
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
Characters newLine = eventFactory.createCharacters("\n");
writer.add(startRoot);
writer.add(newLine);

With Spring Batch this requires a subclass since this JIRA BATCH-1867
public class IndentingStaxEventItemWriter<T> extends StaxEventItemWriter<T> {
#Setter
#Getter
private boolean indenting = true;
#Override
protected XMLEventWriter createXmlEventWriter( XMLOutputFactory outputFactory, Writer writer) throws XMLStreamException {
if ( isIndenting() ) {
return new IndentingXMLEventWriter( super.createXmlEventWriter( outputFactory, writer ) );
}
else {
return super.createXmlEventWriter( outputFactory, writer );
}
}
}
But this requires an additionnal dependency because Spring Batch does not include the code to indent the StAX output:
<dependency>
<groupId>net.java.dev.stax-utils</groupId>
<artifactId>stax-utils</artifactId>
<version>20070216</version>
</dependency>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java program to convert a Text file to XML using XSL - java

You can't feed plain text to an XSL transformer. It only accepts well-formed XML as input. So the code in the linked question starts the transformer with no input and then inside of XSLT, it loads the text with <xsl:variable name="csv" select="unparsed-text($pathToCSV, $encoding)" />

Related

Which methods can be used to return valid and invalid XML data from a file in Java?

How to append XML (String) to XmlEventWriter (StAX) which already created the start of document

Parsing XML file containing HTML entities in Java without changing the XML

Creating XML from XPATH

StAX XML formatting in Java

Categories

Resources