Creating XML from XPATH - java

I am a new bee with xml and XSL, work with legacy platforms...
I am looking for a solution to create XML from XPATH. Happen to see this post How to Generate an XML File from a set of XPath Expressions?
which helped me a lot.
Similar to the request discussed under "Comments" section, I am trying to pass whole XSLT as string, and receiving result as a sting back using Saxon. Receiving result as string, no issues. But when passing XSL as string, it complain about "document()" which is part of < xsl:variable name="vStylesheet" select="document('')" />. Error is "SXXP0003: Error reported by XML parser: Content is not allowed in prolog."
My basic requirement is I should be able to pass XSL (whole file or "vPop" portion) as a string and should receive result in another string without involving any files. That way I can improve the performance and make it generic so that anyone in our shop who does not know how to deal with XML can still generate one...
My java code looks like..
public static String simpleTransform(final String xsltStr) {
String strResult = "";
TransformerFactory tFactory = TransformerFactory.newInstance();
String tempXMLStr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<OpnXMLTAG>Dummy</OpnXMLTAG>";
try {
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
StreamSource XSLSource = new StreamSource(new StringReader(xsltStr));
Transformer transformer = tFactory.newTransformer(XSLSource );
transformer.transform(new StreamSource(new StringReader(tempXMLStr)), result);
strResult = writer.toString();
} catch (Exception e) {
e.printStackTrace();
}
return strResult;
}
And XSLT string I am passing is the same from earlier post.

When you call document(''), it treats '' as a relative URI reference to be resolved against the base URI of the stylesheet. This won't work if the base URI of the stylesheet is unknown, which is the case when you supply it as a StreamSource wrapping a StringReader with no systemId.
In the case of Saxon, document('') actually needs to re-read the stylesheet. It doesn't keep the source file around at run-time just in case it's needed. So you'll need to (a) supply a URI as the systemId property on the StreamSource (any URI will do, it won't actually be read), and (b) supply a URIResolver to resolve the call on document('') and supply the original string.

Related

How to append XML (String) to XmlEventWriter (StAX) which already created the start of document

In order to create big XML files, we decided to make use of the StAX API. The basic structure is build by using the low-level api's: createStartDocument(), createStartElement(). This works as expected.
However, in some cases we like to append existing XML data which resides in a String (retrieved from database). The following snippet illustrates this:
import java.lang.*;
import java.io.*;
import javax.xml.stream.*;
public class Example {
public static void main(String... args) throws XMLStreamException {
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
StringWriter writer = new StringWriter();
XMLEventWriter eventWriter = outputFactory.createXMLEventWriter(writer);
eventWriter.add(eventFactory.createStartDocument("UTF-8", "1.0"));
eventWriter.add(eventFactory.createStartElement("ns0", "http://example.org", "root"));
eventWriter.add(eventFactory.createNamespace("ns0", "http://example.org"));
//In here, we want to append a piece of XML which is stored in a string variable.
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
eventWriter.add(inputFactory.createXMLEventReader(new StringReader(xml)));
eventWriter.add(eventFactory.createEndDocument());
System.out.println(writer.toString());
}
}
With the above code, depending on the implementation, we are not getting the expected result:
Woodstox: The following exception is thrown:'Can not output XML declaration, after output has already been done'. It seems that the XMLEventReader starts off with a startDocument event, but since a startDocument event was already triggered programatically, it throws the error.
JDK: It appends <?xml version="1.0" ... <fragments><fragment>... -> Which leads to invalid XML.
I have also tried to append the XML by using:
eventFactory.createCharacters(xml);
The problem here is that even though the XML is appended, the < and > are transformed into &lt and &gt. Therefore, this results in invalid XML.
Am I missing an API that allows me to simply append a String as XML?
You can first consume any StartDocument if necessary:
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
XMLEventReader xer = inputFactory.createXMLEventReader(new StringReader(xml));
if (xer.peek().isStartDocument())
{
xer.nextEvent();
}
eventWriter.add(xer);

Marshalling CDATA elements with CDATA_SECTION_ELEMENTS adds carriage return characters

I'm working on an application that exports and imports data from / to a DB. The format of the data extract is XML and I'm using JAXB for the serialization / (un)marshalling. I want some elements to be marshalled as CDATA elements and am using this solution which sets OutputKeys.CDATA_SECTION_ELEMENTS to the Transformer properties.
So far this was working quite well, but now I came to a field in the DB that itself contains an XML string, which I also would like to place inside of a CDATA element. Now, for some reason the Transformer is now adding some unnecessary carriage return characters (\r) to each line end, so that the output looks like this:
This is my code:
private static final String IDENT_LENGTH = "3";
private static final String CDATA_XML_ELEMENTS = "text definition note expression mandatoryExpression optionalExpression settingsXml";
public static void marshall(final Object rootObject, final Schema schema, final Writer writer) throws Exception {
final JAXBContext ctx = JAXBContext.newInstance(rootObject.getClass());
final Document document = createDocument();
final Marshaller marshaller = ctx.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setSchema(schema);
marshaller.marshal(rootObject, document);
createTransformer().transform(new DOMSource(document), new StreamResult(writer));
}
private static Document createDocument() throws ParserConfigurationException {
// the DocumentBuilderFactory is actually being hold in a singleton
final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
return builderFactory.newDocumentBuilder().newDocument();
}
private static Transformer createTransformer() throws TransformerConfigurationException, TransformerFactoryConfigurationError {
// the TransformerFactory is actually being hold in a singleton
final TransformerFactory transformerFactory = TransformerFactory.newInstance();
final Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, CDATA_XML_ELEMENTS);
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", IDENT_LENGTH);
return transformer;
}
I'm passing a FileWriter to the marshall method.
My annotated model class looks like this:
#XmlType
#XmlRootElement
public class DashboardSettings {
#XmlElement
private String settingsXml;
public String getSettingsXml() {
return settingsXml;
}
public void setSettingsXml(final String settingsXml) {
this.settingsXml = settingsXml;
}
}
NOTE:
The XML string coming from the DB has Windows style line endings, i.e. \r and \n. I have the feeling that JAXB expects currently Linux style input (i. e. only \n) and is therefore adding a \r character because I'm currently running on a Windows machine. So the question is actually, what's the best way to solve this? Is there any parameter I can pass to control the line ending characters when marshalling? Or should I convert the line endings to Linux style prior marshalling? How will my program behave on different platforms (Windows, Linux, Mac OS)?
I don't necessarily need a platform independent solution, it's OK if the output is in Windows, Linux or whatever style. What I want to avoid is the combination \r\r\n as shown in the above screenshot.
I realise this question is pretty old, but I ran into a similar problem, so maybe an answer can help someone else.
It seems to be an issue with CDATA sections. In my case, I was using the createCDATASection method to create them. When the code was running on a Windows machine, an additional CR was added, as in your example.
I've tried a bunch of things to solve this "cleanly", to no avail.
In my project, the XML document was then exported to a string to POST to a Linux server. So once the string was generated, I just removed the CR characters, leaving only the LF:
myXmlString.replaceAll("\\r", "");
I might not be an appropriate solution for the specific question, but once again, it may help you (or someone else) find a solution.
Note: I'm stuck with Java 7 for this specific project, so it may have been fixed in a more recent version.

XML parsing java confirmation

this is the way i made an XML file to a Java object(s).
i used "xjc" and a valid XML schema, and i got back some "generated" *.java files.
i imported them into a different package in eclipse.
I am reading the XML file in 2 way now.
1) Loading the XML file:
System.out.println("Using FILE approach:");
File f = new File ("C:\\test_XML_files\\complex.apx");
JAXBElement felement = (JAXBElement) u.unmarshal(f);
MyObject fmainMyObject = (MyObject) felement.getValue ();
2) Using a DOM buider:
System.out.println("Using DOM BUILDER Approach:");
JAXBElement element = (JAXBElement) u.unmarshal(test());;
MyObject mainMyObject = (MyObject ) element.getValue ();
now in method "test()" the code below is included:
public static Node test(){
Document document = parseXmlDom();
return document.getFirstChild();
}
private static Document parseXmlDom() {
Document document = null;
try {
// getting the default implementation of DOM builder
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
// parsing the XML file
document = builder.parse(new File("C:\\test_XML_files\\MyXML_FILE.apx"));
} catch (Exception e) {
// catching all exceptions
System.out.println();
System.out.println(e.toString());
}
return document;
}
is this the standard way of doing XML to an Java Object?
I tested if I could access the object and everything works fine. (so far)
Do you suggest a different approach?? or is this one sufficient?
I don't know about a "standard way", but either way looks OK to me. The first way looks simpler ( less code ) so that's the way I'd probably do it, unless there were other factors / requirements.
FWIW, I'd expect that the unmarshal(File) method was implemented to do pretty much what you are doing in your second approach.
However, it is really up to you (and your co-workers) to make judgments about what is "sufficient" for your project.

Tagsoup fails to parse html document from a StringReader ( java )

I have this function:
private Node getDOM(String str) throws SearchEngineException {
DOMResult result = new DOMResult();
try {
XMLReader reader = new Parser();
reader.setFeature(Parser.namespacesFeature, false);
reader.setFeature(Parser.namespacePrefixesFeature, false);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new SAXSource(reader,new InputSource(new StringReader(str))), result);
} catch (Exception ex) {
throw new SearchEngineException("NukatSearchEngine.getDom: " + ex.getMessage());
}
return result.getNode();
}
It takes a String that contains the html document sent by the http server after a POST request, but fails to parse it properly - I only get like four nodes from the entire document. The string itself looks fine - if I print it out and copypasta it into a text document I see the page I expected.
When I use an overloaded version of the above method:
private Node getDOM(URL url) throws SearchEngineException {
DOMResult result = new DOMResult();
try {
XMLReader reader = new Parser();
reader.setFeature(Parser.namespacesFeature, false);
reader.setFeature(Parser.namespacePrefixesFeature, false);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new SAXSource(reader, new InputSource(url.openStream())), result);
} catch (Exception ex) {
throw new SearchEngineException("NukatSearchEngine.getDom: " + ex.getMessage());
}
return result.getNode();
}
then everything works just fine - I get a proper DOM tree, but I need to somehow retrieve the POST answer from server.
Storing the string in a file and reading it back does not work - still getting the same results.
What could be the problem?
Is it maybe a problem with the xml encoding?
This seems like an encoding problem. In the code example of yours that doesn't work you're passing the url as a string into the constructor, which uses it as the systemId, and you get problems with Tagsoup parsing the html. In the example that works you're passing the stream in to the InputSource constructor. The difference is that when you pass in the stream then the SAX implementation can figure out the encoding from the stream.
If you want to test this you could try these steps:
Stream the html you're parsing through a java.io.InputStreamReader and call getEncoding on it to see what encoding it detects.
In your first example code, call setEncoding on the InputSource passing in the encoding that the inputStreamReader reported.
See if the first example, changed to explicitly set the encoding, parses the html correctly.
There's a discussion of this toward the end of an article on using the SAX InputSource.
To get a POST response you first need to do a POST request, new InputSource(url.openStream()) probably opens a connection and reads the response from a GET request. Check out Sending a POST Request Using a URL.
Other possibilities that might be interesting to check out for doing POST requests and getting the response:
Jersey Web Client
HtmlUnit

StAX XML formatting in Java

Is it possible using StAX (specifically woodstox) to format the output xml with newlines and tabs, i.e. in the form:
<element1>
<element2>
someData
</element2>
</element1>
instead of:
<element1><element2>someData</element2></element1>
If this is not possible in woodstox, is there any other lightweight libs that can do this?
There is com.sun.xml.txw2.output.IndentingXMLStreamWriter
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLStreamWriter writer = new IndentingXMLStreamWriter(xmlof.createXMLStreamWriter(out));
Using the JDK Transformer:
public String transform(String xml) throws XMLStreamException, TransformerException
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
Writer out = new StringWriter();
t.transform(new StreamSource(new StringReader(xml)), new StreamResult(out));
return out.toString();
}
Via the JDK: transformer.setOutputProperty(OutputKeys.INDENT, "yes");.
If you're using the StAX cursor API, you can indent the output by wrapping the XMLStreamWriter in an indenting proxy. I tried this in my own project and it worked nicely.
Rather than relying on a com.sun...class that might go away (or get renamed com.oracle...class), I recommend downloading the StAX utility classes from java.net. This package contains a IndentingXMLStreamWriter class that works nicely. (Source and javadoc are included in the download.)
How about StaxMate:
http://www.cowtowncoder.com/blog/archives/2006/09/entry_21.html
Works well with Woodstox, fast, low-memory usage (no in-memory tree built), and indents like so:
SMOutputFactory sf = new SMOutputFactory(XMLOutputFactory.newInstance());
SMOutputDocument doc = sf.createOutputDocument(new FileOutputStream("output.xml"));
doc.setIndentation("\n ", 1, 2); // for unix linefeed, 2 spaces per level
// write doc like:
SMOutputElement root = doc.addElement("element1");
root.addElement("element2").addCharacters("someData");
doc.closeRoot(); // important, flushes, closes output
If you're using the iterating method (XMLEventReader), can't you just attach a new line '\n' character to the relevant XMLEvents when writing to your XML file?
Not sure about stax, but there was a recent discussion about pretty printing xml here
pretty print xml from java
this was my attempt at a solution
How to pretty print XML from Java?
using the org.dom4j.io.OutputFormat.createPrettyPrint() method
if you are using XMLEventWriter, then an easier way to do that is:
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLEventWriter writer = outputFactory.createXMLEventWriter(w);
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
Characters newLine = eventFactory.createCharacters("\n");
writer.add(startRoot);
writer.add(newLine);
With Spring Batch this requires a subclass since this JIRA BATCH-1867
public class IndentingStaxEventItemWriter<T> extends StaxEventItemWriter<T> {
#Setter
#Getter
private boolean indenting = true;
#Override
protected XMLEventWriter createXmlEventWriter( XMLOutputFactory outputFactory, Writer writer) throws XMLStreamException {
if ( isIndenting() ) {
return new IndentingXMLEventWriter( super.createXmlEventWriter( outputFactory, writer ) );
}
else {
return super.createXmlEventWriter( outputFactory, writer );
}
}
}
But this requires an additionnal dependency because Spring Batch does not include the code to indent the StAX output:
<dependency>
<groupId>net.java.dev.stax-utils</groupId>
<artifactId>stax-utils</artifactId>
<version>20070216</version>
</dependency>

Categories

Resources