I need to transform a DOMSource into a StreamSource, because a third-party library only accepts stream sources for SOAP.
Performance is not so much of an issue in this case, so I came up with this horribly verbose set of commands:
DOMSource src = new DOMSource(document);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
StreamResult result = new StreamResult();
ByteArrayOutputStream out = new ByteArrayOutputStream();
result.setOutputStream(out);
transformer.transform(src, result);
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
StreamSource streamSource = new StreamSource(in);
Isn't there a simpler way to do this?
This is as good a way as any. Because your third party library only accepts XML in lexical form, you have no alternative but to serialize the DOM so that the external library can re-parse it. Stupid design - tell them so.
Related
I have a xml file as object in Java as org.w3c.dom.Document doc and I want to convert this into File file. How can I convert the type Document to File?
thanks
I want to add metadata elements in an existing xml file (standard dita) with type File.
I know a way to add elements to the file, but then I have to convert the file to a org.w3c.dom.Document. I did that with the method loadXML:
private Document loadXML(File f) throws Exception{
DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
return builder.parse(f);
After that I change the org.w3c.dom.Document, then I want to continue with the flow of the program and I have to convert the Document doc back to a File file.
What is a efficient way to do that? Or what is a better solution to get some elements in a xml File without converting it?
You can use a Transformer class to output the entire XML content to a File, as showed below:
Document doc =...
// write the content into xml file
DOMSource source = new DOMSource(doc);
FileWriter writer = new FileWriter(new File("/tmp/output.xml"));
StreamResult result = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(source, result);
With JDK 1.8.0 a short way is to use the built-in XMLSerializer (which was introduced with JDK 1.4 as a fork of Apache Xerces)
import com.sun.org.apache.xml.internal.serialize.XMLSerializer;
Document doc = //use your method loadXML(File f)
//change Document
java.io.Writer writer = new java.io.FileWriter("MyOutput.xml");
XMLSerializer xml = new XMLSerializer(writer, null);
xml.serialize(doc);
Use an object of type OutputFormat to configure output, for example like this:
OutputFormat format = new OutputFormat(Method.XML, StandardCharsets.UTF_8.toString(), true);
format.setIndent(4);
format.setLineWidth(80);
format.setPreserveEmptyAttributes(true);
format.setPreserveSpace(true);
XMLSerializer xml = new XMLSerializer(writer, format);
Note that the classes are from com.sun.* package which is not documented and therefore generally is not seen as the preferred way of doing things. However, with javax.xml.transform.OutputKeys you cannot specify the amount of indentation or line width for example. So, if this is important then this solution should help.
I am trying to read a XML file using FileInputStreamReader class. But, when I try to read large XML files, some problems occur. Which java class is more suitable to read large XML files and what are the most efficient parsers to parse XML files?
I think that DOM parsing is a good way to parse XML files
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.parse(yourFile);
in this way you start the parsing of your XML document, then you can modify the nodes and change what you want to change
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
DOMSource source = new DOMSource(yourDoc);
StreamResult result = new StreamResult(yourFile);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
And this second parte is to save all your changes. The "setOutputPropery" method is not mandatory, it's only used to give to the XML file a nice indent.
Take a look at this website:
http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain
It constantly streams data. You can transform this content using the newest version of XSLT (v3), with a command like this:
<xsl:stream href="http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain">
If I want to write some Java code to initiate the transformation (using Saxon, which has implemented xsl:stream), I can do this:
// XSL
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new FileInputStream(xslFile)));
// XML
StreamSource xmlSource = new StreamSource(new FileInputStream(xmlFile));
// Output
MyCustomContentHandler handler = new MyCustomContentHandler();
PrintStream outputPrintStream = new PrintStream(new BufferedOutputStream(new FileOutputStream(outputFile)), true);
handler.setPrintStream(outputPrintStream);
Result result = new SAXResult(handler);
// Transform
transformer.transform(xmlSource, result);
This works. If you let it run for a bit, then open the output file, you’ll see data in it. If you re-open it a bit later, you’ll see even more data. The key to this is the custom content handler that processes the various SAX events.
But suppose that I don’t really want a custom content handler. Suppose I just want to keep the output of the XSLT as is. I can modify my code like this:
// XSL
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new FileInputStream(xslFile)));
// XML
StreamSource xmlSource = new StreamSource(new FileInputStream(xmlFile));
// Output
TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
transformerHandler.setResult(new StreamResult(new PrintWriter(new FileOutputStream(outputFile, true), true)));
// or this…
//transformerHandler.setResult(new StreamResult(new FileOutputStream(outputFile)));
// or this…
//transformerHandler.setResult(new StreamResult(new FileWriter(outputFile)));
ContentHandler contentHandler = (ContentHandler) transformerHandler;
SAXResult result = new SAXResult(transformerHandler);
// Transform
transformer.transform(xmlSource, result);
The good news is that I no longer need a custom content handler, and my output now matches the output of the XSLT exactly. The bad news is that although this code works with non-streaming XSLT, it does not work with streaming XSLT. Despite my various attempts at setting the result (see the “or this…” statements above), nothing is written to the file. I suspect there’s a buffering problem of some sort.
Question: How can I combine the best of these two together? How can I transform a streaming XSLT without having to use a custom content handler?
This seems to be a rerun of a thread on the saxon-help list in June:
http://sourceforge.net/p/saxon/mailman/message/32472658/
The conclusion there was that the output was somehow being buffered in the output stream pipeline. Saxon is emitting events representing the transformation result, as you see by supplying a ContentHandler, but the serialization of these events is being buffered in the I/O system.
At this time, it does not appear to be possible to do what I want to do. My current solution is to use a custom content handler (per my question above) and run its results through a standard XSLT identity transformation. A bit ugly and not very efficient, but it works.
How to convert doc or docx into HTML in Java. Using Apache POI, I was able to convert doc to html but unable to convert docx into html? Please show me sample code? This code work with doc but not docx.
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(stream);
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();
String result = new String(out.toByteArray());
There is no reason why this shouldn't / can't work.
Please review the following:
How to extract plain text from a DOCX file using the new OOXML support in Apache POI 3.5?
https://stackoverflow.com/a/5507019/751158
In short, make sure you're using an up-to-date version of POI, and have all of the required libraries.
(If you need additional assistance, please explain what isn't working. Are you getting compile-time errors? Run-time errors? Unexpected output?)
Should be easy and obvious but I cant find a way - the XMLOutputFactory accepts anly OutputStream, Result or another Writer to generate a new XMLStreamWriter. What I have at hand is an XMLStreamReader which has no methods for extracting a Result or an OutputStream.
If the solution would be easier using the Event API, that would be OK too.
Thank you
You could use a javax.xml.transform.Transformer to convert a StAXSource wrapping the reader to a StAXResult wrapping the writer.
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
StAXSource source = new StAXSource(xmlStreamReader);
StAXResult result = new StAXResult(xmlStreamWriter);
t.transform(source, result);
Using the Event API you could also use the folloiwng:
http://download.oracle.com/javase/6/docs/api/javax/xml/stream/XMLEventWriter.html#add(javax.xml.stream.XMLEventReader)