I have a custom XML which i have to convert into Microsoft Word Doc/Docx. There are many Java APIs like Apache POI, docx4j but i couldn't find any support for creating a word document using XSLT.
Docx4j definitely supports for conversion of docx to xml but what about the way around?
I googled and found WordML
So i thought of using XSLT to first convert my custom xml to WordML format and then later on convert WordML to doc/docx.
Custom XML----> WordML ------> Docx/Doc
My questions are :
how would i convert WordML to docx/doc (an open source method)
is it possible to achieve toc linking in WordML
Kindly guide me through if am doing it correctly or suggest a way to do it. Thanks.
Related
I was wondering if there is any Java API which allows to generate Word document similarly Apache FOP does.
With FOP, it is possible to specify a style sheet which defines the layout of the page in which the data (stored in an xml file) are printed.
The Transformer object within FOP library is in charge of that.
Is there any equivalent API for word document?
With FOP, you can try XML to RTF, which Word accepts.
From their webpage, XMLmind XSL-FO Converter apparently generates:
RTF (can be opened in Word 2000+),
WordprocessingML (can be opened inWord 2003+),
Office Open XML (.docx, can be opened in Word 2007+),
Putting FO to one side, here are 2 different approaches:
The first would be to write an XSLT to convert your XML to Flat OPC XML. Most parts in the Flat OPC XML would simply be copied there by your XSLT. (Generate that template content in Word, using "save as XML"). You'll be focusing mainly on populating the document.xml part. Word can open a Flat OPC XML file, or you can use docx4j (a project I work on) to convert Flat OPC XML to docx.
The second would be to use the docx4j Flying Saucer fork to convert your XML + CSS to docx content. See the code samples. You may need to customise it a bit; one way of feeding it CSS is this file. This actually ought to work pretty well; there is stuff there for mapping class attributes to Word styles, so if you could adorn your XML with class attributes, you could get even better results.
I will assume that your input is an XML document, or at least a CSV file.
1) Create an XSLT stylesheet to transform your input into the Word document format. The result will be a file we will call content.xml. You can apply the stylesheet to the input from Java.
2) Create a MS Word shell and put the content.xml into the shell. There are tools within Apache POI to do this.
I may have taken your question too literally. You might also be able to generate the document using Apache POI API. Also, if your MS Word document doesn't have tables, you can use Apache FOP to generate an RTF document, which can then be easily translated to a .docx file.
Apache POI provides Java APIs for reading and writing Microsoft Excel, Word, and PowerPoint files.
You can checkout POI's Javadocs here.
I have a Java web application that generates an MS Word document in the WordML format (a single XML file in Word 2003 XML format with a .xml file extension). I would like to automatically convert this into the newer Office Open XML format so that the document could be saved as a .docx file (which in essence is a zip file containing multiple XML files).
This has to be fully automated, and cannot require the user to download the file and convert it manually. Furthermore, the user cannot be assumed to have MS Word installed (they could be using LibreOffice instead).
I have been looking for a Java library I could use to do this, but couldn't find any that converts .xml to .docx. The only converter I could find was JODconverter but it doesn't support conversion from .xml to .docx.
Is there a Java library that could do this sort of conversion? Or maybe should I be looking for a non-Java solution? Maybe a Python module could do this? (For example a Python script could take the files generated by the Java app and convert them do .docx.)
If you can't modify your app to emit Flat OPC XML, you could write an XSLT to convert from Word 2003 XML format to Flat OPC XML. They are quite similar.
Then, docx4j (disclosure: I maintain this) supports Flat OPC XML to docx.
I believe I will have to use an XSLFO stylesheet document for the XMLs I use to convert to PDF. And then I would need to use the Transform API of Java to convert XML to PDF.
The XML data can be read using a parser and directly be converted to PDF (XML -> PDF) using a library like iText instead of using complex conversions (XML -> XSLT -> XSLFO -> PDF).
Check out Apache FOP. We use this at work and it's worked well for us. http://xmlgraphics.apache.org/fop/quickstartguide.html
I want some API's and some Document So that i can convert any file into PDF..
The file may be Doc , exl, ppt ..etc .
My requirement is, i have a file EX:- Doc file and i just wants to convert it into PDF.. using java .
Any suggestion will be helpful...
I would recommend you taking a look at Flying Saucer (former xhtmlrenderer) which makes creating PDF files extremely easy from XML and HTML files (internally it uses iText).
HTML/XML can be used as a intermediate format making this a quite flexible solution.
Use
http://pdfbox.apache.org/
and
http://poi.apache.org/
If you want to generate a PDF from an XML document, you can try Apache FOP, which follows the XSL-FO standard.
http://xmlgraphics.apache.org/fop/
So a smart process could be: extract data from your various document formats using POI, odftoolkit (for OenDocument) or other tools, inject them into an XML container, and then translate them into PDF using FOP.
Apache's poi API is best for convert any file to pdf
You can use Itext . It is well documented and comes with ton of examples.
JPedal library in java is usually used to convert pdf to XML or HTML. However, I needed to know if we could extract data from HTML5 document and save it to XML using JPedal library API?
Is there any other possible alternative to this?
Also , I am trying to parse HTML5 document using Java and store it in XML. are there any good solutions to find just specific tags and render an XML out of them?
Please do let me know . Thank you.
There are a number of Java HTML parsers out there, but I recommend using the HTML5 parser from validator.nu available for download from here: http://about.validator.nu/htmlparser/.
Written to use the HTML5 parser algorithm by one of the main protagonists of HTML5, Henri Sivonen of Mozilla, you won't find a more reliable HTML parser and it creates a true DOM that can be manipulated using standard XML tools and queried for hyperlinks using XPath. There are examples of how to use XSLT transformations with it and how to get an XML serialization of the created DOM.