xsl stylesheet for generating word documents from Java - java

I was wondering if there is any Java API which allows to generate Word document similarly Apache FOP does.
With FOP, it is possible to specify a style sheet which defines the layout of the page in which the data (stored in an xml file) are printed.
The Transformer object within FOP library is in charge of that.
Is there any equivalent API for word document?

With FOP, you can try XML to RTF, which Word accepts.
From their webpage, XMLmind XSL-FO Converter apparently generates:
RTF (can be opened in Word 2000+),
WordprocessingML (can be opened inWord 2003+),
Office Open XML (.docx, can be opened in Word 2007+),
Putting FO to one side, here are 2 different approaches:
The first would be to write an XSLT to convert your XML to Flat OPC XML. Most parts in the Flat OPC XML would simply be copied there by your XSLT. (Generate that template content in Word, using "save as XML"). You'll be focusing mainly on populating the document.xml part. Word can open a Flat OPC XML file, or you can use docx4j (a project I work on) to convert Flat OPC XML to docx.
The second would be to use the docx4j Flying Saucer fork to convert your XML + CSS to docx content. See the code samples. You may need to customise it a bit; one way of feeding it CSS is this file. This actually ought to work pretty well; there is stuff there for mapping class attributes to Word styles, so if you could adorn your XML with class attributes, you could get even better results.

I will assume that your input is an XML document, or at least a CSV file.
1) Create an XSLT stylesheet to transform your input into the Word document format. The result will be a file we will call content.xml. You can apply the stylesheet to the input from Java.
2) Create a MS Word shell and put the content.xml into the shell. There are tools within Apache POI to do this.
I may have taken your question too literally. You might also be able to generate the document using Apache POI API. Also, if your MS Word document doesn't have tables, you can use Apache FOP to generate an RTF document, which can then be easily translated to a .docx file.

Apache POI provides Java APIs for reading and writing Microsoft Excel, Word, and PowerPoint files.
You can checkout POI's Javadocs here.

Related

Convert the RTF to PDF using JAVA which reads the tables in rtf document

What ways are there to convert an RTF to PDF that contains a table in the document in Windows or Unix using Java?
The option we have tried here are:
ITEXT - But the table inside the rtf document is not coming properly once converted to PDF. In short the PDF doesn't contain the Table. Here is the code gist. ITEXT for rtf to pdf java code gist
POI - Does apache POI support RTF document parsing? But I found that it is not supported. POI support for RTF
TIKA - Using Tika I am able to read the document, but the table in RTFis not parsed correctly and I don't know how to convert it to PDF. TIKA java code for reading rtf
We have looked into other options. Is possible to develop or convert RTF to PDF with Java?
Other options we looked into are in this link
Yes, its possible. Take a look at JasperReports!
http://community.jaspersoft.com/project/jasperreports-library
There is also a good API available from Jaspersoft to code your custom PDF-engine and your custom datasource. Start with iReport (UI-Editor).

Generate PDF files using iText and apache velocity template(.vm)

What is the general workflow to generate a PDF using iText and an Apache Velocity template file (.vm) in Java?
I am interested in knowing steps like: parse template file, put Java object in context and steps to be performed to generate pdf etc.
I know this is a very basic question. But I am not able to find even a single example of this type on the web. I found XDocReport, but I am interested to know other alternatives as well.
Please help me with some sample project link or at least the steps to get started.
Yes, you can.
It all depends on how complex you want the PDFs to be.
Here are the steps for basic functionality
Generate a HTML file using Apache Velocity template file (.vm).
Use com.itextpdf.text.html.simpleparser.HTMLWorker (deprecated) to parse/convert that HTML file into a PDF.
Additionally, you can use com.itextpdf.text.pdf.PdfCopy.PageStamp to add content (borders, stamps, notes, annotations etc) to an existing PDF.
There is also com.itextpdf.tool.xml.XMLWorker for more advanced HTML conversion (adding style sheets etc)
Generating PDF using iText and an Apache Velocity template file (.vm) in Java directly is not possible because:
PDF is binary format,
Velocity generates plain text content.
On other words, Velocity cannot generate PDF.
XDocReport is able to generate a docx/odt report by merging a docx/odt template which contains some Velocity/Freemarker syntax with Java context. The generated docx/odt report can be convert it to pdf/xhtml.
It works because docx/odt are a zip which contains several xml entries. If you unzip a docx you will see word/document.xml. In this entry, you will see the content that you have typed with MS Word. word/document.xml is a plain text, so Velocity can be used in this case.
Here the XDocReport process to generate pdf from a docx template which uses Velocity:
Load docx template. this step consist to unzip the docx and stores in a map each xml entries (name entry as key and byte array as value). For instance map contains a key with word/document.xml and the xml content of this entry as value.
Loop for each xml entries which must be merged with Java context. For instance word/document.xml is merged with Java context by using Velocity and the result of merge replace the word/document.xml value of the map
Rebuild a new docx by zipping each entries of the map.
At this step we have a generated docx (the report).
To convert it to another format, XDocReport provides a docx-to-pdf converter based on Apache POI and iText. Here the XDocReport process to convert a docx to pdf:
Load docx with Apache POI
Loop for each structures of POI (XWPFParagraph, etc.) to create iText structure (iText Paragraph).
Note that XDocReport is modular and you can use other converters as well.
At first,we use freemarker template to generate a html file,and then render html to a pdf file by IItextRender .Finally, we can view pdf file in browser,there has a very useful javascript tools called pdfjs. Maybe you can try it.

How to convert WordML to Office Open XML in a Java web application?

I have a Java web application that generates an MS Word document in the WordML format (a single XML file in Word 2003 XML format with a .xml file extension). I would like to automatically convert this into the newer Office Open XML format so that the document could be saved as a .docx file (which in essence is a zip file containing multiple XML files).
This has to be fully automated, and cannot require the user to download the file and convert it manually. Furthermore, the user cannot be assumed to have MS Word installed (they could be using LibreOffice instead).
I have been looking for a Java library I could use to do this, but couldn't find any that converts .xml to .docx. The only converter I could find was JODconverter but it doesn't support conversion from .xml to .docx.
Is there a Java library that could do this sort of conversion? Or maybe should I be looking for a non-Java solution? Maybe a Python module could do this? (For example a Python script could take the files generated by the Java app and convert them do .docx.)
If you can't modify your app to emit Flat OPC XML, you could write an XSLT to convert from Word 2003 XML format to Flat OPC XML. They are quite similar.
Then, docx4j (disclosure: I maintain this) supports Flat OPC XML to docx.

How to convert xsl-fo to docx (Office Open XML) in Java?

I'm looking for an open-source or commercial friendly library in Java to convert xsl-fo to docx (Office Open XML) format.
I'm planing to use xsl-fo to produce pdf documents (with Apache FOP), so I thought generating Word documents (docx) out of the same source XML could be a good idea.
UPDATE: I forgot to mention that I'm using Java.
Alternatively, you could do: your source xml -> docx -> xsl-fo -> pdf.
or easier perhaps: source xml -> Flat OPC XML -> xsl-fo -> pdf.
Once you have a docx (or a Flat OPC XML document), transforming that to PDF via FOP is easy with docx4j (since you mention FOP, I'm assuming Java is ok for you).
The benefit of this approach is that you style your output docx as desired, and get the xsl fo "for free".
Flat OPC XML is convenient, because it is docx as a single XML file (ie no need to unzip). So you can create it easily via XSLT. To see it, create a document in Word 2007, and choose "save as .. xml".

Parse Pdf File and write content in word file using java

how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.

Categories

Resources