How to convert xsl-fo to docx (Office Open XML) in Java? - java

I'm looking for an open-source or commercial friendly library in Java to convert xsl-fo to docx (Office Open XML) format.
I'm planing to use xsl-fo to produce pdf documents (with Apache FOP), so I thought generating Word documents (docx) out of the same source XML could be a good idea.
UPDATE: I forgot to mention that I'm using Java.

Alternatively, you could do: your source xml -> docx -> xsl-fo -> pdf.
or easier perhaps: source xml -> Flat OPC XML -> xsl-fo -> pdf.
Once you have a docx (or a Flat OPC XML document), transforming that to PDF via FOP is easy with docx4j (since you mention FOP, I'm assuming Java is ok for you).
The benefit of this approach is that you style your output docx as desired, and get the xsl fo "for free".
Flat OPC XML is convenient, because it is docx as a single XML file (ie no need to unzip). So you can create it easily via XSLT. To see it, create a document in Word 2007, and choose "save as .. xml".

Related

Having Freemarker-like templates for PDF generation (with itext)

In Java Project, i would like to use a Freemarker or something similar to it
(Quick start guide), but to generate a PDF file (textual), with IText for example.
Of cource, the workflow could be like that:
Template (Freemarker) -> Text (IText) -> PDF
...but I feel it is kind of a naive approach. I want to have in the PDF some formattings, tables, etc.
Anyone knows how to design it properly?
As PDF is a binary format, the best seems to generate some standard text format for which there exists a PDF conversion.
Texts --[FreeMarker]--> HTML or --[HTML-to-iText]--> PDF
DocBook XML
This also allows validation of the FreeMarker templates.

xsl stylesheet for generating word documents from Java

I was wondering if there is any Java API which allows to generate Word document similarly Apache FOP does.
With FOP, it is possible to specify a style sheet which defines the layout of the page in which the data (stored in an xml file) are printed.
The Transformer object within FOP library is in charge of that.
Is there any equivalent API for word document?
With FOP, you can try XML to RTF, which Word accepts.
From their webpage, XMLmind XSL-FO Converter apparently generates:
RTF (can be opened in Word 2000+),
WordprocessingML (can be opened inWord 2003+),
Office Open XML (.docx, can be opened in Word 2007+),
Putting FO to one side, here are 2 different approaches:
The first would be to write an XSLT to convert your XML to Flat OPC XML. Most parts in the Flat OPC XML would simply be copied there by your XSLT. (Generate that template content in Word, using "save as XML"). You'll be focusing mainly on populating the document.xml part. Word can open a Flat OPC XML file, or you can use docx4j (a project I work on) to convert Flat OPC XML to docx.
The second would be to use the docx4j Flying Saucer fork to convert your XML + CSS to docx content. See the code samples. You may need to customise it a bit; one way of feeding it CSS is this file. This actually ought to work pretty well; there is stuff there for mapping class attributes to Word styles, so if you could adorn your XML with class attributes, you could get even better results.
I will assume that your input is an XML document, or at least a CSV file.
1) Create an XSLT stylesheet to transform your input into the Word document format. The result will be a file we will call content.xml. You can apply the stylesheet to the input from Java.
2) Create a MS Word shell and put the content.xml into the shell. There are tools within Apache POI to do this.
I may have taken your question too literally. You might also be able to generate the document using Apache POI API. Also, if your MS Word document doesn't have tables, you can use Apache FOP to generate an RTF document, which can then be easily translated to a .docx file.
Apache POI provides Java APIs for reading and writing Microsoft Excel, Word, and PowerPoint files.
You can checkout POI's Javadocs here.

Generate PDF files using iText and apache velocity template(.vm)

What is the general workflow to generate a PDF using iText and an Apache Velocity template file (.vm) in Java?
I am interested in knowing steps like: parse template file, put Java object in context and steps to be performed to generate pdf etc.
I know this is a very basic question. But I am not able to find even a single example of this type on the web. I found XDocReport, but I am interested to know other alternatives as well.
Please help me with some sample project link or at least the steps to get started.
Yes, you can.
It all depends on how complex you want the PDFs to be.
Here are the steps for basic functionality
Generate a HTML file using Apache Velocity template file (.vm).
Use com.itextpdf.text.html.simpleparser.HTMLWorker (deprecated) to parse/convert that HTML file into a PDF.
Additionally, you can use com.itextpdf.text.pdf.PdfCopy.PageStamp to add content (borders, stamps, notes, annotations etc) to an existing PDF.
There is also com.itextpdf.tool.xml.XMLWorker for more advanced HTML conversion (adding style sheets etc)
Generating PDF using iText and an Apache Velocity template file (.vm) in Java directly is not possible because:
PDF is binary format,
Velocity generates plain text content.
On other words, Velocity cannot generate PDF.
XDocReport is able to generate a docx/odt report by merging a docx/odt template which contains some Velocity/Freemarker syntax with Java context. The generated docx/odt report can be convert it to pdf/xhtml.
It works because docx/odt are a zip which contains several xml entries. If you unzip a docx you will see word/document.xml. In this entry, you will see the content that you have typed with MS Word. word/document.xml is a plain text, so Velocity can be used in this case.
Here the XDocReport process to generate pdf from a docx template which uses Velocity:
Load docx template. this step consist to unzip the docx and stores in a map each xml entries (name entry as key and byte array as value). For instance map contains a key with word/document.xml and the xml content of this entry as value.
Loop for each xml entries which must be merged with Java context. For instance word/document.xml is merged with Java context by using Velocity and the result of merge replace the word/document.xml value of the map
Rebuild a new docx by zipping each entries of the map.
At this step we have a generated docx (the report).
To convert it to another format, XDocReport provides a docx-to-pdf converter based on Apache POI and iText. Here the XDocReport process to convert a docx to pdf:
Load docx with Apache POI
Loop for each structures of POI (XWPFParagraph, etc.) to create iText structure (iText Paragraph).
Note that XDocReport is modular and you can use other converters as well.
At first,we use freemarker template to generate a html file,and then render html to a pdf file by IItextRender .Finally, we can view pdf file in browser,there has a very useful javascript tools called pdfjs. Maybe you can try it.

How can I create PDF using XSL and Java?

I believe I will have to use an XSLFO stylesheet document for the XMLs I use to convert to PDF. And then I would need to use the Transform API of Java to convert XML to PDF.
The XML data can be read using a parser and directly be converted to PDF (XML -> PDF) using a library like iText instead of using complex conversions (XML -> XSLT -> XSLFO -> PDF).
Check out Apache FOP. We use this at work and it's worked well for us. http://xmlgraphics.apache.org/fop/quickstartguide.html

How to convert from pptx to html

I have an ooxml file that contain the notes section from a pptx slide (extracted using POI).
Is there a way (framework, software) to convert it to html and still keep the original design (Bold, italic...)
of the notes.
Edit:
Never mind i developed my own pptx notes to html parser.
I have used JODConverter library to convert openoffice ppt to html before.
http://code.google.com/p/jodconverter/wiki/GettingStarted
There is alibrary called docx4java that has the capability to extract MS XML formatted files (docx, xlsx, pptx). Check it and check the sample SvgExporter on http://www.docx4java.org/svn/docx4j/trunk/docx4j/src/pptx4j/java/org/pptx4j/convert/out/svginhtml/.
I used this library and it worked well in extracting the DOCX format as HTML or PDF.

Categories

Resources