Creating Docx, PDF, XSL-FO - java

[Background Info]
We had a solution in place to use Word automation serverside to convert HTM documents into Docx, PDF or Print documents. This solution broke in the latest version of Windows Server 2012. We learned that MS does not intend on Word working in this manner and after trouble shooting with MS support Engineers we have come to the conclusion that it will never work.
[Currently]
I am currently researching potential technologies and tools that my company can use to regain this functionality. We need to be able to create Docx, PDF and print files to a local printer.
I have looked into a number of tool already and I am currently leaning towards Apache FOP this seems to handle PDF and Printing for us.
However, I'm looking for some advice and suggested tools that we could use to implement a pure Java approach. Currently our application creates HTM files with all the required information. So ideally we would like to take these HTM files and "Convert" them into Docx/XLS-FO format.
[Question]
So my question that I'm hoping you will be able to help me with.
What is the best tools that I can use to get from
HTM to Docx
HTM to PDF
Or what would be the best process for achieving this? has anyone had success finding a solution for this in the past?
Thank You

It depends on the level of control and the complexity of the source HTML. There are HTML to FO stylesheets but you might find them wanting for your specific need.
So you could use the Jericho parser to read the HTML and generate FO. Or you generate the target format directly using Apache PDFBox and Apache POI
It all boils down to the level of control you want/need

docx4j-ImportXHTML will get you from XHTML to docx. From there, you can use docx4j (or some other solution eg LibreOffice/OpenOffice) to do docx to PDF.
docx4j supports docx to XSL FO, and by default uses FOP.

Related

converting ppt to html

I want to implement a function that can see PowerPoint on the web at this time.
You can do it simply by converting PowerPoint to an image, but if you convert it to an image, I think there are issues that you can not use video or audio.
So the idea was to convert PowerPoint to HTML and place it where I wanted. However, it does not have much ability to directly implement the pure function of converting PowerPoint to HTML. To solve this problem, I have been looking for open source or various libraries, but I have not found them yet.
The development environment is java8 + Spring Boot.
If you are OK with converting your PPT files to PDF before converting them to HTML, then pdf2htmlEX could be worth looking at. It is the best tool I could find for this kind of work, as it is capable of converting PDFs to HTML very precisely (have a look at the exmples 1,2,3,4). You should be able to find wrapper libraries in the maven repo so that you are able to call it from your Java applications.
If you are OK in using iframe you may use a Microsoft solution https://products.office.com/it-IT/office-online/view-office-documents-online
You may use this code:
<iframe src='https://view.officeapps.live.com/op/embed.aspx?src=[you_ppt_url]' width='100%' height='600px' frameborder='0'>
There's an older node package called PPTX2HTML. It outputs a bunch of garbled code on a canvas element, but it might work. They even have a demo website to try it out. They seemed to have broken the powerpoint up into parseable XML and rendered the elements.

Is there a way (in java) to generate human editable Microsoft Word documents from human readable template?

I am searching for a way for my Java application to generate Word document using some kind of Template (the data for the document will be provided by the application)
Here are the requirements :
- The template should be editable for a non-developper human being. Creating a Jasper template using the adequate tool or editing a Word document with some kind of templating language is compliant. Asking for editing the xml file of the document is not
- The results should be easily editable for a human being, using Microsoft Word. For example, the document generated by Jasper or Birt is not compliant, as the table layout prevent any easy edition.
For the moment, I looked at the following solutions, finding no one which match the two requirements :
Jasper. The document generated are not easily edited
Birt. Same Problem
Generating the xml using a template motor (velocity, Freemarker). I cannot ask for the final client to edit this kind of XML file...
You can check out Templater. It has pretty good demo page.
Disclamer: I'm the author.
LibreOffice
LibreOffice is an open-source implementation of an app suite similar to Microsoft Office. Besides supporting the standardized OpenDocument format, it also reads and writes Microsoft Office formats.
LibreOffice offers a Java API. So you may be able to programmatically create documents from a template.
In the past we’ve done something similar, modifying a document with search-and-replace and document-variables.
Apache Poi
Apache Poi is an open-source library for reading and writing Microsoft Office compatible documents.
I don't know its details but you might take a look.
JODReports (open source) and Docmosis (commercial) are designed to use normal/human-managed documents as templates (Word, OpenOffice, etc), merge in your data and return editable documents, PDFs etc. Please note I work for Docmosis.
Both JODReports and Docmosis provide a Java API.
If you are interested in automating Open Office or Libre Office directly (as mentioned in Basil's answer), this blog about converting Doc to Pdf will give you a quick-start to:
load a doc file as a template
search and replace
export to file (pdf in the example)
To change the output format to Doc instead of PDF:
propertyValues[1].Value = "writer_pdf_Export";
to
propertyValues[1].Value = "MS Word 97";
I hope that helps.
Was searching for this kind of solution as well, and I found XDocReport, including an example of a table. I will give it a try.

Convert html+css+js to PDF

I want to create something like this (code is here):
in pdf format. I'm using google charts and regarding to this forum converting chart to pdf is impossible. I've already tryied iText+XMLWorker, but there is some problem with css and any js supporting at all, I think.
So, the questions are: How can I convert html+css+js to .pdf file? Or, may be, the issue have other variants?
As promised in the comment, I've asked Raf. This was his answer:
One way to use XML Worker for HTML+CSS+JS is to use a browser engine to preprocess the HTML. Examples of such a browser engine are WebKit (Chrome, Safari) and Gecko (Firefox). These can interpret the CSS and JS and give you HTML that is ready to be parsed by XML Worker.
Examples of competing products are:
wkhtmltopdf, a command line tool that uses WebKit as its rendering engine.
Prince XML supports HTML+CSS+JS to PDF using their own engine.
Maybe there are others, but this is what Raf told me. I hope this helps.

Using templates for creating PDF files

I have some PDF template (with header and footer). I want to generate documents that are based on that template.
Is there any way to do that with iText? Thank you
P.S. Right now I am generate a document on-fly i.e. every time I generate header, footer and the context itself.
UPDATE: I have found incredible library called PD4ML. It's not free, but not such expensive, BUT it has really cool features such as HTML2PDF conversion on fly, supports a lot of HTML-CSS tags and has even its own jsp tags library! So I really suggest it when you need something instead of heavy and memory-eating JasperReports.
You can use JasperReports library and the iReport visual designer.
JasperReports use iText to produce PDFs from "jasper" templates, that are XML files (following the jrxml DTD) compiled in java classes, but allows you to use the template for generating MS Office files (with POI), html, etc.
Im not sure with iText, but you can use BIRT for this purpose. http://www.eclipse.org/birt/ Its too much using it just for PDF creation, you can do a lot (more than you can imagine) with it.
If you can choose your template format. I would go with JODReport and JODConverter.
JODReport use an ODT template and fill the mapping in the template with your java code.
JODConverter use LibreOffice to convert such template in PDF or whatever fortmat LibreOffice can handle to export.
You have to be able to use LibreOffice as a service installed remotely on a machine.
I used it back in 2012 but not sure if the project is still active

Creating PDF, HTML, and optionally RTF documents from the same source using Java?

I was looking at using iText to create both a pdf and html version of a document with RTF as a possible option. According to this question this is no longer possible with iText. Is there a library that will allow me to create a document in Java and output it as both PDF and HTML? The ability to output RTF would be nice but is not required.
As that answer to the other question states, you can just use the iText RTF Library.
I have used PD4ML to convert HTML to pdf. Even though it is a commercial app. It is very reliable and supports CSS well.
JasperReports. If you look at this package it supports export to:
pdf
html
rtf
xls
xml
You have two options to create the documents:
via iReport - a visual designer for reports
via an API, where you construct everything with Java code.
Note that even though JasperReports's main function is to create reports, it can very well create other documents, with no tabular data for example.
You could also try Docmosis since that supports the output formats provided by OpenOffice (including the ones you specified) and you can often do the job with a lot less code.

Categories

Resources