Convert html+css+js to PDF - java

I want to create something like this (code is here):
in pdf format. I'm using google charts and regarding to this forum converting chart to pdf is impossible. I've already tryied iText+XMLWorker, but there is some problem with css and any js supporting at all, I think.
So, the questions are: How can I convert html+css+js to .pdf file? Or, may be, the issue have other variants?

As promised in the comment, I've asked Raf. This was his answer:
One way to use XML Worker for HTML+CSS+JS is to use a browser engine to preprocess the HTML. Examples of such a browser engine are WebKit (Chrome, Safari) and Gecko (Firefox). These can interpret the CSS and JS and give you HTML that is ready to be parsed by XML Worker.
Examples of competing products are:
wkhtmltopdf, a command line tool that uses WebKit as its rendering engine.
Prince XML supports HTML+CSS+JS to PDF using their own engine.
Maybe there are others, but this is what Raf told me. I hope this helps.

Related

converting ppt to html

I want to implement a function that can see PowerPoint on the web at this time.
You can do it simply by converting PowerPoint to an image, but if you convert it to an image, I think there are issues that you can not use video or audio.
So the idea was to convert PowerPoint to HTML and place it where I wanted. However, it does not have much ability to directly implement the pure function of converting PowerPoint to HTML. To solve this problem, I have been looking for open source or various libraries, but I have not found them yet.
The development environment is java8 + Spring Boot.
If you are OK with converting your PPT files to PDF before converting them to HTML, then pdf2htmlEX could be worth looking at. It is the best tool I could find for this kind of work, as it is capable of converting PDFs to HTML very precisely (have a look at the exmples 1,2,3,4). You should be able to find wrapper libraries in the maven repo so that you are able to call it from your Java applications.
If you are OK in using iframe you may use a Microsoft solution https://products.office.com/it-IT/office-online/view-office-documents-online
You may use this code:
<iframe src='https://view.officeapps.live.com/op/embed.aspx?src=[you_ppt_url]' width='100%' height='600px' frameborder='0'>
There's an older node package called PPTX2HTML. It outputs a bunch of garbled code on a canvas element, but it might work. They even have a demo website to try it out. They seemed to have broken the powerpoint up into parseable XML and rendered the elements.

Creating Docx, PDF, XSL-FO

[Background Info]
We had a solution in place to use Word automation serverside to convert HTM documents into Docx, PDF or Print documents. This solution broke in the latest version of Windows Server 2012. We learned that MS does not intend on Word working in this manner and after trouble shooting with MS support Engineers we have come to the conclusion that it will never work.
[Currently]
I am currently researching potential technologies and tools that my company can use to regain this functionality. We need to be able to create Docx, PDF and print files to a local printer.
I have looked into a number of tool already and I am currently leaning towards Apache FOP this seems to handle PDF and Printing for us.
However, I'm looking for some advice and suggested tools that we could use to implement a pure Java approach. Currently our application creates HTM files with all the required information. So ideally we would like to take these HTM files and "Convert" them into Docx/XLS-FO format.
[Question]
So my question that I'm hoping you will be able to help me with.
What is the best tools that I can use to get from
HTM to Docx
HTM to PDF
Or what would be the best process for achieving this? has anyone had success finding a solution for this in the past?
Thank You
It depends on the level of control and the complexity of the source HTML. There are HTML to FO stylesheets but you might find them wanting for your specific need.
So you could use the Jericho parser to read the HTML and generate FO. Or you generate the target format directly using Apache PDFBox and Apache POI
It all boils down to the level of control you want/need
docx4j-ImportXHTML will get you from XHTML to docx. From there, you can use docx4j (or some other solution eg LibreOffice/OpenOffice) to do docx to PDF.
docx4j supports docx to XSL FO, and by default uses FOP.

Convert Lotus Notes RichText to PDF programmatically

I want to convert Notes Richtext into PDF in a server program (preferably Java). Is there any sample code how to do that. Converting to HTML/MIME isn't an option since the conversion process is too lossy.
I did some tests with DXL, some XSLT code and XSL:FO, via FOP. It produced some PDF output. Project abandoned due to lack of funding (read: no customer).
The basics, in a recent document: http://www.ibm.com/developerworks/xml/library/x-xslfo/
You'll need to find some third-party software to help, as this isn't possible out of the box. Here's one that looks promising: SWING Software's Lotus Notes Export to PDF

Convert HTML page into MS word using java or any API

I want to convert a HTML page into MS word. I want to know what API's will be helpful and also if there is any other option to do the same.
The entire page is to be converted into .doc (eg. If there is a table in the html page, a similar table must be created in the word doc) .
Apache POI does not provide an option to format the word document as in the HTML page.
I need something that can give me a completely formatted word document.
Some of the things that i seek are JSOUP, docx4j, jasper reports, and JOD Convertor.
I tried parsing the HTML page using JSOUP and I get the contents of
the page in my java program. Now I need to pass these contents to a
doc/docx file. Can docx4j be helpful to get a formatted docx file?
Please help.
Thank you.
I would go with Ashwini Raman's suggestion. It wont work with every scenario. In the case of a complex HTML document with many images and stuff word will not do a good job. But for most cases it should be fine. Otherwise, there is a complex task ahead of you. You will have to parse your HTML document using the jsoup library for example and then use the docx4j library to create your workd document.
Links to both are here:
http://www.docx4java.org/trac/docx4j
http://jsoup.org/
When you are doing it also, the formatting might be iffy.
To answer your original question, no there is no ready made library that does what you are expecting. At least I havent come across any.
I found a way round to do the same. First I need to get the parsed objects using JSOUP and pass these to a document template. I am now looking for the options that can provide me creating easy templates and creating the document dynamically.
I have asked another question regarding the same.

Creating PDF, HTML, and optionally RTF documents from the same source using Java?

I was looking at using iText to create both a pdf and html version of a document with RTF as a possible option. According to this question this is no longer possible with iText. Is there a library that will allow me to create a document in Java and output it as both PDF and HTML? The ability to output RTF would be nice but is not required.
As that answer to the other question states, you can just use the iText RTF Library.
I have used PD4ML to convert HTML to pdf. Even though it is a commercial app. It is very reliable and supports CSS well.
JasperReports. If you look at this package it supports export to:
pdf
html
rtf
xls
xml
You have two options to create the documents:
via iReport - a visual designer for reports
via an API, where you construct everything with Java code.
Note that even though JasperReports's main function is to create reports, it can very well create other documents, with no tabular data for example.
You could also try Docmosis since that supports the output formats provided by OpenOffice (including the ones you specified) and you can often do the job with a lot less code.

Categories

Resources