Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am developing a JSF project and I have a doubt about reporting.
The idea is to offer the users reports in both HTML and PDF formats.
This should work developing the reports in HTML+CSS+JS and whenever a user needs a PDF report just convert the HMTL+CSS+JS to PDF.
Does somebody know a free Java library for converting the HTML to PDF?
This should be blind to the user.
Other proposals are accepted.
Thanks in advance.
better to use wkhtmltopdf tool to convert your HTML to pdf
Free solution: wkhtmltopdf - uses WebKit under the hood.
Commercial solution: PrinceXML - uses it's own ACID 2 compliant HTML rendering engine.
Apache FOP would be one solution which is an XSLT based solution although it does not support HTML5. Flying Saucer, wkhtmltopdf are some free solutions which are worthy a try. Commercial libraries like PriceXML offer support to CSS3. Pdfcrowd is yet another commercial solution.
Would Jasper Reports be an option? From the same report file you can generate many formats, PDF and HTML (+CSS) are two of them. Plus there is a GUI report designer.
Try out Flying Saucer
It internally uses iText and pretty good library
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Could you please anyone tell me the way to create a pdf document for J2EE Application other than iText.
We are previously used the iText, but the problem is the html file (which is generated from Jsp) display is different with the generated PDF. So I need some other way to create a pdf as same as jsp display.
Any one please suggest me the libraries other than iText?
Thanks in Advance.
You are probably using iText 5 and XML Worker. Have you tried iText 7 and pdfHTML? See the HTML to PDF tutorial.
You will need:
iText 7: https://github.com/itext/itext7
the pdfHTML add-on: https://github.com/itext/i7j-pdfhtml
You claim:
the problem is the html file (which is generated from Jsp) display is different with the generated PDF.
That is certainly true when you use HTMLWorker (which you shouldn't) and it's true in many cases for XML Worker. But we rewrote iText from scratch because of the mismatch between the old iText architecture and the requirements when converting HTML to PDF.
If you have a problem with the HTML to PDF conversion, please explain the problem in a question and tag that question as an iText question. If we can improve iText 7 + pdfHTML, why wouldn't we do that?
This is always tricky because HTML and PDF have differing purposes (with a lot of overlap). That means "simply" converting between the two is sometimes not going to work well.
You can
Snapshot an image of the HTML and PDF the image. This has various downsides (can't search / extract text easily, larger, poor zoom, pagination) but is simple if it doesn't conflict with your requirements.
Use a PDF system (like iText) to construct the PDF as desired using the same data. This is obviously more work (possibly a lot more), but is the optimal result in terms of PDF quality and fitness for purpose.
Simplify/adjust your HTML so it converts better into PDF. This depends on what HTML tools/libraries you are using - you might not have much control over the HTML.
Try various other conversion libraries to see if you find a tool that works better for your HTML.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I've been searching high and low for an up to date solution to this age old problem.
Long story short I want to take css + html -> pdf and do it in java.
I don't want to use an API as the data is sensitive. Googling provides me with countless sites/services that offer to do this but I'm looking for a stand alone tool and looking for one that will work nicely from my java server. I've found this awesome looking command line tool but it's a command line tool and spawning processes off a web server starts to get sketchy IMO (but I'm always willing to hear otherwise). Additionally flying saucer seems to be a standard choice, but I've heard mixed reviews.
Here is a 5 year old question on the subject, but I figure things have changed! Especially with all the work being done in the area of front end unit testing with dom manipulation I figure there might be some less than conventional solutions and I'm willing to hear them all!
Any help would be greatly appreciated.
You might try a combination CSSBox that converts HTML+CSS to SVG and then use for example Batik for creating your PDF as proposed for example here. FlyingSaucer could also do the job.
The choice depends on your further requirements. E.g. are you processing "street HTML" or well-formed documents? What about the pages in the resulting PDF? What about interactive elements in the HTML pages?
I mean the only way is to try at least some options practically and then you may ask more specific questions about some particular problems.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
How to convert to PDF from my JSP/HTML file?.
I want to convert a particular part of my webpage to a PDF file. Is it possible?
Yes. Take a good look at booth Apache FOP and iText. No matter what you use, you'll probably have to do a little fiddling.
I used HTMLDoc a couple of years ago and had pretty good luck with it.
try wkhtmltopdf. It is a command line utility that can be provided an html file or web address and a save location for the pdf. Very easy to use and utilizes the same rendering engine as safari. Works MUCH better than many of the other parsers that I have used (that don't always support CSS and other advanced layout features.
Take a look at html2ps (Perl) or html2ps (PHP). However, none of the two is implemented in Java.
You might also want to read this article.
flying saucer library is the best one to use. It works on top of itext and makes the task of conversion very easy.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I would like to read the text and binary attachments in a saved Outlook message (.msg file) from a Java application, without resorting to native code (JNI, Java Native Interface).
Apache POI-HSMF seems to be in the right direction, but it's in very early stages of development...
msgparser is a small open source Java library that parses Outlook .msg files and provides their content using Java objects. msgparser uses the Apache POI - POIFS library to parse the message files which use the OLE 2 Compound Document format.
You could use Apache POIFS, which
seems to be a little more mature,
but that would appear to duplicate the efforts of POI-HSMF.
You could use POI-HSMF and contribute changes to get the
features you need working. That's
often how FOSS projects like that expand.
You
could use com4j, j-Interop, or some
other COM-level interop feature and
interact directly with the COM
interfaces that provide access to
the structured document. That would
be much easier than trying to hit it
directly through JNI.
Have you tried to use Jython with the Python win32 extensions (http://www.jython.org/Project/ + http://python.net/crew/mhammond/win32/)?
If this is for a "personal" or "internal" project Jython with Python may be a very good choice. If you are building a "shrink wrapped" software package this may not be the best option.
Apache POI-HSMF.
You can start from the example given in below link.
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/examples/src/org/apache/poi/hsmf/examples/Msg2txt.java?revision=821500&view=markup&pathrev=821500
Further read library docs.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written.
I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable.
I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible).
Any help would be appreciated.
If you have a text-based PDF, I'd strongly recommend PDFTextStream. It's not free, but licensing is reasonable, and it is much much better than PDFBox. PDFBox chokes on many PDF files which are generated by newer tools, and is not too consistent about PDFs it can handle. PDFTextStream handles any PDF I throw at it, including PDFs with embedded PNG images, which PDFBox can not do.
If you heckle the PDFTextStream folks to add OCR, they may listen up.
We use ABBYY FineReader Engine 11. They have java wrapper.
Pros:
It works great with all the languages (English, Russian, Uzbek etc) and doing real OCR (even if you have pdf without OCR they perform rendering at first and OCRing).
Cons:
It costs. You have to buy developer license and end-user license.
And it is EXTREMELY slow.
If you want to extract OCR from text based PDF you may have to convert it to an image first.
You can use Java wrappers of Tesseract - tesjeract or Tess4J - to perform OCR. However, for PDF, you'll need to convert to image (PNG or TIFF) first before feeding it to the OCR engine.
VietOCR calls Tesseract executable to perform the text extraction. It uses GhostScript to do PDF-to-image conversion.