Create a pdf for J2EE applications [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Could you please anyone tell me the way to create a pdf document for J2EE Application other than iText.
We are previously used the iText, but the problem is the html file (which is generated from Jsp) display is different with the generated PDF. So I need some other way to create a pdf as same as jsp display.
Any one please suggest me the libraries other than iText?
Thanks in Advance.

You are probably using iText 5 and XML Worker. Have you tried iText 7 and pdfHTML? See the HTML to PDF tutorial.
You will need:
iText 7: https://github.com/itext/itext7
the pdfHTML add-on: https://github.com/itext/i7j-pdfhtml
You claim:
the problem is the html file (which is generated from Jsp) display is different with the generated PDF.
That is certainly true when you use HTMLWorker (which you shouldn't) and it's true in many cases for XML Worker. But we rewrote iText from scratch because of the mismatch between the old iText architecture and the requirements when converting HTML to PDF.
If you have a problem with the HTML to PDF conversion, please explain the problem in a question and tag that question as an iText question. If we can improve iText 7 + pdfHTML, why wouldn't we do that?

This is always tricky because HTML and PDF have differing purposes (with a lot of overlap). That means "simply" converting between the two is sometimes not going to work well.
You can
Snapshot an image of the HTML and PDF the image. This has various downsides (can't search / extract text easily, larger, poor zoom, pagination) but is simple if it doesn't conflict with your requirements.
Use a PDF system (like iText) to construct the PDF as desired using the same data. This is obviously more work (possibly a lot more), but is the optimal result in terms of PDF quality and fitness for purpose.
Simplify/adjust your HTML so it converts better into PDF. This depends on what HTML tools/libraries you are using - you might not have much control over the HTML.
Try various other conversion libraries to see if you find a tool that works better for your HTML.

Related

Java convert html with css+js to pdf [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am developing a JSF project and I have a doubt about reporting.
The idea is to offer the users reports in both HTML and PDF formats.
This should work developing the reports in HTML+CSS+JS and whenever a user needs a PDF report just convert the HMTL+CSS+JS to PDF.
Does somebody know a free Java library for converting the HTML to PDF?
This should be blind to the user.
Other proposals are accepted.
Thanks in advance.
better to use wkhtmltopdf tool to convert your HTML to pdf
Free solution: wkhtmltopdf - uses WebKit under the hood.
Commercial solution: PrinceXML - uses it's own ACID 2 compliant HTML rendering engine.
Apache FOP would be one solution which is an XSLT based solution although it does not support HTML5. Flying Saucer, wkhtmltopdf are some free solutions which are worthy a try. Commercial libraries like PriceXML offer support to CSS3. Pdfcrowd is yet another commercial solution.
Would Jasper Reports be an option? From the same report file you can generate many formats, PDF and HTML (+CSS) are two of them. Plus there is a GUI report designer.
Try out Flying Saucer
It internally uses iText and pretty good library

Faster than JODCONVERTER [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have been improving a document management project and one requirement is to render documents(word, pdf, etc) in web page. Pdf can be rendered with iframe, object or embed tag and servlet. But the other documents like word, excel can not be rendered in the web page. My solution is to convert these documents to pdf or html on rendering and render them like this. I've tried to convert them with JODCONVERTER and it does convert but converting a word(docx) almost with 700 pages to pdf 25-30 sec, to html 30-35 sec. It is too much.In the course of events, waiting for too much is not good for users. Documents will be stored our server, not another place. Is there another thing for faster conversion or better solution?
Thank!
You could use jodconverter + LibreOffice 3.5.* or jodconverter + OpenOffice.org 3.4.1 (I have tried both recently and they are way faster than LibreOffice 3.6+/4.0+) in combination with a lazy/parallel conversion process to improve response times.
You cannot transform 700 pages of content in a snap. Even Google Docs puts you on a cloud transformation queue for your uploaded documents. So you can implement this kind of queue which will lazily transform your documents one by one, and you can show a proper message to the user while transform operation is pending. This queue must save the transformed file to filesystem of course, so you can display it anytime you wanted. You must consider the disk space problem here.
A blind solution is to just open the file in another browser tab with correct mimetype, given that the browser is ie and microsoft office is installed, hopefully it will open the file natively in the browser. However it is not a platform-independent solution.

PDF Generation Library for Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I know this has been asked before, but I'm still undecided on which PDF generation framework to use for my current project.
My requirements
on-the-fly generation of PDF documents (mainly order forms, invoices)
Java based
easy to layout
should be open source
easy to change layout
A lot of people seem to use iText, but I have some concerns (apart from the changed licence) regarding separation of concerns: In an HTML context there's good MVC support, where I usually stick to Spring MVC and FreeMarker to separate logic and layout. I'm a little bit worried that with iText you end up mixing code and layout a lot.
I am aware, that Apache FOP could be a solution here, but then again I find XSLT tedious to work with and I read that FOP can be slow when it comes to huge throuput of many documents?
I also considered JasperReports, but from my understanding this is more suited for reports containing tabular datasets rather than single documents such as invoices which require a lot of layout formatting?
Any thoughts on this?
Give JasperReports a try. Use iReport to create the .jrxml files. JapserReports can handle complex layouts. For those parts of the report based on different queries have a look at using subreports embedded into the main report.
Just like #Adrian Smith's solution this approach will separate the report layout editing from the data sourcing.
I have implemented a good solution where my software creates a format-independent "pure" XML file, then I give my boss the XSD and he puts it into Altova StyleVision where he can WYSIWYG design reports based on data he plucks out from the XSD. That software produces an XSLT. So my program:
Produces the format-independent "pure" XML
Transforms it with the XSLT, the output of which is XML-FO
Use Apache FOP to convert the XML-FO into PDF
This is a really great solution, means no more do I (as a programmer) have to change my code each time my boss wants to change a color in the report, my job is simply to produce "pure" XML.
Update: I should also point out that I give my boss access to our SVN repository with Tortoise SVN which is sufficiently easy to use that he can use it without error. So he can check the XSLT files straight into SVN and run the build/deploy without even having to interrupt me from my work. Obviously that workflow only works with people who are sufficiently exact that they don't make mistakes etc., but it works out well for us in that case.
Based on my experience, I would suggest you to consider following Java PDF Libraries for creating PDF reports,
DynamicReports
Apache PDF Box
iText PDF
PDF Clown
For your requirement, I think DynamicReports would be the right choice. I have been using Dynamic Reports from last 3 years for all my PDF Reporting requirements. With a very less amount of code, you can easily create a truly dynamic PDF. Dynamicreports is a wrapper around Jasper Report. So, it internally makes use of Jasper report.
Docmosis allows you to create templates in Word or OpenOffice writer - separating concerns nicely and layout is then in the most familiar tools.
I have been using JODConverter for a while and I really like it.
What we do is use JODReports to generate dynamic OpenOffice.org documents (which internally uses FreeMarker). Then we convert these documents to PDF documents using JODConverter.
It sounds like a lot of work, but it really isn't.
One possibility is
to create your documents in PostScript format and then
convert it to pdf using ghostscript (ps2pdf)

Convert PDF to Word in Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is it possible to convert PDF to Word in Java? I'm not talking about parsing a PDF document and then custom render it again to Word. I want a Java library that can directly convert it.
Reading PDF documents is a very involved process and there are no good free libraries for extracting non-text information from PDF documents in Java. Worse yet, PDF documents have a lot of layout information that is hard to reconstruct, for example a table in a Word document becomes some lines and a bunch of pieces of text in PDF.
It is almost impossible to recreate semantic information from an arbitrary PDF. If you have the same tool that wrote it you have somewhat more chance but even so there is much uncertainty. The only thing you can be sure of in a (text) PDF is the position of each character on the page. (Note that some PDFs include bitmaps in which textual information occurs and that has to rely on OCR).
There are several groups in computer science departments and elsewqhere who are spending very significant effort to try and get semantic information. We collaborate with Penn State - one of the leaders - and they are working on extracting tables. In good casees they get 90% in bad ones 50%.
So the answer is formally that you cannot, but you may occasionally be fortunate. (We do a lot of this for chemistry and count ourselves lucky if we get 50% on a regular basis).
You can try to do it with the iText library. Read the PDF and then write it as an RTF.
This is not that simple though, as you have to preserve the different style that the PDF has.
You can use some external tools.
Install some free program like "Free PDF to Doc" and execute it from you java program.
This Works fine in most cases.
use the Acrobat Pro SDK from you java code.
Best of luck

PDF Text Extraction Approach Using OCR [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written.
I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable.
I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible).
Any help would be appreciated.
If you have a text-based PDF, I'd strongly recommend PDFTextStream. It's not free, but licensing is reasonable, and it is much much better than PDFBox. PDFBox chokes on many PDF files which are generated by newer tools, and is not too consistent about PDFs it can handle. PDFTextStream handles any PDF I throw at it, including PDFs with embedded PNG images, which PDFBox can not do.
If you heckle the PDFTextStream folks to add OCR, they may listen up.
We use ABBYY FineReader Engine 11. They have java wrapper.
Pros:
It works great with all the languages (English, Russian, Uzbek etc) and doing real OCR (even if you have pdf without OCR they perform rendering at first and OCRing).
Cons:
It costs. You have to buy developer license and end-user license.
And it is EXTREMELY slow.
If you want to extract OCR from text based PDF you may have to convert it to an image first.
You can use Java wrappers of Tesseract - tesjeract or Tess4J - to perform OCR. However, for PDF, you'll need to convert to image (PNG or TIFF) first before feeding it to the OCR engine.
VietOCR calls Tesseract executable to perform the text extraction. It uses GhostScript to do PDF-to-image conversion.

Categories

Resources