PDF Generation Library for Java [closed]

PDF Generation Library for Java [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I know this has been asked before, but I'm still undecided on which PDF generation framework to use for my current project.
My requirements
on-the-fly generation of PDF documents (mainly order forms, invoices)
Java based
easy to layout
should be open source
easy to change layout
A lot of people seem to use iText, but I have some concerns (apart from the changed licence) regarding separation of concerns: In an HTML context there's good MVC support, where I usually stick to Spring MVC and FreeMarker to separate logic and layout. I'm a little bit worried that with iText you end up mixing code and layout a lot.
I am aware, that Apache FOP could be a solution here, but then again I find XSLT tedious to work with and I read that FOP can be slow when it comes to huge throuput of many documents?
I also considered JasperReports, but from my understanding this is more suited for reports containing tabular datasets rather than single documents such as invoices which require a lot of layout formatting?
Any thoughts on this?

Give JasperReports a try. Use iReport to create the .jrxml files. JapserReports can handle complex layouts. For those parts of the report based on different queries have a look at using subreports embedded into the main report.
Just like #Adrian Smith's solution this approach will separate the report layout editing from the data sourcing.

I have implemented a good solution where my software creates a format-independent "pure" XML file, then I give my boss the XSD and he puts it into Altova StyleVision where he can WYSIWYG design reports based on data he plucks out from the XSD. That software produces an XSLT. So my program:
Produces the format-independent "pure" XML
Transforms it with the XSLT, the output of which is XML-FO
Use Apache FOP to convert the XML-FO into PDF
This is a really great solution, means no more do I (as a programmer) have to change my code each time my boss wants to change a color in the report, my job is simply to produce "pure" XML.
Update: I should also point out that I give my boss access to our SVN repository with Tortoise SVN which is sufficiently easy to use that he can use it without error. So he can check the XSLT files straight into SVN and run the build/deploy without even having to interrupt me from my work. Obviously that workflow only works with people who are sufficiently exact that they don't make mistakes etc., but it works out well for us in that case.

Based on my experience, I would suggest you to consider following Java PDF Libraries for creating PDF reports,
DynamicReports
Apache PDF Box
iText PDF
PDF Clown
For your requirement, I think DynamicReports would be the right choice. I have been using Dynamic Reports from last 3 years for all my PDF Reporting requirements. With a very less amount of code, you can easily create a truly dynamic PDF. Dynamicreports is a wrapper around Jasper Report. So, it internally makes use of Jasper report.

Docmosis allows you to create templates in Word or OpenOffice writer - separating concerns nicely and layout is then in the most familiar tools.

I have been using JODConverter for a while and I really like it.
What we do is use JODReports to generate dynamic OpenOffice.org documents (which internally uses FreeMarker). Then we convert these documents to PDF documents using JODConverter.
It sounds like a lot of work, but it really isn't.

One possibility is
to create your documents in PostScript format and then
convert it to pdf using ghostscript (ps2pdf)

Related

Create a pdf for J2EE applications [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Could you please anyone tell me the way to create a pdf document for J2EE Application other than iText.
We are previously used the iText, but the problem is the html file (which is generated from Jsp) display is different with the generated PDF. So I need some other way to create a pdf as same as jsp display.
Any one please suggest me the libraries other than iText?
Thanks in Advance.

You are probably using iText 5 and XML Worker. Have you tried iText 7 and pdfHTML? See the HTML to PDF tutorial.
You will need:
iText 7: https://github.com/itext/itext7
the pdfHTML add-on: https://github.com/itext/i7j-pdfhtml
You claim:
the problem is the html file (which is generated from Jsp) display is different with the generated PDF.
That is certainly true when you use HTMLWorker (which you shouldn't) and it's true in many cases for XML Worker. But we rewrote iText from scratch because of the mismatch between the old iText architecture and the requirements when converting HTML to PDF.
If you have a problem with the HTML to PDF conversion, please explain the problem in a question and tag that question as an iText question. If we can improve iText 7 + pdfHTML, why wouldn't we do that?

This is always tricky because HTML and PDF have differing purposes (with a lot of overlap). That means "simply" converting between the two is sometimes not going to work well.
You can
Snapshot an image of the HTML and PDF the image. This has various downsides (can't search / extract text easily, larger, poor zoom, pagination) but is simple if it doesn't conflict with your requirements.
Use a PDF system (like iText) to construct the PDF as desired using the same data. This is obviously more work (possibly a lot more), but is the optimal result in terms of PDF quality and fitness for purpose.
Simplify/adjust your HTML so it converts better into PDF. This depends on what HTML tools/libraries you are using - you might not have much control over the HTML.
Try various other conversion libraries to see if you find a tool that works better for your HTML.

Apache POI or docx4j for dealing with docx documents [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
What do you think Which is better to use to read docx document as java objects and why ?
in other words. which library supports most of the word tags ?

Disclosure: I lead the docx4j project
Although docx4j can also handle pptx and xlsx, it is mostly used for docx manipulation. By way of illustration, as at the time of writing, there are nearly 1000 topics in the docx4j forum. The pptx forum has only 10% of the volume.
Whatever you want to do with the docx document, docx4j ought to be able to help you. There's a single page overview of a generic workflow.
For many common requirements, docx4j provides higher level API. These include:
Create/open/save docx (of course)
Report/document generation, using a variety of approaches: (i) Variable
substitution, (ii) XML data binding (particularly strong), and (iii) Mailmerge
Export as HTML, XHTML
Export as PDF (with font support)
For anything else, you can manipulate the JAXB representation of the docx to your heart's content. JAXB is a Java community standard, included in Java 6, and with a strong alternative implementation in EclipseLink's MOXy. (POI uses XML Beans instead of JAXB)
There's a web app to help you explore a docx, and generate Java code to create corresponding Java objects.
Of course, if there is some specific task you have in mind, it may be that docx4j or POI has a particular strength there.
Both docx4j and POI are ASL v2 licensed.
docx4j is actively maintained; its source code is on GitHub.
In addition, commercial support is available for docx4j if you want it, as are several commercial extensions eg MergeDocx.
docx4j does rely on POI as a library for its implementation of the OLE 2 Compound Document format, which we're grateful for.

I think Apache POI 's main focus is on dealing with spreadsheets though i has features to read word documents and it uses xml beans to do so.
Docx4j mainly deals with docx documents using jaxb. Usually jaxb allows xml to java object conversion hence i think docx4j would be preferable for your case.

If you are dealing with docx document, docx4j is more convenient than Apache POI.
You can use following links to learn basics of docx4j. Also, there is a nice forum of docx4j.
1.http://blog.iprofs.nl/2012/09/06/creating-word-documents-with-docx4j/
2.http://www.smartjava.org/content/create-complex-word-docx-documents-programatically-docx4j?

I tried Apache POI, but the problem is when printing anything from docx file (Ex: To print all "Heading1" elements from docx),it gets printed lots of bad data and whitespaces. Docx4j will avoid this bad data, I tried it.

Is Sun Xml doclet still available? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am looking for a doclet that can generate javadoc in xml format instead of the default html.
After some search, I found there was a Sun XML doclet, previously located at http://www.sun.com/xml/developers/doclet/
However this link no longer works, does anyone have a copy of Sun xml doclet? or any other alternative xml doclet?

Finally found an alternative xml doclet from http://jeldoclet.sourceforge.net/
It is very lightweight and powerful :)

I am not sure if this is what you are after, but there's a project on SF called XHTML Doclet. It's supposed to produce XHTML javadoc output instead of traditional HTML one. Anyway I doubt it has anything to do with Sun's implementation.
You didn't mention XHTML in your question, so I am not convinced XHTML is the XML you are looking for.

Although this question looks like an obscured one, I believe in fact it is pretty much important. (But the problem is the question itself was asked somewhat incorrectly.)
So, you are (or were) looking for an XML doclet... But what XML are you talking about?
For instance, XHTML is actually "XML" and an XHTML-generating doclet would be about the same as the standard one (not sure if the Standard Doclet already generates XHTML).
A doclet generating DITA output (a DITA-doclet) would be also an "XML doclet" at the same time, because DITA is XML too!
The same could be said about lots of other possible doclets generating such formats like:
XSL-FO, SVG, Microsoft Office XML, Office Open XML and so on.
In fact, a certain XML vocabulary can be created for about anything. That's why "XML" is eXtensible Markup Language!
All those "XML doclets" would be completely different from each other and quite heavyweight, because if to consider the XHTML-doclet (aka the Standard Doclet) as a piece of work, then why other XML-doclets (e.g. a DITA-doclet) would be simpler?
So, you may guess why there are no many pure "XML doclets" now (and those which initially were have been stopped long ago). I think, that's simply because people eventually realized that it is impossible to created a single doclet (or documentation generator, that is) for pretty much anything. The same as you cannot develop a program that does anything (any your wish).
So, why don't I think that the whole question (about the XML doclet) is senseless then?
Because what essentially you are looking for is not an universal "XML doclet" as it is -- that thing cannot exist! Rather, you need a tool for easy development of a custom XML doclet for your particular XML vocabulary. In that form, I believe, the question is very much legitimate!
What is that tool? A general programming language (Java itself) would be that "tool" of course. But the task actually isn't that wide. A more focused thing can exist that would already automate lots of operations common to the documentation generation in general (and to Javadoc in particular).
Such a tool does exist! (and quite long ago at that)
It is called DocFlex/Javadoc: http://www.filigris.com/products/docflex_javadoc/
In DocFlex/Javadoc, the actual doclets are programmed in the form of special templates using a graphic Template Designer. Farther, those templates are interpreted by the Template Interpreter wrapped as a Javadoc doclet. The templates themselves have some analogy to XSLT scripts (though they are not based on XSLT).
That template system helps to automate lots of routine tasks and will allow you to concentrate more on the data processing itself and the design of the result output.
Although, currently DocFlex/Javadoc is more focused on the generation of HTML and RTF, any XML markup can be generated with it as well. Simply, the third output format supported by DocFlex/Javadoc is just plain text (TXT), and XML files are plain-text files.
Any XML tags can be specified in the templates to be emitted as part of the TXT output.
So, you will get XML as a result -- any XML at that, so much any as you have programmed it.
Basically, it works the same as an XSLT script, which converts some XML file(s) into another XML output. The difference is that the data source here is not XML files but the Doclet API!
In fact, DocFlex/Javadoc itself is an offshoot of a much bigger project. Another offshoot is an XML Schema documentation generator, which appears to become quite popular here, e.g. see: How to convert xsd to human readable documentation?

Looking for a "Universal" Document viewer component/library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am looking for an Applet with similar functionality to the Oracle/Stellent OutsideIn ActiveX control or the Autonomy KeyView technology that act as a browser plug-in allowing the rendering/display of a large number of file formats (Word processing, spreadhseet, graphics, etc.) I currently use the Stellent solution, but due to some restrictions of some of our clients would prefer something that either exists as a Java Applet, Silverlight control, or has a Java API that I could build an applet on top of (neither of the two I mentioned do).
At a bare minimum it would need to display at least the following formats:
MS Word, Excel, PowerPoint
MS Outlook MSG files
Adobe PDF
Standard image formats: BMP, PNG, JPEG, TIFF
WordPerfect
HTML
Any suggestions?

If a commercial product is an option, ViewOne is a nice product. It's an Applet and you can view a large variety of document.

It's not a plugin, but multivalent is a java library and browser for a large number of document formats, but probably not all the ones you'd like to cover.
It does at least cover the PDF, HTML, and any reasonable image format, but not any of the proprietary formats.

If you are looking for pure Java component that supports all these formats, I'm pretty confident that it doesn't exist. If what you want is to embed Browser, MS Office, Acrobat etc. you would need an ActiveX container.
Here are some choices:
JDIC - if you are using Swing (see the Document Viewer demo.)
SWT ActiveX container - if you are using SWT
TeamDev WinPack - if your time is more valuable than your money ;-) The product is very polished, the price is reasonable and the support is excellent.
Note that with any of these you need to have installed Acrobat, MS Office (or the free doc viewers) and whatever else applications you need to edit the file formats.

Have you looked at Adeptol AJAX Document Viewer.
A no plugin non applet no install viewer which supports more than 300 file typess.
See ajaxdocumentviewer.com

You may be interested in Net-it Central. It uses an Active-x plugin or java applet and works with several different formats. I am using it for Word and Excel currently.

PDF Text Extraction Approach Using OCR [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written.
I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable.
I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible).
Any help would be appreciated.

If you have a text-based PDF, I'd strongly recommend PDFTextStream. It's not free, but licensing is reasonable, and it is much much better than PDFBox. PDFBox chokes on many PDF files which are generated by newer tools, and is not too consistent about PDFs it can handle. PDFTextStream handles any PDF I throw at it, including PDFs with embedded PNG images, which PDFBox can not do.
If you heckle the PDFTextStream folks to add OCR, they may listen up.

We use ABBYY FineReader Engine 11. They have java wrapper.
Pros:
It works great with all the languages (English, Russian, Uzbek etc) and doing real OCR (even if you have pdf without OCR they perform rendering at first and OCRing).
Cons:
It costs. You have to buy developer license and end-user license.
And it is EXTREMELY slow.

If you want to extract OCR from text based PDF you may have to convert it to an image first.

You can use Java wrappers of Tesseract - tesjeract or Tess4J - to perform OCR. However, for PDF, you'll need to convert to image (PNG or TIFF) first before feeding it to the OCR engine.
VietOCR calls Tesseract executable to perform the text extraction. It uses GhostScript to do PDF-to-image conversion.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.