Common Java based PDF construction libraries/tools? - java

I'm looking for 2 or 3 of the most common/industry wide used libraries for the Java platform for the creation of PDFs on the fly.
The one requirement I'm focusing on is the ability to use specific formatting such as page layout and font sizes and typefaces (this will be a dynamically created legal document with frustratingly specific type standards).
I'm not actually going to be the one implementing this (I'm not a Java developer), but am trying to get the ball running and need to pass along some things to have our dev team to start investigating.
I'm investigating iText at the moment, which seems to be a well established option. I'm not yet sure how robust/flexible the templating abilities are, though.
EDIT: I just realized that there's probably no one 'right' answer for this question so maybe htis is better as part of the Wiki.

iText is probably the best all around free tool.
PDFLib is another choice if you are willing to pay for the license. It has a bit more features and has a native implementation backing the Java API.
There is always FOP (from apache) if you are willing to deal with XSLT and XSL-FO, but I believe they haven't updated those engines in a while.

I agree that iText is a great tool. However, the current version of iText is not free if you intend to use it in a closed source project. See Wikipedia:
In the end of 2009, iText version 5 is released under Affero GPL license. This license is drastically different from the previous license that iText had been distributed under, in that it requires anyone using iText 5 under a free license to provide the users with the full source of their application. Projects that do not want to provide their source code are required to purchase a commercial license for a non-disclosed price or they cannot upgrade to iText 5.
However, you may still use iText 4 under the LGPL license.

Take a look at Apache FOP. Very powerful.

IText will probably serve most of your purposes. However, if you are looking to convert from rtf or doc to pdf, you can use a java plugin for open source tools like OpenOffice( openoffice.org)
Hope this is helpful,
R

iText is probably your #1 standard in this area. You might also consider JODReports or Docmosis since they can do template-based reporting using standard word processor documents as templates.

Have you considered http://jasperforge.org/

Related

Java Server question: Cheap/Mid-Level Creating report documents with designer (PDF/HTML)

What are some ways to create PDF reports in a Java server environment without having to use Java code to do so. Or maybe minimal Java code?
We have used iText and various htmltopdf solutions. Those work, but they take a lot of Java code create the documents and you have to code the positioning of all the elements?
Is there a solution that has a a designer tool? You design a report template with the designer and then deploy the template on a server?
We could pay for an enterprise solution.
You might be interested in JasperReports and iReport (which is sort of a designer IDE for JasperReports).
You can use JCopist to generate PDFs using FreeMarker templates rather than writing Java code. Another option that is mostly suitable to JSF-based projects, is to use JBoss Seam's iText-based template-based PDF generation tools: http://docs.jboss.org/seam/1.1.1.GA/reference/en/html/itext.html
You don't say if you're prepared to pay for an enterprise solution. If you are, then Thunderhead may be an option. It provides the means for templates (as you've specified) and can generate documents off the back of these in a variety of formats. You can interface to it via a variety of means (JMS / WebService / COM - not sure about the COM, on reflection). It has ActiveX controls to allow users to edit templates (with appropriate and fine-grained permissioning) and the template editing resembles a Word-based editor. It's very powerful indeed.
You should look at Docmosis. It uses standard word or openoffice documents as templates - so you design your document in a normal word processor. Depending on what you want to do in your templates this can be an ideal way to build reports since most developers (and users) already know how to work with word processors. You can then have Docmosis manipulate the document, merge in data and produce various formats. Have a look at the demo on the website.

any lib / api for filling in the field in microsoft doc?

I got a document that need to be filled in (it was in microsoft word doc), I have no idea how to filled in / integrated with my current web apps.
is there any good java api / lib that could be used ? preferrably the free one.
here is the example of doc that need to be filled in.
http://drop.io/callmeblessed/asset/debt-agremeent-certificate-doc
Apache POI - the Java API for Microsoft Documents
If Leniel's suggestion doesn't work (I would suggest trying POI first, as well), there's the OpenOffice.org java UNO API which has a different implementation. It introduces a significant runtime dependency, but if POI doesn't cut it, it's the obvious second choice.
Docmosis can do this as long as you have control of the source document and can specify placeholders etc. It's free and makes use of OpenOffice to do the format conversions. The Docmosis engine can do document manipulation (population, repetition, deletion etc). Load balancing and scalability features are paid for though.

Word document creation API in Java

I would like to create a word document using a template, replace some variables (fields) and save it as a new word document.
I was thinking using Apache POI, http://poi.apache.org/ is it the best for this purpose?
can you share your impression from it?
I've worked with POI before and it's certainly able to generate Word documents. But the devil is in the details.
Word has thousands of features: You can put numbered lists starting at #13 with negative indents into two joined cells of a table included in another table that is itself part of a bullet list... you get the idea. When the POI documentation says they are a work in progress, that reflects what will probably be an eternal state of trying to catch up to the (to us, undocumented) specification of Word.
Documents with a reasonably "normal" set of used features are well supported by POI, whose interfaces and methods are reasonable and consistent but sometimes require a bit of work. But as Pascal says, documents with a not too exorbitant set of features are also supported by RTF.
I have almost no experience "doing" RTF but it's probably a bit simpler than working with POI.
If you're working in an environment or for a customer who insists that your produced documents be .DOC rather than .RTF, then POI is pretty much your only choice, unless you can introduce a step where you use a bit of Office automation to convert RTF into DOC.
Update: I've had a couple more ideas in the meantime.
Using POI or creating RTF documents is something that you could do on practically any platform. At my job, all servers doing processing like this happen to be running Linux, for example.
However, in the likely case that your programs will run under Windows, there is another alternative: Jacob http://www.land-of-kain.de/docs/jacob/
Jacob is a COM interface for Java; it essentially allows you to "remote control" Windows programs such as Word and Excel. The document I linked to above is not to Jacob's own site but to a single page with "cookie cutter" recipes for using Jacob. The project itself is on SourceForge: http://sourceforge.net/projects/jacob-project/ But people claim, and rightly so, that the documentation is a bit lacking.
Jacob has the advantage over all other solutions that you're dealing with the "real" Word and therefore all capabilities of Word are available to you. This would be an alternative if there are detail aspects of your document that just can't be handled with POI or via the RTF format.
This is obviously way too late, But since 2013 there is a much better, more flexible solution to word document creation.
http://www.docx4java.org/trac/docx4j
I have had much more luck with docx4j than I ever did with POI.
I'm not sure of the exact status of the Word documents support in POI but, according to the POI website, work is still in progress (can't say what this mean exactly). So, at this time, I would not use POI but rather try to generate a RTF document. For this, you could :
Use RTFTemplate which is a RTF to RTF Engine that can generate RTF document as the result of the merge of a RTF model and data.
Use iText which is primarly a PDF generator but can also generate RTF.
Build your own custom solution (but I wouldn't do that).
I'd go for iText.
If you use a template, and do not want to create the word document from scratch, for what I know, POI is a pretty good solution. You open the template and select the zones you want to replace.
They say POI is still is developpement, but I've been using it in production environnement and it works pretty good at the moment.
I know this question is a bit old, but I think many people still find this with search engines, so I post another possibility to do what you want right here:
If the one and only goal is to have a Word Template and to replace some values in it, you might consider saving a Word Template as single xml (not docx) and then processing it with simple Java and without any Framework. If you want to do more (e.g. create lists or tables) you might also consider understanding the xml format and writing your own helpers before loading a Framework like POI.
Here is an example on how to do that:
http://dev-notes.com/code.php?q=10
This is the fast version, if you want a nice version, you could try using an XML processor.
PS: users might notice that the file extension is not doc but xml and they may blame you for that, but that's ok... just rename it to doc, word will recognize the format and everyone is happy again ;)
You should look into the Aspose.Words components. They have recently begun providing a Java version of the component.
See the following link: Aspose.Word for Java
This supports Word automation, creation and advanced features such as mail merging without the need for an instance of Microsoft Word on the machine. The real benefits are that you are able to work within the context of an actual word document and not having to compromise by creating RTFs etc.
The Java version is not currently as fully featured as the .Net version but the main core functionality is there and they are pushing very hard to have a feature equivalent version soon.
Also, if you purchase the Java version you get a years free upgrades / support as the new releases are created.
If you are working with docx documents, docx4j is an option. Like POI, its open source.
I created and use this: http://code.google.com/p/java2word

Open source alternative to DITA Open Toolkit

I'm working on a web app that will need to process DITA documents from persistent storage (likely a JCR). The DITA Open Toolkit is the only DITA implementation I'm aware of, but it requires all of your documents to exist on the filesystem. Ideally, I'd like something that works like the DITA OT, but allows you to provide a resolver (much like an XSLT URIResolver) to pull referenced content from other sources.
If people have other ideas, such as using a virtual filesystem to trick the DITA OT into working, I'd love to hear those too. Thanks!
Edit: I forgot to mention in the original post that I'm looking for an open-source solution, as this is for a project released under the Educational Community License.
After some evaluation, the newest version of the XMLMind Dita Converter (ditac) is really up to the job. Performance is at least double that of the Open Toolkit for building identical projects: http://www.xmlmind.com/ditac/
One thing to note about XMLMind Dita Converter (ditac) is that it's released under the Mozilla Public License, which according to http://www.gnu.org/licenses/license-list.html#GPLIncompatibleLicenses is not compatible with GPL.
Look at Arbortext (specifically Arbortext Content Manager). Arbortext supports xinclude, catalog files, and it also has a production-ready PDF and digital media publishing tool that you don't get with the OTK. The OTK isn't really meant to be for production.
Yes, I'm a vendor (now), but I started as an implementer more than a decade ago. I answer a lot of community questions and sponsor two dozen resources for getting people's questions answered. The best of which is the SF Bay PTC Arbortext User Group (Virtual).
Are you looking to do something like what Juniper is doing? (I can only post one link, so it's going to be mine..) go to juniper dot net, choose support, technical documentation, ex-series platforms, any of the ex series docs. They're showing topics on the web directly (it's also inside the source code on the router and in the pdf books). It would help if I understood what you're trying to do.
Feel free to reach out to me offline.
This new set of DITA XProc pipelines on the EMC Developer Network might be worth looking into. It can be downloaded free for development (and there's an XProc engine there as well).
This package appeared at the end of October 2010.
Quote: "The aim of the project is to provide an alternative to the DITA Open Toolkit (DITA-OT) that does not rely on file system-based processing, has no direct dependency on Java and Ant, and makes use of the XML processing capabilities of XProc to offer greater flexibility, extensibility, portability, and ultimately also better performance. The pipelines use standard XProc features as much as possible, so with little or no effort, users should be able to use them with any compliant XProc implementation. The pipelines have been tested with EMC Documentum XProc Engine (Calumet) version 1.0.12."
My coworker just talk to me about DITA Compiler. Apparently it's part of xml mind.
According to him, the implementation isn't quite complete.
Maybe DITA2Go can help:
http://www.dita2go.com/
DITA2Go allows your files to be anyplace you please, as you requested. It also has numerous extensions beyond what the OT provides, such as scoped keydefs and ditavals, which are under consideration for DITA 1.3. It was created with intense collaboration of two TC members working on major live projects, and is used by hundreds of people currently.
It is also about ten times as fast as the OT, thanks to C++, and requires no programming skills at all to use.
It is free, but it is not Open Source. It is fully supported and the developers fix bugs immediately and often add new features in a day or two on request. It shares a large part of its code with a commercial product, Mif2Go, which is the tool used by about 25% of FrameMaker users who are moving to DITA, according to a recent survey.
I don't see a requirement for the tools used to create a freely-licensed document to be Open Source themselves. There are absolutely no restrictions on use of the output, which obviously belongs to the user, not to Omni Systems.
HTH!

PDFLib opinions/experiences

My organization is considering PDFlib for dynamically creating PDF files (http://www.pdflib.com/) in our Java (Spring/Tomcat) environment.
Does anyone have experiences that they can share about the pro/cons of this Library?
We've been using PDFlib for a few years but we switched to DynaPDF recently (we are not using Java but C++). There never were any issues with the PDFlib - it always worked stable and reliable (and we really used all features including spot colors and importing of other PDFs).
It contains very good documentation and their support is fine, too.
Unfortunately, depending on what features of PDFlib you need, it is very expensive. We requested a 3-platform license without royalties (the PDI-enabled version), and were offered a licence for around 20,000 €. This is a bit expensive for a small company like ours.
So eventually we moved on to DynaPDF, which is less expensive and creates PDF files just as reliable. We got a license including source code for about €1000. I'm not sure if they provide Java wrappers, though.
Also this question might be interesting for you.
Hope that helps.
Iv been using pdfLib for about 3 years now and its been great for me. i guess it really depends on what you want to use it for but for me its been really good. I do a lot of file maniuplations and so far its been able to do everything i need very well. Support can be better but overall its not too bad but the software itself is great.

Categories

Resources