I'm looking for a simple (free) way to convert an arbitrary document to a PDF from within a program. There are any number of free PDF printers, but I need to be able to call the conversion within a program without human intervention. The program is being developed in Java, but will run exclusively in a Windows environment so calling an exe seems like a good solution if such a conversion program exists.
I have had some success with JodConverter, which is a Java-based wrapper around the OpenOffice.org API. Basically, you can run OpenOffice as a server and automate the action of opening a document in OpenOffice (which supports many many types) and saving it as PDF. JodConverter makes that a lot easier and has built-in support for running as a web service if you're interested in that.
Downsides: 1) Like OpenOffice itself, the conversion for certain complicated proprietary documents is not perfect; some of your Word documents may not look exactly identical as PDFs. 2) OpenOffice as a server is not entirely stable; if you hit it with a bunch of requests it will crash. One (somewhat expensive -- I think a few thousand dollars US) alternative is Sun's StarOffice Server, which does exactly the same thing as JodConverter (wrap OpenOffice) but adds pooling of OpenOffice instances and other stability support.
The most accurate PDF conversion tools are made by Adobe (and they do have server-based converters with API support), but they are very expensive - tens of thousands of dollars US.
simple... free... pdf... arbitrary input... At least the requirements are easy and reasonable.
Seriously, those requirements just aren't going to be met. If you are willing to pay money for a library that does some of this, you can check out Amyuni - It's a great library, but the type of stuff you are asking for is squarely in native win32 land - not something that's going to happen in Java. And even with that in place, it's not going to be simple.
I suppose you could do something with Ghostscript as well (many of the free PDF converters use it). But even then, you still have to deal with the conversion from arbitrary input issue.
There are other libraries available that can display lots of different file formats (even without the native application available) - perhaps something like that would work. Here's one (owned by Oracle now, so you know it's gotta be good ;-) : Outside In.
(BTW - iText is most definitely not going to do what you are asking about. I love iText, I use iText - heck, I'm a developer for part of iText - but it's most definitely not a PDF print driver, which is more in line with what you are going for).
for Java, the most recommended is iText
Related
I am working on software to store legal documents and I was thinking that PDF might be an ideal format to work in. However I am a little confused as to what would best suit my needs in this regard in the actual format of the PDF file.
I have the following requirements for the documents:
will be stored for a minimum of 7 years if not longer
not editable
contain both images and text (images will be in .jpg format ideally)
I was originally looking at using PDF/A-1 however I have discovered that this format does not seem to like using JPEG images, or at least it doesn't when using JODConverter.
Any suggestions/explanations as to which format would best meet these needs would be much appreciated!
For the requirements you described, PDF/A-1b (yes, b at the end!) is the ideal format. The b is for basic -- it has less strict requirements to meet than the PDF/A-1a (a at the end), which is for accessible (or advanced, as I mnemonic it).
If you have no difficulty implementing PDF/A-1a, you may as well go for it. However, depending on your source documents, PDF/A-1a may be extremely difficult and nearly impossible to generate (as it requires the additional tagging of the file's content for the accessibility features).
As for JPEG: of course PDF/A-1b supports JPEGs. It does not allow JPEG2000 compression to be used, because that algorithm was patent encumbered at the time of defining the PDF/A-1b standard. PDF/A-1b generating software therefor must re-compress objects using this type of compression with one of the other methods (which does not pose a big practical problem though.)
You may also want to look at the The PDF/A Competence Center (PDFA) website. (Disclosure: I'm a member of the PDFA.)
PDF/A-1 is a good format for long-term storage (as that's it's intention) and so it tries to remove external dependencies. This includes some things like embedding fonts and DISABLING external hyperlinks (which makes sense also, but can be a gotcha). Some useful info is on the Adobe site (look at the key-specifications tab). PDF sounds like the right answer to your requirements.
The images being embedded should not be a problem. JODReports perhaps is doing something wrong (or the version of OpenOffice/LibreOffice you are using underneath). You could try switching parts of that underlying infrastructure (OO/LO), try experimenting directly from OpenOffice/LibreOffice GUI - export PDF/A-1 and see what the results are or try some other tools in the chain (eg Docmosis though that is based on similar technology).
with the platform j2ee,data of datagrid is from java dao(from the database),so which way is better,to export the data of datagrid to Excel with java dao or flex?thx, and forget my poor English...
I would suggest using the server-side implementation approach. In the company I'm working we tried both (but okay it was .NET). By now we had so many problems with the export of documents (also PDF [isn't it ironic??]) in combination with Flex. I remember following problems:
you might need a more current version of your Flash plugin (>= 10.0) (could cause legacy problems)
there were problems specifying different data types like dates
in the end it took more man-days than intended
A server side implementation might be more effort (in the first step), but
Java is more powerful
it's faster
more configuration is possible
(garbage collection works a lot better)
your real qestion is where it have more support.
in flex the for me i found 2 solutions for this problem:
is using the action script library as3xls
http://code.google.com/p/as3xls/ - it export a real excel document but it have a cons
it from my expirent with him it support only english chars it's not export unicoded chars
is saving a feak xls file that contain an html table format yes it works!
I've also dealt with this issue with my flex app on whether to do the datagrid export on the client side or server side.
I had originally exported the datagrid on the client side using as3xls (as mentioned above). It worked well at first, but then started giving me real headaches once it's limitations were reached (aka not being able to export non ASCII characters). This is when I stopped using as3xls.
If you do not require the datagrid export have any specific formatting, another option would be to export the datagrid contents to a comma delimited string and save the string to a CSV file, all from within the client. This way you avoid issues with specific data types (dates), have complete control over the text exported, and write out to a really loose and flexible file format. And you save yourself having to code the infrastructure to round-trip the datagrid contents to your server and back in cases where the size and formatting of exported data may not really justify it.
If you do require formatting or the vastly superior power of java to handle your export, I would suggest using an MS Office API like Apache POI to give you the ability to format your data much better into standard XLS or XLSX documents. This ended up being the solution we went with (except we are using SmartXLS as our Excel API), for the greater ability to control exactly how the data exported was to be laid out and formatted, plus delivering XLS/XLSX files to clients is more professional, and it's easier to provide to those less computer savvy clients than is a CSV.
I would like to create a word document using a template, replace some variables (fields) and save it as a new word document.
I was thinking using Apache POI, http://poi.apache.org/ is it the best for this purpose?
can you share your impression from it?
I've worked with POI before and it's certainly able to generate Word documents. But the devil is in the details.
Word has thousands of features: You can put numbered lists starting at #13 with negative indents into two joined cells of a table included in another table that is itself part of a bullet list... you get the idea. When the POI documentation says they are a work in progress, that reflects what will probably be an eternal state of trying to catch up to the (to us, undocumented) specification of Word.
Documents with a reasonably "normal" set of used features are well supported by POI, whose interfaces and methods are reasonable and consistent but sometimes require a bit of work. But as Pascal says, documents with a not too exorbitant set of features are also supported by RTF.
I have almost no experience "doing" RTF but it's probably a bit simpler than working with POI.
If you're working in an environment or for a customer who insists that your produced documents be .DOC rather than .RTF, then POI is pretty much your only choice, unless you can introduce a step where you use a bit of Office automation to convert RTF into DOC.
Update: I've had a couple more ideas in the meantime.
Using POI or creating RTF documents is something that you could do on practically any platform. At my job, all servers doing processing like this happen to be running Linux, for example.
However, in the likely case that your programs will run under Windows, there is another alternative: Jacob http://www.land-of-kain.de/docs/jacob/
Jacob is a COM interface for Java; it essentially allows you to "remote control" Windows programs such as Word and Excel. The document I linked to above is not to Jacob's own site but to a single page with "cookie cutter" recipes for using Jacob. The project itself is on SourceForge: http://sourceforge.net/projects/jacob-project/ But people claim, and rightly so, that the documentation is a bit lacking.
Jacob has the advantage over all other solutions that you're dealing with the "real" Word and therefore all capabilities of Word are available to you. This would be an alternative if there are detail aspects of your document that just can't be handled with POI or via the RTF format.
This is obviously way too late, But since 2013 there is a much better, more flexible solution to word document creation.
http://www.docx4java.org/trac/docx4j
I have had much more luck with docx4j than I ever did with POI.
I'm not sure of the exact status of the Word documents support in POI but, according to the POI website, work is still in progress (can't say what this mean exactly). So, at this time, I would not use POI but rather try to generate a RTF document. For this, you could :
Use RTFTemplate which is a RTF to RTF Engine that can generate RTF document as the result of the merge of a RTF model and data.
Use iText which is primarly a PDF generator but can also generate RTF.
Build your own custom solution (but I wouldn't do that).
I'd go for iText.
If you use a template, and do not want to create the word document from scratch, for what I know, POI is a pretty good solution. You open the template and select the zones you want to replace.
They say POI is still is developpement, but I've been using it in production environnement and it works pretty good at the moment.
I know this question is a bit old, but I think many people still find this with search engines, so I post another possibility to do what you want right here:
If the one and only goal is to have a Word Template and to replace some values in it, you might consider saving a Word Template as single xml (not docx) and then processing it with simple Java and without any Framework. If you want to do more (e.g. create lists or tables) you might also consider understanding the xml format and writing your own helpers before loading a Framework like POI.
Here is an example on how to do that:
http://dev-notes.com/code.php?q=10
This is the fast version, if you want a nice version, you could try using an XML processor.
PS: users might notice that the file extension is not doc but xml and they may blame you for that, but that's ok... just rename it to doc, word will recognize the format and everyone is happy again ;)
You should look into the Aspose.Words components. They have recently begun providing a Java version of the component.
See the following link: Aspose.Word for Java
This supports Word automation, creation and advanced features such as mail merging without the need for an instance of Microsoft Word on the machine. The real benefits are that you are able to work within the context of an actual word document and not having to compromise by creating RTFs etc.
The Java version is not currently as fully featured as the .Net version but the main core functionality is there and they are pushing very hard to have a feature equivalent version soon.
Also, if you purchase the Java version you get a years free upgrades / support as the new releases are created.
If you are working with docx documents, docx4j is an option. Like POI, its open source.
I created and use this: http://code.google.com/p/java2word
My organization is considering PDFlib for dynamically creating PDF files (http://www.pdflib.com/) in our Java (Spring/Tomcat) environment.
Does anyone have experiences that they can share about the pro/cons of this Library?
We've been using PDFlib for a few years but we switched to DynaPDF recently (we are not using Java but C++). There never were any issues with the PDFlib - it always worked stable and reliable (and we really used all features including spot colors and importing of other PDFs).
It contains very good documentation and their support is fine, too.
Unfortunately, depending on what features of PDFlib you need, it is very expensive. We requested a 3-platform license without royalties (the PDI-enabled version), and were offered a licence for around 20,000 €. This is a bit expensive for a small company like ours.
So eventually we moved on to DynaPDF, which is less expensive and creates PDF files just as reliable. We got a license including source code for about €1000. I'm not sure if they provide Java wrappers, though.
Also this question might be interesting for you.
Hope that helps.
Iv been using pdfLib for about 3 years now and its been great for me. i guess it really depends on what you want to use it for but for me its been really good. I do a lot of file maniuplations and so far its been able to do everything i need very well. Support can be better but overall its not too bad but the software itself is great.
I read elsewhere (a response by "hazzen" here) that .NET has "a binding for the entire Office suite outlined here that allows you to write COM-based methods that you can call from Office. It is intended for automation, but you can write any managed code you want and have Excel call into it."
I'm interested in the same thing for Java. My present solution runs a standard Windows program that launches Java, whereupon any results are essentially sent to standard out and the intermediate program captures these and feeds the result back in to Excel - also using what amounts to std-out.
There has to be a better way!
Is there such a "binding" available for Java?
I'd also be pleased by any pointers to web articles or whathaveyou that teach about this kind of integration issue.
JCOB, j-Interop and J-Integra might do something like it. Ockham's Flashlight:
Java/COM, Java/Win32 Integration resources has more links.
Apache POI has java bindings to Excel which work quite nice. It also does Word/Outlook/PPT, but I recall Excel integration being its strongpoint.