I want to create a PDF using pdfbox (https://pdfbox.apache.org/cookbook/documentcreation.html). However, pdfbox does not seem to provide dynamic text layout mechanisms like those a text editor like OpenOffice provides (automatic text flow using predefined text formattings like block format, centered text, line breaks etc.).
Is there any Java library that provides that functionality on top of pdfbox or separate from it? Or do you have any free code available?
I had the same problem, that's why I started PDFBox-Layout. It has support for simple word wrapping, text and paragraph alignment, pagination, vertical and column layout, and markup for easy bold/italic highlighting.
See the Wiki for more information. Maybe you will find it useful :-)
BlockFrame (on GitHub) is another layout framework for PDFBox, filling a different space to PDFBox-Layout. PDFBox-Layout seems oriented to text, but BlockFrame is designed for complex data structures. It's also designed with extensibility in mind.
I needed something to print crosswords I'd generated, and wound up coding a framework. If there's interest, I'll extend and maintain it. It should be possible to use BlockFrame to draw small, complex sections of larger PDF documents, as well as generate an entire PDF.
Feedback would be appreciated.
I had a similar problem in Ruby. I used Prawn in the past, which has a syntax similar to pdfbox. Lots of primitives.
I found it was a lot better to use a HTML+CSS to PDF solution. I'm already generating HTML and CSS, and it's easy to make print-specific CSS. Then I use either wkhtmltopdf or princexml to generate PDF. Both are command-line tools that run on a variety of platforms.
Related
Background:
I am not able to install a 3rd party library as iText or anything similiar. I have to write a PDF package myself.
I am looking for a resource covering and explaining how to embedd fonts and texts in the PDF file format. So far, I have a pretty good coverage about adding rectangles, ISO-8859-1 texts and n-hedrons using Path elements.
Now, the next step is supporting UTF-8 charset (or overall just different charsets) with different fonts. Reading the ISO-32000:2008 I cannot understand how to that (the overall document is very techy and I am still a junior developer).
I found PDFBox, but I am having a hard time understanding the overcomplicated principles and decisions made.
If anyone has a reference to simple code examples or cookboks how to handle texts properly in the PDF file format, I appreciate if you link it.
If the language matters:
I am using Java. But I am more looking for a generous text/ article covering the topic with examples in any language.
I have hundreds of rich PDFs that need to be generated from my application, they have images and colorful content. I was looking to build a framework which support a template and data model and can take care of rest, so adding anew pdfs would be just adding a new template. In the past i have used free-marker to generate HTML and that print HTML to PDFs, are there any better recent solution to solve this problem?
There are various things you could do:
generate xml data, apply xslt transformation to style it, and convert
the html document to a pdf
code a small class that converts whatever data format you have to a pdf document (you would need to do all the layout through code)
create a template (using whatever design program you want) pdf document, insert form fields, and have iText fill the form (several of our customers go for this approach)
Keep in mind that JasperReports uses a proprietary format. Whereas the approaches I suggested use only open and well-established formats.
Take a look at JasperReports.
I've written some Java code using the iText library to generate a PDF report, but specifying the layout seems very manual and takes a lot of time, re-running the code to test small adjustments.
Does anyone know of a report designer for PDFs which would work with Java? It doesn't have to be iText based, that's just what I'm using at the moment.
Yes, JasperReports. It has the iReport visual designer. Also, the API is pretty straightforward.
As far as I know, BIRT is an alternative to JasperReports.
Have in mind that both are complex reporting solutions that support exporting to a number of different formats.
You could look at Docmosis which uses Word or OpenOffice documents to provide the layout and formatting (that is, a template). It has a property you can set that will "watch" templates for changes and automatically process the changes in the background, so you can keep changing a template, then render it again without any code/compile changes.
I am using PDF documents for various purposes using iText library.
Its like one class per PDF document. In a way there are a lot of similarities among the classes and the same have been listed below:
The fields have (x,y) location
The field can be wrapped after some no. of words
A field can have a value which is a function of one or more parameters
Subsequent page of PDF has to kept same or different
I am thinking of doing this layout business through a XML file. Any thoughts or innovative ideas of solving this are welcome.
take a look at PDFBox Library which is now in the incubator of Apache
PDFBox is nice, Used it before and good good help from the developer. You might want to have a look at XSL:FO. It is an XML based formatting language that can output the result as PDF (and other formats) using Apache:FOP.
What about Prince? It's a FOP engine that uses CSS files as styling, and has a Java API. It's not free though (apart from the free Personal License)
Flying Saucer supports using XHTML/CSS to create PDFs.
Open source implementation will be preferred.
Obviously, it isn't an easy task, PDF formatting is much richer than HTML's one (plus you must extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial...).
I see in the sidebar of your question a similar question: Converting PDF to HTML with Python which points to a library (poppler, which is apparently written in C++, perhaps can be accessed with JNI/JNA) and to a related question which offers even more answers.
Only ones I know of have to be paid for.
BFO
JPedal
Try using PDFBox from the apache foundation.