Why is PDFBox PDFRenderer slow?

Why is PDFBox PDFRenderer slow? - java

I want to convert a PDF to a TIFF using PDFBox 2.x and the PDFRenderer Class.
But it runs very slowly compared to ghostscript.
Here's my sample code
public class SpeedTest
{
static long startTime = System.currentTimeMillis ();
public static void logTime (String msg)
{
long now = System.currentTimeMillis ();
System.out.println (String.format ("%.3f: %s", (now - startTime) / 1000.0, msg));
startTime = now;
}
public static void main (String[] args) throws Exception
{
//System.setProperty ("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
String pdfFileName = args[0];
String tiffFileName = args[1];
PDDocument document = PDDocument.load (new File (pdfFileName));
logTime (pdfFileName + " loaded.");
PDFRenderer pdfRenderer = new PDFRenderer (document);
logTime ("intitalized renderer.");
BufferedImage img = pdfRenderer.renderImageWithDPI (0, 600, ImageType.RGB);
logTime ("page rendered as image.");
ImageIO.write (img, "TIFF", new File (tiffFileName));
logTime ("image saved as TIFF.");
}
}
The output is as follows
0.521: sample.pdf loaded.
0.013: intitalized renderer.
2.910: page rendered as image.
2.005: image saved as TIFF.
As you can see, the call to pdfRenderer.renderImageWithDPI takes almost 3 secs (also ImageIO.write-call takes 2 secs, too).
When done the same using ghostscript the complete task finishes in 0.4secs.
time gs -dQUIET -dBATCH -dNOPAUSE -sstdout=/dev/null -sDEVICE=tifflzw -r600 -dFirstPage=1 -dLastPage=1 -sOutputFile=sample.tif sample.pdf
real 0m0.389s
user 0m0.340s
sys 0m0.048s
I've also already tried
System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
as I'm running Java 8 (1.8.0_161 to be precise) but that makes no difference.
Thanks for every idea,
regards
Thomas

Upgrade to JDK 1.8.0_191 which was released on Oct, 2018, or JDK 9.0.4.
From Pdfbox docs,
PDFBox and Java 8
Important notice when using PDFBox with Java 8
before 1.8.0_191 or Java 9 before 9.0.4
Due to the change of the java color management module towards
“LittleCMS”, users can experience slow performance in color
operations. A solution is to disable LittleCMS in favor of the old
KCMS (Kodak Color Management System) by:
Starting with -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
or Calling
System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")
Sources:
https://bugs.openjdk.java.net/browse/JDK-8041125

According to my experiments this slowness only occurs for the first rendered page of a document. If you render all pages of a multi-page document, then all pages after the first one render faster. The absolute speed of the rendering also depends very much on the size of the DPIs used.
Render 6 document pages at 600 DPI
4.903s: page 0 rendered as image.
4.205s: page 1 rendered as image.
3.946s: page 2 rendered as image.
3.866s: page 3 rendered as image.
3.761s: page 4 rendered as image.
3.633s: page 5 rendered as image.
Render 6 document pages at 300 DPI
3.241s: page 0 rendered as image.
1.308s: page 1 rendered as image.
1.155s: page 2 rendered as image.
1.156s: page 3 rendered as image.
1.109s: page 4 rendered as image.
1.083s: page 5 rendered as image.
Render 6 document pages at 150 DPI
2.507s: page 0 rendered as image.
0.555s: page 1 rendered as image.
0.386s: page 2 rendered as image.
0.373s: page 3 rendered as image.
0.410s: page 4 rendered as image.
0.361s: page 5 rendered as image.
Render 6 document pages at 72 DPI
2.455s: page 0 rendered as image.
0.333s: page 1 rendered as image.
0.213s: page 2 rendered as image.
0.190s: page 3 rendered as image.
0.175s: page 4 rendered as image.
0.171s: page 5 rendered as image.
I think the problem here is that the AWT graphics does all rendering in software and with a constant pixel fill rate the rendering time scales quadratically with the DPI value. The slowness of the first image is probably some initialization overhead. (But that's all a wild guess at the moment.)

Related

Java2DRendererBuilder BufferedImage From HTML Size Incorrect

I am using com.openhtmltopdf.java2d.Java2DRendererBuilder to create BufferedImage objects from html. The html may contain image objects as base64 text.
Java2DRendererBuilder builder = new Java2DRendererBuilder();
builder.withHtmlContent(html, null);
builder.useDefaultPageSize(210, 297, BaseRendererBuilder.PageSizeUnits.MM);
builder.useFastMode();
builder.toSinglePage(bufferedImagePageProcessor);
builder.runFirstPage();
int i = 0;
for (BufferedImage imagePage : imagePages) {
log.debug("image width: "+imagePage.getWidth());
log.debug("image height: "+imagePage.getHeight());
saveImageFile(imagePage, "normal"+System.currentTimeMillis());
}
The problem is that when the html contains an image with a large width, the resulting BufferedImage does not fit within the page. Instead, part of the image is not shown.
I realize I can change the default page size to a larger value to accommodate the larger image. However, I don't know what value to set for the width since I won't know beforehand if the html has any images or what their width would be. I am hoping to find a way to accommodate any (reasonable) width on an A4 size page automatically.
Image not fitting on page

PDFRenderer creates images of greater size than the PDF itself

I have a sample of scanned PDFs that I need to edit and re-export. I use PDFBox to render the PDF into a series of images (one image per page), I perform some OpenCV calculations on the rasterized jpegs and then I intend to insert them back into a new pdf file.
Example: PDF is 423kb, Page 1 is 313kb, Page 2 is 287kb, Page 3 is 319kb, Page 4 is 485kb, and Page 5 is 470kb.
Problem is that the output images are greater in size than the PDF itself. This results in my OCR efforts taking much longer than is acceptable (5 minutes vs 30 seconds per document). The only way to keep the jpegs from inflating in size is to leave them with a default DPI of 72. This produces poor quality images that cannot be used.
Why is this happening? I should be able to get back images that have a size less than or equal to the PDF in question (without sacrificing quality). I'm not doing anything weird to the images, just removing watermarks.
Here's some code illustrating how I'm extracting the jpegs from the PDF.
File file = new File(fileName);
PDDocument document = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(document);
BufferedImage[] pageArray = new BufferedImage[document.getNumberOfPages()];
int pageCounter = 0;
for(PDPage page : document.getPages()) {
pageArray[pageCounter] = renderer.renderImageWithDPI(pageCounter, 160);
pageCounter++;
}

Creating PDF using JAVA (Netbeans) with images and multi pages

I am developing a Java program with the following requirements:
The application will take 5 input fields and 3 images (browse and "attach" to the Java application).
Once the "form" is completed it will be submitted using a button called "submit".
Once submitted the JAVA application will create a PDF file with the 5 inputed text and the 3 attached images.
I should be able to control which goes to which page number.
How do I implement such a solution with iText?

The application will take 5 input fields and 3 images (browse and "attach" to the Java application).
Once the "form" is completed it will be submitted using a button called "submit".
These first two requirements are unclear; are they to be implemented in a Java GUI (AWT? Swing? FX?), in some independent web UI (Plain HTML? Vaadin?), or in some derived UI (Portlet? ...)?
But as the question title "Creating PDF using JAVA (Netbeans) with images and multi pages" focuses on the PDF creation, let's look at the third and fourth requirements.
Once submitted the JAVA application will create a PDF file with the 5 inputed text and the 3 attached images.
I should be able to control which goes to which page number.
Let's assume you already have those inputs in the variables
String text1, text2, text3, text4, text5;
byte[] image1, image2, image3;
The framework
With iText you now create the document like this:
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfWriter;
...
// where you want to create the PDF;
// use a FileOutputStream for creating the PDF in the file system
// use a ByteArrayOutputStream for creating the PDF in a byte[] in memory
OutputStream output = ...;
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, output);
document.open();
// Add content for the first page(s)
...
// Start e new page
document.newPage();
// Add content for the next page(s)
...
// Start a new page
document.newPage();
// etc etc
document.close();
Adding text
You can add text in one of the Add content for the ... page(s) sections using
import com.itextpdf.text.Paragraph;
...
document.add(new Paragraph(text1));
Adding an image
You can add an image in one of the Add content for the ... page(s) sections using
import com.itextpdf.text.Image;
...
document.add(Image.getInstance(image1));
Adding at a given position
Adding text or images as described above leaves the layout details to iText, and iText fills the page from top to bottom except some margins.
If you want to control the positioning of the content yourself (which also means you have to take care that the content parts do not overlap or are drawn outside the page area), you can do so like this:
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.Phrase;
...
PdfContentByte canvas = writer.getDirectContent();
Phrase phrase = new Phrase(text2);
ColumnText.showTextAligned(canvas, Element.ALIGN_LEFT, phrase, 200, 572, 0);
Image img = Image.getInstance(image2);
img.setAbsolutePosition(200, 200);
canvas.addImage(img);
And there are many more options how to manipulate your content, e.g. choosing a font, choosing text sizes, scaling images, rotating content, ..., simply have a look at the iText samples from the book iText in Action - Second Edition.

You can use XSL-FO. A basic example here. After this, you can search and add other options for your PDF.

Filling landscape PDF with PDFBox

I try to fill a PDF form with PDFBox and I managed to do it well with a portrait oriented document. But I have a problem when filling a document in landscape mode. The fields are filled up, but the text orientation is not good. It appear vertically like if it was still in portrait but in a rotation of 90 degrees.
Here is my simplified code:
PDDocument pdfDoc = PDDocument.load(MY_FILE);
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
acroForm.getField("aAddressLine1").setValue("ADDRESS1_HERE");
acroForm.getField("aAddressLine2").setValue("ADDRESS1_HERE");
acroForm.getField("country").setValue("COUNTRY_HERE");
pdfDoc.save(PATH_HERE);
pdfDoc.close();
Did you manage to fill a PDF document in landscape mode?
Thanks for your help.

The short answer
I'm afraid PDFBox does not yet (as of version 1.8.2) allow you to fill in landscape PDFs like the one you provided because it does not seem to query and factor in informations about the page the form field is located on.
The long answer
There are different ways you can define a page to be A4 landscape:
You can define it to have the A4 landscape dimensions directly by means of a media box definition:
/MediaBox [0, 0, 842, 595]
In this case the coordinates of your aAddressLine1 would be
/Rect[23.1711 86.8914 292.121 100.132]
or you can define it to have the A4 portrait dimensions and being rotated by 90° (or 270° obviously):
/MediaBox [0, 0, 595, 842]
/Rotate 90
In this case the coordinates of your aAddressLine1 are
/Rect[86.8914 23.1711 100.132 292.121]
Your example document uses the latter method.
Now PDFBox, when creating an appearance stream for that field, only looks at the rectangle defining the field but ignores the properties of the page. Thus, PDFBox sees a very narrow and very high textfield and fills it in just like that. It is completely unaware that the result will be rotated in a PDF viewer.
What it should have done is to also look at the page the field is located on. If that page has a /Rotate entry, it should create an appearance stream for the field which displays the text rotated in the opposite direction.
Alternatives
In a comment you also asked
Do you know another library I could use if PDFBox can't do what I want?
I have tested the feat with iText 5.4.2:
PdfReader reader = new PdfReader(MY_FILE);
OutputStream os = new FileOutputStream(PATH_HERE);
PdfStamper stamper = new PdfStamper(reader, os);
AcroFields acroFields = stamper.getAcroFields();
acroFields.setField("aAddressLine1", "ADDRESS1_HERE");
acroFields.setField("aAddressLine2", "ADDRESS1_HERE");
stamper.close();
(The free iText version is licensed under the AGPL; you have to decide whether that's ok for your project. There is a commercial license, too, if it's not ok.)
I'm sure other PDF libraries also can do that, it's not too exotic a feature after all...
But I also tested PDF Clown 0.1.3 (trunk version), which did not work either:
File file = new File(MY_FILE);
Document document = file.getDocument();
Form form = document.getForm();
form.getFields().get("aAddressLine1").setValue("ADDRESS1_HERE");
form.getFields().get("aAddressLine2").setValue("ADDRESS1_HERE");
file.save(new java.io.File(PATH_HERE), SerializationModeEnum.Incremental);
file.close();

Loading Java applet with height set to 100%

I have an webpage that displays a java applet. The applet is resized if the window is resized using JavaScript which works fine.
The width and height of the applet is set to 100%. When the applet is loading, an image is displayed
image = "preloader.gif"
Using IE 6/7 everything works fine. But in Firefox, the applet has a height of approximately 200 pixels. The width is correct at 100%. Therefore, the preloader image is cut in half. After the applet has loaded and the javascript resizes the page, width and height are set correctly.
If I change the HTML code and use fixed sizes for the applet, the object displays correctly during loading, but cannot be resized afterwards.
Is there any solution to this problem?
Thanks,
Daniel
ps I'm using the Object / embed Tag, but the problem is the same if I use the applet tag.

I would advise you to implement a CSS based solution and create the applet with 100% size of the containing div. I use this method, it's simple, solid and cross browser reliable. Is there any reason you can't do this?

Hard to say without looking at the page. One possibility is the applet loads before the javascript gets run. In which case you could try loading the applet using javascript (instead of coding it directly in the html).

I was getting the same kind of short-and-wide applet in Firefox and Opera.
Now I create the applet dynamically, and this allows me to calculate the size of its containing div depending on the viewport size. This leads me to believe that you would get what you want if you used a containing div with a size specified in points and not as 100%.
The code I use to create the applet
function initJavaView() {
...
var viewportHeight = window.innerHeight ? window.innerHeight :
$(window).height();
var height = viewportHeight - appletArea.offsetTop - 8;
html = '<div style="width:100%;height:' + height + 'px;">'
if (!$.browser.msie /*&& !$.browser.mozilla*/){
html = html + '<object type="application/x-java-applet;version=1.5" ';
} else {
html = html +
'<object ' +
'classid = "clsid:8AD9C840-044E-11D1-B3E9-00805F499D93" '+
'codebase = "http://java.sun.com/update/1.5.0/jinstall-1_5-windows-i586.cab#Version=1,5,0,0" ';
}
html = html + ' width="100%" height="100%">' +
...
appletArea.innerHTML = html;
};
The code running at http://books.verg.es/elements_of_ux.html?format=java

I had the same problem, and I can't find an answer, but I got the solution.
You have to embed the applet code in a Javascript function, and fill the innerHtml of the body with it (or wherever you want to use the applet) shortly after the page is loaded. So...
//optional: add styles
function styleApplet(){
document.getElementsByTagName('body')[0].style.overflow = 'hidden';
}
//complementary funcion so applet code is readable
function documentWrite(chars){
buffer +=chars;
}
//funcion to add all code at once. It is compulsory
function executeWrite(){
document.getElementsByTagName('body')[0].innerHTML = buffer;
buffer = '';
}
function writeApplet(){
documentWrite('<applet code="..... </applet>');
documentWrite('..... ');
documentWrite('</applet>');
executeWrite();
}
$(document).ready(function(){
setTimeout('writeApplet()',100);
setTimeout('styleApplet()',100); //optional
}
Any adds to the answer are helpful :-)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.