ITextRenderer: Adjust page height to content - java

I'm using ITextRenderer to generate a PDF from HTML and what I need to do is a cash register receipt.
This receipt has dynamic width and, of course, dynamic content. This said, the height of content will always be different and right now I'm struggling to find a way of adjusting the height of the PDF page to the content.
If it's too big the receipt has a long white section in the end and if it's to short the PDF get's paginated and I need it to be in one page only.
I'm using #page {size: Wpx Hpx;} to set the page size, but it's almost impossible (would be very painful) to calculate the content height based on width and data.
This is the code that generates the PDF:
ITextRenderer renderer = new ITextRenderer();
byte[] bytes = htmlDocumentString.toString().getBytes("UTF-8");
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource(bais);
Document doc = builder.parse(is);
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(outputStream);
outputStream.flush();
outputStream.close();
I've also tried renderer.getSharedContext().setPrint(false);, but this throws a NPE.
Also #page {-fs-page-sequence: "none";} without any luck.

The solution I found is not even close to perfect, but works!
#page {
size: Wpx 1px;
}
* {
page-break-inside: always;
}
This will generate 1px pages for the entire content. Then I just have to tell the printer to print all the pages with 0px margin between pages.
Why this solution is not perfect? The file size goes from 1 or 2KB to 200KB.. not very good, when streaming through 3G.

Related

Convert html to pdf in landscape mode using iText

I'm trying to convert html to pdf using iText.
Here is the simple code that is working fine :
ByteArrayOutputStream pdfStream = new ByteArrayOutputStream();
HtmlConverter.convertToPdf(htmlAsStringToConvert, pdfStream)
Now, I want to convert the pdf to LANDSCAPE mode, so I've tried :
ConverterProperties converterProperties = new ConverterProperties();
MediaDeviceDescription mediaDeviceDescription = new MediaDeviceDescription(MediaType.SCREEN);
mediaDeviceDescription.setOrientation(LANDSCAPE);
converterProperties.setMediaDeviceDescription(mediaDeviceDescription);
HtmlConverter.convertToPdf(htmlAsStringToConvert, pdfStream, converterProperties);
and also :
PdfDocument pdfDoc = new PdfDocument(writer);
pdfDoc.setDefaultPageSize(PageSize.A4.rotate());
HtmlConverter.convertToPdf(htmlAsStringToConvert, pdfDoc, new ConverterProperties()).
I've also mixed both, but the result remains the same, the final PDF is still in default mode.
The best way to achieve landscape page size when converting HTML to PDF is to provide the corresponding CSS instruction for the page to become landscape.
This is done with the following CSS:
#page {
size: landscape;
}
Now, if you have your input HTML document in htmlAsStringToConvert variable then you can process it as an HTML using Jsoup library which iText embeds. Basically we are just adding the necessary CSS instruction into our <head>:
Document htmlDoc = Jsoup.parse(htmlAsStringToConvert);
htmlDoc.head().append("<style>" +
"#page { size: landscape; } "
+ "</style>");
HtmlConverter.convertToPdf(htmlDoc.outerHtml(), new FileOutputStream(outPdf));
Beware that if you already have #page declarations in your HTML then the one you append might be in conflict with the ones you already have - in this case, you need to make sure you insert your declaration as the latest one (this should be the case with the code above as long as all of your declaration are in <head> element).

PDFRenderer creates images of greater size than the PDF itself

I have a sample of scanned PDFs that I need to edit and re-export. I use PDFBox to render the PDF into a series of images (one image per page), I perform some OpenCV calculations on the rasterized jpegs and then I intend to insert them back into a new pdf file.
Example: PDF is 423kb, Page 1 is 313kb, Page 2 is 287kb, Page 3 is 319kb, Page 4 is 485kb, and Page 5 is 470kb.
Problem is that the output images are greater in size than the PDF itself. This results in my OCR efforts taking much longer than is acceptable (5 minutes vs 30 seconds per document). The only way to keep the jpegs from inflating in size is to leave them with a default DPI of 72. This produces poor quality images that cannot be used.
Why is this happening? I should be able to get back images that have a size less than or equal to the PDF in question (without sacrificing quality). I'm not doing anything weird to the images, just removing watermarks.
Here's some code illustrating how I'm extracting the jpegs from the PDF.
File file = new File(fileName);
PDDocument document = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(document);
BufferedImage[] pageArray = new BufferedImage[document.getNumberOfPages()];
int pageCounter = 0;
for(PDPage page : document.getPages()) {
pageArray[pageCounter] = renderer.renderImageWithDPI(pageCounter, 160);
pageCounter++;
}

extract thumbnail from 3d pdf using itextpdf

When I view a 3D pdf (aka PDF/E) with Adobe Acrobat Reader, it shows a thumbnail on the left side:
Is it possible to extract this thumbnail from the pdf using itext or is it generated on the fly by the viewer?
This is possible, though from what I am seeing I doubt your PDF has a specific thumbnail image and just renders the page in the thumbnail.
First, let's create a PDF that has a thumbnail according to the PDF specification since I couldn't find one. Section 12.3.4 of ISO-3200-2 (the PDF specification) states the following:
The thumbnail image for a page shall be an image XObject specified by the Thumb entry in the page object...
This can be easily created using iText like so:
PdfWriter writer = new PdfWriter(OUTPUT_FILE);
PdfDocument pdfDocument = new PdfDocument(writer);
Document document = new Document(pdfDocument);
document.add(new Paragraph("Hello world"));
PdfImageXObject thumbnail = new PdfImageXObject(ImageDataFactory.create(getInput("itext.png")));
pdfDocument.getFirstPage().getPdfObject().put(PdfName.Thumb, thumbnail.getPdfObject());
document.close();
Where getInput("itext.png") resolves to a full path of our image:
This gives us output.pdf
You'll note that neither Acrobat nor Reader display the thumbnail image- they simply render the page. Other readers do use our new thumbnail:
Since you are using reader I would think this means the thumbnail in your PDF is simply the rendered page since thumbnails appear to be ignored.
To answer your question, getting the thumbnail is simply the reverse of the operation above- we get the Page's dictionary and look for a /Thumb entry
PdfReader reader = new PdfReader(OUTPUT_FILE);
PdfDocument pdfDocument = new PdfDocument(reader);
PdfStream thumbnailStream = pdfDocument.getFirstPage().getPdfObject().getAsStream(PdfName.Thumb);
if (thumbnailStream != null) {
PdfImageXObject thumbnail = new PdfImageXObject(thumbnailStream);
BufferedImage image = thumbnail.getBufferedImage();
//Output to file, memory, etc
}

Changing page zoom of an existing pdf with PDFBox

I have a pdf which I'm iterating through using PDFBox as below:
PDDocument doc = PDDocument.load(new ByteArrayInputStream(bytearray));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
for(PDPage page : catalog.getPages()){
...
}
I want to set the default magnification for the pages so that when it is opened through a pdf reader, it opens at 75% zoom by default. Is this possible? I've seen few posts where the zoom is set using PDPageXYZDestination, but I'm not sure whether that is applicable in my case.
Thanks.
Do this, it applies to the first seen page only, i.e. when opening:
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDPage page = doc.getPage(0); // zero-based; you can also put another number to jump to a specific existing page
PDPageXYZDestination dest = new PDPageXYZDestination();
dest.setPage(page);
dest.setZoom(0.75f);
dest.setLeft((int) page.getCropBox().getLowerLeftX());
dest.setTop((int) page.getCropBox().getUpperRightY());
PDActionGoTo action = new PDActionGoTo();
action.setDestination(dest);
catalog.setActions(null);
catalog.setOpenAction(action);
doc.save(...);

Filling landscape PDF with PDFBox

I try to fill a PDF form with PDFBox and I managed to do it well with a portrait oriented document. But I have a problem when filling a document in landscape mode. The fields are filled up, but the text orientation is not good. It appear vertically like if it was still in portrait but in a rotation of 90 degrees.
Here is my simplified code:
PDDocument pdfDoc = PDDocument.load(MY_FILE);
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
acroForm.getField("aAddressLine1").setValue("ADDRESS1_HERE");
acroForm.getField("aAddressLine2").setValue("ADDRESS1_HERE");
acroForm.getField("country").setValue("COUNTRY_HERE");
pdfDoc.save(PATH_HERE);
pdfDoc.close();
Did you manage to fill a PDF document in landscape mode?
Thanks for your help.
The short answer
I'm afraid PDFBox does not yet (as of version 1.8.2) allow you to fill in landscape PDFs like the one you provided because it does not seem to query and factor in informations about the page the form field is located on.
The long answer
There are different ways you can define a page to be A4 landscape:
You can define it to have the A4 landscape dimensions directly by means of a media box definition:
/MediaBox [0, 0, 842, 595]
In this case the coordinates of your aAddressLine1 would be
/Rect[23.1711 86.8914 292.121 100.132]
or you can define it to have the A4 portrait dimensions and being rotated by 90° (or 270° obviously):
/MediaBox [0, 0, 595, 842]
/Rotate 90
In this case the coordinates of your aAddressLine1 are
/Rect[86.8914 23.1711 100.132 292.121]
Your example document uses the latter method.
Now PDFBox, when creating an appearance stream for that field, only looks at the rectangle defining the field but ignores the properties of the page. Thus, PDFBox sees a very narrow and very high textfield and fills it in just like that. It is completely unaware that the result will be rotated in a PDF viewer.
What it should have done is to also look at the page the field is located on. If that page has a /Rotate entry, it should create an appearance stream for the field which displays the text rotated in the opposite direction.
Alternatives
In a comment you also asked
Do you know another library I could use if PDFBox can't do what I want?
I have tested the feat with iText 5.4.2:
PdfReader reader = new PdfReader(MY_FILE);
OutputStream os = new FileOutputStream(PATH_HERE);
PdfStamper stamper = new PdfStamper(reader, os);
AcroFields acroFields = stamper.getAcroFields();
acroFields.setField("aAddressLine1", "ADDRESS1_HERE");
acroFields.setField("aAddressLine2", "ADDRESS1_HERE");
stamper.close();
(The free iText version is licensed under the AGPL; you have to decide whether that's ok for your project. There is a commercial license, too, if it's not ok.)
I'm sure other PDF libraries also can do that, it's not too exotic a feature after all...
But I also tested PDF Clown 0.1.3 (trunk version), which did not work either:
File file = new File(MY_FILE);
Document document = file.getDocument();
Form form = document.getForm();
form.getFields().get("aAddressLine1").setValue("ADDRESS1_HERE");
form.getFields().get("aAddressLine2").setValue("ADDRESS1_HERE");
file.save(new java.io.File(PATH_HERE), SerializationModeEnum.Incremental);
file.close();

Categories

Resources