PDFBox renderImage produces incorrect image dimensions at specified scale - java

I am using the very useful PDFBox to build a simple pdf stamping GUI.
I noticed a serious issue with a particular document however.
When I specify a particular scale factor for the rendering, the expected output image size is different.
What is worse? the scaling factor used for the resultant image along the horizontal axis is different from that along the vertical axis.
Here is the code I used:
/**
* #param pdfPath The path to the pdf document
* #param page The pdf page number(is zero based)
*/
public BufferedImage loadPdfImage(String pdfPath, int page) {
File file = new File(pdfPath);
try (PDDocument doc = PDDocument.load(file)) {
pageCount = doc.getNumberOfPages();
PDPage pDPage = doc.getPage(page);
float w = pDPage.getCropBox().getWidth();
float h = pDPage.getCropBox().getHeight();
System.out.println("Pdf opening: width: "+w+", height: "+h);
PDFRenderer renderer = new PDFRenderer(doc);
float dpiRatio = 1.5f;
BufferedImage img = renderer.renderImage(page, dpiRatio);
float dpiXRatio = img.getWidth() / w;
float dpiYRatio = img.getHeight()/ h;
System.out.println("dpiXRatio: "+dpiXRatio+", dpiYRatio: "+dpiYRatio);
return img;
} catch (IOException ex) {
System.out.println( "invalid pdf found. Please check");
}
return null;
}
The code above loads most pdf documents that I have tried it on and converts given pages within them to BufferedImage objects.
For the said document however, it seems to be unable to render the converted image at the supplied scale-factor.
Is there anything wrong with my code? or is it a known bug?
Thanks.
EDIT
I am using PDFBOX v2.0.15
And the page has no rotation.

The error was mine; for the most part.
I had used the MediaBox to compute the scale factors and unfortunately the MediaBox and CropBox of the pdf file in question were not the same.
For example:
cropbox-rect: [8.50394,34.0157,586.496,807.984]
mediabox-rect: [0.0,0.0,595.0,842.0]
After making corrections for these, the scale-factors matched better along both axes, save for the errors due to the fact that the image sizes are integer numbers.
This is negligible enough for me to neglect, though.
When stamping, all I had to do was to make the necessary corrections for the cropbox. For example to draw the image(stamp) at P(x,y), I would do:
x += cropBox.getLowerLeftX();
y += cropBox.getLowerLeftY();
before calling the draw image functionality.
It all came out fine!

Related

Why does PDFBox read the image width/height wrong? (always assumes "width" is the bigger one)

I'm using the PDFBox library (see here) to convert an image to PDF. The goal is to have a image scaled to a full A4 page in the PDF file. And it works well, except one thing:
The image height and width seem to be mixed up. The width is always assumed to be the bigger value of them both. I have 2 images: One has the dimensions (according to the Windows file details) 4032x2268 (landscape) and the other one 2268x4032 (portrait).
When i load the images in PDFBox, the width is always 4032 and the height 2268. The goal is to create a landscape PDF for one and a portrait PDF for the other one. This weird "bug" (?) causes the portrait image to convert to a landscape PDF which of course causes the image to be rotated (which is inconventient).
Here's the relevant part of my code:
public byte[] imageToPDF(MultipartFile file) throws IOException {
PDDocument pdf = new PDDocument();
PDImageXObject pdImage = PDImageXObject.createFromByteArray(pdf, file.getBytes(), file.getOriginalFilename());
// scale image to fit the full page
PDPage page;
int imageWidth;
int imageHeight;
if (pdImage.getWidth() > pdImage.getHeight()) {
// landscape pdf
float pageHeight = PDRectangle.A4.getWidth();
float pageWidth = PDRectangle.A4.getHeight();
page = new PDPage(new PDRectangle(pageWidth, pageHeight));
imageWidth = (int)pageWidth;
imageHeight = (int)(((double)imageWidth / (double)pdImage.getWidth()) * (double)pdImage.getHeight());
} else {
// portrait pdf
float pageHeight = PDRectangle.A4.getHeight();
float pageWidth = PDRectangle.A4.getWidth();
page = new PDPage(new PDRectangle(pageWidth, pageHeight));
imageHeight = (int)pageHeight;
imageWidth = (int)(((double)imageHeight / (double)pdImage.getHeight()) * (double)pdImage.getWidth());
}
...
}
pdImage.getWidth() is always greater than pdImage.getHeight(), no matter which of the two images I use. Does anyone have an idea?

Unable to extract values from PDF for specific coordinates using java apache pdfbox

My task is to extract text from PDF for a specific coordinates.
I have used Apache Pdfbox client for data extraction .
To get the x, y , height and width coordinates from the PDF i am using PDF X change tool which is in Millimeter. When i pass the value in the rectangle the values are not getting empty value.
public String getTextUsingPositionsUsingPdf(String pdfLocation, int pageNumber, double x, double y, double width,
double height) throws IOException {
String extractedText = "";
// PDDocument Creates an empty PDF document. You need to add at least
// one page for the document to be valid.
// Using load method we can load a PDF document
PDDocument document = null;
PDPage page = null;
try {
if (pdfLocation.endsWith(".pdf")) {
document = PDDocument.load(new File(pdfLocation));
int getDocumentPageCount = document.getNumberOfPages();
System.out.println(getDocumentPageCount);
// Get specific page. THe parameter is pageindex which starts with // 0. If we need to
// access the first page then // the pageIdex is 0 PDPage
if (getDocumentPageCount > 0) {
page = document.getPage(pageNumber + 1);
} else if (getDocumentPageCount == 0) {
page = document.getPage(0);
}
// To create a rectangle by passing the x axis, y axis, width and height
Rectangle2D rect = new Rectangle2D.Double(x, y, width, height);
String regionName = "region1";
// Strip the text from PDF using PDFTextStripper Area with the
// help of Rectangle and named need to given for the rectangle
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
stripper.addRegion(regionName, rect);
stripper.extractRegions(page);
System.out.println("Region is " + stripper.getTextForRegion("region1"));
extractedText = stripper.getTextForRegion("region1");
} else {
System.out.println("No data return");
}
} catch (IOException e) {
System.out.println("The file not found" + "");
} finally {
document.close();
}
// Return the extracted text and this can be used for assertion
return extractedText;
}
Please suggest whether my way is correct or not..
I have used this PDF tutorialspoint.com/uipath/uipath_tutorial.pdf.. Where i am trying to find the text "a part of contests" which is have x = 55.6 mm y = 168.8 width = 210.0 mm and height = 297.0. But i am getting empty value
I tested your method with those inputs:
System.out.println("Extracting like Venkatachalam Neelakantan from uipath_tutorial.pdf\n");
float MM_TO_UNITS = 1/(10*2.54f)*72;
String text = getTextUsingPositionsUsingPdf("src/test/resources/mkl/testarea/pdfbox2/extract/uipath_tutorial.pdf",
0, 55.6 * MM_TO_UNITS, 168.8 * MM_TO_UNITS, 210.0 * MM_TO_UNITS, 297.0 * MM_TO_UNITS);
System.out.printf("\n---\nResult:\n%s\n", text);
(ExtractText test testUiPathTutorial)
and got the result
part of contents of this e-book in any manner without written consent
te the contents of our website and tutorials as timely and as precisely as
, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
guarantee regarding the accuracy, timeliness or completeness of our
tents including this tutorial. If you discover any errors on our website or
ease notify us at contact#tutorialspoint.com
i
Assuming you actually were looking for "a part of contents", not "a part of contests", merely the 'a' is missing; probably when measuring you looked for the beginning of the visible letter drawing but the actual glyph origin is a bit before that. If you choose a slightly smaller x, e.g. 54.6 mm, you'll also get the 'a'.
It obviously is no surprise that you get more than "a part of contents", considering the width and height of your rectangle.
Should you wonder about the MM_TO_UNITS constant, have a look at this answer.

How to correctly copy pages from one PDF document to another using iTextPDF 7?

The following code is a simplified version of a method that receives a destination pdf document and a source pdf document and copies the pages from the source to the destination. The page copy has to be a little smaller (95%) of the original one since the page in the destination document will receive some extra text as header and footer.
try {
for(int pageIndex = 1; pageIndex<=pdfSource.getNumberOfPages(); ++pageIndex) {
PdfPage sourcePage = pdfSource.getPage(pageIndex);
Rectangle sourceRect = sourcePage.getPageSizeWithRotation();
PdfPage page = pdfDest.addNewPage(PageSize.A4);
// Transformation matrix
PdfCanvas canvas = new PdfCanvas(page);
AffineTransform transformationMatrix = AffineTransform.getScaleInstance(
(page.getPageSize().getWidth() / sourcePage.getWidth()) * 0.95,
(page.getPageSize().getHeight() / sourcePage.getHeight()) * 0.95);
canvas.concatMatrix(transformationMatrix);
try {
PdfFormXObject pageCopy = sourcePage.copyAsFormXObject(pdfDestino);
float x = (float)(page.getPageSize().getWidth()*0.05);
float y = (float)(page.getPageSize().getHeight()*0.05);
canvas.addXObject(pageCopy, x, y);
pageCopy.flush();
} catch(Exception e) {
// bla bla bla
}
// Reset tansformations
transformationMatrix = AffineTransform.getScaleInstance(
sourceRect.getWidth() / page.getPageSize().getWidth(),
sourceRect.getHeight() / page.getPageSize().getHeight()
);
canvas.concatMatrix(transformationMatrix);
canvas.setFillColorRgb(0.0f, 0.0f, 0.65f)
.setFontAndSize(PdfFontFactory.createFont(FontConstants.COURIER), 11);
// Adding some extra text
// blah blah blah
}
} finally {
pdfSource.close();
}
As a general rule, this method will be called many times, since many documents will be appended to the destination PDF document.
All works fine, except when we have a big list of source documents. In this context, big is above 100.
So, when we have a list of, for instance, 30 source documents, all of then are correctly processed and the destination document will have the pages of all the 30 documents.
When this list is longer than 100 documents, we have a OutOfMemory exception. We think there is a memory leak here, but we can't find it.
What are we missing?

PDFBOX unable to find number of pixels which contain a particular character

I am using PDFBOX for pdf creation. In pdfbox is there any function which will give the font size in pixels? For example letters A and a, will take different spaces for printing. Where obviosely A will take more pixels than a. How can I get find number of pixels supposed to take a character or a word?
First of all the concept of pixels is a bit vague.
Generally a document is of a certain size, e.g. inches/cm etc.
The javadocs for PDFBox shows that PDFont has a few methods to determine the width of a string or character.
Take a look at for example these pages:
getStringWidth(String text)
getWidth(int code)
getWidthFromFont(int code)
These units are in 1/1000 of an Em. Also see this page.
For a full example:
float fontSize = 12;
String text = "a";
PDRectangle pageSize = PDRectangle.A4;
PDFont font = PDType1Font.HELVETICA_BOLD;
PDDocument doc = new PDDocument();
PDPage page = new PDPage(pageSize);
doc.addPage(page);
PDPageContentStream stream = new PDPageContentStream(doc,page);
stream.setFont( font, fontSize );
// charWidth is in points multiplied by 1000.
double charWidth = font.getStringWidth(text);
charWidth *= fontSize; // adjust for font-size.
stream.beginText();
stream.moveTextPositionByAmount(0,10);
float widthLeft = pageSize.getWidth();
widthLeft *= 1000.0; //due to charWidth being x1000.
while(widthLeft > charWidth){
stream.showText(text);
widthLeft -= charWidth;
}
stream.close();
// Save the results and ensure that the document is properly closed:
doc.save( "example.pdf");
doc.close();

Add page numbers to Merged PDF with different Pages sizes using IText API

I am trying to add Page numbers to merged PDF files using Itext on top right corner of the pages, but my pdf content size is different, after merging the PDF's while trying to print the page sizes i am getting approximately same sizes(height and width) on each page, but i am not able see page numbers, because of content size difference. please see below code and pdf attachements which am using for merging PDFs and adding page numbers.
public class PageNumber {
public static void main(String[] args) {
PageNumber number = new PageNumber();
try {
String DOC_ONE_PATH = "C:/Users/Admin/Downloads/codedetailsforartwork/elebill.pdf";
String DOC_TWO_PATH = "C:/Users/Admin/Downloads/codedetailsforartwork/PP-P0109916.pdf";
String DOC_THREE_PATH = "C:/Users/Admin/Downloads/codedetailsforartwork/result.pdf";
String[] files = { DOC_ONE_PATH, DOC_TWO_PATH };
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(DOC_THREE_PATH));
document.open();
PdfReader reader;
int n;
for (int i = 0; i < files.length; i++) {
reader = new PdfReader(files[i]);
n = reader.getNumberOfPages();
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(reader, ++page));
}
copy.freeReader(reader);
reader.close();
}
// step 5
document.close();
number.manipulatePdf(
"C:/Users/Admin/Downloads/codedetailsforartwork/result.pdf",
"C:/Users/Admin/Downloads/codedetailsforartwork/PP-P0109916_1.pdf");
} catch (IOException | DocumentException | APIException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void manipulatePdf(String src, String dest)
throws IOException, DocumentException, APIException {
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
PdfContentByte pagecontent;
for (int i = 0; i < n;) {
pagecontent = stamper.getOverContent(++i);
System.out.println(i);
com.itextpdf.text.Rectangle pageSize = reader.getPageSize(i);
pageSize.normalize();
float height = pageSize.getHeight();
float width = pageSize.getWidth();
System.out.println(width + " " + height);
ColumnText.showTextAligned(pagecontent, Element.ALIGN_CENTER,
new Phrase(String.format("page %d of %d", i, n)),
width - 200, height-85, 0);
}
stamper.close();
reader.close();
}
}
PDF files Zip
#Bruno's answer explains and/or references answer with explanations for all relevant facts on the issue at hand.
In a nutshell, the two issues of the OP's code are:
he uses reader.getPageSize(i); while this indeed returns the page size, PDF viewers do not display the whole page size but merely the crop box on it. Thus, the OP should use reader.getCropBox(i) instead. According to the PDF specification, "the crop box defines the region to which the contents of the page shall be clipped (cropped) when displayed or printed. ... The default value is the page’s media box."
he uses pageSize.getWidth() and pageSize.getHeight() to determine the upper right corner but should use pageSize.getRight() and pageSize.getTop() instead. The boxes defining the PDF coordinate system may not have the origin in their lower left corner.
I don't understand why you are defining the position of the page number like this:
com.itextpdf.text.Rectangle pageSize = reader.getPageSize(i);
pageSize.normalize();
float height = pageSize.getHeight();
float width = pageSize.getWidth();
where you use
x = width - 200;
y = height - 85;
How does that make sense?
If you have an A4 page in portrait with (0,0) as the coordinate of the lower-left corner, the page number will be added at position x = 395; y = 757. However, (0,0) isn't always the coordinate of the lower-left corner, so the first A4 page with the origin at another position will already put the page number at another position. If the page size is different, the page number will move to other places.
It's as if you're totally unaware of previously answered questions such as How should I interpret the coordinates of a rectangle in PDF? and Where is the Origin (x,y) of a PDF page?
I know, I know, finding these specific answers on StackOverflow is hard, but I've spent many weeks organizing the best iText questions on StackOverflow on the official web site. See for instance: How should I interpret the coordinates of a rectangle in PDF? and Where is the origin (x,y) of a PDF page?
These Q&As are even available in a free ebook! If you take a moment to educate yourself by reading the documentation, you'll find the answer to the question How to position text relative to page? that was already answered on StackOverflow in 2013: How to position text relative to page using iText?
For instance, if you want to position your page number at the bottom and in the middle, you need to define your coordinates like this:
float x = pageSize.getBottom() + 10;
float y = pageSize.getLeft() + pageSize.getWidth() / 2;
ColumnText.showTextAligned(pagecontent, Element.ALIGN_CENTER,
new Phrase(String.format("page %d of %d", i, n)), x, y, 0);
I hope this answer will inspire you to read the documentation. I've spent weeks of work on organizing that documentation and it's frustrating when I discover that people don't read it.

Categories

Resources