Calculating the absolute position of a Field in a PDF form

Calculating the absolute position of a Field in a PDF form - java

I am trying to get the Absolute position of a PDF field and My code is as follows.
float[] _advisor = reader.getAcroFields().getFieldPositions("_advisor");
float[] _test = reader.getAcroFields().getFieldPositions("_test");
float[] _owner = reader.getAcroFields().getFieldPositions("_owner");
All the fields are vertically aligned same left position.
The problem is the first two fields are on the same page of the PDF and the value of xLeft is same but the Last field _owner is on the second page and the Value of xLeft is off by a big amount. Do i need to subtract an offset or something for pages in different page?

Some things to consider:
The default coordinate used by iText has its origin at the lower left corner of the page.
iText will return coordinates in points, rather than in pixels.
You can display a ruler and grid overlay using Adobe Reader, this enables you to easily gauge where each component is at. Check whether these readings are the same as the values iText provides you with.
If you still think iText is giving you the wrong values, please provide us access to your pdf, and provide us with the values you expect to receive (and why).

One possible issue could be that your mediabox has a different positioning than 0,0. I needed that once so I "normalized" the values like this:
PdfDictionary pageDict = reader.getPageN(pageNumber);
PdfArray mediaBox = (PdfArray)PdfReader.getPdfObject(pageDict.get(PdfName.MEDIABOX));
//check whether the mediabox has a different positioning than 0,0
if(((PdfNumber)mediaBox.getPdfObject(0)).floatValue()!=0){
//normalize X coordinates
lowerLeftX = lowerLeftX-(PdfNumber)mediaBox.getPdfObject(0)).floatValue();
upperRightX = upperRightX-((PdfNumber)mediaBox.getPdfObject(0)).floatValue();
}
if(((PdfNumber)mediaBox.getPdfObject(1)).floatValue()!=0){
//normalize Y coordinates
lowerLeftY = lowerLeftY-((PdfNumber)mediaBox.getPdfObject(1)).floatValue();
upperRightY = upperRightY-((PdfNumber)mediaBox.getPdfObject(1)).floatValue();
}

Related

Different height and width property for PDF File in PDFBox

For a certain PDF file if I use page.getMediaBox().getWidth() and page.getMediaBox().getHeight() to get width and height of PDF file page using PDFBox, if shows values which are different than the values I am getting using the PDFBoxDebugger. What might be the reason? I am attaching the screenshot for the PDFDebugger. I am using PDFBox-2.0.9 version. The values I am getting from page.getMediaBox().getWidth() and page.getMediaBox().getHeight() are 531.36597 and 647.99603 respectively which do not match with the PDFBoxDebugger values. (And it only occurs for the first page of PDF, for further pages it works fine)

As Tilman already stated in a comment, the values to expect are
a width of 1282.2 - 750.834 = 531.366 and
a height of 849.593 - 201.597 = 647.996 (corrected value).
The observed values
531.36597 and 647.99603
correspond to the expected values well enough considering the accuracy of the float type.
I assume the op misunderstands the values of the MediaBox array. They do not contain the width or height as explicit values but the coordinates of two opposite corners of the box.
The MediaBox value is specified to have the type rectangle, cf. ISO 32000-1 table 30 Entries in a page object. And a rectangle is specified as
a specific array object used to describe locations oon a page and bounding boxes for a variety of objects and written as an array of four numbers giving the coordinates of a pair of diagonally opposite corners,
cf. ISO 32000-1 section 4.40 rectangle.
As also already mentioned by Tilman you probably should be looking at the CropBox instead.

Image overlay on pdf using itext java

I am trying to overlay an image on PDF pages. When I try to do that using adobe acrobat and select vertical distance from top and left equal to 0, then image overlays correctly at the required location.
I am trying to achieve the same using iText API but can't seem to position the image at correct location on the pdf.
The values for position are trail and error. The size of the pdf is 612X792 and the size of the image is 1699.0x817.0 so I scaled the image to fit the pdf size.
The left of the image and pdf align correctly but the tops have issue. I tried with all the values and somehow 792/2+100 matches this but again will change in case I get a different pdf or image.
Somehow adobe reader is able to do that. Is there a way to align left and top in iText or any other library.
The pdf is existing pdf generated from some other source.
Updated source code
public void manipulatePdfNoTransparency(String inputFileName,
String outputfileName, String overlayFilePath,
int altPage) throws IOException, DocumentException {
System.out.println("outputfileName :"+outputfileName);
PdfReader reader = new PdfReader(inputFileName);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfileName));
stamper.setRotateContents(false);
// image watermark
Image img = Image.getInstance(overlayFilePath);
float yOffset=calculateYOffset(reader.getPageSize(1).getWidth(), reader.getPageSize(1)
.getHeight(),img.getWidth(),img.getHeight());
img.scaleToFit(reader.getPageSize(1).getWidth(), reader.getPageSize(1)
.getHeight());
Rectangle pagesize;
float x, y;
// loop over every page
//int i=1;
pagesize = reader.getPageSize(1);
x = (pagesize.getLeft() + pagesize.getRight()) / 2;
y = (pagesize.getTop() + pagesize.getBottom()) / 2;
img.setAbsolutePosition(0,yOffset);
for (int i = 1; i <= n; i = i + altPage) {
stamper.getUnderContent(i).addImage(img);
}
stamper.close();
reader.close();
System.out.println("File created at "+outputfileName);
}
public static float calculateYOffset(float pdfWidth,float pdfHeight, float originalImageWidth,float originalImageHeight) {
// size of image 1699.0x817.0
// size of pdf 612X792
//This means that the scaled image has a height of 817 * (612/1699) = ca. 294.3 PDF coordinate system units.
System.out.println("pdfWidth : "+pdfWidth+ " pdfHeight : "+pdfHeight+" originalImageWidth : "+originalImageWidth+" originalImageHeight : "+originalImageHeight);
float scaledImageHeight = originalImageHeight*pdfWidth / originalImageWidth;
//The image shall be positioned on the page by putting its top left corner onto the top left corner of the page.
//Thus, the x coordinate of its lower left corner is 0, and the y coordinate of its lower left corner is
//the y coordinate of the upper left corner of the page minus the height of the scaled image,
//i.e. ca. 792 - 294.3 = 497.7.
float yOffset = pdfHeight-scaledImageHeight;
System.out.println("yoffset : "+ yOffset);
return yOffset;
}

First let's take a look at this line:
img.scaleToFit(
reader.getPageSize(1).getWidth(),
reader.getPageSize(1).getHeight());
The scaleToFit() method resizes an images keeping the aspect ratio intact. You seem to overlook that, so let me give you an example of what it means to keep the aspect ratio intact.
Suppose that you have an image img400x600 that measures 400 x 600 user units, and you scale that image to fit a rectangle of 200 x 10,000 user units:
img400x600.scaleToFit(200, 10000);
What will be the size of image400x600? You seem to think that the size will be 200 x 10,000, but that assumption is incorrect. The new size of the image will be 200 x 300, because the aspect ratio is width = 0.66666 * height.
You complain that the size of the image doesn't equal the size of the page when you use scaleToFit(), but that is normal if the aspect ratio of the image is different from the aspect ratio of the page.
If you really want the image to have the same size of the page, you need to use the scaleAbsolute() method:
img.scaleAbsolute(
reader.getPageSize(1).getWidth(),
reader.getPageSize(1).getHeight());
However, this might result in really distorted images, because scaleAbsolute() doesn't respect the aspect ratio. For instance: I have seen developers who used scaleAbsolute() on portraits of people, and the result was that the picture of these people became ugly; either their head became extremely fat, or it became extremely thin depending on the different in aspect ratio.
Now let's take a look at this line:
img.setAbsolutePosition(0,y+100);
That is a very strange line. You are making the assumption that the x coordinate of the lower left corner is 0; I understand that y + 100 was obtained through trial and error.
Let's see what the official documentation has to say about defining the offset of objects added to an existing PDF document. There are several FAQ items on this subject:
Where is the origin (x,y) of a PDF page?
How to position text relative to page?
How should I interpret the coordinates of a rectangle in PDF?
...
You currently only look at the value of the /MediaBox (you obtain this value using the getPageSize() method), and you ignore the /CropBox (in case it is present).
If I were you, I'd do this:
Rectangle pageSize = reader.getCropBox(pageNumber);
if (pageSize == null)
pageSize = reader.getPageSize(pageNumber);
Once you have the pageSize, you need to add the take into account the offset when adding content. The origin might not coincide with the (0, 0) coordinate:
img.setAbsolutePosition(pageSize.getLeft(), pageSize.getBottom());
As you can see: there is no need for trial and error. All the values that you need can be calculated.
Update:
In the comments, #mkl clarifies that #Gagan wants the image to fit the height exactly. That is easy to achieve.
If the aspect ratio needs to be preserved, it's sufficient to scaleToFit the height like this:
img.scaleToFit(100000f, pageSize.getHeight());
In this case, the image won't be deformed, but part of the image will not be visible.
If the aspect ratio doesn't need to be preserved, the image can be scaled like this:
img.scaleAbsolute(pageSize.getWidth(), pageSize.getHeight());
If this still doesn't answer the question, I suggest that the OP clarifies what it is that is unclear about the math.

In a comment to the question I mentioned
somehow 792/2+100 matches this - actually that is off by about 1.7. You only need very simple math to calculate this.
and the OP responded
when you say it is off by 1.7 and simple math is required to calculate this. could you please let me know what math and how you arrived at 1.7.
This answer explains that math.
Assumed requirements
From the question and later comments by the OP I deduced these requirements:
An image shall be overlayed over a PDF page.
The image for this shall be scaled, keeping its aspect ratio. After scaling it shall completely fit onto the page and at least one dimension shall equal the corresponding page dimension.
It shall be positioned on the page by putting its top left corner onto the top left corner of the page, no rotation shall be applied.
The crop box of the PDF page coincides with the media box.
Calculation at hand
In the case at hand, the size of the pdf is 612X792 and the size of the image is 1699.0x817.0. Furthermore, the OP's comments imply that the bottom left corner of the page actually is the origin of the coordinate system.
To scale the horizontal extent of the image to exactly fit the page, one has to scale by 612/1699. To scale the vertical extent to exactly fit the page, one has to scale by 792/817. To make the whole image fit the page with aspect ration kept, one has to use the smaller scaling factor, 612/1699. This is what the OP's
img.scaleToFit(reader.getPageSize(1).getWidth(),reader.getPageSize(1).getHeight());
does, assuming the crop box coincides with the media box.
This means that the scaled image has a height of 817 * (612/1699) = ca. 294.3 PDF coordinate system units.
When positioning an image on a PDF page, you usually do that by giving the coordinates where the bottom left corner of the image shall go.
The image shall be positioned on the page by putting its top left corner onto the top left corner of the page. Thus, the x coordinate of its lower left corner is 0, and the y coordinate of its lower left corner is the y coordinate of the upper left corner of the page minus the height of the scaled image, i.e. ca. 792 - 294.3 = 497.7.
Thus, the scaled image shall be positioned at (0, 497.7).
The numbers the OP found by trial and error are 0 for x and middle height plus 100 for y. The middle height is (792 + 0)/2 = 396. Thus, he uses the coordinates (0, 496) which (see above) vertically are off by ca. 1.7.

Converting PDFBox coordinates to pixel coordinates of PDPage::convertToImage

I'm using PDFBox's PDPage::convertToImage to display PDF pages in Java. I'm trying to create click-able areas on the PDF page's image based on COSObjects in the page (namely, AcroForm fields). The problem is the PDF seems to use a completely different coordinate system:
System.out.println(field.getDictionary().getItem(COSName.RECT));
yields
COSArray{[COSFloat{149.04}, COSFloat{678.24}, COSInt{252}, COSFloat{697.68}]}
If I were to estimate the actual dimensions of the field's rectangle on the image, it would be 40,40,50,10 (x,y,width,height). There's no obvious correlation between the two and I can't seem to find any information about this with Google.
How can I determine the pixel position of a PDPage's COSObjects?

The pdf coordinate system is not that different from the coordinate system used in images. The only differences are:
the y-axis points up, not down
the scale is most likely different.
You can convert from pdf coordinates to image coordinates using these formulae:
x_image = x_pdf * width_image / width_page
y_image = (height_pdf - y_pdf) * height_image / height_pdf
To get the page size, simply use the mediabox size of the page that contains the annotation:
PDRectangle pageBounds = page.getMediaBox();
You may have missed the correlation between the array from the pdf and your image coordinate estimates, since a rectangle in pdf is represented as array [x_left, y_bottom, x_right, y_top].
Fortunately PDFBox provides classes that operate on a higher level than the cos structure. Use this to your advantage and use e.g. PDRectangle you get from the PDAnnotation using getRectangle() instead of accessing the COSArray you extract from the field's dictionary.

In PDFBox, how to change the origin (0,0) point of a PDRectangle object?

The Situation:
In PDFBox, PDRectangle objects' default origin (0,0) seems to be the lower-left corner of a page.
For example, the following code gives you a square at the lower-left corner of a page, and each side is 100 units long.
PDRectangle rectangle = new PDRectangle(0, 0, 100, 100);
The Question:
Is it possible to change the origin to the UPPER-LEFT corner, so that, for example, the code above will give you the same square at the UPPER-LEFT corner of the page instead?
The reason I ask:
I was using PDFTextStripper to get the coordinates of the text (by using the getX() and getY() methods of the extracted TextPosition objects). The coordinates retrieved from TextPosition objects seem have an origin (0,0) at the UPPER-LEFT CORNER. I want the coordinates of my PDRectangle objects have the same origin as the coordinates of my TextPosition objects.
I have tried to adjust the Y-coordinates of my PDRectangle by "page height minus Y-coordinate". This gives me the desired result, but it's not elegant. I want an elegant solution.
Note:
Someone has asked a similar question. The answer is what I tried, which is not the most elegant.
how to change the coordiantes of a text in a pdf page from lower left to upper left

You can change coordinate systems somewhat but most likely things won't get more elegant in the end.
To start with...
First of all let's clear up some misconception:
You assume
In PDFBox, PDRectangle objects' default origin (0,0) seems to be the lower-left corner of a page.
This is not true for all cases, merely often.
The area containing the displayed page area (on paper or on screen) usually is defined by the CropBox entry of the page in question:
CropBox rectangle (Optional; inheritable) A rectangle, expressed in default user space units, that shall define the visible region of default user space.
When the page is displayed or printed, its contents shall be clipped (cropped) to this rectangle and then shall be imposed on the output medium in some implementation-defined manner.
... The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the Rotate entry in the page dictionary).
... In PostScript, the origin of default user space always corresponds to the lower-left corner of the output medium. While this convention is common in PDF documents as well, it is not required; the page dictionary’s CropBox entry can specify any rectangle of default user space to be made visible on the medium.
Thus, the origin (0,0) can literally be anywhere, it may be at the lower left, at the upper left, in the middle of the page or even far outside the displayed page area.
And by means of the Rotate entry, that area can even be rotated (by 90°, 180°, or 270°).
Putting the origin (as you seem to have observed) in the lower left merely is done by convention.
Furthermore you seem to think that the coordinate system is constant. This also is not the case, there are operations by which you can transform the user space coordinate system drastically, you can translate, rotate, mirror, skew, and/or scale it!
Thus, even if at the beginning the coordinate system is the usual one, origin in lower left, x-axis going right, y-axis going up, it may be changed to something weird some way into the page content description. Drawing your rectangle new PDRectangle(0, 0, 100, 100) there might produce some rhomboid form just right of the page center.
What you can do...
As you see coordinates in PDF user space are a very dynamic matter. what you can do to tame the situation, depends on the context you use your rectangle in.
Unfortunately you were quite vague in the description of what you do. Thus, this will be somewhat vague, too.
Coordinates in the page content
If you want to draw some rectangle on an existing page, you first of all need a page content stream to write to, i.e. a PDPageContentStream instance, and it should be prepared in a manner guaranteeing that the original user space coordinate system has not been disturbed. You get such an instance by using the constructor with three boolean arguments setting all them to true:
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);
Then you can apply a transformation to the coordinate system. You want the top left to be the origin and the y-value increasing downwards. If the crop box of the page tells you the top left has coordinates (xtl, ytl), therefore, you apply
contentStream.concatenate2CTM(new AffineTransform(1, 0, 0, -1, xtl, ytl));
and from here on you have a coordinate system you wanted, origin top left and y coordinates mirrored.
Be aware of one thing, though: If you are going to draw text, too, not only the text insertion point y coordinate is mirrored but also the text itself unless you counteract that by adding an also mirroring text matrix! If you want to add much text, therefore, this may not be as elegant as you want.
Coordinates for annotations
If you don't want to use the rectangle in the content stream but instead for adding annotations, you are not subject to the transformations mentioned above but you can not make use of it, either.
Thus, in this context you have to take the crop box as it is and transform your rectangle accordingly.
Why PDFBox text extraction coordinates are as they are
Essentially for putting lines of text together in the right order and sorting the lines correctly, you don't want such a weird situation but instead a simple stable coordinate system. Some PDFBox developers chose the top-left-origin, y-increasing-downwards variant for that, and so the TextPosition coordinates have been normalized to that scheme.
In my opinion a better choice would have been to use the default user space coordinates for easier re-use of the coordinates. You might, therefore, want to try working with textPosition.getTextMatrix().getTranslateX(), textPosition.getTextMatrix().getTranslateY() for a TextPosition textPosition

The following seems to be the best way to "adjust" the TextPosition coordinates:
x_adjusted = x_original + page.findCropBox().getLowerLeftX();
y_adjusted = -y_original + page.findCropBox().getUpperRightY();
where page is the PDPage on which the TextPosition object is located

The accepted answer created some problems for me. Also, text being mirrored and adjusting for that just didn't seem like the right solution for me. So here's what I came up with and so far, this has worked pretty smoothly.
Solution (example available below):
Call the getAdjustedPoints(...) method with your original points as you are drawing on paper where x=0 and y=0 is top left corner.
This method will return float array (length 4) that can be used to draw rect
Array order is x, y, width and height. Just pass that addRect(...) method
private float[] getAdjustedPoints(PDPage page, float x, float y, float width, float height) {
float resizedWidth = getSizeFromInches(width);
float resizedHeight = getSizeFromInches(height);
return new float[] {
getAdjustedX(page, getSizeFromInches(x)),
getAdjustedY(page, getSizeFromInches(y)) - resizedHeight,
resizedWidth, resizedHeight
};
}
private float getSizeFromInches(float inches) {
// 72 is POINTS_PER_INCH - it's defined in the PDRectangle class
return inches * 72f;
}
private float getAdjustedX(PDPage page, float x) {
return x + page.getCropBox().getLowerLeftX();
}
private float getAdjustedY(PDPage page, float y) {
return -y + page.getCropBox().getUpperRightY();
}
Example:
private PDPage drawPage1(PDDocument document) {
PDPage page = new PDPage(PDRectangle.LETTER);
try {
// Gray Color Box
PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, false, false);
contentStream.setNonStrokingColor(Color.decode(MyColors.Gallery));
float [] p1 = getAdjustedPoints(page, 0f, 0f, 8.5f, 1f);
contentStream.addRect(p1[0], p1[1], p1[2], p1[3]);
contentStream.fill();
// Disco Color Box
contentStream.setNonStrokingColor(Color.decode(MyColors.Disco));
p1 = getAdjustedPoints(page, 4.5f, 1f, 4, 0.25f);
contentStream.addRect(p1[0], p1[1], p1[2], p1[3]);
contentStream.fill();
contentStream.close();
} catch (Exception e) { }
return page;
}
As you can see, I've drawn 2 rectangle boxes.
To draw this, I used the the following coordinates which assumes that x=0 and y=0 is top left.
Gray Color Box: x=0, y=0, w=8.5, h=1
Disco Color Box: x=4.5 y=1, w=4, h=0.25
Here's an image of my result.

Add the height of the PDF (Easiest Solution)

global positioning an image with itext

Anybody knows if there are any special coordinates in iText to global positioning an image at the bottom right of the document?
I'm not sure it exists...

First we need to know if you're talking about a Document that is created from scratch or about adding an Image to an existing document.
If you create a document from scratch, then the coordinate of the bottom-right depends on the page size you've used when creating the Document object. For instance: for an A4 page, the bottom right corner has x = 595 ; y = 0 (the measurements are done in 'user unit' which correspond with points by default). So if you want to position an image in the bottom right corner, you need to use img.setAbsolutePosition(595 - img.getScaledWidth(), 0); and then just use document.add(img); to add the image.
DISCLAIMER: if you use a page size that differs from the default, or if you define a CropBox, you'll need to adapt the coordinates accordingly.
If you want to add an image to an existing document, you need to inspect the page size, and you need to check if there's a CropBox. You need to calculate the offset of the image based on these values. Again you can use setAbsolutePosition(), but you need to add the image to a PdfContentByte object obtained using the getOverContent() or getUnderContent() method (assuming that you're using PdfStamper).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.