PDFBOX unable to find number of pixels which contain a particular character

PDFBOX unable to find number of pixels which contain a particular character - java

I am using PDFBOX for pdf creation. In pdfbox is there any function which will give the font size in pixels? For example letters A and a, will take different spaces for printing. Where obviosely A will take more pixels than a. How can I get find number of pixels supposed to take a character or a word?

First of all the concept of pixels is a bit vague.
Generally a document is of a certain size, e.g. inches/cm etc.
The javadocs for PDFBox shows that PDFont has a few methods to determine the width of a string or character.
Take a look at for example these pages:
getStringWidth(String text)
getWidth(int code)
getWidthFromFont(int code)
These units are in 1/1000 of an Em. Also see this page.
For a full example:
float fontSize = 12;
String text = "a";
PDRectangle pageSize = PDRectangle.A4;
PDFont font = PDType1Font.HELVETICA_BOLD;
PDDocument doc = new PDDocument();
PDPage page = new PDPage(pageSize);
doc.addPage(page);
PDPageContentStream stream = new PDPageContentStream(doc,page);
stream.setFont( font, fontSize );
// charWidth is in points multiplied by 1000.
double charWidth = font.getStringWidth(text);
charWidth *= fontSize; // adjust for font-size.
stream.beginText();
stream.moveTextPositionByAmount(0,10);
float widthLeft = pageSize.getWidth();
widthLeft *= 1000.0; //due to charWidth being x1000.
while(widthLeft > charWidth){
stream.showText(text);
widthLeft -= charWidth;
}
stream.close();
// Save the results and ensure that the document is properly closed:
doc.save( "example.pdf");
doc.close();

Related

PDFBox renderImage produces incorrect image dimensions at specified scale

I am using the very useful PDFBox to build a simple pdf stamping GUI.
I noticed a serious issue with a particular document however.
When I specify a particular scale factor for the rendering, the expected output image size is different.
What is worse? the scaling factor used for the resultant image along the horizontal axis is different from that along the vertical axis.
Here is the code I used:
/**
* #param pdfPath The path to the pdf document
* #param page The pdf page number(is zero based)
*/
public BufferedImage loadPdfImage(String pdfPath, int page) {
File file = new File(pdfPath);
try (PDDocument doc = PDDocument.load(file)) {
pageCount = doc.getNumberOfPages();
PDPage pDPage = doc.getPage(page);
float w = pDPage.getCropBox().getWidth();
float h = pDPage.getCropBox().getHeight();
System.out.println("Pdf opening: width: "+w+", height: "+h);
PDFRenderer renderer = new PDFRenderer(doc);
float dpiRatio = 1.5f;
BufferedImage img = renderer.renderImage(page, dpiRatio);
float dpiXRatio = img.getWidth() / w;
float dpiYRatio = img.getHeight()/ h;
System.out.println("dpiXRatio: "+dpiXRatio+", dpiYRatio: "+dpiYRatio);
return img;
} catch (IOException ex) {
System.out.println( "invalid pdf found. Please check");
}
return null;
}
The code above loads most pdf documents that I have tried it on and converts given pages within them to BufferedImage objects.
For the said document however, it seems to be unable to render the converted image at the supplied scale-factor.
Is there anything wrong with my code? or is it a known bug?
Thanks.
EDIT
I am using PDFBOX v2.0.15
And the page has no rotation.

The error was mine; for the most part.
I had used the MediaBox to compute the scale factors and unfortunately the MediaBox and CropBox of the pdf file in question were not the same.
For example:
cropbox-rect: [8.50394,34.0157,586.496,807.984]
mediabox-rect: [0.0,0.0,595.0,842.0]
After making corrections for these, the scale-factors matched better along both axes, save for the errors due to the fact that the image sizes are integer numbers.
This is negligible enough for me to neglect, though.
When stamping, all I had to do was to make the necessary corrections for the cropbox. For example to draw the image(stamp) at P(x,y), I would do:
x += cropBox.getLowerLeftX();
y += cropBox.getLowerLeftY();
before calling the draw image functionality.
It all came out fine!

Unable to extract values from PDF for specific coordinates using java apache pdfbox

My task is to extract text from PDF for a specific coordinates.
I have used Apache Pdfbox client for data extraction .
To get the x, y , height and width coordinates from the PDF i am using PDF X change tool which is in Millimeter. When i pass the value in the rectangle the values are not getting empty value.
public String getTextUsingPositionsUsingPdf(String pdfLocation, int pageNumber, double x, double y, double width,
double height) throws IOException {
String extractedText = "";
// PDDocument Creates an empty PDF document. You need to add at least
// one page for the document to be valid.
// Using load method we can load a PDF document
PDDocument document = null;
PDPage page = null;
try {
if (pdfLocation.endsWith(".pdf")) {
document = PDDocument.load(new File(pdfLocation));
int getDocumentPageCount = document.getNumberOfPages();
System.out.println(getDocumentPageCount);
// Get specific page. THe parameter is pageindex which starts with // 0. If we need to
// access the first page then // the pageIdex is 0 PDPage
if (getDocumentPageCount > 0) {
page = document.getPage(pageNumber + 1);
} else if (getDocumentPageCount == 0) {
page = document.getPage(0);
}
// To create a rectangle by passing the x axis, y axis, width and height
Rectangle2D rect = new Rectangle2D.Double(x, y, width, height);
String regionName = "region1";
// Strip the text from PDF using PDFTextStripper Area with the
// help of Rectangle and named need to given for the rectangle
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
stripper.addRegion(regionName, rect);
stripper.extractRegions(page);
System.out.println("Region is " + stripper.getTextForRegion("region1"));
extractedText = stripper.getTextForRegion("region1");
} else {
System.out.println("No data return");
}
} catch (IOException e) {
System.out.println("The file not found" + "");
} finally {
document.close();
}
// Return the extracted text and this can be used for assertion
return extractedText;
}
Please suggest whether my way is correct or not..

I have used this PDF tutorialspoint.com/uipath/uipath_tutorial.pdf.. Where i am trying to find the text "a part of contests" which is have x = 55.6 mm y = 168.8 width = 210.0 mm and height = 297.0. But i am getting empty value
I tested your method with those inputs:
System.out.println("Extracting like Venkatachalam Neelakantan from uipath_tutorial.pdf\n");
float MM_TO_UNITS = 1/(10*2.54f)*72;
String text = getTextUsingPositionsUsingPdf("src/test/resources/mkl/testarea/pdfbox2/extract/uipath_tutorial.pdf",
0, 55.6 * MM_TO_UNITS, 168.8 * MM_TO_UNITS, 210.0 * MM_TO_UNITS, 297.0 * MM_TO_UNITS);
System.out.printf("\n---\nResult:\n%s\n", text);
(ExtractText test testUiPathTutorial)
and got the result
part of contents of this e-book in any manner without written consent
te the contents of our website and tutorials as timely and as precisely as
, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
guarantee regarding the accuracy, timeliness or completeness of our
tents including this tutorial. If you discover any errors on our website or
ease notify us at contact#tutorialspoint.com
i
Assuming you actually were looking for "a part of contents", not "a part of contests", merely the 'a' is missing; probably when measuring you looked for the beginning of the visible letter drawing but the actual glyph origin is a bit before that. If you choose a slightly smaller x, e.g. 54.6 mm, you'll also get the 'a'.
It obviously is no surprise that you get more than "a part of contents", considering the width and height of your rectangle.
Should you wonder about the MM_TO_UNITS constant, have a look at this answer.

create a one page PDF from two PDFs using PDFBOX

I have a small (quarter inch) one page PDF I created with PDFBOX with text (A). I want to put that small one page PDF (A) on the top of an existing PDF page (B), preserving the existing content of the PDF page (B). In the end, I will have a one page PDF, representing the small PDF on top(A), and the existing PDF intact making up the rest (B). How can I accomplish this with PDFBOX?

To join two pages one atop the other onto one target page, you can make use of the PDFBox LayerUtility for importing pages as form XObjects in a fashion similar to PDFBox SuperimposePage example, e.g. with this helper method:
void join(PDDocument target, PDDocument topSource, PDDocument bottomSource) throws IOException {
LayerUtility layerUtility = new LayerUtility(target);
PDFormXObject topForm = layerUtility.importPageAsForm(topSource, 0);
PDFormXObject bottomForm = layerUtility.importPageAsForm(bottomSource, 0);
float height = topForm.getBBox().getHeight() + bottomForm.getBBox().getHeight();
float width, topMargin, bottomMargin;
if (topForm.getBBox().getWidth() > bottomForm.getBBox().getWidth()) {
width = topForm.getBBox().getWidth();
topMargin = 0;
bottomMargin = (topForm.getBBox().getWidth() - bottomForm.getBBox().getWidth()) / 2;
} else {
width = bottomForm.getBBox().getWidth();
topMargin = (bottomForm.getBBox().getWidth() - topForm.getBBox().getWidth()) / 2;
bottomMargin = 0;
}
PDPage targetPage = new PDPage(new PDRectangle(width, height));
target.addPage(targetPage);
PDPageContentStream contentStream = new PDPageContentStream(target, targetPage);
if (bottomMargin != 0)
contentStream.transform(Matrix.getTranslateInstance(bottomMargin, 0));
contentStream.drawForm(bottomForm);
contentStream.transform(Matrix.getTranslateInstance(topMargin - bottomMargin, bottomForm.getBBox().getHeight()));
contentStream.drawForm(topForm);
contentStream.close();
}
(JoinPages method join)
You use it like this:
try ( PDDocument document = new PDDocument();
PDDocument top = ...;
PDDocument bottom = ...) {
join(document, top, bottom);
document.save("joinedPage.pdf");
}
(JoinPages test testJoinSmallAndBig)
The result looks like this:

Just as an additional point to #mkl's answer.
If anybody is looking to scale the PDFs before placing them on the page use,
contentStream.transform(Matrix.getScaleInstance(<scaling factor in x axis>, <scaling factor in y axis>)); //where 1 is the scaling factor if you want the page as the original size
This way you can rescale your PDFs.

How to watermark PDFs using text or images?

I have a bunch of PDF documents in a folder and I want to augment them with a watermark. What are my options from a Java serverside context?
Preferably the watermark will support transparency. Both vector and raster is desirable.

Please take a look at the TransparentWatermark2 example. It adds transparent text on each odd page and a transparent image on each even page of an existing PDF document.
This is how it's done:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
// text watermark
Font f = new Font(FontFamily.HELVETICA, 30);
Phrase p = new Phrase("My watermark (text)", f);
// image watermark
Image img = Image.getInstance(IMG);
float w = img.getScaledWidth();
float h = img.getScaledHeight();
// transparency
PdfGState gs1 = new PdfGState();
gs1.setFillOpacity(0.5f);
// properties
PdfContentByte over;
Rectangle pagesize;
float x, y;
// loop over every page
for (int i = 1; i <= n; i++) {
pagesize = reader.getPageSizeWithRotation(i);
x = (pagesize.getLeft() + pagesize.getRight()) / 2;
y = (pagesize.getTop() + pagesize.getBottom()) / 2;
over = stamper.getOverContent(i);
over.saveState();
over.setGState(gs1);
if (i % 2 == 1)
ColumnText.showTextAligned(over, Element.ALIGN_CENTER, p, x, y, 0);
else
over.addImage(img, w, 0, 0, h, x - (w / 2), y - (h / 2));
over.restoreState();
}
stamper.close();
reader.close();
}
As you can see, we create a Phrase object for the text and an Image object for the image. We also create a PdfGState object for the transparency. In our case, we go for 50% opacity (change the 0.5f into something else to experiment).
Once we have these objects, we loop over every page. We use the PdfReader object to get information about the existing document, for instance the dimensions of every page. We use the PdfStamper object when we want to stamp extra content on the existing document, for instance adding a watermark on top of each single page.
When changing the graphics state, it is always safe to perform a saveState() before you start and to restoreState() once you're finished. You code will probably also work if you don't do this, but believe me: it can save you plenty of debugging time if you adopt the discipline to do this as you can get really strange effects if the graphics state is out of balance.
We apply the transparency using the setGState() method and depending on whether the page is an odd page or an even page, we add the text (using ColumnText and an (x, y) coordinate calculated so that the text is added in the middle of each page) or the image (using the addImage() method and the appropriate parameters for the transformation matrix).
Once you've done this for every page in the document, you have to close() the stamper and the reader.
Caveat:
You'll notice that pages 3 and 4 are in landscape, yet there is a difference between those two pages that isn't visible to the naked eye. Page 3 is actually a page of which the size is defined as if it were a page in portrait, but it is rotated by 90 degrees. Page 4 is a page of which the size is defined in such a way that the width > the height.
This can have an impact on the way you add a watermark, but if you use getPageSizeWithRotation(), iText will adapt. This may not be what you want: maybe you want the watermark to be added differently.
Take a look at TransparentWatermark3:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.setRotateContents(false);
// text watermark
Font f = new Font(FontFamily.HELVETICA, 30);
Phrase p = new Phrase("My watermark (text)", f);
// image watermark
Image img = Image.getInstance(IMG);
float w = img.getScaledWidth();
float h = img.getScaledHeight();
// transparency
PdfGState gs1 = new PdfGState();
gs1.setFillOpacity(0.5f);
// properties
PdfContentByte over;
Rectangle pagesize;
float x, y;
// loop over every page
for (int i = 1; i <= n; i++) {
pagesize = reader.getPageSize(i);
x = (pagesize.getLeft() + pagesize.getRight()) / 2;
y = (pagesize.getTop() + pagesize.getBottom()) / 2;
over = stamper.getOverContent(i);
over.saveState();
over.setGState(gs1);
if (i % 2 == 1)
ColumnText.showTextAligned(over, Element.ALIGN_CENTER, p, x, y, 0);
else
over.addImage(img, w, 0, 0, h, x - (w / 2), y - (h / 2));
over.restoreState();
}
stamper.close();
reader.close();
}
In this case, we don't use getPageSizeWithRotation() but simply getPageSize(). We also tell the stamper not to compensate for the existing page rotation: stamper.setRotateContents(false);
Take a look at the difference in the resulting PDFs:
In the first screen shot (showing page 3 and 4 of the resulting PDF of TransparentWatermark2), the page to the left is actually a page in portrait rotated by 90 degrees. iText however, treats it as if it were a page in landscape just like the page to the right.
In the second screen shot (showing page 3 and 4 of the resulting PDF of TransparentWatermark3), the page to the left is a page in portrait rotated by 90 degrees and we add the watermark as if the page is in portrait. As a result, the watermark is also rotated by 90 degrees. This doesn't happen with the page to the right, because that page has a rotation of 0 degrees.
This is a subtle difference, but I thought you'd want to know.
If you want to read this answer in French, please read Comment créer un filigrane transparent en PDF?

Best option is iText. Check a watermark demo here
Important part of the code (where the watermar is inserted) is this:
public class Watermark extends PdfPageEventHelper {
#Override
public void onEndPage(PdfWriter writer, Document document) {
// insert here your watermark
}
Read carefully the example.
onEndPage() method will be something like (in my logo-watermarks I use com.itextpdf.text.Image;):
Image image = Image.getInstance(this.getClass().getResource("/path/to/image.png"));
// set transparency
image.setTransparency(transparency);
// set position
image.setAbsolutePosition(absoluteX, absoluteY);
// put into document
document.add(image);

Setting a text style to underlined in PDFBox

I'm trying to add underlined text to a blank pdf page using PDFBox, but I haven't been able to find any examples online. All questions on stackoverflow point to extracting underlined text, but not creating it. Has this function not been implemented for PDFBox? Looking at the PDFBox documentation, it seems that fonts are pre-rendered as bold, italic, and regular.
For example, Times New Roman Regular is denoted as:
PDFont font = PDType1Font.TIMES_ROMAN.
Times New Roman Bold is denoted as:
PDFont font = PDType1Font.TIMES_BOLD
Italicized is denoted as:
PDFont font = PDType1Font.TIMES_ITALIC
There seems to be no underlined option. Is there anyway to underline text, or is this not a feature?

I'm not sure if this is a better alternative or not, but I followed Tilman Hausherr and drew a line in comparison to my text. For instance, I have the following:
public processPDF(int xOne, int yOne, int xTwo, int yTwo)
{
//create pdf and its contents for one page
PDDocument document = new PDDocument();
File file = new File("hello.pdf");
PDPage page = new PDPage();
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contentStream;
try {
//create content stream
contentStream = new PDPageContentStream(document, page);
//being to create our text for our page
contentStream.beginText();
contentStream.setFont( font, largeTitle);
//position of text
contentStream.moveTextPositionByAmount(xOne, yOne, xTwo, yTwo);
contentStream.drawString("Hello");
contentStream.endText();
//begin to draw our line
contentStream.drawLine(xOne, yOne - .5, xTwo, yYwo - .5);
//close and save document
document.save(file);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
where our parameters xOne, yOne, xTwo, and yTwo are our locations of the text. The line has us subtract .5 from yOne and yTwo to move it a pinch below our text location, ultimately setting it to look like underlined text.
There may be better ways, but this was the route I went.

I use below function for underlined the string.
public class UnderlineText {
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 10f;
String str = "Hello";
public static void main(String[] args) {
new UnderlineText().generatePDF(20, 200);
}
public void generatePDF(int sX, int sY)
{
//create pdf and its contents for one page
PDDocument document = new PDDocument();
File file = new File("underlinePdfbox.pdf");
PDPage page = new PDPage();
PDPageContentStream contentStream;
try {
document.addPage(page);
//create content stream
contentStream = new PDPageContentStream(document, page);
//being text for our page
contentStream.beginText();
contentStream.setFont( font, fontSize);
contentStream.newLineAtOffset(sX, sY);
contentStream.showText(str);
contentStream.endText();
//Draw Underline
drawLine(contentStream, str, 1, sX, sY, -2);
//close and save document
contentStream.close();
document.save(file);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public void drawLine(PDPageContentStream contentStream, String text, float lineWidth, float sx, float sy, float linePosition) throws IOException {
//Calculate String width
float stringWidth = fontSize * font.getStringWidth(str) / 1000;
float lineEndPoint = sx + stringWidth;
//begin to draw our line
contentStream.setLineWidth(lineWidth);
contentStream.moveTo(sx, sy + linePosition);
contentStream.lineTo(lineEndPoint, sy + linePosition);
contentStream.stroke();
}
}
drawLine is a function which i created for drawing a line for specific string. You can adjust line as per specification using position attribute.
Minus (-) value in position field create under line. you can use positive value for over-line and stroke-line.(For example -2 for underline, 10 for over-line, 2 for stroke-line for above code)
Also you can manage the width for line.

Try this answer:
highlight text using pdfbox when it's location in the pdf is known
This method using PDAnnotationTextMarkup, it has four values
/**
* The types of annotation.
*/
public static final String SUB_TYPE_HIGHLIGHT = "Highlight";
/**
* The types of annotation.
*/
public static final String SUB_TYPE_UNDERLINE = "Underline";
/**
* The types of annotation.
*/
public static final String SUB_TYPE_SQUIGGLY = "Squiggly";
/**
* The types of annotation.
*/
public static final String SUB_TYPE_STRIKEOUT = "StrikeOut";
Hope it helps

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.