I'm trying to find certain text in a pdf and making the font color white. As a POC I've already succeeded finding text and highlighting it in the pdf based on the code written by mkl here: find position of text in pdf
Is it however possible, based on the received coordinates to change the font color of the text inside the rectangle instead of highlighting the text? Alternatively, can I add a white rectangle to cover the text?
Thanks in advance
edit: I have started adding the rectangles to the pdf, however as stated they are not in correct position. This is what I have so far (don't mind the style, just a POC):
TextPositionSequence class by mkl
byte[] content = ...;
PDDocument document = PDDocument.load(content);
for (int page = 1; page <= document.getNumberOfPages(); page++) {
List<TextPositionSequence> hits = null;
try {
hits = findSubwordsImproved(document, page, "[" + searchTerm + "]");
} catch (IOException e) {
e.printStackTrace();
}
for (TextPositionSequence hit : hits) {
TextPosition lastPosition = hit.textPositionAt(hit.length() - 1);
TextPosition firstPosition = hit.textPositionAt(0);
PDPage actualPage = document.getPage(page - 1);
PDRectangle cropBox = actualPage.getCropBox();
float x = firstPosition.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX();
float y = firstPosition.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY();
float w = hit.getWidth();
try {
PDPageContentStream contents = new PDPageContentStream(document, actualPage, PDPageContentStream.AppendMode.APPEND, false);
contents.setNonStrokingColor(Color.RED);
contents.addRect(x, y, w, firstPosition.getHeight());
contents.fill();
contents.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
I've solved it with the following code. I fiddled with the rectangle height a bit to get the box to cover the entire text. This might need tweaking in the future:
float posXInit = hit.getX();
float posXEnd = lastPosition.getXDirAdj() + lastPosition.getWidth();
float posYInit = firstPosition.getPageHeight() - firstPosition.getYDirAdj();
float posYEnd = firstPosition.getPageHeight() - lastPosition.getYDirAdj();
float height = firstPosition.getHeight();
PDPageContentStream contents = new PDPageContentStream(document, actualPage, PDPageContentStream.AppendMode.APPEND, false, true);
contents.setNonStrokingColor(Color.WHITE);
contents.addRect(posXInit, posYEnd - height / 3, hit.getWidth(), height * 2);
contents.fill();
contents.close();
Related
My aim is to add a image to a pdf and write a text above this image. I have centered the image and the text should be center above the image with a little margin to the image.
Currently the image will be added and centered but the text is not centered.
Here my current code. The interesting part is where the method drawTitleAtTop will be called. Here i have added the height of the newly added image to the y postion plus a margin of 3. The x coordinate I calculate depending on the incoming text but there is some miscalculation. Any advice?
private static void addScaledImage(ImageData imgData, PDDocument pdDocument, Dimension thePdfDimension) {
ImageHelper helper = Scalr::resize;
byte[] scaledImage = ImageUtils.resizeImageKeepAspectRatio(helper, imgData.getImageBinary(), thePdfDimension.width);
PDRectangle rectangle = pdDocument.getPage(0).getMediaBox();
PDPage page = new PDPage(rectangle);
pdDocument.addPage(page);
PDImageXObject pdImage = null;
try {
pdImage = PDImageXObject.createFromByteArray(pdDocument, scaledImage, null);
LOG.debug("size of scaled image is x: {0} y {1}", pdImage.getWidth(), pdImage.getHeight());
int xForImage = (thePdfDimension.width - pdImage.getWidth()) / 2 ;
int yForImage = (thePdfDimension.height - pdImage.getHeight()) / 2;
LOG.debug("new x {0} new y {1}", xForImage, yForImage);
try (PDPageContentStream contentStream = new PDPageContentStream(pdDocument, page, AppendMode.APPEND, true, true)) {
if (StringUtils.isNotBlank(imgData.getTitle())) {
yForImage = xForImage - 20;
contentStream.drawImage(pdImage, xForImage, yForImage, pdImage.getWidth(), pdImage.getHeight());
drawTitelAtTop(imgData, page, xForImage , yForImage + pdImage.getHeight() + 3, contentStream);
} else {
contentStream.drawImage(pdImage, xForImage, yForImage, pdImage.getWidth(), pdImage.getHeight());
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private static void drawTitelAtTop(ImageData imgData, PDPage page, int x, int y, PDPageContentStream contentStream) throws IOException {
PDFont font = PDType1Font.COURIER;
int fontSize = FONT_SIZE_FOR_TITLE;
float titleWidth = font.getStringWidth(imgData.getTitle()) / 1000 * fontSize;
LOG.debug("title width is " + titleWidth);
contentStream.setFont(font, fontSize);
contentStream.beginText();
float tx = ((x - titleWidth) / 2) + x;
//float tx = x;
//float ty = page.getMediaBox().getHeight() - marginTop + (marginTop / 4);
float ty = y;
LOG.debug("title offset x {0} y {1}", tx, ty);
contentStream.newLineAtOffset(tx,
ty);
contentStream.showText(imgData.getTitle());
contentStream.endText();
}
I have code written using PDFBox API that highlights the words in a PDF but when I convert highlighted PDF pages to images, then whatever I have highlighted gets disappeared from the image.
Below screenshot is with highlighted text, for highlighting I have used PDFBox's PDAnnotationTextMarkup class:
Highlighted PDF Page
Below is the image after converting the pdf page to image:
Highlighted PDF Page Image after converting
Below is the code I have used for converting PDF to Image:
PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCounter = 0;
for (PDPage page : document.getPages())
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (pageCounter++) + ".png", 300);
}
document.close();
Please suggest what is wrong here, why PDFRenderer not able to take PDF page image along with the highlighted red box.
Below is the code I used to highlight the text in PDF:
private void highlightText(String pdfFilePath, String highlightedPdfFilePath) {
try {
// Loading an existing document
File file = new File(highlightedPdfFilePath);
if (!file.exists()) {
file = new File(pdfFilePath);
}
PDDocument document = PDDocument.load(file);
// extended PDFTextStripper class
PDFTextStripper stripper = new PDFTextHighlighter();
// Get number of pages
int number_of_pages = document.getDocumentCatalog().getPages().getCount();
// The method writeText will invoke an override version of
// writeString
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
// Print collected information
System.out.println("tokenStream:::"+tokenStream);
System.out.println("tokenStream size::"+tokenStream.size());
System.out.println("coordinates size::"+coordinates.size());
double page_height;
double page_width;
double width, height, minx, maxx, miny, maxy;
int rotation;
// scan each page and highlitht all the words inside them
for (int page_index = 0; page_index < number_of_pages; page_index++) {
// get current page
PDPage page = document.getPage(page_index);
// Get annotations for the selected page
List<PDAnnotation> annotations = page.getAnnotations();
// Define a color to use for highlighting text
PDColor red = new PDColor(new float[] { 1, 0, 0 }, PDDeviceRGB.INSTANCE);
// Page height and width
page_height = page.getMediaBox().getHeight();
page_width = page.getMediaBox().getWidth();
// Scan collected coordinates
for (int i = 0; i < coordinates.size(); i++) {
if (!differencePgaeNumber.contains(page_index)) {
differencePgaeNumber.add(page_index);
}
// if the current coordinates are not related to the current
// page, ignore them
if ((int) coordinates.get(i)[4] != (page_index + 1))
continue;
else {
// get rotation of the page...portrait..landscape..
rotation = (int) coordinates.get(i)[7];
// page rotated of 90degrees
if (rotation == 90) {
height = coordinates.get(i)[5];
width = coordinates.get(i)[6];
width = (page_height * width) / page_width;
// define coordinates of a rectangle
maxx = coordinates.get(i)[1];
minx = coordinates.get(i)[1] - height;
miny = coordinates.get(i)[0];
maxy = coordinates.get(i)[0] + width;
} else // i should add here the cases -90/-180 degrees
{
height = coordinates.get(i)[5];
minx = coordinates.get(i)[0];
maxx = coordinates.get(i)[2];
miny = page_height - coordinates.get(i)[1];
maxy = page_height - coordinates.get(i)[3] + height;
}
// Add an annotation for each scanned word
PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(
PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
txtMark.setColor(red);
txtMark.setConstantOpacity((float) 0.3); // 30%
// transparent
PDRectangle position = new PDRectangle();
position.setLowerLeftX((float) minx);
position.setLowerLeftY((float) miny);
position.setUpperRightX((float) maxx);
position.setUpperRightY((float) ((float) maxy + height));
txtMark.setRectangle(position);
float[] quads = new float[8];
quads[0] = position.getLowerLeftX(); // x1
quads[1] = position.getUpperRightY() - 2; // y1
quads[2] = position.getUpperRightX(); // x2
quads[3] = quads[1]; // y2
quads[4] = quads[0]; // x3
quads[5] = position.getLowerLeftY() - 2; // y3
quads[6] = quads[2]; // x4
quads[7] = quads[5]; // y5
txtMark.setQuadPoints(quads);
txtMark.setContents(tokenStream.get(i).toString());
annotations.add(txtMark);
}
}
}
// Saving the document in a new file
File highlighted_doc = new File(highlightedPdfFilePath);
document.save(highlighted_doc);
document.close();
} catch (IOException e) {
System.out.println(e);
}
}
You need to construct the visual appearance of the annotation with this call:
txtMark.constructAppearances(document);
I'm trying to rotate text using pdfbox by I couldn't achieve it. I tried to set the texMatrix but my text is not rotating as intended.
Does someone have an idea of how I could turn at 90 degrees my text?
This is my code :
contentStream.beginText();
float tx = titleWidth / 2;
float ty = titleHeight / 2;
contentStream.setTextMatrix(Matrix.getTranslateInstance(tx, ty));
contentStream.setTextMatrix(Matrix.getRotateInstance(Math.toRadians(90),tx,ty));
contentStream.setTextMatrix(Matrix.getTranslateInstance(-tx, -ty));
contentStream.newLineAtOffset(xPos, yPos);
contentStream.setFont(font, fontSize);
contentStream.showText("Tets");
contentStream.endText();
Thank You
Here's a solution that draws three pages, one with text unrotated, one with text rotated but keeping the coordinates as if planning landscape printing, and one that is what you wanted (rotated around the center of the text). My solution is close to that, it rotates around the bottom of the center of the text.
public static void main(String[] args) throws IOException
{
PDDocument doc = new PDDocument();
PDPage page1 = new PDPage();
doc.addPage(page1);
PDPage page2 = new PDPage();
doc.addPage(page2);
PDPage page3 = new PDPage();
doc.addPage(page3);
PDFont font = PDType1Font.HELVETICA;
float fontSize = 20;
int xPos = 100;
int yPos = 400;
float titleWidth = font.getStringWidth("Tets") / 1000;
float titleHeight = fontSize;
float tx = titleWidth / 2;
float ty = titleHeight / 2;
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page1))
{
contentStream.beginText();
contentStream.newLineAtOffset(xPos, yPos);
contentStream.setFont(font, fontSize);
contentStream.showText("Tets");
contentStream.endText();
}
// classic case of rotated page
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page2))
{
contentStream.beginText();
Matrix matrix = Matrix.getRotateInstance(Math.toRadians(90), 0, 0);
matrix.translate(0, -page2.getMediaBox().getWidth());
contentStream.setTextMatrix(matrix);
contentStream.newLineAtOffset(xPos, yPos);
contentStream.setFont(font, fontSize);
contentStream.showText("Tets");
contentStream.endText();
}
// rotation around text
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page3))
{
contentStream.beginText();
Matrix matrix = Matrix.getRotateInstance(Math.toRadians(90), 0, 0);
matrix.translate(0, -page3.getMediaBox().getWidth());
contentStream.setTextMatrix(matrix);
contentStream.newLineAtOffset(yPos - titleWidth / 2 - fontSize, page3.getMediaBox().getWidth() - xPos - titleWidth / 2 - fontSize);
contentStream.setFont(font, fontSize);
contentStream.showText("Tets");
contentStream.endText();
}
doc.save("saved.pdf");
doc.close();
}
This example rotates around the left baseline of the text and uses the matrix translation to position the text at the specific point.
The showText() is always positioned at 0,0, which is the position before the rotation. The matrix translation then positions the text after the rotation.
If you want another rotation point of your text relocation the text rotation position in the contentStream.newLineAtOffset(0, 0)-line
float angle = 35;
double radians = Math.toRadians(angle);
for (int x : new int[] {50,85,125, 200})
for (int y : new int[] {40, 95, 160, 300}) {
contentStream.beginText();
// Notice the post rotation position
Matrix matrix = Matrix.getRotateInstance(radians,x,y);
contentStream.setTextMatrix(matrix);
// Notice the pre rotation position
contentStream.newLineAtOffset(0, 0);
contentStream.showText(".(" + x + "," + y + ")");
contentStream.endText();
}
To get the height and the width of the text you want to rotate use font.getBoundingBox().getHeight()/1000*fontSize and font.getStringWidth(text)/1000*fontSize.
I am using Java to write output to a PDDocument, then appending that document to an existing one before serving it to the client.
Most of it is working well. I only have a small problem trying to handle content overflow while writing to that PDDocument.
I want to keep track of where text is being inserted into the document so that when the "cursor" so to speak goes past a certain point, I'll create a new page, add it to the document, create a new content stream, and continue as normal.
Here is some code that shows what I'd like to do:
// big try block
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream content = new PDPageContentStream(doc, page);
int fontSize = 12;
content.beginText();
content.setFont(...);
content.moveTextPositionByAmount(margin, pageHeight-margin);
for ( each element in a collection of values ) {
content.moveTextPositionByAmount(0, -fontSize); // only moves down in document
// at this point, check if past the end of page, if so add a new page
if (content.getTextYPosition(...) < margin) { // wishful thinking, doesn't exist
content.endText();
content.close();
page = new PDPage();
doc.addPage(page);
content = new PDPageContentStream(doc, page);
content.beginText();
content.setFont(...);
content.moveTextPositionByAmount(margin, pageHeight-(margin+fontSize));
}
content.drawString(...);
}
content.endText();
content.close();
The important bit is the content.getTextYPosition(). It doesn't actually exist, but I'm sure PDPageContentStream must be keeping track of a similar value. Is there any way to access this value?
Thanks.
Create a heightCounter variable that tracks how far you've moved the text location. It's initial value can be your starting Y position.
PDRectangle mediabox = page.findMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2 * margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
float heightCounter = startY;
Every time you move the text position, subtract that from your heightCounter. When heightCounter is less than what you're moving the text position by, then create a new page.
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.moveTextPositionByAmount(startX, startY);
for (String line : lines) {
if(height>705){ //this is the height of my bottom line where I want cutoff. you can check yours by sysoout the content.
line = line.trim();
float charSpacing = 0;
if (line.length() > 1) {
float size = fontSize * pdfFont.getStringWidth(line) / 1000;
float free = width - size;
if (free > 0) {
charSpacing = free / (line.length() - 1);
}
}
contentStream.drawString(line);
contentStream.moveTextPositionByAmount(0, -leading);
System.out.println("content Stream line :" + line);
height--;
System.out.println("value of height:"+ height);
}
else{
contentStream.endText();
contentStream.close();
page = new PDPage(PDPage.PAGE_SIZE_A4);
doc.addPage(page);
contentStream = new PDPageContentStream(doc, page,false, true, true);
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.moveTextPositionByAmount(startX, startY);
System.out.println("Height counter value :"+ height);
System.out.println("line insde the second page:" + line);
contentStream.drawString(line);
System.out.println("Output on second page:"+contentStream.toString());
contentStream.moveTextPositionByAmount(0, -leading);
height=mediabox.getHeight() - 2 * margin; //
}
}
contentStream.endText();
contentStream.close();
}
this is my configuration on the top for your reference how I am using.
happy coding ..
PDPageContentStream contentStream = new PDPageContentStream(doc, page,true, true, true);
//PDPage page1 = new PDPage(PDPage.PAGE_SIZE_A4);
PDPageContentStream contentStream1 = new PDPageContentStream(doc, page,true, true, true);
PDFont pdfFont = PDType1Font.COURIER;
PDFont fontBold = PDType1Font.TIMES_BOLD;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 45;
float width = mediabox.getWidth() - 2 * margin;
float height = mediabox.getHeight() - 2 * margin;
float startX = mediabox.getLowerLeftX() + margin - statVarX;
float startY = mediabox.getUpperRightY() - margin - statVarY;
How can I add page number to a page in a document generated using PDFBox?
Can anybody tell me how to add page numbers to a document after I merge different PDFs? I am using the PDFBox library in Java.
This is my code and it works well but I need to add page number.
PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource("c:\\pdf1.pdf");
ut.addSource("c:\\pdf2.pdf");
ut.addSource("c:\\pdf3.pdf");
ut.mergeDocuments();
You may want to look at the PDFBox sample AddMessageToEachPage.java. The central code is:
try (PDDocument doc = PDDocument.load(new File(file)))
{
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 36.0f;
for( PDPage page : doc.getPages() )
{
PDRectangle pageSize = page.getMediaBox();
float stringWidth = font.getStringWidth( message )*fontSize/1000f;
// calculate to center of the page
int rotation = page.getRotation();
boolean rotate = rotation == 90 || rotation == 270;
float pageWidth = rotate ? pageSize.getHeight() : pageSize.getWidth();
float pageHeight = rotate ? pageSize.getWidth() : pageSize.getHeight();
float centerX = rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
float centerY = rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
// append the content to the existing stream
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page, AppendMode.APPEND, true, true))
{
contentStream.beginText();
// set font and font size
contentStream.setFont( font, fontSize );
// set text color to red
contentStream.setNonStrokingColor(255, 0, 0);
if (rotate)
{
// rotate the text according to the page rotation
contentStream.setTextMatrix(Matrix.getRotateInstance(Math.PI / 2, centerX, centerY));
}
else
{
contentStream.setTextMatrix(Matrix.getTranslateInstance(centerX, centerY));
}
contentStream.showText(message);
contentStream.endText();
}
}
doc.save( outfile );
}
The 1.8.x pendant was:
PDDocument doc = null;
try
{
doc = PDDocument.load( file );
List allPages = doc.getDocumentCatalog().getAllPages();
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 36.0f;
for( int i=0; i<allPages.size(); i++ )
{
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
float stringWidth = font.getStringWidth( message )*fontSize/1000f;
// calculate to center of the page
int rotation = page.findRotation();
boolean rotate = rotation == 90 || rotation == 270;
float pageWidth = rotate ? pageSize.getHeight() : pageSize.getWidth();
float pageHeight = rotate ? pageSize.getWidth() : pageSize.getHeight();
double centeredXPosition = rotate ? pageHeight/2f : (pageWidth - stringWidth)/2f;
double centeredYPosition = rotate ? (pageWidth - stringWidth)/2f : pageHeight/2f;
// append the content to the existing stream
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
contentStream.beginText();
// set font and font size
contentStream.setFont( font, fontSize );
// set text color to red
contentStream.setNonStrokingColor(255, 0, 0);
if (rotate)
{
// rotate the text according to the page rotation
contentStream.setTextRotation(Math.PI/2, centeredXPosition, centeredYPosition);
}
else
{
contentStream.setTextTranslation(centeredXPosition, centeredYPosition);
}
contentStream.drawString( message );
contentStream.endText();
contentStream.close();
}
doc.save( outfile );
}
finally
{
if( doc != null )
{
doc.close();
}
}
Instead of the message, you can add page numbers. And instead of the center, you can use any position.
(The example can be improved, though: the MediaBox is the wrong choice, the CropBox should be used, and the page rotation handling only appears to properly handle 0° and 90°; 180° and 270° create upside-down writing.)
It is easy, try the following code
public static void addPageNumbers(PDDocument document, String numberingFormat, int offset_X, int offset_Y) throws IOException {
int page_counter = 1;
for(PDPage page : document.getPages()){
PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true, false);
contentStream.beginText();
contentStream.setFont(PDType1Font.TIMES_ITALIC, 10);
PDRectangle pageSize = page.getMediaBox();
float x = pageSize.getLowerLeftX();
float y = pageSize.getLowerLeftY();
contentStream.newLineAtOffset(x+ pageSize.getWidth()-offset_X, y+offset_Y);
String text = MessageFormat.format(numberingFormat,page_counter);
contentStream.showText(text);
contentStream.endText();
contentStream.close();
++page_counter;
}
}
public static void main(String[] args) throws Exception {
File file = new File("your input pdf path");
PDDocument document = PDDocument.load(file);
addPageNumbers(document,"Page {0}",60,18);
document.save(new File("output pdf path"));
document.close();
}