Cannot capture annotations in a PDImageXObject using PDFBox

Cannot capture annotations in a PDImageXObject using PDFBox - java

I am highlighting (in green) certain words on each page of my input pdf document using PDAnnotationTextMarkup. A sample screenshot is seen below;
However, when I crop this part of the page and try to save it as a PDImageXObject, I get the following result;
The code I am using to crop the image is as follows;
public PDImageXObject cropAndSaveImage(int pageNumber, Rectangle dimensions)
{
PDImageXObject pdImage = null;
// Extract arguments from the rectangle object
float x = dimensions.getXValue();
float y = dimensions.getYValue();
float w = dimensions.getWidth();
float h = dimensions.getHeight();
PDRectangle pdRectangle;
int pageResolution = 140;
try{
// Fetch the source page
PDPage sourcePage = document.getPage(pageNumber-1);
// Calculate height of the source page
PDRectangle mediaBox = sourcePage.getMediaBox();
float sourcePageHeight = mediaBox.getHeight();
// Fetch the original crop box of the page
PDRectangle originalCrop = sourcePage.getCropBox();
/*
* PDF Crop Box - Here we initialize the rectangle area
* that is needed to crop a region from the PDF document
*/
if(pageOriginShifted)
{
pdRectangle = new PDRectangle(x, sourcePageHeight - y, w, h);
}
else
{
pdRectangle = new PDRectangle(x, y, w, h);
}
// Crop the required rectangle from the source page
sourcePage.setCropBox(pdRectangle);
// PDF renderer
PDFRenderer renderer = new PDFRenderer(document);
// Convert to an image
BufferedImage bufferedImage = renderer.renderImageWithDPI(pageNumber-1, pageResolution, ImageType.RGB);
pdImage = LosslessFactory.createFromImage(document, bufferedImage);
// Restore the original page back to the document
sourcePage.setCropBox(originalCrop);
return pdImage;
}catch(Exception e)
{
Logger.logWithoutTimeStamp(LOG_TAG + "cropAndSaveImage()", Logger.ERROR, e);
}
return null;
}
I am quite perplexed on why the highlighted text in green won't show up. Any help in this regard is highly appreciated (I cannot attach the input PDF document owing to privacy issues).
Thanks in advance,
Bharat.

Related

Why does PDFBox read the image width/height wrong? (always assumes "width" is the bigger one)

I'm using the PDFBox library (see here) to convert an image to PDF. The goal is to have a image scaled to a full A4 page in the PDF file. And it works well, except one thing:
The image height and width seem to be mixed up. The width is always assumed to be the bigger value of them both. I have 2 images: One has the dimensions (according to the Windows file details) 4032x2268 (landscape) and the other one 2268x4032 (portrait).
When i load the images in PDFBox, the width is always 4032 and the height 2268. The goal is to create a landscape PDF for one and a portrait PDF for the other one. This weird "bug" (?) causes the portrait image to convert to a landscape PDF which of course causes the image to be rotated (which is inconventient).
Here's the relevant part of my code:
public byte[] imageToPDF(MultipartFile file) throws IOException {
PDDocument pdf = new PDDocument();
PDImageXObject pdImage = PDImageXObject.createFromByteArray(pdf, file.getBytes(), file.getOriginalFilename());
// scale image to fit the full page
PDPage page;
int imageWidth;
int imageHeight;
if (pdImage.getWidth() > pdImage.getHeight()) {
// landscape pdf
float pageHeight = PDRectangle.A4.getWidth();
float pageWidth = PDRectangle.A4.getHeight();
page = new PDPage(new PDRectangle(pageWidth, pageHeight));
imageWidth = (int)pageWidth;
imageHeight = (int)(((double)imageWidth / (double)pdImage.getWidth()) * (double)pdImage.getHeight());
} else {
// portrait pdf
float pageHeight = PDRectangle.A4.getHeight();
float pageWidth = PDRectangle.A4.getWidth();
page = new PDPage(new PDRectangle(pageWidth, pageHeight));
imageHeight = (int)pageHeight;
imageWidth = (int)(((double)imageHeight / (double)pdImage.getHeight()) * (double)pdImage.getWidth());
}
...
}
pdImage.getWidth() is always greater than pdImage.getHeight(), no matter which of the two images I use. Does anyone have an idea?

How to insert a image created by decode of a string to a pdf in pdfbox

I'm trying to insert an image (which needs to be converted from a string by java.util.Base64.getDecoder().decode(imageInputString)) to a certain position of a pdf file.
The main logic of the code will be:
//create a PDImageXObject myImage first (or something that could be used in addImage method.
//And this is what I could not figure out how to accomplish.
//open the pdf file and use addImage to insert the image to the specific page at specific position.
PDDocument document = PDDocument.load(pdfFile);
PDPageContentStream contentStream = new PDPageContentStream(document, pageNumber);
contentStream.addImage(myImage,x,y);
document.save();
Most of the tutorial I found created the myImage from reading an image file. Could someone help me to see if I could do the same thing but with a byte [], which is the output of java.util.Base64.getDecoder().decode(imageInputString)?
Thanks!

You can use the static method PDImageXObject.createFromByteArray(), which detects the file type based on contents and will decide which PDF image type / image compression is best. (javadoc)

Thanks to Tilman Hausherr.
Here is the final code (just the core part):
int pageNumber = j;
PDPage page = document.getPage(pageNumber);
PDResources resources = page.getResources();
byte[] ba = java.util.Base64.getDecoder().decode(base64str);
PDImageXObject sigimg = PDImageXObject.createFromByteArray(document,ba,"signature");
float imgW = sigimg.getWidth();
float imgH = sigimg.getHeight();
PDPageContentStream contentStream = new PDPageContentStream(document, page,PDPageContentStream.AppendMode.APPEND, true,true);
PDRectangle sigRect = field.getWidgets().get(0).getRectangle();
float fieldW = sigRect.getWidth();
float fieldH = sigRect.getHeight();
if (imgW > fieldW || imgH > fieldH){
if(imgW/fieldW > imgH/fieldH){
sigimg.setWidth(Math.round(fieldW));
sigimg.setHeight(Math.round(imgH/imgW*fieldW));
}
else{
sigimg.setWidth(Math.round(imgW/imgH*fieldH));
sigimg.setHeight(Math.round(fieldH));
}
}
contentStream.drawImage(sigimg,sigRect.getLowerLeftX(),sigRect.getLowerLeftY());
contentStream.close();

create a one page PDF from two PDFs using PDFBOX

I have a small (quarter inch) one page PDF I created with PDFBOX with text (A). I want to put that small one page PDF (A) on the top of an existing PDF page (B), preserving the existing content of the PDF page (B). In the end, I will have a one page PDF, representing the small PDF on top(A), and the existing PDF intact making up the rest (B). How can I accomplish this with PDFBOX?

To join two pages one atop the other onto one target page, you can make use of the PDFBox LayerUtility for importing pages as form XObjects in a fashion similar to PDFBox SuperimposePage example, e.g. with this helper method:
void join(PDDocument target, PDDocument topSource, PDDocument bottomSource) throws IOException {
LayerUtility layerUtility = new LayerUtility(target);
PDFormXObject topForm = layerUtility.importPageAsForm(topSource, 0);
PDFormXObject bottomForm = layerUtility.importPageAsForm(bottomSource, 0);
float height = topForm.getBBox().getHeight() + bottomForm.getBBox().getHeight();
float width, topMargin, bottomMargin;
if (topForm.getBBox().getWidth() > bottomForm.getBBox().getWidth()) {
width = topForm.getBBox().getWidth();
topMargin = 0;
bottomMargin = (topForm.getBBox().getWidth() - bottomForm.getBBox().getWidth()) / 2;
} else {
width = bottomForm.getBBox().getWidth();
topMargin = (bottomForm.getBBox().getWidth() - topForm.getBBox().getWidth()) / 2;
bottomMargin = 0;
}
PDPage targetPage = new PDPage(new PDRectangle(width, height));
target.addPage(targetPage);
PDPageContentStream contentStream = new PDPageContentStream(target, targetPage);
if (bottomMargin != 0)
contentStream.transform(Matrix.getTranslateInstance(bottomMargin, 0));
contentStream.drawForm(bottomForm);
contentStream.transform(Matrix.getTranslateInstance(topMargin - bottomMargin, bottomForm.getBBox().getHeight()));
contentStream.drawForm(topForm);
contentStream.close();
}
(JoinPages method join)
You use it like this:
try ( PDDocument document = new PDDocument();
PDDocument top = ...;
PDDocument bottom = ...) {
join(document, top, bottom);
document.save("joinedPage.pdf");
}
(JoinPages test testJoinSmallAndBig)
The result looks like this:

Just as an additional point to #mkl's answer.
If anybody is looking to scale the PDFs before placing them on the page use,
contentStream.transform(Matrix.getScaleInstance(<scaling factor in x axis>, <scaling factor in y axis>)); //where 1 is the scaling factor if you want the page as the original size
This way you can rescale your PDFs.

Drawing vector images on PDF with PDFBox

I would like to draw a vector image on a PDF with Apache PDFBox.
This is the code I use to draw regular images
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(1);
PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true);
BufferedImage _prevImage = ImageIO.read(new FileInputStream("path/to/image.png"));
PDPixelMap prevImage = new PDPixelMap(document, _prevImage);
contentStream.drawXObject(prevImage, prevX, prevY, imageWidth, imageHeight);
If I use a svg or wmf image instead of png, the resulting PDF document comes corrupted.
The main reason I want the image to be a vector image is that with PNG or JPG the image looks horrible, I think it gets somehow compressed so it looks bad. With vector images this shouldn't happen (well, when I export svg paths as PDF in Inkscape it doesn't happen, vector paths are preserved).
Is there a way to draw a svg or wmf (or other vector) to PDF using Apache PDFBox?
I'm currently using PDFBox 1.8, if that matters.

See the library pdfbox-graphics2d, touted in this Jira.
You can draw the SVG, via Batik or Salamander or whatever, onto the class PdfBoxGraphics2D, which is parallel to iText's template.createGraphics(). See the GitHub page for samples.
PDDocument document = ...;
PDPage page = ...; // page whereon to draw
String svgXML = "<svg>...</svg>";
double leftX = ...;
double bottomY = ...; // PDFBox coordinates are oriented bottom-up!
// I set these to the SVG size, which I calculated via Salamander.
// Maybe it doesn't matter, as long as the SVG fits on the graphic.
float graphicsWidth = ...;
float graphicsHeight = ...;
// Draw the SVG onto temporary graphics.
var graphics = new PdfBoxGraphics2D(document, graphicsWidth, graphicsHeight);
try {
int x = 0;
int y = 0;
drawSVG(svg, graphics, x, y); // with Batik, Salamander, or whatever you like
} finally {
graphics.dispose();
}
// Graphics are not visible till a PDFormXObject is added.
var xform = graphics.getXFormObject();
try (var contentWriter = new PDPageContentStream(document, page, AppendMode.APPEND, false)) { // false = don't compress
// XForm objects have to be placed via transform,
// since they cannot be placed via coordinates like images.
var transform = AffineTransform.getTranslateInstance(leftX, bottomY);
xform.setMatrix(transform);
// Now the graphics become visible.
contentWriter.drawForm(xform);
}
And ... in case you want also to scale the SVG graphics to 25% size:
// Way 1: Scale the SVG beforehand
svgXML = String.format("<svg transform=\"scale(%f)\">%s</svg>", .25, svgXML);
// Way 2: Scale in the transform (before calling xform.setMatrix())
transform.concatenate(AffineTransform.getScaleInstance(.25, .25));

I do this, but not directly.
In first transform your SVG documents in PDF documents with FOP librairy and Batik.
https://xmlgraphics.apache.org/fop/dev/design/svg.html.
In second times, you can use LayerUtility in pdfbox to transform your new pdf document in PDXObjectForm. After that, just needs to include PDXObjectForm in your final pdf documents.

The final working solution for me that loads an SVG file and overlays it on a PDF file (this renders the SVG in a 500x500 box at (0,0) coordinate which is bottom left of the PDF document):
package com.example.svgadder;
import java.io.*;
import java.nio.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import de.rototor.pdfbox.graphics2d.PdfBoxGraphics2D;
import java.awt.geom.AffineTransform;
import com.kitfox.svg.SVGDiagram;
import com.kitfox.svg.SVGException;
import com.kitfox.svg.SVGUniverse;
public class App
{
public static void main( String[] args ) throws Exception {
App app = new App();
}
public App() throws Exception {
// loading PDF and SVG files
File pdfFile = new File("input.pdf");
File svgFile = new File("input.svg");
PDDocument doc = PDDocument.load(pdfFile);
PDPage page = doc.getPage(0);
SVGUniverse svgUniverse = new SVGUniverse();
SVGDiagram diagram = svgUniverse.getDiagram(svgUniverse.loadSVG(f.toURL()));
PdfBoxGraphics2D graphics = new PdfBoxGraphics2D(doc, 500, 500);
try {
diagram.render(graphics);
} finally {
graphics.dispose();
}
PDFormXObject xform = graphics.getXFormObject();
try (PDPageContentStream contentWriter = new PDPageContentStream(doc, page, AppendMode.APPEND, false)) {
AffineTransform transform = AffineTransform.getTranslateInstance(0, 0);
xform.setMatrix(transform);
contentWriter.drawForm(xform);
}
doc.save("res.pdf");
doc.close();
}
}
Please use svgSalamander from here:
https://github.com/mgarin/svgSalamander
Please use what Coemgenus suggested for scaling your final overlaid SVG. I tried the 2nd option and it works well.
Nirmal

Setting a text style to underlined in PDFBox

I'm trying to add underlined text to a blank pdf page using PDFBox, but I haven't been able to find any examples online. All questions on stackoverflow point to extracting underlined text, but not creating it. Has this function not been implemented for PDFBox? Looking at the PDFBox documentation, it seems that fonts are pre-rendered as bold, italic, and regular.
For example, Times New Roman Regular is denoted as:
PDFont font = PDType1Font.TIMES_ROMAN.
Times New Roman Bold is denoted as:
PDFont font = PDType1Font.TIMES_BOLD
Italicized is denoted as:
PDFont font = PDType1Font.TIMES_ITALIC
There seems to be no underlined option. Is there anyway to underline text, or is this not a feature?

I'm not sure if this is a better alternative or not, but I followed Tilman Hausherr and drew a line in comparison to my text. For instance, I have the following:
public processPDF(int xOne, int yOne, int xTwo, int yTwo)
{
//create pdf and its contents for one page
PDDocument document = new PDDocument();
File file = new File("hello.pdf");
PDPage page = new PDPage();
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contentStream;
try {
//create content stream
contentStream = new PDPageContentStream(document, page);
//being to create our text for our page
contentStream.beginText();
contentStream.setFont( font, largeTitle);
//position of text
contentStream.moveTextPositionByAmount(xOne, yOne, xTwo, yTwo);
contentStream.drawString("Hello");
contentStream.endText();
//begin to draw our line
contentStream.drawLine(xOne, yOne - .5, xTwo, yYwo - .5);
//close and save document
document.save(file);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
where our parameters xOne, yOne, xTwo, and yTwo are our locations of the text. The line has us subtract .5 from yOne and yTwo to move it a pinch below our text location, ultimately setting it to look like underlined text.
There may be better ways, but this was the route I went.

I use below function for underlined the string.
public class UnderlineText {
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 10f;
String str = "Hello";
public static void main(String[] args) {
new UnderlineText().generatePDF(20, 200);
}
public void generatePDF(int sX, int sY)
{
//create pdf and its contents for one page
PDDocument document = new PDDocument();
File file = new File("underlinePdfbox.pdf");
PDPage page = new PDPage();
PDPageContentStream contentStream;
try {
document.addPage(page);
//create content stream
contentStream = new PDPageContentStream(document, page);
//being text for our page
contentStream.beginText();
contentStream.setFont( font, fontSize);
contentStream.newLineAtOffset(sX, sY);
contentStream.showText(str);
contentStream.endText();
//Draw Underline
drawLine(contentStream, str, 1, sX, sY, -2);
//close and save document
contentStream.close();
document.save(file);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public void drawLine(PDPageContentStream contentStream, String text, float lineWidth, float sx, float sy, float linePosition) throws IOException {
//Calculate String width
float stringWidth = fontSize * font.getStringWidth(str) / 1000;
float lineEndPoint = sx + stringWidth;
//begin to draw our line
contentStream.setLineWidth(lineWidth);
contentStream.moveTo(sx, sy + linePosition);
contentStream.lineTo(lineEndPoint, sy + linePosition);
contentStream.stroke();
}
}
drawLine is a function which i created for drawing a line for specific string. You can adjust line as per specification using position attribute.
Minus (-) value in position field create under line. you can use positive value for over-line and stroke-line.(For example -2 for underline, 10 for over-line, 2 for stroke-line for above code)
Also you can manage the width for line.

Try this answer:
highlight text using pdfbox when it's location in the pdf is known
This method using PDAnnotationTextMarkup, it has four values
/**
* The types of annotation.
*/
public static final String SUB_TYPE_HIGHLIGHT = "Highlight";
/**
* The types of annotation.
*/
public static final String SUB_TYPE_UNDERLINE = "Underline";
/**
* The types of annotation.
*/
public static final String SUB_TYPE_SQUIGGLY = "Squiggly";
/**
* The types of annotation.
*/
public static final String SUB_TYPE_STRIKEOUT = "StrikeOut";
Hope it helps

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cannot capture annotations in a PDImageXObject using PDFBox - java

Related

Why does PDFBox read the image width/height wrong? (always assumes "width" is the bigger one)

How to insert a image created by decode of a string to a pdf in pdfbox

create a one page PDF from two PDFs using PDFBOX

Drawing vector images on PDF with PDFBox

Setting a text style to underlined in PDFBox

Categories

Resources