How to make text invisible in an existing PDF - java

I want to make all the text in an existing PDF transparent.
Option 1: select all the text, find a color property and change it to "colorless"
Or, if there is no such property
Option 2: Parse the page content Stream and all Form XObjects for that page, detect text blocks (BT/ET), and set the render mode to invisble.
This seems to be a complex operation.
Here is my example file
The following code is generating PDF(example pdf file):
Document document = new Document(new Rectangle(width, height));
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));
document.open();
PdfContentByte picCanvas = null;
PdfContentByte txtCanvas = null;
if (isUnderPic) {
txtCanvas = writer.getDirectContentUnder();
picCanvas = writer.getDirectContent();
} else {
txtCanvas = writer.getDirectContent();
picCanvas = writer.getDirectContentUnder();
}
BaseFont bf = null;
if (null != pageList) {
int[] dpi = { 0, 0 };
if (dpiType == 1) {
dpi[0] = 300;
dpi[1] = 300;
} else if (dpiType == 2) {
dpi[0] = 600;
dpi[1] = 600;
}
for (int i = 0; i < pageList.size(); i++) {
PDFPage page = pageList.get(i);
Image pageImage = null;
if (pdfType == 3) {
pageImage = Image.getInstance(page.getBinImage());
} else {
pageImage = Image.getInstance(page.getOriImage());
}
if (pageImage.getWidth() > 0) {
pageImage.scaleAbsolute(page.getWidth(), page.getHeight());
}
pageImage.setAbsolutePosition(0, 0);
picCanvas.addImage(pageImage);
if (pdfType == 2 || pdfType == 3) {
for (PageElement ele : page.getElementList()) {
if (ele.getType().equals(PDFConstant.ElementType.PDF_ELEMENT_CHAR)) {
txtCanvas.beginText();
if (isColor) {
txtCanvas.setTextRenderingMode(PdfContentByte.TEXT_RENDER_MODE_FILL);
txtCanvas.setColorFill(BaseColor.RED);
} else {
txtCanvas.setTextRenderingMode(PdfContentByte.TEXT_RENDER_MODE_INVISIBLE);
}
String font = ele.getFont();
try {
bf = fonts.get(font);
if (null == bf) {
bf = BaseFont.createFont(font, "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
fonts.put(font, bf);
}
} catch (Exception e) {
bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
fonts.put(font, bf);
}
txtCanvas.setFontAndSize(bf, ele.getFontSize());
txtCanvas.setTextMatrix(ele.getPageX(), ele.getPageY(page.getRcInPage()));
txtCanvas.showText(ele.getCode());
txtCanvas.endText();
}
}
}
if (StringUtils.isNotBlank(cutPath)) {
for (PageElement ele : page.getElementList()) {
if (ele.getType().equals(PDFConstant.ElementType.PDF_ELEMENT_PIC) && StringUtils.isNotBlank(ele.getCutPicSrc())) {
ImageTools.cutPic(ele.getRcInImage(), page.getOriImage(), ele.getCutPicSrc(), dpi);
}
}
}
if (pdfType == 3) {
logger.debug("pdfType == 3");
for (PageElement ele : page.getElementList()) {
if (ele.getType().equals(PDFConstant.ElementType.PDF_ELEMENT_PIC) && StringUtils.isNotBlank(ele.getCutPicSrc())) {
if (new File(ele.getCutPicSrc()).exists()) {
Image cutCover = Image.getInstance(ImageTools.drawImage((int) ele.getWidth(), (int) ele.getHeight()));
if (cutCover.getWidth() > 0) {
cutCover.scaleAbsolute(ele.getWidth(), ele.getHeight());
}
cutCover.setAbsolutePosition(ele.getPageX(), ele.getPageY(page.getRcInPage()));
picCanvas.addImage(cutCover);
Image pic = Image.getInstance(ele.getCutPicSrc());
if (pic.getWidth() > 0) {
pic.scaleAbsolute(ele.getWidth(), ele.getHeight());
}
pic.setAbsolutePosition(ele.getPageX(), ele.getPageY(page.getRcInPage()));
picCanvas.addImage(pic);
}
}
}
}
if (i + 1 < pageList.size()) {
document.setPageSize(new Rectangle(pageList.get(i + 1).getWidth(), pageList.get(i + 1).getHeight()));
} else {
document.setPageSize(new Rectangle(pageList.get(i).getWidth(), pageList.get(i).getHeight()));
}
document.newPage();
}
}
document.close();

I've taken a look at your PDF and I see that the PDF is a scanned image. The text isn't really text: it consists of an image. Your question is invalid because it assumes that the text consists of vector data (defined using PDF syntax, such as BT and ET). In reality, the text is a bunch of pixels and any pixel doesn't know whether it belongs to a text glyph or an image. In short: you're using the wrong approach. You are trying to solve a problem using PDF software whereas you should be using a tool that manipulates raster images.
This is the image I extracted from the PDF:
The OP claims that there are two layers: one with an image, one with text. That may very well be true, but the image also contains rasterized text and it is impossible to remove that text from the image by changing the PDF syntax.
You may be able to cover the text if you know the coordinates, but that will largely depend on the accuracy of the OCR operation.
If your requirement is not to cover the text in the image, but the text of the vector layer, it's sufficient to add the syntax that adds the image after the syntax that adds the vector text. If the image is opaque, it will cover all the text. This is done in the RepeatImage example:
PdfReader reader = new PdfReader(src);
// We assume that there's a single large picture on the first page
PdfDictionary page = reader.getPageN(1);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
PdfName imgName = xobjects.getKeys().iterator().next();
Image img = Image.getInstance((PRIndirectReference)xobjects.getAsIndirectObject(imgName));
img.setAbsolutePosition(0, 0);
img.scaleAbsolute(reader.getPageSize(1));
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.getOverContent(1).addImage(img);
stamper.close();
reader.close();
Take a look at the resulting PDF; now you can still select the vector text, but it's no longer visible.

Related

Java Apache POI: insert an image "infront the text"

I have a placeholder image in my docx file and I want to replace it with new image. The problem is - the placeholder image has an attribute "in front of text", but the new image has not. As a result the alignment breaks. Here is my code snippet and the docx with placeholder and the resulting docx.
.......
replaceImage(doc, "Рисунок 1", qr, 50, 50);
ByteArrayOutputStream out = new ByteArrayOutputStream();
doc.write(out);
out.close();
return out.toByteArray();
}
}
public XWPFDocument replaceImage(XWPFDocument document, String imageOldName, byte[] newImage, int newImageWidth, int newImageHeight) throws Exception {
try {
int imageParagraphPos = -1;
XWPFParagraph imageParagraph = null;
List<IBodyElement> documentElements = document.getBodyElements();
for (IBodyElement documentElement : documentElements) {
imageParagraphPos++;
if (documentElement instanceof XWPFParagraph) {
imageParagraph = (XWPFParagraph) documentElement;
if (imageParagraph.getCTP() != null && imageParagraph.getCTP().toString().trim().contains(imageOldName)) {
break;
}
}
}
if (imageParagraph == null) {
throw new Exception("Unable to replace image data due to the exception:\n"
+ "'" + imageOldName + "' not found in in document.");
}
ParagraphAlignment oldImageAlignment = imageParagraph.getAlignment();
// remove old image
boolean isDeleted = document.removeBodyElement(imageParagraphPos);
// now add new image
XWPFParagraph newImageParagraph = document.createParagraph();
XWPFRun newImageRun = newImageParagraph.createRun();
newImageParagraph.setAlignment(oldImageAlignment);
try (InputStream is = new ByteArrayInputStream(newImage)) {
newImageRun.addPicture(is, XWPFDocument.PICTURE_TYPE_JPEG, "qr",
Units.toEMU(newImageWidth), Units.toEMU(newImageHeight));
}
// set new image at the old image position
document.setParagraph(newImageParagraph, imageParagraphPos);
// NOW REMOVE REDUNDANT IMAGE FORM THE END OF DOCUMENT
document.removeBodyElement(document.getBodyElements().size() - 1);
return document;
} catch (Exception e) {
throw new Exception("Unable to replace image '" + imageOldName + "' due to the exception:\n" + e);
}
}
The image with placeholder:
enter image description here
The resulting image:
enter image description here
To replace picture templates in Microsoft Word there is no need to delete them.
The storage is as so:
The embedded media is stored as binary file. This is the picture data (XWPFPictureData). In the document a picture element (XWPFPicture) links to that picture data.
The XWPFPicture has settings for position, size and text flow. These dont need to be changed.
The changing is needed in XWPFPictureData. There one can replace the old binary content with the new.
So the need is to find the XWPFPicture in the document. There is a non visual picture name stored while inserting the picture in the document. So if one knows that name, then this could be a criteriea to find the picture.
If found one can get the XWPFPictureData from found XWPFPicture. There is method XWPFPicture.getPictureDatato do so. Then one can replace the old binary content of XWPFPictureData with the new. XWPFPictureData is a package part. So it has PackagePart.getOutputStream to get an output stream to write to.
Following complete example shows that all.
The source.docx needs to have an embedded picture named "QRTemplate.jpg". This is the name of the source file used while inserting the picture into Word document using Word GUI. And there needs to be a file QR.jpg which contains the new content.
The result.docx then has all pictures named "QRTemplate.jpg" replaced with the content of the given file QR.jpg.
import java.io.FileInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class WordReplacePictureData {
static XWPFPicture getPictureByName(XWPFRun run, String pictureName) {
if (pictureName == null) return null;
for (XWPFPicture picture : run.getEmbeddedPictures()) {
String nonVisualPictureName = picture.getCTPicture().getNvPicPr().getCNvPr().getName();
if (pictureName.equals(nonVisualPictureName)) {
return picture;
}
}
return null;
}
static void replacePictureData(XWPFPictureData source, String pictureResultPath) {
try ( FileInputStream in = new FileInputStream(pictureResultPath);
OutputStream out = source.getPackagePart().getOutputStream();
) {
byte[] buffer = new byte[2048];
int length;
while ((length = in.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
static void replacePicture(XWPFRun run, String pictureName, String pictureResultPath) {
XWPFPicture picture = getPictureByName(run, pictureName);
if (picture != null) {
XWPFPictureData source = picture.getPictureData();
replacePictureData(source, pictureResultPath);
}
}
public static void main(String[] args) throws Exception {
String templatePath = "./source.docx";
String resultPath = "./result.docx";
String pictureTemplateName = "QRTemplate.jpg";
String pictureResultPath = "./QR.jpg";
try ( XWPFDocument document = new XWPFDocument(new FileInputStream(templatePath));
FileOutputStream out = new FileOutputStream(resultPath);
) {
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
for (XWPFRun run : paragraph.getRuns()) {
replacePicture(run, pictureTemplateName, pictureResultPath);
}
}
}
document.write(out);
}
}
}
I have a dirty workaround. Since the text block on the right side of the image is static, I replaced the text with screen-shot on the original docx. And now, when the placeholder image been substituted by the new image, everything is rendered as expected.

Background image itextpdf 5.5

I'm using itextpdf 5.5, can't change it to 7.
I have the problem with background image.
I have a document (text and tables) without stamp and I want to add stamp to it.
This is how I download existing doc.
PdfReader pdfReader = new PdfReader("/doc.pdf");
PdfImportedPage page = writer.getImportedPage(pdfReader, 1);
PdfContentByte pcb = writer.getDirectContent();
pcb.addTemplate(page, 0,0);
And this is how I download stamp image and add it to my doc.
PdfContentByte canvas = writer.getDirectContentUnder();
URL resource = getClass().getResource(getStamp());
Image background = new Jpeg(resource);
background.scaleToFit(463F, 132F);
background.setAbsolutePosition(275F, 100F);
canvas.addImage(background);
But when I download my document - I don't see the stamp. I tried to change getDirectContent() to getDirectContentUnder() when I download my doc but this leads to the opposite situation - my stamp isn't in background.
My first doc is generated this way.
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Document document = new Document(PageSize.A4);
try {
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
Paragraph title = new Paragraph(formatUtil.msg("my.header"), fontBold);
title.setAlignment(Element.ALIGN_CENTER);
document.add(title);
Template tmpl = fmConfig.getConfiguration().getTemplate("template.ftl");
Map<String, Object> params = new HashMap<>();
StringWriter writer = new StringWriter();
params.put("param", "param");
tmpl.process(params, writer);
document.add(new Paragraph(writer.toString(), fontCommon));
PdfPTable table = new PdfPTable(2);
document.add(table);
PdfContentByte canvas = writer.getDirectContentUnder();
Image background = new Jpeg(getClass().getResource("background.jpg"));
background.scaleAbsolute(PageSize.A4);
background.setAbsolutePosition(0,0);
canvas.addImage(background);
} finally {
if (document.isOpen()) {
document.close();
}
}
In comments it became clear that the task was to add some content (a bitmap image) to a PDF so that it is over the background (another bitmap image) added to the UnderContent during generation and under the text in the original DirectContent.
iText does not contain high-level code for such content manipulation. While the structure of iText generated PDFs would allow for such code, PDFs generated or manipulated by other PDF libraries may have a different structure; the structure in iText generated PDFs may even be changed during post-processing using other libraries. Thus, it is understandable that no high-level feature for this is provided by iText.
To implement the task nonetheless, therefore, we have to base our code on lower level iText APIs. In this context we can make use of the PdfContentStreamEditor helper class from this answer which already abstracts some details away. (That question originally is about iTextSharp for C# but further down in the answer Java versions of the code also are provided.)
In detail, we extend the PdfContentStreamEditor to remove the former UnderContent (and provide it as list of instructions). Now we can in a first step apply this editor to your file and in a second step add this former UnderContent plus an image over it to the intermediary file. (This could also be done in a single step but that would require a more complex, less maintainable editor class.)
First the new UnderContentRemover content stream editor class:
public class UnderContentRemover extends PdfContentStreamEditor {
/**
* Clears state of {#link UnderContentRemover}, in particular
* the collected content. Use this if you use this instance for
* multiple edit runs.
*/
public void clear() {
afterUnderContent = false;
underContent.clear();
depth = 0;
}
/**
* Retrieves the collected UnderContent instructions
*/
public List<List<PdfObject>> getUnderContent() {
return new ArrayList<List<PdfObject>>(underContent);
}
/**
* Adds the given instructions (which may previously have been
* retrieved using {#link #getUnderContent()}) to the given
* {#link PdfContentByte} instance.
*/
public static void write (PdfContentByte canvas, List<List<PdfObject>> operations) throws IOException {
for (List<PdfObject> operands : operations) {
int index = 0;
for (PdfObject object : operands) {
object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '\n');
}
}
}
protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {
String operatorString = operator.toString();
if (afterUnderContent) {
super.write(processor, operator, operands);
return;
} else if ("q".equals(operatorString)) {
depth++;
} else if ("Q".equals(operatorString)) {
depth--;
if (depth < 1)
afterUnderContent = true;
} else if (depth == 0) {
afterUnderContent = true;
super.write(processor, operator, operands);
return;
}
underContent.add(new ArrayList<>(operands));
}
boolean afterUnderContent = false;
List<List<PdfObject>> underContent = new ArrayList<>();
int depth = 0;
}
(UnderContentRemover)
As you see, its write method stores the leading instructions forwarded to it in the underContent list until it finds the restore-graphics-state (Q) instruction matching the initial save-graphics-state instruction (q). After that it instead forwards all further instructions to the parent write implementation which writes them to the edited page content.
We can use this for the task at hand as follows:
PdfReader pdfReader = new PdfReader(YOUR_DOCUMENT);
List<List<List<PdfObject>>> underContentByPage = new ArrayList<>();
byte[] sourceWithoutUnderContent = null;
try ( ByteArrayOutputStream outputStream = new ByteArrayOutputStream() ) {
PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream);
UnderContentRemover underContentRemover = new UnderContentRemover();
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
underContentRemover.clear();
underContentRemover.editPage(pdfStamper, i);
underContentByPage.add(underContentRemover.getUnderContent());
}
pdfStamper.close();
pdfReader.close();
sourceWithoutUnderContent = outputStream.toByteArray();
}
Image background = YOUR_IMAGE_TO_ADD_INBETWEEN;
background.scaleToFit(463F, 132F);
background.setAbsolutePosition(275F, 100F);
pdfReader = new PdfReader(sourceWithoutUnderContent);
byte[] sourceWithStampInbetween = null;
try ( ByteArrayOutputStream outputStream = new ByteArrayOutputStream() ) {
PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream);
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
PdfContentByte canvas = pdfStamper.getUnderContent(i);
UnderContentRemover.write(canvas, underContentByPage.get(i-1));
canvas.addImage(background);
}
pdfStamper.close();
pdfReader.close();
sourceWithStampInbetween = outputStream.toByteArray();
}
Files.write(new File("PdfLikeVladimirSafonov-WithStampInbetween.pdf").toPath(), sourceWithStampInbetween);
(AddImageInBetween test testForVladimirSafonov)

PDFBox join two pdfside by side optimizing disk space

I am using PDFBox to join two PDFs side by side.
I am using the following code:
PDDocument outDoc = new PDDocument();
int maxPages = targetDoc.getNumberOfPages();
if (sourceDoc.getNumberOfPages() > targetDoc.getNumberOfPages()) {
maxPages = sourceDoc.getNumberOfPages();
}
PDPage sourceIndexPage;
PDPage targetIndexPage;
PDRectangle pdf1Frame;
PDRectangle pdf2Frame;
PDRectangle outPdfFrame;
COSDictionary dict;
PDPage outPdfPage;
LayerUtility layerUtility;
PDFormXObject sourceFormPDF;
PDFormXObject targetFormPDF;
AffineTransform afLeft;
AffineTransform afRight;
for (int indexPage = 0; indexPage < maxPages; indexPage++) {
// Create output PDF frame
try {
sourceIndexPage = sourceDoc.getPage(indexPage);
} catch (IndexOutOfBoundsException error) {
sourceDoc.addPage(new PDPage());
sourceIndexPage = targetDoc.getPage(indexPage);
}
try {
targetIndexPage = targetDoc.getPage(indexPage);
} catch (IndexOutOfBoundsException error) {
targetDoc.addPage(new PDPage());
targetIndexPage = targetDoc.getPage(indexPage);
}
sourceIndexPage.setRotation(0);
targetIndexPage.setRotation(0);
pdf1Frame = sourceIndexPage.getCropBox();
pdf2Frame = targetIndexPage.getCropBox();
outPdfFrame = new PDRectangle(pdf1Frame.getWidth() + pdf2Frame.getWidth(),
Math.max(pdf1Frame.getHeight(), pdf2Frame.getHeight()));
// Create output page with calculated frame and add it to the document
dict = new COSDictionary();
dict.setItem(COSName.TYPE, COSName.PAGE);
dict.setItem(COSName.MEDIA_BOX, outPdfFrame);
dict.setItem(COSName.CROP_BOX, outPdfFrame);
dict.setItem(COSName.ART_BOX, outPdfFrame);
outPdfPage = new PDPage(dict);
outDoc.addPage(outPdfPage);
// Source PDF pages has to be imported as form XObjects to be able to insert them at a specific point in the output page
// pageNumber
layerUtility = new LayerUtility(outDoc);
sourceFormPDF = layerUtility.importPageAsForm(sourceDoc, indexPage);
targetFormPDF = layerUtility.importPageAsForm(targetDoc, indexPage);
// Add form objects to output page
afLeft = new AffineTransform();
layerUtility.appendFormAsLayer(outPdfPage, sourceFormPDF, afLeft, "left " + indexPage);
afRight = AffineTransform.getTranslateInstance(pdf1Frame.getWidth(), 0.0);
layerUtility.appendFormAsLayer(outPdfPage, targetFormPDF, afRight, "right" + indexPage);
}
outDoc.save("oudDoc.pdf");
The issue I have is that for some documents, the size of the outDoc is too high. I expected it to be something around dim source document + dim target document, but it is 10x, 20x more in reality.
Looking inside the document's structure, I noticed that I am repeating common resources that in the original PDFs were separated. Is there a way to compress/optimize my code to have less space on disk?
We solved the problem by postprocessing the generated pdf with ghostscript

PDFBox: How to "flatten" a PDF-form?

How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?
Same question was answered here:
a quick way to do this, is to remove the fields from the acrofrom.
For this you just need to get the document catalog, then the acroform
and then remove all fields from this acroform.
The graphical representation is linked with the annotation and stay in
the document.
So I wrote this code:
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
public class PdfBoxTest {
public void test() throws Exception {
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form-field --> stop");
return;
}
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
// set the text in the form-field <-- does work
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setValue("Test-String");
}
}
// remove form-field but keep text ???
// acroForm.getFields().clear(); <-- does not work
// acroForm.setFields(null); <-- does not work
// acroForm.setFields(new ArrayList()); <-- does not work
// ???
pdDoc.save("E:\\Form-Test-Result.pdf");
pdDoc.close();
}
}
With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().
Simplified code with an example call of this method:
//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
//Fill the document
...
//Flatten the document
pDAcroForm.flatten();
//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();
Note: dynamic XFA forms cannot be flatten.
For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.
This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)
This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.
First, edit the forms using Acrobat Pro. Make them hidden and read-only.
Then you need to use two libraries: PDFBox and PDFClown.
PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).
Single field example code:
// PDF Clown code
File file = new File("Some file path");
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");
PageStamper stamper = new PageStamper();
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());
// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path");
composer.setFont(font, 10);
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY();
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate));
// Actually delete the form field!
field.delete();
stamper.flush();
// Create new buffer to output to...
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard);
byte[] bytes = buffer.toByteArray();
// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
// Phew. Finally.
pdfDocument.save("Some file path");
Probably some typos here and there, but this should be enough to get the gist :)
After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1.
This is what documentation stands about that:
If set, the user may not change the value of the field.
Any associated widget annotations will not interact
with the user; that is, they will not respond to mouse
clicks or change their appearance in response to
mouse motions. This flag is useful for fields whose
values are computed or imported from a database.
so the code could look like that (using pdfbox lib):
public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> acroFormFields = form.getFields();
System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));
for(PDField field: acroFormFields) {
makeAcroFieldReadOnly(field);
}
}
private static void makeAcroFieldReadOnly(PDField field) {
field.getDictionary().setInt("Ff",1);
}
setReadOnly did work for me as shown below -
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setReadOnly(true);
}
}
Solution to flattening acroform AND retaining the form field values using pdfBox:
see solution at https://mail-archives.apache.org/mod_mbox/pdfbox-users/201604.mbox/%3C3BC7E352-9447-4458-AAC3-5A9B70B4CCAA#fileaffairs.de%3E
The solution that worked for me with pdfbox 2.0.1:
File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);
field.setValue("new-value");
pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();
I didn't need to do the 2 extra steps in the above solution link:
// correct the missing page link for the annotations
// Add the missing resources to the form
I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:
selected "create Pdf Form"
Submit format of "PDF" - I found this gave smaller pdf file size than selecting "FDF" but still operated as a pdf form.
Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.
In order to really "flatten" an acrobat form field there seems to be much more to do than at the first glance.
After examining the PDF standard I managed to achieve real flatening in three steps:
save field value
remove widgets
remove form field
All three steps can be done with pdfbox (I used 1.8.5). Below I will sketch how I did it.
A very helpful tool in order to understand whats going on is the PDF Debugger.
Save the field
This is the most complicated step of the three.
In order to save the field's value you have to save its content to the pdf's content for each of the field's widgets. Easiest way to do so is drawing each widget's appearance to the widget's page.
void saveFieldValue( PDField field ) throws IOException
{
PDDocument document = getDocument( field );
// see PDField.getWidget()
for( PDAnnotationWidget widget : getWidgets( field ) )
{
PDPage parentPage = getPage( widget );
try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true ))
{
writeContent( contentStream, widget );
}
}
}
void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget )
throws IOException
{
PDAppearanceStream appearanceStream = getAppearanceStream( widget );
PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() );
AffineTransform transformation = getPositioningTransformation( widget.getRectangle() );
contentStream.drawXObject( xobject, transformation );
}
The appearance is an XObject stream containing all of the widget's content (value, font, size, rotation, etc.). You simply need to place it at the right position on the page which you can extract from the widget's rectangle.
Remove widgets
As noted above each field may have multiple widgets. A widget takes care of how a form field can be edited, triggers, displaying when not editing and such stuff.
In order to remove one you have to remove it from its page's annotations.
void removeWidget( PDAnnotationWidget widget ) throws IOException
{
PDPage widgetPage = getPage( widget );
List<PDAnnotation> annotations = widgetPage.getAnnotations();
PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget );
if( deleteCandidate != null && annotations.remove( deleteCandidate ) )
widgetPage.setAnnotations( annotations );
}
Note that the annotations may not contain the exact PDAnnotationWidget since it's a kind of a wrapper. You have to remove the one with matching COSObject.
Remove form field
As final step you remove the form field itself. This is not very different to the other posts above.
void removeFormfield( PDField field ) throws IOException
{
PDAcroForm acroForm = field.getAcroForm();
List<PDField> acroFields = acroForm.getFields();
List<PDField> removeCandidates = getFields( acroFields, field.getPartialName() );
if( removeAll( acroFields, removeCandidates ) )
acroForm.setFields( acroFields );
}
Note that I used a custom removeAll method here since the removeCandidates.removeAll() didn't work as expected for me.
Sorry that I cannot provide all the code here but with the above you should be able to write it yourself.
I don't have enough points to comment but SJohnson's response of setting the field to read only worked perfectly for me. I am using something like this with PDFBox:
private void setFieldValueAndFlatten(PDAcroForm form, String fieldName, String fieldValue) throws IOException {
PDField field = form.getField(fieldName);
if(field != null){
field.setValue(fieldValue);
field.setReadonly(true);
}
}
This will write your field value and then when you open the PDF after saving it will have your value and not be editable.
This is the code I came up with after synthesizing all of the answers I could find on the subject. This handles flattening text boxes, combos, lists, checkboxes, and radios:
public static void flattenPDF (PDDocument doc) throws IOException {
//
// find the fields and their kids (widgets) on the input document
// (each child widget represents an appearance of the field data on the page, there may be multiple appearances)
//
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> tmpfields = form.getFields();
PDResources formresources = form.getDefaultResources();
Map formfonts = formresources.getFonts();
PDAnnotation ann;
//
// for each input document page convert the field annotations on the page into
// content stream
//
List<PDPage> pages = catalog.getAllPages();
Iterator<PDPage> pageiterator = pages.iterator();
while (pageiterator.hasNext()) {
//
// get next page from input document
//
PDPage page = pageiterator.next();
//
// add the fonts from the input form to this pages resources
// so the field values will display in the proper font
//
PDResources pageResources = page.getResources();
Map pageFonts = pageResources.getFonts();
pageFonts.putAll(formfonts);
pageResources.setFonts(pageFonts);
//
// Create a content stream for the page for appending
//
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
//
// Find the appearance widgets for all fields on the input page and insert them into content stream of the page
//
for (PDField tmpfield : tmpfields) {
List widgets = tmpfield.getKids();
if(widgets == null) {
widgets = new ArrayList();
widgets.add(tmpfield.getWidget());
}
Iterator<COSObjectable> widgetiterator = widgets.iterator();
while (widgetiterator.hasNext()) {
COSObjectable next = widgetiterator.next();
if (next instanceof PDField) {
PDField foundfield = (PDField) next;
ann = foundfield.getWidget();
} else {
ann = (PDAnnotation) next;
}
if (ann.getPage().equals(page)) {
COSDictionary dict = ann.getDictionary();
if (dict != null) {
if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if (ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSStream stream = (COSStream) ap.getDictionaryObject("N");
if (stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
contentStream.appendRawCommands("Q\n");
}
} else if (tmpfield instanceof PDChoiceButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if(ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS);
COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D);
if (d != null) {
COSStream stream = (COSStream) d.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
if (!(tmpfield instanceof PDCheckbox)){
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
}
COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N);
if (n != null) {
COSStream stream = (COSStream) n.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
contentStream.appendRawCommands("Q\n");
}
}
}
}
}
}
// delete any field widget annotations and write it all to the page
// leave other annotations on the page
COSArrayList newanns = new COSArrayList();
List anns = page.getAnnotations();
ListIterator annotiterator = anns.listIterator();
while (annotiterator.hasNext()) {
COSObjectable next = (COSObjectable) annotiterator.next();
if (!(next instanceof PDAnnotationWidget)) {
newanns.add(next);
}
}
page.setAnnotations(newanns);
contentStream.close();
}
//
// Delete all fields from the form and their widgets (kids)
//
for (PDField tmpfield : tmpfields) {
List kids = tmpfield.getKids();
if(kids != null) kids.clear();
}
tmpfields.clear();
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
}
Full class here:
https://gist.github.com/jribble/beddf7620536939f88db
This is the answer of Thomas, from the PDFBox-Mailinglist:
You will need to get the Fields over the COSDictionary. Try this
code...
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();
I thought I'd share our approach that worked with PDFBox 2+.
We've used the PDAcroForm.flatten() method.
The fields needed some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.
Finally what worked was the following:
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = PDDocument.load(new File(src));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
PDResources resources = new PDResources();
acroForm.setDefaultResources(resources);
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.flatten();
doc.save(dst);
doc.close();
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}
If the PDF document doesn't actually contain form fields but you still want to flatten other elements like markups, the following works quite well. FYI It was implemented for C#
public static void FlattenPdf(string fileName)
{
PDDocument doc = PDDocument.load(new java.io.File(fileName));
java.util.List annots = doc.getPage(0).getAnnotations();
for (int i = 0; i < annots.size(); ++i)
{
PDAnnotation annot = (PDAnnotation)annots.get(i);
annot.setLocked(true);
annot.setReadOnly(true);
annot.setNoRotate(true);
}
doc.save(fileName);
doc.close();
}
This effectively locks all markups in the document and they will no longer be editable.
pdfbox c# annotations

Don't know how to create a centered "image + text" watermark in pdf files with iText (Java)

I'm using the iText library, and I'm trying to add a watermark at the bottom of the page. The watermark is simple, it has to be centered an has an image on the left and a text on the right.
At this point, I have the image AND the text in a png format. I can calculate the position where I want to put the image (centered) calculating the page size and image size, but now I want to include the text AS text (better legibility, etc.).
Can I embed the image and the text in some component and then calculate the position like I'm doing now? Another solutions or ideas?
Here is my actual code:
try {
PdfReader reader = new PdfReader("example.pdf");
int numPages = reader.getNumberOfPages();
PdfStamper stamp = new PdfStamper(reader, new FileOutputStream("pdfWithWatermark.pdf"));
int i = 0;
Image watermark = Image.getInstance("watermark.png");
PdfContentByte addMark;
while (i < numPages) {
i++;
float x = reader.getPageSizeWithRotation(i).getWidth() - watermark.getWidth();
watermark.setAbsolutePosition(x/2, 15);
addMark = stamp.getUnderContent(i);
addMark.addImage(watermark);
}
stamp.close();
}
catch (Exception i1) {
logger.info("Exception adding watermark.");
i1.printStackTrace();
}
Thank you in advance!
you better check this:
import com.lowagie.text.*;
import java.io.*;
import com.lowagie.text.pdf.*;
import java.util.*;
class pdfWatermark
{
public static void main(String args[])
{
try
{
PdfReader reader = new PdfReader("text.pdf");
int n = reader.getNumberOfPages();
// Create a stamper that will copy the document to a new file
PdfStamper stamp = new PdfStamper(reader,
new FileOutputStream("text1.pdf"));
int i = 1;
PdfContentByte under;
PdfContentByte over;
Image img = Image.getInstance("watermark.jpg");
BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
BaseFont.WINANSI, BaseFont.EMBEDDED);
img.setAbsolutePosition(200, 400);
while (i < n)
{
// Watermark under the existing page
under = stamp.getUnderContent(i);
under.addImage(img);
// Text over the existing page
over = stamp.getOverContent(i);
over.beginText();
over.setFontAndSize(bf, 18);
over.showText("page " + i);
over.endText();
i++;
}
stamp.close();
}
catch (Exception de)
{}
}
}
(source)
is a bit ugly but, can't you add the image and the text to a table and then center it?

Categories

Resources