Incorrect alignment of text in PDF field with PDFBox

Incorrect alignment of text in PDF field with PDFBox - java

I'm trying to compile a PDF form programmatically with Java, using PDFBox 1.8.16.
My workflow has been the following:
I exported my design with InDesign as a PDF form with named fields
I wrote a Java script that parses a CSV and compile the form programmatically
The problem I'm getting is on the alignment of the text inside the field, after the completion. As you can see, the image aligns to the bottom of the text field.
The strange behavior I'm getting is when I select the field just like I'd do to edit the content: when I do that, the text aligns correctly.
The code I'm using to fill the text field is:
private void setField(String fieldName, String content, String fontName, Boolean big) throws IOException {
PDDocumentCatalog docCatalog;
PDAcroForm acroForm;
PDField field;
COSDictionary dict;
COSString defaultAppearance;
docCatalog = pdfTemplate.getDocumentCatalog();
acroForm = docCatalog.getAcroForm();
field = acroForm.getField(fieldName);
dict = field.getDictionary();
defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
if (defaultAppearance != null) {
if (big) {
dict.setString(COSName.DA, "/" + fontName + " 16.17 Tf 0 g");
} else {
dict.setString(COSName.DA, "/" + fontName + " 8.67 Tf 0 g");
}
if (field instanceof PDTextbox) {
field = new PDTextbox(acroForm, dict);
field.setValue(content);
}
}
}
I don't know what other info could be useful, but I'm ready to provide anything I'm asked about.

Related

PDFBox: No glyph for U+0054 in font AAAAAD+FreeSerifBold

my PDFBox throws following error: No glyph for U+0054 in font AAAAAD+FreeSerifBold.
I found several similar threads on stackoverflow but I couldn't fix my problem by them.
My code is similar to code example:
public QuoteWorkerPdf() throws IOException {
// Create PDF with one blank page
document = PDDocument.load(
getClass().getResourceAsStream("data/quote_template.pdf"));
page = (PDPage) document.getDocumentCatalog().getPages().get(0);
printable = new PDFPrintable(document);
// get the document catalog
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null)
{
// Retrieve an individual field and set its value.
PDTextField field = (PDTextField) acroForm.getField( "q2_quotationPrepared" );
field.setValue("TextEntry");
// If a field is nested within the form tree a fully qualified name
// might be provided to access the field.
//field = (PDTextField) acroForm.getField( "fieldsContainer.nestedSampleField" );
//field.setValue("Text Entry");
}
// Save and close the filled out form.
document.save("target/FillFormField.pdf");
}
U+0054 is "T" which is the first letter of the string.
For pdf form creation I use www.jotform.com.
Does anybody know how can I solve this?
Stacktrace:
Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+0054 in font AAAAAD+FreeSerifBold
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:363)
at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:398)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:324)
at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:353)
at org.apache.pdfbox.pdmodel.interactive.form.PlainText$Paragraph.getLines(PlainText.java:174)
at org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:182)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:508)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:364)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:237)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
at aaalabel.diefinder.QuoteWorkerPdf.<init>(QuoteWorkerPdf.java:69)
at aaalabel.diefinder.QuoteWorkerPdf.main(QuoteWorkerPdf.java:114)

This code is tailored to your file. It changes the default appearance string to use a different font. See also this answer that is somewhat related but more general.
The problem with your input file is that the font used in the field is subsetted, so it doesn't have all glyphs you would expect.
PDDocument doc = PDDocument.load(new File("82667884384374 (1).pdf"));
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDTextField field = (PDTextField) acroForm.getField("q2_quotationPrepared");
COSName helvName = acroForm.getDefaultResources().add(PDType1Font.HELVETICA); // use different font if you want. Do not subset!
field.setDefaultAppearance("/" + helvName.getName() + " 10 Tf 0 g"); // modifies your existing DA string
field.setValue("TextEntry");
doc.save(new File("82667884384374 (1)-new.pdf"));
doc.close();

Why are Form Fields set with PDFBox(2.0.11) not being displayed?

I'm using PDFBox 2.0.11 to open a PDF Form and pulling out the values. This works as expected. When I try to set a value it appears to work. When I open the PDF the value is not displayed. If I click in the field, the value is then displayed as set, but then disappears again when I click out of the field.
This seems to be a common issue, but none of the fixes I've seen seem to work.
if(file.exists())
{
PDDocument doc = PDDocument.load(file);
doc.setAllSecurityToBeRemoved(true);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
// Add Font
PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), PDType1Font.HELVETICA);
form.setDefaultResources(resources);
// End Add Font
form.setNeedAppearances(false);
List<PDField> fields = form.getFields();
for (Object field : fields)
{
if (field instanceof PDTextField) {
PDTextField pdTextbox = (PDTextField) field;
System.out.println("PDTextBox " + pdTextbox.getFullyQualifiedName() + " " + pdTextbox.getValue());
if(pdTextbox.getFullyQualifiedName().equalsIgnoreCase("a3_5"))
{
try {
pdTextbox.getWidgets().get(0).setHidden(false);
pdTextbox.setValue("5500");
}
catch(Exception e){
e.printStackTrace();
}
}
}
else
{
System.out.print(field);
System.out.print(" = ");
System.out.print(field.getClass());
System.out.println();
}
}
doc.save("..._MINE_UPDATE.pdf");
doc.close();
}
Stack Trace
java.io.IOException: Could not find font: /Helvetica
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processSetFont(PDDefaultAppearanceString.java:179)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processOperator(PDDefaultAppearanceString.java:132)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processAppearanceStringOperators(PDDefaultAppearanceString.java:108)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.<init>(PDDefaultAppearanceString.java:86)
at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.<init>(AppearanceGeneratorHelper.java:100)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:262)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
at com.controller.TestPDFBox.loadData(TestPDFBox.java:87)
Skipped for loop
for (COSName fontResourceName : widgetResources.getFontNames())
{
try
{
if (acroFormResources.getFont(fontResourceName) == null)
{
LOG.debug("Adding font resource " + fontResourceName + " from widget to AcroForm");
acroFormResources.put(fontResourceName, widgetResources.getFont(fontResourceName));
}
}
catch (IOException e)
{
LOG.warn("Unable to match field level font with AcroForm font");
}
}

Thanks to Tilman Hausherr for helping me get to an answer that ultimately stemmed from a quirk with MacOSX's Preview application.
Preview for some reason strips out functionality that also causes you to be unable to set values correctly in the PDF.
The code above works correctly, although I did make a change to the Add Font section to the following.
// Add Font
PDResources resources = form.getDefaultResources();
if(resources == null)
{
resources = new PDResources();
}
resources.put(COSName.getPDFName("Helvetica"), PDType1Font.HELVETICA);
if(form.getDefaultResources() == null)
{
form.setDefaultResources(resources);
}
// End Add Font

In case it is not obvious: Don't create / edit / save your template pdf with Mac's Preview to use with PDFBox.
I ran into the same issue and had to re-create the PDF in Acrobat Pro. With this PDF the above code worked perfectly fine.

How to resolve "...is not available in this font's encoding"?

So I am using PDFBox to fill in some pdfs. So far everything was great - I created a form in pdf with Avenir Light font, and I could fill it in. However, the problem that just now showed up, is that when I am trying to fill the pdf using letters such as ł, ą, ć ... I get the following error:
U+0142 is not available in this font's encoding: MacRomanEncoding with differences
with different numbers.
Now, my question is - how can I fix this, so that I can fill the form automatically? When I open the pdf in Acrobat Reader, I can insert those letters, and I dont get any errors. Here is how I set the field:
public void setField(PDDocument document, PDField field, String value ) throws IOException {
if( field != null && value != null) {
try{
field.setValue(value);
} catch (Exception e){
e.printStackTrace();
}
}
else {
System.err.println( "No field found with name:" + field.getPartialName() );
}
}
UPDATE
I've been trying to upload my own Avenir-Light.tff like this:
PDFont font = PDType1Font.HELVETICA;
PDResources res = new PDResources();
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
acroForm.setDefaultAppearance(da);
However, this doesn't seem to have any impact on the printed fields, and throws almost the same message:
U+0104 ('Aogonek') is not available in this font Helvetica (generic: ArialMT) encoding: WinAnsiEncoding

PDFBox define 14 standard fonts in PDType1Font :
PDType1Font.TIMES_ROMAN PDType1Font.TIMES_BOLD
PDType1Font.TIMES_ITALI PDType1Font.TIMES_BOLD_ITALIC
PDType1Font.HELVETICA PDType1Font.HELVETICA_BOLD
PDType1Font.HELVETICA_OBLIQUE
PDType1Font.HELVETICA_BOLD_OBLIQUE PDType1Font.COURIER
PDType1Font.COURIER_BOLD PDType1Font.COURIER_OBLIQUE
PDType1Font.COURIER_BOLD_OBLIQUE PDType1Font.SYMBOL
PDType1Font.ZAPF_DINGBATS
So if you want to use Avenir-Light you have to load it from a .ttf file. You can do this as #TilmanHausherr suggested PDType0Font.load(doc, new File("path/Avenir-Light.ttf"), false).
PDFont font = PDType0Font.load(doc, new File("path/Avenir-Light.ttf"), false);
PDResources res = new PDResources();
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
acroForm.setDefaultAppearance(da);
Update
Do you know why it also displays a warning if form of: OpenType Layout
tables used in font Avenir-Light are not implemented in PDFBox and
will be ignored?
Avenir-light font uses OpenType Layout tables (Advanced Typographic) that PDFBox does not support yet. This advaned typographics will be ignored

Apache PDFBox - Need Help Reading a Form

I am trying to read a PDF form using the Apache PDFBox. As I figured, if I use the PDFTextStripper it retrieves the text not the form data. How do I read the form data that is entered in the PDF? I have just a few fields to read and they are simple String and number.
Thanks!
Update: I tried the following code:
PDDocumentCatalog catalog = pd.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
if (form == null) {
System.out.println("No form...stop");
return;
}
List<PDField> fields = new ArrayList<PDField>();
fields = form.getFields();
log.info("---Fields Size:"+fields.size());
for(int i=0;i<fields.size();i++) {
log.info("i:"+i);
PDField field = (PDField) fields.get(i);
String value = field.getValue();
String name = field.getFullyQualifiedName();
log.info("name:"+name);
}
This gives me the output: name:topmostSubform[0]

PDFBox: How to "flatten" a PDF-form?

How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?
Same question was answered here:
a quick way to do this, is to remove the fields from the acrofrom.
For this you just need to get the document catalog, then the acroform
and then remove all fields from this acroform.
The graphical representation is linked with the annotation and stay in
the document.
So I wrote this code:
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
public class PdfBoxTest {
public void test() throws Exception {
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form-field --> stop");
return;
}
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
// set the text in the form-field <-- does work
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setValue("Test-String");
}
}
// remove form-field but keep text ???
// acroForm.getFields().clear(); <-- does not work
// acroForm.setFields(null); <-- does not work
// acroForm.setFields(new ArrayList()); <-- does not work
// ???
pdDoc.save("E:\\Form-Test-Result.pdf");
pdDoc.close();
}
}

With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().
Simplified code with an example call of this method:
//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
//Fill the document
...
//Flatten the document
pDAcroForm.flatten();
//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();
Note: dynamic XFA forms cannot be flatten.
For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.

This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)
This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.
First, edit the forms using Acrobat Pro. Make them hidden and read-only.
Then you need to use two libraries: PDFBox and PDFClown.
PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).
Single field example code:
// PDF Clown code
File file = new File("Some file path");
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");
PageStamper stamper = new PageStamper();
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());
// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path");
composer.setFont(font, 10);
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY();
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate));
// Actually delete the form field!
field.delete();
stamper.flush();
// Create new buffer to output to...
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard);
byte[] bytes = buffer.toByteArray();
// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
// Phew. Finally.
pdfDocument.save("Some file path");
Probably some typos here and there, but this should be enough to get the gist :)

After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1.
This is what documentation stands about that:
If set, the user may not change the value of the field.
Any associated widget annotations will not interact
with the user; that is, they will not respond to mouse
clicks or change their appearance in response to
mouse motions. This flag is useful for fields whose
values are computed or imported from a database.
so the code could look like that (using pdfbox lib):
public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> acroFormFields = form.getFields();
System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));
for(PDField field: acroFormFields) {
makeAcroFieldReadOnly(field);
}
}
private static void makeAcroFieldReadOnly(PDField field) {
field.getDictionary().setInt("Ff",1);
}

setReadOnly did work for me as shown below -
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setReadOnly(true);
}
}

Solution to flattening acroform AND retaining the form field values using pdfBox:
see solution at https://mail-archives.apache.org/mod_mbox/pdfbox-users/201604.mbox/%3C3BC7E352-9447-4458-AAC3-5A9B70B4CCAA#fileaffairs.de%3E
The solution that worked for me with pdfbox 2.0.1:
File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);
field.setValue("new-value");
pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();
I didn't need to do the 2 extra steps in the above solution link:
// correct the missing page link for the annotations
// Add the missing resources to the form
I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:
selected "create Pdf Form"
Submit format of "PDF" - I found this gave smaller pdf file size than selecting "FDF" but still operated as a pdf form.
Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.

In order to really "flatten" an acrobat form field there seems to be much more to do than at the first glance.
After examining the PDF standard I managed to achieve real flatening in three steps:
save field value
remove widgets
remove form field
All three steps can be done with pdfbox (I used 1.8.5). Below I will sketch how I did it.
A very helpful tool in order to understand whats going on is the PDF Debugger.
Save the field
This is the most complicated step of the three.
In order to save the field's value you have to save its content to the pdf's content for each of the field's widgets. Easiest way to do so is drawing each widget's appearance to the widget's page.
void saveFieldValue( PDField field ) throws IOException
{
PDDocument document = getDocument( field );
// see PDField.getWidget()
for( PDAnnotationWidget widget : getWidgets( field ) )
{
PDPage parentPage = getPage( widget );
try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true ))
{
writeContent( contentStream, widget );
}
}
}
void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget )
throws IOException
{
PDAppearanceStream appearanceStream = getAppearanceStream( widget );
PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() );
AffineTransform transformation = getPositioningTransformation( widget.getRectangle() );
contentStream.drawXObject( xobject, transformation );
}
The appearance is an XObject stream containing all of the widget's content (value, font, size, rotation, etc.). You simply need to place it at the right position on the page which you can extract from the widget's rectangle.
Remove widgets
As noted above each field may have multiple widgets. A widget takes care of how a form field can be edited, triggers, displaying when not editing and such stuff.
In order to remove one you have to remove it from its page's annotations.
void removeWidget( PDAnnotationWidget widget ) throws IOException
{
PDPage widgetPage = getPage( widget );
List<PDAnnotation> annotations = widgetPage.getAnnotations();
PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget );
if( deleteCandidate != null && annotations.remove( deleteCandidate ) )
widgetPage.setAnnotations( annotations );
}
Note that the annotations may not contain the exact PDAnnotationWidget since it's a kind of a wrapper. You have to remove the one with matching COSObject.
Remove form field
As final step you remove the form field itself. This is not very different to the other posts above.
void removeFormfield( PDField field ) throws IOException
{
PDAcroForm acroForm = field.getAcroForm();
List<PDField> acroFields = acroForm.getFields();
List<PDField> removeCandidates = getFields( acroFields, field.getPartialName() );
if( removeAll( acroFields, removeCandidates ) )
acroForm.setFields( acroFields );
}
Note that I used a custom removeAll method here since the removeCandidates.removeAll() didn't work as expected for me.
Sorry that I cannot provide all the code here but with the above you should be able to write it yourself.

I don't have enough points to comment but SJohnson's response of setting the field to read only worked perfectly for me. I am using something like this with PDFBox:
private void setFieldValueAndFlatten(PDAcroForm form, String fieldName, String fieldValue) throws IOException {
PDField field = form.getField(fieldName);
if(field != null){
field.setValue(fieldValue);
field.setReadonly(true);
}
}
This will write your field value and then when you open the PDF after saving it will have your value and not be editable.

This is the code I came up with after synthesizing all of the answers I could find on the subject. This handles flattening text boxes, combos, lists, checkboxes, and radios:
public static void flattenPDF (PDDocument doc) throws IOException {
//
// find the fields and their kids (widgets) on the input document
// (each child widget represents an appearance of the field data on the page, there may be multiple appearances)
//
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> tmpfields = form.getFields();
PDResources formresources = form.getDefaultResources();
Map formfonts = formresources.getFonts();
PDAnnotation ann;
//
// for each input document page convert the field annotations on the page into
// content stream
//
List<PDPage> pages = catalog.getAllPages();
Iterator<PDPage> pageiterator = pages.iterator();
while (pageiterator.hasNext()) {
//
// get next page from input document
//
PDPage page = pageiterator.next();
//
// add the fonts from the input form to this pages resources
// so the field values will display in the proper font
//
PDResources pageResources = page.getResources();
Map pageFonts = pageResources.getFonts();
pageFonts.putAll(formfonts);
pageResources.setFonts(pageFonts);
//
// Create a content stream for the page for appending
//
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
//
// Find the appearance widgets for all fields on the input page and insert them into content stream of the page
//
for (PDField tmpfield : tmpfields) {
List widgets = tmpfield.getKids();
if(widgets == null) {
widgets = new ArrayList();
widgets.add(tmpfield.getWidget());
}
Iterator<COSObjectable> widgetiterator = widgets.iterator();
while (widgetiterator.hasNext()) {
COSObjectable next = widgetiterator.next();
if (next instanceof PDField) {
PDField foundfield = (PDField) next;
ann = foundfield.getWidget();
} else {
ann = (PDAnnotation) next;
}
if (ann.getPage().equals(page)) {
COSDictionary dict = ann.getDictionary();
if (dict != null) {
if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if (ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSStream stream = (COSStream) ap.getDictionaryObject("N");
if (stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
contentStream.appendRawCommands("Q\n");
}
} else if (tmpfield instanceof PDChoiceButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if(ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS);
COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D);
if (d != null) {
COSStream stream = (COSStream) d.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
if (!(tmpfield instanceof PDCheckbox)){
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
}
COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N);
if (n != null) {
COSStream stream = (COSStream) n.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
contentStream.appendRawCommands("Q\n");
}
}
}
}
}
}
// delete any field widget annotations and write it all to the page
// leave other annotations on the page
COSArrayList newanns = new COSArrayList();
List anns = page.getAnnotations();
ListIterator annotiterator = anns.listIterator();
while (annotiterator.hasNext()) {
COSObjectable next = (COSObjectable) annotiterator.next();
if (!(next instanceof PDAnnotationWidget)) {
newanns.add(next);
}
}
page.setAnnotations(newanns);
contentStream.close();
}
//
// Delete all fields from the form and their widgets (kids)
//
for (PDField tmpfield : tmpfields) {
List kids = tmpfield.getKids();
if(kids != null) kids.clear();
}
tmpfields.clear();
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
}
Full class here:
https://gist.github.com/jribble/beddf7620536939f88db

This is the answer of Thomas, from the PDFBox-Mailinglist:
You will need to get the Fields over the COSDictionary. Try this
code...
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();

I thought I'd share our approach that worked with PDFBox 2+.
We've used the PDAcroForm.flatten() method.
The fields needed some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.
Finally what worked was the following:
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = PDDocument.load(new File(src));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
PDResources resources = new PDResources();
acroForm.setDefaultResources(resources);
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.flatten();
doc.save(dst);
doc.close();
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}

If the PDF document doesn't actually contain form fields but you still want to flatten other elements like markups, the following works quite well. FYI It was implemented for C#
public static void FlattenPdf(string fileName)
{
PDDocument doc = PDDocument.load(new java.io.File(fileName));
java.util.List annots = doc.getPage(0).getAnnotations();
for (int i = 0; i < annots.size(); ++i)
{
PDAnnotation annot = (PDAnnotation)annots.get(i);
annot.setLocked(true);
annot.setReadOnly(true);
annot.setNoRotate(true);
}
doc.save(fileName);
doc.close();
}
This effectively locks all markups in the document and they will no longer be editable.
pdfbox c# annotations

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Incorrect alignment of text in PDF field with PDFBox - java

Related

PDFBox: No glyph for U+0054 in font AAAAAD+FreeSerifBold

Why are Form Fields set with PDFBox(2.0.11) not being displayed?

How to resolve "...is not available in this font's encoding"?

Apache PDFBox - Need Help Reading a Form

PDFBox: How to "flatten" a PDF-form?

Categories

Resources