I'm having an interactive PDF with a couple of fields. When some of the fields are filled in the other ones are calculated. In Adobe Acrobat Reader this works fine.
Now when I fill in the document as follows:
public static void setField(PDDocument pdfDocument, String name, String value ) throws IOException {
PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDField field = acroForm.getField( name );
if( field != null ) {
field.setValue(value);
} else {
System.err.println( "No field found with name:" + name );
}
}
The fields are filled in but I have two problems:
For every field I get:
May 04, 2021 11:57:04 AM org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper getFormattedValue
INFO: Field contains a formatting action but no ScriptingHandler has been supplied - formatted value might be incorrect
The fields that are normally auto calculated are not filled in. Do I need to trigger some actions or is it because the field is not formatted like a string or a number?
I need to remove property in Text (setRise) , if t.setRise(+-) gets out of fields paper.
PdfDocument pdfDoc = new PdfDocument(pdfWriter);
Document doc = new Document(pdfDoc, PageSize.A5);
doc.setMargins(0,0,0,36);
for (int i = 0; i <50 ; i++) {
Text t = new Text("hello " + i);
if(i ==0){
t.setTextRise(7);
}
if(i==31){
t.setTextRise(-35);
}
Paragraph p = new Paragraph(t);
p.setNextRenderer(new ParagraphRen(p,doc));
p.setFixedLeading(fixedLeading);
doc.add(p);
}
doc.close();
}
class ParagraphRen extends ParagraphRenderer{
private float heightDoc;
private float marginTop;
private float marginBot;
public ParagraphRen(Paragraph modelElement, Document doc) {
super(modelElement);
this.heightDoc =doc.getPdfDocument().getDefaultPageSize().getHeight();
this.marginTop = doc.getTopMargin();
this.marginBot = doc.getBottomMargin();
}
#Override
public void drawChildren(DrawContext drawContext) {
super.drawChildren(drawContext);
Rectangle rect = this.getOccupiedAreaBBox();
List<IRenderer> childRenderers = this.getChildRenderers();
//check first line
if(rect.getTop()<=heightDoc- marginTop) {
for (IRenderer iRenderer : childRenderers) {
if (iRenderer.getModelElement().hasProperty(72)) {
Object property = iRenderer.getModelElement().getProperty(72);
float v = (Float) property + rect.getTop();
//check text more AreaPage
if(v >heightDoc){
iRenderer.getModelElement().deleteOwnProperty(72);
}
}
}
}
//check last line
if(rect.getBottom()-marginBot-rect.getHeight()*2<0){
for (IRenderer iRenderer : childRenderers) {
if (iRenderer.getModelElement().hasProperty(72)) {
Object property = iRenderer.getModelElement().getProperty(72);
//if setRise(-..) more margin bottom setRise remove
if(rect.getBottom()-marginBot-rect.getHeight()+(Float) property<0)
iRenderer.getModelElement().deleteOwnProperty(72);
}
}
}
}
}
Here i check if first lines with setRise more the paper area I remove setRise property.
And if last lines with serRise(-35) more then margin bottom I remove it.
But it doesn't work. Properties don't remove.
Your problem is as follows: drawChildren method gets called after rendering has been done. At this stage iText usually doesn't consider properties of any elements: it just places the element in its occupied area, which has been calculated before, at layout() stage.
You can overcome it with layout emulation.
Let's add all your paragraphs to a div rather than directly to the document. Then emulate adding this div to the document:
LayoutResult result = div.createRendererSubTree().setParent(doc.getRenderer()).layout(new LayoutContext(new LayoutArea(0, PageSize.A5)));
In the snippet above I've tried to layout our div on a A5-sized document.
Now you can consider the result of layout and change some elements, which will be then processed for real with Document#add. For example, to get the 30th layouted paragraph one can use:
((DivRenderer)result.getSplitRenderer()).getChildRenderers().get(30);
Some more tips:
split renderer represent the part of the content which iText can place on the area, overflow - the content which overflows.
Please, how i can add titles of the Chapters in ColumnText?
I need make PDF like this:
| ColumnText column1 | ColumnText column2 |
| PdfPTable with content | PdfPTable with content |
| | Chapter 2 title |
| Chapter 1 title | |
And then add TOC to this document.
I make document with ColumnText and table in it. But can't add Chapter in table.
I can add Chapter only to the document body, but in this case title of Chapter not in ColumnText.
Image of one page of the result document here
Your question isn't clear in the sense that you don't tell us if you want a TOC like this:
If this is the case, you are using the wrong terminology, as what you see in the Bookmarks panel can be referred to as Outlines or bookmarks.
If you say you want a TOC, you want something like this:
I mention both, because you talk about the Chapter (a class you should no longer use) and that class creates bookmarks/outlines, not a TOC.
I have create a PDF file that has both, bookmarks and a TOC: columns_with_toc.pdf. Please take a look at the CreateTOCinColumn example to find out how it's done.
Just like you, I create a ColumnText object with titles and tables:
ColumnText ct = new ColumnText(writer.getDirectContent());
int start;
int end;
for (int i = 0; i <= 20; ) {
start = (i * 10) + 1;
i++;
end = i * 10;
String title = String.format("Numbers from %s to %s", start, end);
Chunk c = new Chunk(title);
c.setGenericTag(title);
ct.addElement(c);
ct.addElement(createTable(start, end));
}
int column = 0;
do {
if (column == 3) {
document.newPage();
column = 0;
}
ct.setSimpleColumn(COLUMNS[column++]);
} while (ColumnText.hasMoreText(ct.go()));
The result looks like this:
In spite of the rules for posting a question on StackOverflow, you didn't post a code sample, but there is at least one difference between your code and mine:
c.setGenericTag(title);
In this line, we declare a generic tag. This tag is used by the TOCEntry class that looks like this:
public class TOCCreation extends PdfPageEventHelper {
protected PdfOutline root;
protected List<TOCEntry> toc = new ArrayList<TOCEntry>();
public TOCCreation() {
}
public void setRoot(PdfOutline root) {
this.root = root;
}
public List<TOCEntry> getToc() {
return toc;
}
#Override
public void onGenericTag(PdfWriter writer, Document document, Rectangle rect, String text) {
PdfDestination dest = new PdfDestination(PdfDestination.XYZ, rect.getLeft(), rect.getTop(), 0);
new PdfOutline(root, dest, text);
TOCEntry entry = new TOCEntry();
entry.action = PdfAction.gotoLocalPage(writer.getPageNumber(), dest, writer);
entry.title = text;
toc.add(entry);
}
}
As you can see, we create a PdfDestination based on the position of the title:
PdfDestination dest = new PdfDestination(PdfDestination.XYZ, rect.getLeft(), rect.getTop(), 0);
If you want bookmarks, you can create a PdfOutline like this:
new PdfOutline(root, dest, text);
If you want a TOC, you can store a String and a PdfAction in a List:
TOCEntry entry = new TOCEntry();
entry.action = PdfAction.gotoLocalPage(writer.getPageNumber(), dest, writer);
entry.title = text;
toc.add(entry);
Now that we understand the TOCCreation class, we take a look at how to use it:
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
TOCCreation event = new TOCCreation();
writer.setPageEvent(event);
document.open();
event.setRoot(writer.getRootOutline())
We create an event object, pass it to the writer and after we've opened the document, we pass the root of the outline tree to the event. The bookmarks will be created automatically, the TOC won't. If you want to add the TOC, you need something like this:
document.newPage();
for (TOCEntry entry : event.getToc()) {
Chunk c = new Chunk(entry.title);
c.setAction(entry.action);
document.add(new Paragraph(c));
}
You now have a list of titles which you can click to jump to the corresponding table.
I have form PDF file as shown in image.FORM_PDF
Using PDFBox in Java I have retrieved text of the form fields.
My Code:
File file = new File("example.pdf");
PDDocument doc = PDDocument.load(file);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
PDFieldTree fields = form.getFieldTree();
for (PDField field : fields) {
Object value = field.getValueAsString();
String name = field.getPartialName();
System.out.print(name);
System.out.print(" = ");
System.out.print(value);
System.out.println();
}
Output :
Given Name Text Box = Jignesh
Family Name Text Box = Jignesh
House nr Text Box = xyz
Address 2 Text Box = pqr
I want below also to be retrieved
Given Name:
Family Name:
Address 1:
as
Given Name Text = Given Name:
Family Name Text = Family Name:
House nr Text = Address 1:
Address 2 Text = Address 2:
Since above were form fields all fields were retrieved easily. I want to extract even the labels of the form, since I want to map both of them.
Please help with the same.
Thanks a lot.
The PDFbox content stream is done per page, but the fields come from the form which comes from the catalog, which comes from the pdf doc itself. So I'm not sure which fields are on which pages, and its causing to write text out to incorrect locations/pages.
ie. I'm processing fields per page, but not sure which fields are on which pages.
Is there a way to tell which field is on which page? Or, is there a way to get just the fields on the current page?
Thank you!
Mark
code snippet:
PDDocument pdfDoc = PDDocument.load(file);
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
// Get field names
List<PDField> fieldList = acroForm.getFields();
List<PDPage> pages = pdfDoc.getDocumentCatalog().getAllPages();
for (PDPage page : pages) {
PDPageContentStream contentStream = new PDPageContentStream(pdfDoc, page, true, true, true);
processFields(acroForm, fieldList, contentStream, page);
contentStream.close();
}
The PDFbox content stream is done per page, but the fields come from the form which comes from the catalog, which comes from the pdf doc itself. So I'm not sure which fields are on which pages
The reason for this is that PDFs contain a global object structure defining the form. A form field in this structure may have 0, 1, or more visualizations on 0, 1, or more actual PDF pages. Furthermore, in case of only 1 visualization, a merge of field object and visualization object is allowed.
PDFBox 1.8.x
Unfortunately PDFBox in its PDAcroForm and PDField objects represents only this object structure and does not provide easy access to the associated pages. By accessing the underlying structures, though, you can build the connection.
The following code should make clear how to do that:
#SuppressWarnings("unchecked")
public void printFormFields(PDDocument pdfDoc) throws IOException {
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
List<PDPage> pages = docCatalog.getAllPages();
Map<COSDictionary, Integer> pageNrByAnnotDict = new HashMap<COSDictionary, Integer>();
for (int i = 0; i < pages.size(); i++) {
PDPage page = pages.get(i);
for (PDAnnotation annotation : page.getAnnotations())
pageNrByAnnotDict.put(annotation.getDictionary(), i + 1);
}
PDAcroForm acroForm = docCatalog.getAcroForm();
for (PDField field : (List<PDField>)acroForm.getFields()) {
COSDictionary fieldDict = field.getDictionary();
List<Integer> annotationPages = new ArrayList<Integer>();
List<COSObjectable> kids = field.getKids();
if (kids != null) {
for (COSObjectable kid : kids) {
COSBase kidObject = kid.getCOSObject();
if (kidObject instanceof COSDictionary)
annotationPages.add(pageNrByAnnotDict.get(kidObject));
}
}
Integer mergedPage = pageNrByAnnotDict.get(fieldDict);
if (mergedPage == null)
if (annotationPages.isEmpty())
System.out.printf("i Field '%s' not referenced (invisible).\n", field.getFullyQualifiedName());
else
System.out.printf("a Field '%s' referenced by separate annotation on %s.\n", field.getFullyQualifiedName(), annotationPages);
else
if (annotationPages.isEmpty())
System.out.printf("m Field '%s' referenced as merged on %s.\n", field.getFullyQualifiedName(), mergedPage);
else
System.out.printf("x Field '%s' referenced as merged on %s and by separate annotation on %s. (Not allowed!)\n", field.getFullyQualifiedName(), mergedPage, annotationPages);
}
}
Beware, there are two shortcomings in the PDFBox PDAcroForm form field handling:
The PDF specification allows the global object structure defining the form to be a deep tree, i.e. the actual fields do not have to be direct children of the root but may be organized by means of inner tree nodes. PDFBox ignores this and expects the fields to be direct children of the root.
Some PDFs in the wild, foremost older ones, do not contain the field tree but only reference the field objects from the pages via the visualizing widget annotations. PDFBox does not see these fields in its PDAcroForm.getFields list.
PS: #mikhailvs in his answer correctly shows that you can retrieve a page object from a field widget using PDField.getWidget().getPage() and determine its page number using catalog.getAllPages().indexOf. While being fast this getPage() method has a drawback: It retrieves the page reference from an optional entry of the widget annotation dictionary. Thus, if the PDF you process has been created by software that fills that entry, all is well, but if the PDF creator has not filled that entry, all you get is a null page.
PDFBox 2.0.x
In 2.0.x some methods for accessing the elements in question have changed but not the situation as a whole, to safely retrieve the page of a widget you still have to iterate through the pages and find a page that references the annotation.
The safe methods:
int determineSafe(PDDocument document, PDAnnotationWidget widget) throws IOException
{
COSDictionary widgetObject = widget.getCOSObject();
PDPageTree pages = document.getPages();
for (int i = 0; i < pages.getCount(); i++)
{
for (PDAnnotation annotation : pages.get(i).getAnnotations())
{
COSDictionary annotationObject = annotation.getCOSObject();
if (annotationObject.equals(widgetObject))
return i;
}
}
return -1;
}
The fast method
int determineFast(PDDocument document, PDAnnotationWidget widget)
{
PDPage page = widget.getPage();
return page != null ? document.getPages().indexOf(page) : -1;
}
Usage:
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
if (acroForm != null)
{
for (PDField field : acroForm.getFieldTree())
{
System.out.println(field.getFullyQualifiedName());
for (PDAnnotationWidget widget : field.getWidgets())
{
System.out.print(widget.getAnnotationName() != null ? widget.getAnnotationName() : "(NN)");
System.out.printf(" - fast: %s", determineFast(document, widget));
System.out.printf(" - safe: %s\n", determineSafe(document, widget));
}
}
}
(DetermineWidgetPage.java)
(In contrast to the 1.8.x code the safe method here simply searches for the page of a single field. If in your code you have to determine the page of many widgets, you should create a lookup Map like in the 1.8.x case.)
Example documents
A document for which the fast method fails: aFieldTwice.pdf
A document for which the fast method works: test_duplicate_field2.pdf
Granted this answer may not help the OP (a year later), but if someone else runs into it, here is the solution:
PDDocumentCatalog catalog = doc.getDocumentCatalog();
int pageNumber = catalog.getAllPages().indexOf(yourField.getWidget().getPage());
This example uses Lucee (cfml) https://lucee.org/
A big thank you to mkl as his answer above is invaluable and I couldn't have built this function without his help.
Call the function: pageForSignature(doc, fieldName) and it will return the page no that the fieldname resides on. Returns -1 if fieldName not found.
<cfscript>
try{
/*
java is used by using CreateObject()
*/
variables.File = CreateObject("java", "java.io.File");
//references lucee bundle directory - typically on tomcat: /usr/local/tomcat/lucee-server/bundles
variables.PDDocument = CreateObject("java", "org.apache.pdfbox.pdmodel.PDDocument", "org.apache.pdfbox.app", "2.0.18")
function determineSafe(doc, widget){
var i = '';
var widgetObject = widget.getCOSObject();
var pages = doc.getPages();
var annotation = '';
var annotationObject = '';
for (i = 0; i < pages.getCount(); i=i+1){
for (annotation in pages.get(i).getAnnotations()){
if(annotation.getSubtype() eq 'widget'){
annotationObject = annotation.getCOSObject();
if (annotationObject.equals(widgetObject)){
return i;
}
}
}
}
return -1;
}
function pageForSignature(doc, fieldName){
try{
var acroForm = doc.getDocumentCatalog().getAcroForm();
var field = '';
var widget = '';
var annotation = '';
var pageNo = '';
for(field in acroForm.getFields()){
if(field.getPartialName() == fieldName){
for(widget in field.getWidgets()){
for(annotation in widget.getPage().getAnnotations()){
if(annotation.getSubtype() == 'widget'){
pageNo = determineSafe(doc, widget);
doc.close();
return pageNo;
}
}
}
}
}
return -1;
}catch(e){
doc.close();
writeDump(label="catch error",var='#e#');
}
}
doc = PDDocument.init().load(File.init('/**********/myfile.pdf'));
//returns no, page numbers start at 0
pageNo = pageForSignature(doc, 'twtzceuxvx');
writeDump(label="pageForSignature(doc, fieldName)", var="#pageNo#");
</cfscript
General solution for single or multiple widget of (duplicate qualified name of single page)..
List<PDAnnotationWidget> widget=field.getWidgets();
PDDocumentCatalog catalog = doc.getDocumentCatalog();
for(int i=0;i<widget.size();i++) {
int pageNumber = 1+ catalog.getPages().indexOf(field.getWidgets().get(i).getPage());
/* field co ordinate also can get here for single or multiple both it will work..*/
//PDRectangle r= widget.get(i).getRectangle();
}