Apache PDFBox: Get alignment and font from a PDAnnotationWidget or PDTextField

Apache PDFBox: Get alignment and font from a PDAnnotationWidget or PDTextField - java

I have an existing pdf file with form fields, which can be filled by a user. This form fields have a font and text alignment which were defined when creating the pdf file.
I use Apache PDFBox to find the form field in the pdf:
PDDocument document = PDDocument.load(pdfFile);
PDAcroForm form = document.getDocumentCatalog().getAcroForm();
PDTextField textField = (PDTextField)form.getField("anyFieldName");
if (textField == null) {
textField = (PDTextField)form.getField("fieldsContainer.anyFieldName");
}
List<PDAnnotationWidget> widgets = textField.getWidgets();
PDAnnotationWidget annotation = null;
if (widgets != null && !widgets.isEmpty()) {
annotation = widgets.get(0);
/* font and alignment needed here */
}
If I set the content of the form field with
textField.setValue("This is the text");
then the text in the form field has the same font and alignment as predefined for this field.
But I need the alignment and the font for a second field (which is not a form field btw.).
How to find out which alignment (left, center, right) and which font (I need a PDType1Font and its size in point) is defined for this form field? Sth. like font = annotation.getFont() and alignment = annotation.getAlignment() which both do not exist.
How to get font and alignment?
17: Edit
Where I need the font is this:
PDPageContentStream content = new PDPageContentStream(document, page, AppendMode.APPEND, false);
content.setFont(font, size); /* Here I need font and size from the text field above */
content.beginText();
content.showText("My very nice text");
content.endText();
I need the font for the setFont() call.

To get the PDFont, do this:
String defaultAppearance = textField.getDefaultAppearance(); // usually like "/Helv 12 Tf 0 0 1 rg"
Pattern p = Pattern.compile("\\/(\\w+)\\s(\\d+)\\s.*");
Matcher m = p.matcher(defaultAppearance);
if (!m.find() || m.groupCount() < 2)
{
// oh-oh
}
String fontName = m.group(1);
int fontSize = Integer.parseInt(m.group(2));
PDAnnotationWidget widget = textField.getWidgets().get(0);
PDResources res = widget.getAppearance().getNormalAppearance().getAppearanceStream().getResources();
PDFont fieldFont = res.getFont(COSName.getPDFName(fontName));
if (fieldFont == null)
{
fieldFont = acroForm.getDefaultResources().getFont(COSName.getPDFName(fontName));
}
System.out.println(fieldFont + "; " + fontSize);
This retrieves the font object from the resource dictionary of the resource dictionary of the first widget of your field. If the font isn't there, the default resource dictionary is checked. Note that there are no null checks, you need to add them. At the botton of the code you'll get a PDFont object and a number.
Re alignment, call getQ(), see also here.

Related

PDFBox: No glyph for U+0054 in font AAAAAD+FreeSerifBold

my PDFBox throws following error: No glyph for U+0054 in font AAAAAD+FreeSerifBold.
I found several similar threads on stackoverflow but I couldn't fix my problem by them.
My code is similar to code example:
public QuoteWorkerPdf() throws IOException {
// Create PDF with one blank page
document = PDDocument.load(
getClass().getResourceAsStream("data/quote_template.pdf"));
page = (PDPage) document.getDocumentCatalog().getPages().get(0);
printable = new PDFPrintable(document);
// get the document catalog
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null)
{
// Retrieve an individual field and set its value.
PDTextField field = (PDTextField) acroForm.getField( "q2_quotationPrepared" );
field.setValue("TextEntry");
// If a field is nested within the form tree a fully qualified name
// might be provided to access the field.
//field = (PDTextField) acroForm.getField( "fieldsContainer.nestedSampleField" );
//field.setValue("Text Entry");
}
// Save and close the filled out form.
document.save("target/FillFormField.pdf");
}
U+0054 is "T" which is the first letter of the string.
For pdf form creation I use www.jotform.com.
Does anybody know how can I solve this?
Stacktrace:
Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+0054 in font AAAAAD+FreeSerifBold
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:363)
at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:398)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:324)
at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:353)
at org.apache.pdfbox.pdmodel.interactive.form.PlainText$Paragraph.getLines(PlainText.java:174)
at org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:182)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:508)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:364)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:237)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
at aaalabel.diefinder.QuoteWorkerPdf.<init>(QuoteWorkerPdf.java:69)
at aaalabel.diefinder.QuoteWorkerPdf.main(QuoteWorkerPdf.java:114)

This code is tailored to your file. It changes the default appearance string to use a different font. See also this answer that is somewhat related but more general.
The problem with your input file is that the font used in the field is subsetted, so it doesn't have all glyphs you would expect.
PDDocument doc = PDDocument.load(new File("82667884384374 (1).pdf"));
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDTextField field = (PDTextField) acroForm.getField("q2_quotationPrepared");
COSName helvName = acroForm.getDefaultResources().add(PDType1Font.HELVETICA); // use different font if you want. Do not subset!
field.setDefaultAppearance("/" + helvName.getName() + " 10 Tf 0 g"); // modifies your existing DA string
field.setValue("TextEntry");
doc.save(new File("82667884384374 (1)-new.pdf"));
doc.close();

write on existing form-pdf with pdfbox

I am relativly new to Java and I want to replace an existing iText based Javascript with pdfbox. (Java 2.0)
I have a pdf-Formsheet (but this sheet has no Acroform entries) and I want to fill it with information (Name, Birthdate and so on). The pdf is in a rectangular special size (like a contact card).
My code so far:
File file = new File("ToBeFilled.pdf");
PDDocument document = PDDocument.load(file);
System.out.println("PDF loaded");
//Retrieving the page
PDPage page = (PDPage)document.getPages().get( 0 );
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream content = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true);
content.beginText();
//Setting the font to the Content stream
content.setFont(font, 30);
//Setting the position for the line (float x, float y), (0,0) = lower left corner
content.newLineAtOffset(100, 400);
String text = "This is the sample document and we are adding content to it.";
String text1 = "This is an example of adding text to a page in the pdf document. we can add as many lines";
String text2 = "as we want like this using the ShowText() method of the ContentStream class";
//Adding text in the form of string
content.showText(text);
//Adding text in the form of string
content.newLine();
content.showText(text1);
content.newLine();
content.showText(text2);
//Ending the content stream
content.endText();
System.out.println("Text added");
content.close();
//Saving the document
document.save("newPrint.pdf");
//Closing the document
document.close();
The text does not show. What am I missing here? I thought with the correct text-positions I could simply write on the pdf?

The source is working.
Maybe your content.newLineAtOffset(100, 400); is too huge - out of bounds - for your little card.
By the way, you have to setLeading(float) to use newLine() meaningfuly.

How to resolve "...is not available in this font's encoding"?

So I am using PDFBox to fill in some pdfs. So far everything was great - I created a form in pdf with Avenir Light font, and I could fill it in. However, the problem that just now showed up, is that when I am trying to fill the pdf using letters such as ł, ą, ć ... I get the following error:
U+0142 is not available in this font's encoding: MacRomanEncoding with differences
with different numbers.
Now, my question is - how can I fix this, so that I can fill the form automatically? When I open the pdf in Acrobat Reader, I can insert those letters, and I dont get any errors. Here is how I set the field:
public void setField(PDDocument document, PDField field, String value ) throws IOException {
if( field != null && value != null) {
try{
field.setValue(value);
} catch (Exception e){
e.printStackTrace();
}
}
else {
System.err.println( "No field found with name:" + field.getPartialName() );
}
}
UPDATE
I've been trying to upload my own Avenir-Light.tff like this:
PDFont font = PDType1Font.HELVETICA;
PDResources res = new PDResources();
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
acroForm.setDefaultAppearance(da);
However, this doesn't seem to have any impact on the printed fields, and throws almost the same message:
U+0104 ('Aogonek') is not available in this font Helvetica (generic: ArialMT) encoding: WinAnsiEncoding

PDFBox define 14 standard fonts in PDType1Font :
PDType1Font.TIMES_ROMAN PDType1Font.TIMES_BOLD
PDType1Font.TIMES_ITALI PDType1Font.TIMES_BOLD_ITALIC
PDType1Font.HELVETICA PDType1Font.HELVETICA_BOLD
PDType1Font.HELVETICA_OBLIQUE
PDType1Font.HELVETICA_BOLD_OBLIQUE PDType1Font.COURIER
PDType1Font.COURIER_BOLD PDType1Font.COURIER_OBLIQUE
PDType1Font.COURIER_BOLD_OBLIQUE PDType1Font.SYMBOL
PDType1Font.ZAPF_DINGBATS
So if you want to use Avenir-Light you have to load it from a .ttf file. You can do this as #TilmanHausherr suggested PDType0Font.load(doc, new File("path/Avenir-Light.ttf"), false).
PDFont font = PDType0Font.load(doc, new File("path/Avenir-Light.ttf"), false);
PDResources res = new PDResources();
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
acroForm.setDefaultAppearance(da);
Update
Do you know why it also displays a warning if form of: OpenType Layout
tables used in font Avenir-Light are not implemented in PDFBox and
will be ignored?
Avenir-light font uses OpenType Layout tables (Advanced Typographic) that PDFBox does not support yet. This advaned typographics will be ignored

Setting multi-line text to form fields in PDFBox

I'm using PDFBox to fill form fields in a pdf using below code:
PDField nameField = form.getField("name");
if(null != nameField){
nameField.setValue(data.get("name")); // data is a hashmap
nameField.setReadonly(true);
}
The problem is, if the text is long it doesn't split to multiple lines, even though I have enabled the "multi-line" option for the field in the pdf. Do I have to do anything from the code as well to enable this?
Thanks.

Remember
Setting the ressources for the fonts to be used into the TextField.
Associating the ressources with the PDAccroform of the PDDocument.
Getting a widget for the PDTextField.
Getting a rectangle for the Widget.
Setting the width and the height of the rectangle of the widget.
It would solve it. In my case, I have a height of 20 for a non multiline text and another of 80 for a multiline textfield.You can see them being the last argument of the PDRectangle constructor. The PDRectangle class is used to specify the position and the dimension of the widget that sets it's rectangle to it. The texfield widget will appear as specified by the PDRectangle.
public static PDTextField addTextField(PDDocument pdDoc,PDAcroForm pda,String value,
String default_value,Boolean multiline,float txtfieldsyposition,float pagesheight)
{
int page = (int) (txtfieldsyposition/pagesheight);
if(page+1> pdDoc.getNumberOfPages())
{
ensurePageCapacity(pdDoc,page+1);//add 1 page to doc if needed
}
PDTextField pdtff = new PDTextField(pda);
PDFont font = new PDType1Font(FontName.TIMES_ROMAN);
String appearance = "/TIMES 10 Tf 0 0 0 rg";
try
{
PDFont font_ = new PDType1Font(FontName.HELVETICA);
PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), font_);
resources.put(COSName.getPDFName("TIMES"), font);
pda.setDefaultResources(resources);
org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget widget = pdtff.getWidgets().get(0);
PDRectangle rect = null;
if(!multiline)
rect = new PDRectangle(80, (pagesheight - (txtfieldsyposition % pagesheight)), 450, 20);
else
rect = new PDRectangle(80,(pagesheight-(txtfieldsyposition%pagesheight)),450,80);
PDPage pd_page = pdDoc.getPage(page);
System.out.println(pd_page.getBBox().getHeight());
widget.setRectangle(rect);
widget.setPage(pd_page);
PDAppearanceCharacteristicsDictionary fieldAppearance = new PDAppearanceCharacteristicsDictionary(new COSDictionary());
fieldAppearance.setBorderColour(new PDColor(new float[]{0,0,0}, PDDeviceRGB.INSTANCE));
fieldAppearance.setBackground(new PDColor(new float[]{255,255,255}, PDDeviceRGB.INSTANCE));
widget.setAppearanceCharacteristics(fieldAppearance);
widget.setPrinted(true);
pd_page.getAnnotations().add(widget);
System.out.println("before appearance " +pdtff.getDefaultAppearance());
pdtff.setDefaultAppearance(appearance);
System.out.println("after appearance "+pdtff.getDefaultAppearance());
if(multiline)
{
pdtff.setMultiline(true);
}
pdtff.setDefaultValue("");
pdtff.setValue(value.replaceAll("\u202F"," "));
pdtff.setPartialName( page +""+(int)txtfieldsyposition);
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
catch(IllegalArgumentException e)
{
e.printStackTrace();
}
return pdtff;
}

Adding Header to existing PDF File using PDFBox

I am trying to add a Header to an existing PDF file. It works but the table header in the existing PDF are messed up by the change in the font. If I remove setting the font then the header doesn't show up. Here is my code:
// the document
PDDocument doc = null;
try
{
doc = PDDocument.load( file );
List allPages = doc.getDocumentCatalog().getAllPages();
//PDFont font = PDType1Font.HELVETICA_BOLD;
for( int i=0; i<allPages.size(); i++ )
{
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
PDFont font = PDType1Font.TIMES_ROMAN;
float fontSize = 15.0f;
contentStream.beginText();
// set font and font size
contentStream.setFont( font, fontSize);
contentStream.moveTextPositionByAmount(700, 1150);
contentStream.drawString( message);
contentStream.endText();
//contentStream.
contentStream.close();}
doc.save( outfile );
}
finally
{
if( doc != null )
{
doc.close();
}
}
}`

Essentially you are running into a PDFBox bug in the current version 1.8.2.
A workaround:
Add a getFonts call of the page resources after creating the new content stream before using a font:
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
page.getResources().getFonts(); // <<<<<<<<
PDFont font = PDType1Font.TIMES_ROMAN;
float fontSize = 15.0f;
contentStream.beginText();
The bug itself:
The bug is in the method PDResources.addFont which is called from PDPageContentStream.setFont:
public String addFont(PDFont font)
{
return addFont(font, MapUtil.getNextUniqueKey( fonts, "F" ));
}
It uses the current content of the fonts member variable to determine a unique name for the new font resource on the page at hand. Unfortunately this member variable still can be (and in your case is) uninitialized at this time. This results in the MapUtil.getNextUniqueKey( fonts, "F" ) call to always return F0.
The font variable then is initialized implicitly during the addFont(PDFont, String) call later.
Thus, if unfortunately there already existed a font named F0 on that page, it is replaced by the new font.
Having tested with your PDF this is exactly what happens in your case. As the existing font F0 uses some custom encoding while your replacement font uses a standard one, the text originally written using F0 now looks like gibberish.
The work-around mentioned above implicitly initializes that member variable and, thus, prevents the font replacement.
If you plan to use PDFBox in production for this task, you might want to report the bug.
PS: As mentioned in the comments above there is another bug to observe in context with inherited resources. It should be brought to the PDFBox development's attention, too.
PPS: The issue at hand meanwhile has been fixed in PDFBox for versions 1.8.3 and 2.0.0, cf. PDFBOX-1753.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache PDFBox: Get alignment and font from a PDAnnotationWidget or PDTextField - java

Related

PDFBox: No glyph for U+0054 in font AAAAAD+FreeSerifBold

write on existing form-pdf with pdfbox

How to resolve "...is not available in this font's encoding"?

Setting multi-line text to form fields in PDFBox

Adding Header to existing PDF File using PDFBox

Categories

Resources