How to resolve "...is not available in this font's encoding"? - java

So I am using PDFBox to fill in some pdfs. So far everything was great - I created a form in pdf with Avenir Light font, and I could fill it in. However, the problem that just now showed up, is that when I am trying to fill the pdf using letters such as ł, ą, ć ... I get the following error:
U+0142 is not available in this font's encoding: MacRomanEncoding with differences
with different numbers.
Now, my question is - how can I fix this, so that I can fill the form automatically? When I open the pdf in Acrobat Reader, I can insert those letters, and I dont get any errors. Here is how I set the field:
public void setField(PDDocument document, PDField field, String value ) throws IOException {
if( field != null && value != null) {
try{
field.setValue(value);
} catch (Exception e){
e.printStackTrace();
}
}
else {
System.err.println( "No field found with name:" + field.getPartialName() );
}
}
UPDATE
I've been trying to upload my own Avenir-Light.tff like this:
PDFont font = PDType1Font.HELVETICA;
PDResources res = new PDResources();
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
acroForm.setDefaultAppearance(da);
However, this doesn't seem to have any impact on the printed fields, and throws almost the same message:
U+0104 ('Aogonek') is not available in this font Helvetica (generic: ArialMT) encoding: WinAnsiEncoding

PDFBox define 14 standard fonts in PDType1Font :
PDType1Font.TIMES_ROMAN PDType1Font.TIMES_BOLD
PDType1Font.TIMES_ITALI PDType1Font.TIMES_BOLD_ITALIC
PDType1Font.HELVETICA PDType1Font.HELVETICA_BOLD
PDType1Font.HELVETICA_OBLIQUE
PDType1Font.HELVETICA_BOLD_OBLIQUE PDType1Font.COURIER
PDType1Font.COURIER_BOLD PDType1Font.COURIER_OBLIQUE
PDType1Font.COURIER_BOLD_OBLIQUE PDType1Font.SYMBOL
PDType1Font.ZAPF_DINGBATS
So if you want to use Avenir-Light you have to load it from a .ttf file. You can do this as #TilmanHausherr suggested PDType0Font.load(doc, new File("path/Avenir-Light.ttf"), false).
PDFont font = PDType0Font.load(doc, new File("path/Avenir-Light.ttf"), false);
PDResources res = new PDResources();
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
String da = "/" + fontName.getName() + " 12 Tf 0 g";
acroForm.setDefaultAppearance(da);
Update
Do you know why it also displays a warning if form of: OpenType Layout
tables used in font Avenir-Light are not implemented in PDFBox and
will be ignored?
Avenir-light font uses OpenType Layout tables (Advanced Typographic) that PDFBox does not support yet. This advaned typographics will be ignored

Related

Incorrect alignment of text in PDF field with PDFBox

I'm trying to compile a PDF form programmatically with Java, using PDFBox 1.8.16.
My workflow has been the following:
I exported my design with InDesign as a PDF form with named fields
I wrote a Java script that parses a CSV and compile the form programmatically
The problem I'm getting is on the alignment of the text inside the field, after the completion. As you can see, the image aligns to the bottom of the text field.
The strange behavior I'm getting is when I select the field just like I'd do to edit the content: when I do that, the text aligns correctly.
The code I'm using to fill the text field is:
private void setField(String fieldName, String content, String fontName, Boolean big) throws IOException {
PDDocumentCatalog docCatalog;
PDAcroForm acroForm;
PDField field;
COSDictionary dict;
COSString defaultAppearance;
docCatalog = pdfTemplate.getDocumentCatalog();
acroForm = docCatalog.getAcroForm();
field = acroForm.getField(fieldName);
dict = field.getDictionary();
defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
if (defaultAppearance != null) {
if (big) {
dict.setString(COSName.DA, "/" + fontName + " 16.17 Tf 0 g");
} else {
dict.setString(COSName.DA, "/" + fontName + " 8.67 Tf 0 g");
}
if (field instanceof PDTextbox) {
field = new PDTextbox(acroForm, dict);
field.setValue(content);
}
}
}
I don't know what other info could be useful, but I'm ready to provide anything I'm asked about.

PDFBox: No glyph for U+0050 in extracted font

I'm trying to create new page in document and write some text to it, while using the font contained in the file.
The font is extracted from the resources:
PDPage page = document.getPage(0);
PDResources res = page.getResources();
List<PDFont> fonts = new ArrayList<>();
for (COSName fontName : res.getFontNames()) {
PDFont font = res.getFont(fontName);
System.out.println(font);
fonts.add(font);
}
And later used to write some text:
stream.beginText();
stream.setFont(fonts.get(0), 12);
stream.setTextMatrix(Matrix.getTranslateInstance(20, 50));
stream.showText("Protokol");
stream.endText();
The showText method always fails with error
No glyph for U+0050 (P) in font QZHBRL+ArialMT
But the glyph is there, as verified by FontForge:
Also the method hasGlyph returns true.
The complete project including the PDF is available at github repository showing the issue
Here you actually ran into an open PDFBox TODO, your stream.showText eventually calls encode of the underlying CID font for each character and here we have:
public class PDCIDFontType2 extends PDCIDFont
{
...
public byte[] encode(int unicode)
{
int cid = -1;
if (isEmbedded)
{
...
// otherwise we require an explicit ToUnicode CMap
if (cid == -1)
{
//TODO: invert the ToUnicode CMap?
// see also PDFBOX-4233
cid = 0;
}
}
...
if (cid == 0)
{
throw new IllegalArgumentException(
String.format("No glyph for U+%04X (%c) in font %s", unicode, (char) unicode, getName()));
}
return encodeGlyphId(cid);
}
...
}
(org.apache.pdfbox.pdmodel.font.PDCIDFontType2)
Where PDFBox could not otherwise determine a mapping from Unicode to glyph code (if (cid == -1)), the JavaDoc comments indicate another way to determine a glyph code, an inverse lookup of the ToUnicode map. If this was implemented, PDFBox could have determined a glyph ID and written your text.
Unfortunately it is not implemented yet.
This has been fixed in issue PDFBOX-5103. This will be available in PDFBox 2.0.23 and until then, in a snapshot build.

Problem about font encoding in PDF/A generation

So here is my problem :
I'm currently working on an java application that will archive document in a PDF/A-1. I'm using PdfBox for pdf generation and when I can't generate a valid PDF/A-1 pdf, because of the font. The font is embedded inside the pdf file but this website : https://www.pdf-online.com/osa/validate.aspx tell me that this is not a valid PDF/A because of :
The key Encoding has a value Identity-H which is prohibited.
I look on internet on what is this Identity-H encoding and it seem that it's the way that font are encoded, like the ansi encoding.
I've already tried to get different font like Helvetica or arial unicode Ms but nothing work, there is alway this Identity-H encoding.I'm a bit lost with all this mess in encoding so if someone can explain me it'll be great. Also here is the code I write to embedded a font in the pdf :
// load the font as this needs to be embedded
PDFont font = PDType0Font.load(doc, getClass().getClassLoader().getResourceAsStream(fontfile), true);
if (!font.isEmbedded())
{
throw new IllegalStateException("PDF/A compliance requires that all fonts used for"
+ " text rendering in rendering modes other than rendering mode 3 are embedded.");
}
Thanks for your help :)
Problem solved :
I used the example of apache : CreatePDFA ( I have no clue why that work and not my code ) : Example in examples/src/main/java/org/apache/pdfbox/examples
I add to fit the PDF/A-3 requirement :
doc.getDocumentCatalog().setLanguage("en-US");
PDMarkInfo mark = new PDMarkInfo(); // new PDMarkInfo(page.getCOSObject());
PDStructureTreeRoot treeRoot = new PDStructureTreeRoot();
doc.getDocumentCatalog().setMarkInfo(mark);
doc.getDocumentCatalog().setStructureTreeRoot(treeRoot);
doc.getDocumentCatalog().getMarkInfo().setMarked(true);
PDDocumentInformation info = doc.getDocumentInformation();
info.setCreationDate(date);
info.setModificationDate(date);
info.setAuthor("KairosPDF");
info.setProducer("KairosPDF");
info.setCreator("KairosPDF");
info.setTitle("Generated PDf");
info.setSubject("PDF/A3-A");
Here is my code to embedded a file to the pdf :
private final PDDocument doc = new PDDocument();
private final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
private final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
private final Map<String, PDComplexFileSpecification> efMap = new HashMap<>();
public void addFile(PDDocument doc, File child) throws IOException {
File file = new File(child.getPath());
Calendar date = Calendar.getInstance();
//first create the file specification, which holds the embedded file
PDComplexFileSpecification fs = new PDComplexFileSpecification();
fs.setFileUnicode(child.getName());
fs.setFile(child.getName());
InputStream is = new FileInputStream(file);
PDEmbeddedFile ef = new PDEmbeddedFile(doc, is);
//Setting
ef.setSubtype("application/octet-stream");
ef.setSize((int) file.length() + 1);
ef.setCreationDate(date);
ef.setModDate(date);
COSDictionary dictionary = fs.getCOSObject();
dictionary.setItem(COSName.getPDFName("AFRelationship"), COSName.getPDFName("Data"));
fs.setEmbeddedFile(ef);
efMap.put(child.getName(), fs);
efTree.setNames(efMap);
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
is.close();
}
The only problem left is this error from the validation :
File specification 'Test.txt' not associated with an object.
Hope it'll help some.

PDFBox: No glyph for U+0054 in font AAAAAD+FreeSerifBold

my PDFBox throws following error: No glyph for U+0054 in font AAAAAD+FreeSerifBold.
I found several similar threads on stackoverflow but I couldn't fix my problem by them.
My code is similar to code example:
public QuoteWorkerPdf() throws IOException {
// Create PDF with one blank page
document = PDDocument.load(
getClass().getResourceAsStream("data/quote_template.pdf"));
page = (PDPage) document.getDocumentCatalog().getPages().get(0);
printable = new PDFPrintable(document);
// get the document catalog
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null)
{
// Retrieve an individual field and set its value.
PDTextField field = (PDTextField) acroForm.getField( "q2_quotationPrepared" );
field.setValue("TextEntry");
// If a field is nested within the form tree a fully qualified name
// might be provided to access the field.
//field = (PDTextField) acroForm.getField( "fieldsContainer.nestedSampleField" );
//field.setValue("Text Entry");
}
// Save and close the filled out form.
document.save("target/FillFormField.pdf");
}
U+0054 is "T" which is the first letter of the string.
For pdf form creation I use www.jotform.com.
Does anybody know how can I solve this?
Stacktrace:
Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+0054 in font AAAAAD+FreeSerifBold
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:363)
at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:398)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:324)
at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:353)
at org.apache.pdfbox.pdmodel.interactive.form.PlainText$Paragraph.getLines(PlainText.java:174)
at org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:182)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:508)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:364)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:237)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
at aaalabel.diefinder.QuoteWorkerPdf.<init>(QuoteWorkerPdf.java:69)
at aaalabel.diefinder.QuoteWorkerPdf.main(QuoteWorkerPdf.java:114)
This code is tailored to your file. It changes the default appearance string to use a different font. See also this answer that is somewhat related but more general.
The problem with your input file is that the font used in the field is subsetted, so it doesn't have all glyphs you would expect.
PDDocument doc = PDDocument.load(new File("82667884384374 (1).pdf"));
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDTextField field = (PDTextField) acroForm.getField("q2_quotationPrepared");
COSName helvName = acroForm.getDefaultResources().add(PDType1Font.HELVETICA); // use different font if you want. Do not subset!
field.setDefaultAppearance("/" + helvName.getName() + " 10 Tf 0 g"); // modifies your existing DA string
field.setValue("TextEntry");
doc.save(new File("82667884384374 (1)-new.pdf"));
doc.close();

Why are Form Fields set with PDFBox(2.0.11) not being displayed?

I'm using PDFBox 2.0.11 to open a PDF Form and pulling out the values. This works as expected. When I try to set a value it appears to work. When I open the PDF the value is not displayed. If I click in the field, the value is then displayed as set, but then disappears again when I click out of the field.
This seems to be a common issue, but none of the fixes I've seen seem to work.
if(file.exists())
{
PDDocument doc = PDDocument.load(file);
doc.setAllSecurityToBeRemoved(true);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
// Add Font
PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), PDType1Font.HELVETICA);
form.setDefaultResources(resources);
// End Add Font
form.setNeedAppearances(false);
List<PDField> fields = form.getFields();
for (Object field : fields)
{
if (field instanceof PDTextField) {
PDTextField pdTextbox = (PDTextField) field;
System.out.println("PDTextBox " + pdTextbox.getFullyQualifiedName() + " " + pdTextbox.getValue());
if(pdTextbox.getFullyQualifiedName().equalsIgnoreCase("a3_5"))
{
try {
pdTextbox.getWidgets().get(0).setHidden(false);
pdTextbox.setValue("5500");
}
catch(Exception e){
e.printStackTrace();
}
}
}
else
{
System.out.print(field);
System.out.print(" = ");
System.out.print(field.getClass());
System.out.println();
}
}
doc.save("..._MINE_UPDATE.pdf");
doc.close();
}
Stack Trace
java.io.IOException: Could not find font: /Helvetica
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processSetFont(PDDefaultAppearanceString.java:179)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processOperator(PDDefaultAppearanceString.java:132)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.processAppearanceStringOperators(PDDefaultAppearanceString.java:108)
at org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.<init>(PDDefaultAppearanceString.java:86)
at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
at org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.<init>(AppearanceGeneratorHelper.java:100)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:262)
at org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:228)
at org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
at com.controller.TestPDFBox.loadData(TestPDFBox.java:87)
Skipped for loop
for (COSName fontResourceName : widgetResources.getFontNames())
{
try
{
if (acroFormResources.getFont(fontResourceName) == null)
{
LOG.debug("Adding font resource " + fontResourceName + " from widget to AcroForm");
acroFormResources.put(fontResourceName, widgetResources.getFont(fontResourceName));
}
}
catch (IOException e)
{
LOG.warn("Unable to match field level font with AcroForm font");
}
}
Thanks to Tilman Hausherr for helping me get to an answer that ultimately stemmed from a quirk with MacOSX's Preview application.
Preview for some reason strips out functionality that also causes you to be unable to set values correctly in the PDF.
The code above works correctly, although I did make a change to the Add Font section to the following.
// Add Font
PDResources resources = form.getDefaultResources();
if(resources == null)
{
resources = new PDResources();
}
resources.put(COSName.getPDFName("Helvetica"), PDType1Font.HELVETICA);
if(form.getDefaultResources() == null)
{
form.setDefaultResources(resources);
}
// End Add Font
In case it is not obvious: Don't create / edit / save your template pdf with Mac's Preview to use with PDFBox.
I ran into the same issue and had to re-create the PDF in Acrobat Pro. With this PDF the above code worked perfectly fine.

Categories

Resources