Write cyrillic chars into PDF form fields with PDFBox

Write cyrillic chars into PDF form fields with PDFBox - java

I am using pdfbox 2.0.5 to fill out form fields of a PDF document using this code:
doc = PDDocument.load(inputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
for (PDField field : form.getFieldTree()){
field.setValue("должен");
}
I get this error: U+0434 ('afii10069') is not available in this font Times-Roman (generic: TimesNewRomanPSMT) encoding: StandardEncoding with differences
The PDF document itself contains cyrillic text which is displayed fine. I have tried using different fonts. For "Arial Unicode MS" it wants to download a 50MB "Adobe Acrobat Reader DC Font Pack". Is this a requirement for cyrillic characters?
Which font do I have to specify in the text field to handle cyrillic (or asian) characters?
Thanks,
Ropo

Adobe handles that by reusing the embedded font file in the {/Ubuntu} font and creates a new font resource from that. Here is a quick hack which can serve as a guide of how to achieve something similar. The code is specific to a sample I've got.
PDDocument doc = PDDocument.load(new File(...));
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDResources formResources = acroForm.getDefaultResources();
PDTrueTypeFont font = (PDTrueTypeFont) formResources.getFont(COSName.getPDFName("Ubuntu"));
// here is the 'magic' to reuse the font as a new font resource
TrueTypeFont ttFont = font.getTrueTypeFont();
PDFont font2 = PDType0Font.load(doc, ttFont, true);
ttFont.close();
formResources.put(COSName.getPDFName("F0"), font2);
PDTextField formField = (PDTextField) acroForm.getField("Text2");
formField.setDefaultAppearance("/F0 0 Tf 0 g");
formField.setValue("öäüинформацию");
doc.save(...);
doc.close();

The solution was trivial:
form.setNeedAppearances(true);
And then I remove the blue box of the field with:
field.setReadOnly(true);

Related

pdfbox: how to load a font once and use it many times?

I am trying to create a lot of pdf files in a loop.
for(int i=0; i<10000; ++i){
PDDocument doc = PDDocument.load(inputstream);
PDPage page = doc.getPage(0);
PDPageContentStream content = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true, true);
content.beginText();
//what happens here?
PDFont font = PDType0Font.load(doc, Thread.currentThread().getContextClassLoader().getResourceAsStream("font/simsun.ttf") );
content.setFont(font, 10);
//...
doc.save(outstream);
doc.close();
}
what does it happen by calling PDType0Font.load... ? Because the ttf file is large (10M), will it create ephemeral big objects of font 10000 times? If so, is there a way to make the font as embedded as PDType1Font, so I can just load it once and use it many times in the loop?
I encountered a full GC problem here, and I'm trying to figure it out.

Create the font at the fontbox level:
TrueTypeFont ttf = new TTFParser().parse(...);
You can now reuse ttf in different PDDocument objects like this:
PDFont font = PDType0Font.load(doc, ttf, true);
When done with all documents, don't forget to close ttf.
See also PDFontTest.testPDFBox3826() in the source code.

itext pdf change default font size in Paragraph not working

while using itext5 in android to display pdf from XHTML am trying to change the font size but it's not reflecting.
I would like to know the substitutes(or hack) for CSS as itext5 is not supporting CSS.
preparedText = output.toString("UTF-8");
list = XMLWorkerHelper.parseToElementList(preparedText, null);
// URL path =Thread.currentThread().getContextClassLoader().getResource("fontname");
// FontFactory.register(path.toString(), "test_font");
Font titleFont = FontFactory.getFont(FontFactory.HELVETICA_BOLD,7f);
paragraph.setFont(titleFont);
paragraph.addAll(list);
publishProgress(88);
// write to document
document.open();
document.newPage();
Paragraph p= new Paragraph(paragraph);
p.setFont(titleFont);
document.add(p);
document.close();

The font you set in a paragraph applies to all text added to the paragraph afterwards, it does not change the previously added text. To set the font of the text you add to a paragraph in the constructor, there is a constructor that also accepts a font parameter.
Thus, instead of
Paragraph p= new Paragraph(paragraph);
p.setFont(titleFont);
use
Paragraph p = new Paragraph(paragraphText, titleFont);
or
Paragraph p = new Paragraph();
p.setFont(titleFont);
p.add(paragraphText);

PDFBox incorrect text appearance after copy/paste

I’m using PDFBox 2.0.4 to create PDF documents with acroForms. Here is my test code example:
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
String dir = "../testPdfBox/src/main/resources/fonts/";
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));
PDResources resources = new PDResources();
String fontName = resources.add(font).getName();
acroForm.setDefaultResources(resources);
String defaultAppearanceString = format("/%s 12 Tf 0 g", fontName);
acroForm.setDefaultAppearance(defaultAppearanceString);
PDTextField field = new PDTextField(acroForm);
field.setPartialName("SampleField");
field.setDefaultAppearance(defaultAppearanceString);
acroForm.getFields().add(field);
PDAnnotationWidget widget = field.getWidgets().get(0);
PDRectangle rect = new PDRectangle(50, 750, 200, 50);
widget.setRectangle(rect);
widget.setPage(page);
widget.setPrinted(true);
page.getAnnotations().add(widget);
field.setValue("Sample field 123456");
acroForm.flatten();
document.save("target/SimpleForm.pdf");
document.close();
Everything works fine. But when I try to copy text from the created document and paste it to the NotePad or Word it becomes squares.
􀀷􀁅􀁑􀁔􀁐􀁉􀀄􀁊􀁍􀁉􀁐􀁈􀀄􀀕􀀖􀀗􀀘􀀙􀀚
I search a lot about this problem. The most popular answer is that there is no toUnicode cmap in created PDF. So I explore my document with CanOpener for Acrobat:
Yes, there is no toUnicode cmap, but everything works properly, if not to use acroForm.flatten(). When form fields are not flattened, I can copy/paste text from the document and it looks correct. Nevertheless I need all fields to be flattened.
So, I have two questions:
Why there is a problem with copy/pasting text in flattened form, and everything is ok in non-flattened?
What can I do to avoid problem with text copy/pasting?
Is there only one solution - to create toUnicode CMap by my own, like in this example?
My test pdf files are available here.

Please replace
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));
with
PDType0Font font = PDType0Font.load(document, new FileInputStream(dir + "Roboto-Regular.ttf"), false);
This makes sure that the font is embedded in full and not just as a subset.

How to add text watermark to pdf in Java using Apache PDFBox?

I am not getting any tutorial for adding a text watermark in a PDF file? Can you all please guide me, I am very new to PDFBOX.
Its not duplicate, the link in the comment didn't help me. I want to add text, not an image to the pdf.

Here is an example using PDFBox 2.0.2. This will load a PDF and write some text in the bottom right corner in a red transparent font. If it is a multiple page PDF the watermark will appear on every page. It might not be production ready, as I am not sure if there are some additional null conditions that need to be checked, but it should get you running in the right direction.
Keep in mind that this particular block of code will not modify the original PDF, but will create a new PDF using the Tmp_(filename) as the output.
private static void watermarkPDF (File fileStored) {
File tmpPDF;
PDDocument doc;
tmpPDF = new File(fileStored.getParent() + System.getProperty("file.separator") +"Tmp_"+fileStored.getName());
doc = PDDocument.load(fileStored);
for(PDPage page:doc.getPages()){
PDPageContentStream cs = new PDPageContentStream(doc, page, AppendMode.APPEND, true, true);
String ts = "Some sample text";
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 14.0f;
PDResources resources = page.getResources();
PDExtendedGraphicsState r0 = new PDExtendedGraphicsState();
r0.setNonStrokingAlphaConstant(0.5f);
cs.setGraphicsStateParameters(r0);
cs.setNonStrokingColor(255,0,0);//Red
cs.beginText();
cs.setFont(font, fontSize);
cs.setTextMatrix(Matrix.getTranslateInstance(0f,0f));
cs.showText(ts);
cs.endText();
}
cs.close();
}
doc.save(tmpPDF);
}

Adding Header to existing PDF File using PDFBox

I am trying to add a Header to an existing PDF file. It works but the table header in the existing PDF are messed up by the change in the font. If I remove setting the font then the header doesn't show up. Here is my code:
// the document
PDDocument doc = null;
try
{
doc = PDDocument.load( file );
List allPages = doc.getDocumentCatalog().getAllPages();
//PDFont font = PDType1Font.HELVETICA_BOLD;
for( int i=0; i<allPages.size(); i++ )
{
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
PDFont font = PDType1Font.TIMES_ROMAN;
float fontSize = 15.0f;
contentStream.beginText();
// set font and font size
contentStream.setFont( font, fontSize);
contentStream.moveTextPositionByAmount(700, 1150);
contentStream.drawString( message);
contentStream.endText();
//contentStream.
contentStream.close();}
doc.save( outfile );
}
finally
{
if( doc != null )
{
doc.close();
}
}
}`

Essentially you are running into a PDFBox bug in the current version 1.8.2.
A workaround:
Add a getFonts call of the page resources after creating the new content stream before using a font:
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
page.getResources().getFonts(); // <<<<<<<<
PDFont font = PDType1Font.TIMES_ROMAN;
float fontSize = 15.0f;
contentStream.beginText();
The bug itself:
The bug is in the method PDResources.addFont which is called from PDPageContentStream.setFont:
public String addFont(PDFont font)
{
return addFont(font, MapUtil.getNextUniqueKey( fonts, "F" ));
}
It uses the current content of the fonts member variable to determine a unique name for the new font resource on the page at hand. Unfortunately this member variable still can be (and in your case is) uninitialized at this time. This results in the MapUtil.getNextUniqueKey( fonts, "F" ) call to always return F0.
The font variable then is initialized implicitly during the addFont(PDFont, String) call later.
Thus, if unfortunately there already existed a font named F0 on that page, it is replaced by the new font.
Having tested with your PDF this is exactly what happens in your case. As the existing font F0 uses some custom encoding while your replacement font uses a standard one, the text originally written using F0 now looks like gibberish.
The work-around mentioned above implicitly initializes that member variable and, thus, prevents the font replacement.
If you plan to use PDFBox in production for this task, you might want to report the bug.
PS: As mentioned in the comments above there is another bug to observe in context with inherited resources. It should be brought to the PDFBox development's attention, too.
PPS: The issue at hand meanwhile has been fixed in PDFBox for versions 1.8.3 and 2.0.0, cf. PDFBOX-1753.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Write cyrillic chars into PDF form fields with PDFBox - java

The solution was trivial: form.setNeedAppearances(true); And then I remove the blue box of the field with: field.setReadOnly(true);

Related

pdfbox: how to load a font once and use it many times?

itext pdf change default font size in Paragraph not working

PDFBox incorrect text appearance after copy/paste

How to add text watermark to pdf in Java using Apache PDFBox?

Adding Header to existing PDF File using PDFBox

Categories

Resources