PDFBox returns missing descendant font dictionary - java

when extracting the first page of a PDF I get java.io.IOException: Missing descendant font dictionary.
The extraction code is the following:
PDDocument pdDocument = PDDocument.load(file);
PageExtractor pageExtractor = new PageExtractor(pdDocument, 1, 1);
PDDocument singlePageDocument = pageExtractor.extract();
It only happens with few PDFs and the error points to the Fonts definitions, but I am unclear on how fonts in PDF are processed by Apache PDFBox (using version v2.0.18).
Any tip?
Thanks

Related

How to create fields automaticlly in Java without Adobe Acrobat

I have to fill pdf fields.
What I have done so far is to open my pdf like a form with Adobe Acrobat and then save it. It turns into a pdf forms and with Apache PdfBox I can achieve what I want to do.
Unfortunately, I must not open it with an external program. And if I don't do this little trick, I have an empty array with :
DDocument pdfDoc = PDDocument.load(new File(path));
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
acroForm.getFields() //empty
Is it possible to create dynamicly in java my future fields from a simple pdf?
Thanks in advance

PDF Box flatten PDF causes weird spacing

I'm having an issue with PDF box flattening a PDF generated by Adobe Acrobat DC.
The Adobe Acrobat text field I created is absolutely the default text field.
In my example below, I have a PatientName field with the text value "Douglas McDouggelman".
When I flatten the PDF, here's what it looks like:
Anyone know what's up with this bizarre spacing?
It appears that the space + next character are combined. This is what it looks like when you try to select that character.
Code:
try (PDDocument document = PDDocument.load(pdfFormInputStream)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
acroForm.getField("PatientName").setValue("Douglas McDouggelman");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
if (flattenPdfs) {
acroForm.flatten();
}
document.save(byteArrayOutputStream);
}
I realized this PDF was from some other group who made it and who knows what they did. So I found the source word document, repeated the creation of the form from Adobe DC, added the fields back to the document, then it was totally fine.
PDF box was not the problem... it was some unknown incorrect step that the person who originally prepared the pdf did.

How to embed an standard font into generated PDF with PDFBox

I need to add some text to PDF/A files using the Apache PDFBox library for Java. The problem is that, because it needs to be a valid PDF/A file, all the used fonts must be embedded in it. I know that I can embed a TTF font using PDFBox, but I'd like to avoid having to provide a font file with the application, so I was wondering if there's a way to embed one of the standard fonts available in PDFBox as if it was external.
For example, when I write something using one of the standard fonts, the PDF validator complains about this:
I've used the following code to write the text:
PDFont standardFont = PDType1Font.HELVETICA_BOLD;
PDPage pag = new PDPage();
pag.setResources(new PDResources());
PDPageContentStream contentStream = new PDPageContentStream(pdfFile, pag);
//Begin the Content stream
contentStream.beginText();
//Setting the font to the Content stream
contentStream.setFont(standardFont, 12);
//Setting the position for the line
contentStream.newLineAtOffset(25, 500);
//Adding text in the form of string
contentStream.showText("JUST A SAMPLE STRING");
//Ending the content stream
contentStream.endText();
//Closing the content stream
contentStream.close();
pdfFile.addPage(pag);
pdfFile.save(file);
pdfFile.close();
Is there any option to force the embed of the font when setting it?
Thanks in advance,
There is only one font embedded in PDFBox. You can use it this way:
PDFont font = PDType0Font.load(doc, SomePdfboxClass.class.getResourceAsStream(
"/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"));

showing emoji in pdf or excel

I have the data containing emoji in database. I want to display in the generated document such as pdf or in excel format.
I am using spring boot application. Please suggest any java library for generating either PDF or excel which supports emoji.
iText supports this. Assuming
your emoji is a unicode character
you use a font that contains the correct glyph for this unicode character
Best way to test this is to try it.
This is how to get started with iText:
https://developers.itextpdf.com/content/itext-7-jump-start-tutorial/installing-itext-7
And this is a small code-snippet that adds text to a document with different fonts:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.createFont(FontConstants.TIMES_BOLD);
Text title =
new Text("The Strange Case of Dr. Jekyll and Mr. Hyde").setFont(bold);
Text author = new Text("Robert Louis Stevenson").setFont(font);
Paragraph p = new Paragraph().add(title).add(" by ").add(author);
document.add(p);
document.close();
For more information check out the tutorials.
https://developers.itextpdf.com/content/itext-7-building-blocks/chapter-1

docx conversion to pdf in korean font

Hope someone can help me. It's about docx to pdf conversion having korean sign in docx document.
I'm able to convert a docx document to pdf with docx4j.
In pdf document, I can see the result. But if my docx document contains korean font, I can't see any korean font in my pdf document except the latin numbers.
What do I have to do to get korean font in my pdf from the docx document?
Here is my code:
File docXFile ="E:/contract2Files/test.docx";
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docXFile);
String out = docXFile .replace("docx","pdf");
File pdfFile = new File(out);
OutputStream pdfFileOs = new java.io.FileOutputStream(pdfFile);
org.docx4j.convert.out.pdf.PdfConversion c = new JanoPdfConversion(wordprocessingMLPackage);
c.output(pdfFileOs);
Please try http://www.docx4java.org/docx4j/docx4j-3_0-beta2.zip (link updated 15 Nov)
You might need to configure your font mapper, though things work out of the box with the Identity mapper on my Windows box, since I have the relevant font installed.
If this doesn't help, please put a sample docx somewhere StackOverflow users can see it.
thanks again. I tried the newest jar file with this code passage and it worked!!! Now I get the korean letters into pdf. Thank again.
ThemePart themePart =
wordprocessingMLPackage.getMainDocumentPart().getThemePart();
org.docx4j.dml.BaseStyles.FontScheme fontScheme = themePart.getFontScheme();
org.docx4j.dml.TextFont textFont = fontScheme.getMinorFont().getLatin();
textFont.setTypeface("Malgun Gothic");

Categories

Resources