PDF Box flatten PDF causes weird spacing

PDF Box flatten PDF causes weird spacing - java

I'm having an issue with PDF box flattening a PDF generated by Adobe Acrobat DC.
The Adobe Acrobat text field I created is absolutely the default text field.
In my example below, I have a PatientName field with the text value "Douglas McDouggelman".
When I flatten the PDF, here's what it looks like:
Anyone know what's up with this bizarre spacing?
It appears that the space + next character are combined. This is what it looks like when you try to select that character.
Code:
try (PDDocument document = PDDocument.load(pdfFormInputStream)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
acroForm.getField("PatientName").setValue("Douglas McDouggelman");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
if (flattenPdfs) {
acroForm.flatten();
}
document.save(byteArrayOutputStream);
}

I realized this PDF was from some other group who made it and who knows what they did. So I found the source word document, repeated the creation of the form from Adobe DC, added the fields back to the document, then it was totally fine.
PDF box was not the problem... it was some unknown incorrect step that the person who originally prepared the pdf did.

Related

How to create fields automaticlly in Java without Adobe Acrobat

I have to fill pdf fields.
What I have done so far is to open my pdf like a form with Adobe Acrobat and then save it. It turns into a pdf forms and with Apache PdfBox I can achieve what I want to do.
Unfortunately, I must not open it with an external program. And if I don't do this little trick, I have an empty array with :
DDocument pdfDoc = PDDocument.load(new File(path));
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
acroForm.getFields() //empty
Is it possible to create dynamicly in java my future fields from a simple pdf?
Thanks in advance

PDFBox returns missing descendant font dictionary

when extracting the first page of a PDF I get java.io.IOException: Missing descendant font dictionary.
The extraction code is the following:
PDDocument pdDocument = PDDocument.load(file);
PageExtractor pageExtractor = new PageExtractor(pdDocument, 1, 1);
PDDocument singlePageDocument = pageExtractor.extract();
It only happens with few PDFs and the error points to the Fonts definitions, but I am unclear on how fonts in PDF are processed by Apache PDFBox (using version v2.0.18).
Any tip?
Thanks

showing emoji in pdf or excel

I have the data containing emoji in database. I want to display in the generated document such as pdf or in excel format.
I am using spring boot application. Please suggest any java library for generating either PDF or excel which supports emoji.

iText supports this. Assuming
your emoji is a unicode character
you use a font that contains the correct glyph for this unicode character
Best way to test this is to try it.
This is how to get started with iText:
https://developers.itextpdf.com/content/itext-7-jump-start-tutorial/installing-itext-7
And this is a small code-snippet that adds text to a document with different fonts:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
PdfFont font = PdfFontFactory.createFont(FontConstants.TIMES_ROMAN);
PdfFont bold = PdfFontFactory.createFont(FontConstants.TIMES_BOLD);
Text title =
new Text("The Strange Case of Dr. Jekyll and Mr. Hyde").setFont(bold);
Text author = new Text("Robert Louis Stevenson").setFont(font);
Paragraph p = new Paragraph().add(title).add(" by ").add(author);
document.add(p);
document.close();
For more information check out the tutorials.
https://developers.itextpdf.com/content/itext-7-building-blocks/chapter-1

PdfBox - Unable to extract some text from pdf

I am trying to extract text from a pdf using pdfbox. However I am unable to extract all the text from a table. See the image below (snipped from the pdf)
(some confidential text has been highlighted)
I am able to get the text out of the 1st table (in orange) and the 3rd table (General Information one). But I am unable to extract anything out of the 2nd table.
In the output I just see a couple of blank lines between the output of 1st and 3rd table.
Here is my code.
PDDocument doc = PDDocument.load(new File("...."));
PDFTextStripper pdfStripper = new PDFTextStripper();
String text = pdfStripper.getText(doc);
System.out.println(text);
doc.close();
Any inputs or suggestions?

I found the issue. The content was being displayed but it was being re-arranged.
The PDF had a couple of tables placed one after the other. The content of this table was being displayed after content of a few tables placed just after it. So for example if I had 6 tables and this was the 2nd table from the top. It's content was being displayed on 5th position instead of 2nd position.
As suggested by Tilman in comments the use of pdfStripper.setSortByPosition(true) results into expected content in expected places.

Filling landscape PDF with PDFBox

I try to fill a PDF form with PDFBox and I managed to do it well with a portrait oriented document. But I have a problem when filling a document in landscape mode. The fields are filled up, but the text orientation is not good. It appear vertically like if it was still in portrait but in a rotation of 90 degrees.
Here is my simplified code:
PDDocument pdfDoc = PDDocument.load(MY_FILE);
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
acroForm.getField("aAddressLine1").setValue("ADDRESS1_HERE");
acroForm.getField("aAddressLine2").setValue("ADDRESS1_HERE");
acroForm.getField("country").setValue("COUNTRY_HERE");
pdfDoc.save(PATH_HERE);
pdfDoc.close();
Did you manage to fill a PDF document in landscape mode?
Thanks for your help.

The short answer
I'm afraid PDFBox does not yet (as of version 1.8.2) allow you to fill in landscape PDFs like the one you provided because it does not seem to query and factor in informations about the page the form field is located on.
The long answer
There are different ways you can define a page to be A4 landscape:
You can define it to have the A4 landscape dimensions directly by means of a media box definition:
/MediaBox [0, 0, 842, 595]
In this case the coordinates of your aAddressLine1 would be
/Rect[23.1711 86.8914 292.121 100.132]
or you can define it to have the A4 portrait dimensions and being rotated by 90° (or 270° obviously):
/MediaBox [0, 0, 595, 842]
/Rotate 90
In this case the coordinates of your aAddressLine1 are
/Rect[86.8914 23.1711 100.132 292.121]
Your example document uses the latter method.
Now PDFBox, when creating an appearance stream for that field, only looks at the rectangle defining the field but ignores the properties of the page. Thus, PDFBox sees a very narrow and very high textfield and fills it in just like that. It is completely unaware that the result will be rotated in a PDF viewer.
What it should have done is to also look at the page the field is located on. If that page has a /Rotate entry, it should create an appearance stream for the field which displays the text rotated in the opposite direction.
Alternatives
In a comment you also asked
Do you know another library I could use if PDFBox can't do what I want?
I have tested the feat with iText 5.4.2:
PdfReader reader = new PdfReader(MY_FILE);
OutputStream os = new FileOutputStream(PATH_HERE);
PdfStamper stamper = new PdfStamper(reader, os);
AcroFields acroFields = stamper.getAcroFields();
acroFields.setField("aAddressLine1", "ADDRESS1_HERE");
acroFields.setField("aAddressLine2", "ADDRESS1_HERE");
stamper.close();
(The free iText version is licensed under the AGPL; you have to decide whether that's ok for your project. There is a commercial license, too, if it's not ok.)
I'm sure other PDF libraries also can do that, it's not too exotic a feature after all...
But I also tested PDF Clown 0.1.3 (trunk version), which did not work either:
File file = new File(MY_FILE);
Document document = file.getDocument();
Form form = document.getForm();
form.getFields().get("aAddressLine1").setValue("ADDRESS1_HERE");
form.getFields().get("aAddressLine2").setValue("ADDRESS1_HERE");
file.save(new java.io.File(PATH_HERE), SerializationModeEnum.Incremental);
file.close();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDF Box flatten PDF causes weird spacing - java

Related

How to create fields automaticlly in Java without Adobe Acrobat

PDFBox returns missing descendant font dictionary

showing emoji in pdf or excel

PdfBox - Unable to extract some text from pdf

Filling landscape PDF with PDFBox

Categories

Resources