Overlapping characters in text field iText PDF - java

I have PDF with text field which contains some characters. But the language specific characters are overlapping.
When it gains focus, text changes and displays correctly. When lost focus, displays incorrectly.
When text is edited displays also correctly.
File test_extended_filled.pdf see bellow
How I created PDF:
Created odg template in OpenOffice Draw 4.0.1 -> test.odg
Exported as PDF -> test.pdf
Edited test.pdf with Adobe Acrobat X Pro 10.0.0 and resaved with extended functions (needed to save on local PC) -> test_extended.pdf
Filled form by java (pdfstamper) -> test_extended_filled.pdf
Bonus: when i change font by pdfstamper in java it looks like changes are applied only on focused text too. -> test_extended_filled_font_size.pdf
Note: When I fill test.pdf from 2. it's displayed correctly -> text_filled.pdf
Attached files (go to download section):
https://rapidshare.com/share/ACC0D81E9235A6DA2CC2353BD21A4C37
After I added
stamper.getAcroFields().addSubstitutionFont
it's better, but some characters still overlap. -> test_extended_filled_font_size_with_substitution_font.pdf
http://rapidshare.com/share/0EE3238F37E9115C36A7A74706B09826
Any ideas?

Please take a look at the FillFormSpecialChars example and the resulting PDF.
Open Office doesn't really create nice forms. As mkl already indicates, the NeedAppearances flag can cause problems, the border of the fields is drawn onto the page content instead of being part of the widget annotation, etc...
In your case, you've defined a font that isn't optimal for special characters. Using a substitution font isn't ideal, because you can clearly see that drawing the glyphs isn't that much of a problem. The problem is that the metrics are all wrong. It's as if the special characters have an advance of 0 glyph units. In this case, you should change the font using the setFieldProperty() method.

Related

PDFBox - 2.0.3 - PDFTextStripper picking up old text from page prior to cropping/rotating

I'm attempting to perform some string validation against individual PDF pages in a file via the use of Apache PDFBox.
I'm going to be utilizing PDFTextStripper for the majority of this, so my first issue to tackle was the fact that all the PDFs i'm going to be validating against are generated as 2up; e.g Page 1 of 2 and page 2 of 2 were on the same page or if you imagine you literally scanned a book face down into a scanner - In addition to this, they were oriented incorrectly, and needed rotating 90 degrees so PDFTextStripper could read them properly.
Using elements of the below questions/solutions, i have built a method which first crops the page exactly in half, exports the cropped pages in order to a new file, rotates each page to the correct orientation and then saves the file;
Rotate PDF around its center using PDFBox in java
Split a PDF page in two parts [duplicate]
Visually, my method is seemingly working as expected until i run PDFTextStripper against it - It appears to be returning the text of not just the page i want, but also the page i cropped out of it.
To confirm the issue, I extracted a single page out of the entire document and saved it as a new file - when running PDFTextStripper, i still get the same results even though all i can see is literally one page. Adobe search doesn't bring up the hidden, legacy data either.
I can only assume that during my transform method, i need to redefine the cropped page with only the contents of the cropped page.
My question is, how can i do this?
p.s - i haven't posted my code as it's basically a amalgamation of the solutions provided in the aforementioned links above - however if it i needed, i can provide
The PDFTextStripper ignores the CropBox you set to crop the pages. It also ignores whether text is covered by some filled rectangle or image or whether the text is invisible, it extracts all text (except text in patterns or contains in Type 3 font characters).
You might want to try the PDFTextStripperByArea instead. This class (which is derived from PDFTextStripper) restricts itself to regions you can define.
(Unfortunately these regions have to be defined using a different coordinate system than the one used for the CropBox, so usually you will have to transform the coordinates first.)

How to add a row in a itext PdfPTable dynamically?

I have to create a pdf using itext which will contain a button, when clicked should add a row in an existing PdfPTable. I wrote some code to create a PushbuttonField. While trying to set action I can only find PdfAction.javaScript. I am not able to figure out how to add a row in a table. I tried searching online but all I could find is PdfAction.javaScript
Any help would be greatly appreciated. Thank you.
When you create a PDF file, you draw text, lines and shapes to a canvas. That is also what happens when you add a PdfPTable to a Document. If you look at the syntax of the PDF page, you won't recognize a table. You'll find text (the content of the cells), lines (the borders), and shapes (the backgrounds), but you won't find a table. If the table is distributed over different pages, the "table" on one page won't know that it is related to the "table" on the other page.
Sure, you can add semantic structure to the document by introducing marked content, and by creating a structure tree, but that mechanism which we call Tagged PDF can't be used to make the PDF "editable" the same way a Word document is editable. Tagged PDF is (among others) used to allow assistive technology to present the content to the visually impaired (e.g. in the context of PDF/UA). The presence of structure doesn't change the fact that all text, all lines, and all shapes are added at absolute positions.
This is very different from HTML where the position on a page of a <table>, <tr>, <th>, or <td> is calculated at the moment the page is rendered. In HTML this position can even change when you resize the browser window.
There is no such thing in PDF (except if you use XFA (*), a technology that is deprecated since ISO 32000-2). All content on a page has a fixed position, hardcoded into the page's content stream. Changing the size of the PDF viewer window won't change anything to the position of the page content.
Because of all of this, your question is invalid. It is impossible to create a button in PDF that adds a row to a table, because:
In many cases there is no table: there is just a bunch of text, lines, and shapes at absolute positions,
Even if there is the notion of a table (using Tagged PDF): the visual represenation of that table is fixed at creation time, it can't be changed at consumption time.
You want to use an ordinary PDF viewer as if it were a PDF editor. That is impossible for all the reasons listed above.
(*) XFA was deprecated for different reasons. One of the most important reasons it is the lack of support for XFA. There aren't many viewers that support XFA. If you would post a follow-up question asking *"How can I create an XFA document?", the answer would be: "Don't do this!" Creating XFA is extremely complex, and once you've succeeded in creating an XFA form, you'll discover that many of your customers won't be able to consume the file because their viewer doesn't support the format.

PDF/A's Checkbox - iText 2.1.7

I need to print a PDF/A document with my Java App, which implements iText 2.1.7. When I use a PDF Templates, my app works fine and print the checkbox checked if they satisfy the conditions.
But, when I use PDF/A Templates, my app doesn't fill these checkbox, but they receive the values correctly.
Can anyone help me? I dont know if this problem is in the template or in the code.
Can you describe in more detail what you are doing?
thesis I:
If you have a (non PDF/A) PDF with form fields and then you programmatically change a checkbox value with iText then the change is visible in the PDF (e.g. you can see it in adobe reader and on printed paper)
Correct?
thesis II:
If you have a PDF/A compliant PDF and change the value of a checkbox then the change is not visible in the PDF (neither in adobe reader nor on printed paper)
Correct?
In the PDF format you need to differentiate between the appearance of a field and the (data) value of a field. Normally PDF/A documents are made for long term archiving and not to change values. If you still do that you need to ensure that you also update the appearance of a form field.
Try to update the appearance of the field you change (after you set the new value). You can do that with the following method:
AcroField fields;
boolean success = fields.regenerateField(String yourCheckboxFieldName);

How to add a non-printable image in xsl / apache-fop

I'm trying to generate an xsl to be printed in a pre-printed sheet which works fine.
Now i want to give the user a better previsualization (in the pdf screen version) adding a background image which emulates the "pre-printed" stuf on the sheet to give the user a "context" of what is he printing.
The question is: Is there any way I can set a background image in xsl (using apache fop) visible only in pdf but not in the printed version of it?
Thank you all for reading or givin any advice.
Although as the comments state, you can't have content in the PDF that does not come out in a physical printed copy, here is one possible work around for you. Depending on how your users are ultimately going to be using FOP for PDF rendering and how your a driving the work flow, it's possible to pass a parameter into an xslt file before the transofrmation phase is run, so potentially, you could do a dual rendering of the same PDF, one that is presented to the user where the background image is enabled, and one that gets printed, you could just set a variable similar to how they do in this Example, and call it something like $isPreview, and just use a simple if or choose statement to check for 'Y' or 'N'.
Since you are sending to a printer, you may even want to take advantage of FOP's ability to generate to Postscript rather than PDF, I've used this feature quite extensively for print documents using FOP while also producing a PDF copy for electronic delivery via email or hosted services, and I've yet to find any discrepancy between the PDF rendering and what is printed after sending a rendered postscript file, so it should work well for you as well.
As I said, this is not truly a solution to your problem as you've presented it, but as a work around, it could get you the desired results if your clever about how you implement it.
I don;t think the statement that it is not possible is true, I am just not sure how to create such a PDF with FOP. Certainly you can add an image field. One would use a button field and place the image in the button. Then you would set the properties of that button to not print (printable false).
PDF support images in fields: https://answers.acrobatusers.com/adding-image-field-form-q41825.aspx
RenderX supports PDF Form fields but I do not see where they support an image inside the button, only text: http://www.renderx.com/reference.html#PDF%20Forms. But they do support setting a field to "printable".

Splitting a pdf with pdfbox, but losing the font

I wrote some code in Java using the pdfbox API that splits a pdf document into it's individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the string on it. My problem is that when the new page is saved, I lose my font. I just made a quick word document to test it and the default font was calibri, so when I run the program I get an error box that reads: "Cannot extract the embedded font..." So it replaces the font with some other default.
I have seen a lot of example code that shows how to change the font when you are inputting text to be placed in the pdf, but nothing that sets the font for the pdf.
If anyone is familiar with a way to do this, (or can find documentation/examples), I would greatly appreciate it!
Edit: forgot to include some sample code
if (pageContent.indexOf(findThis) >= 0){
PDPage pageToRip = pages.get(i);
>>set the font of pageToRip here
res.importPage(pageToRip); //res is the new document that will be saved
}
I don't know if that helps any, but I figured I'd include it.
Also, this is what the change looks like if the pdf is written in calibri and split:
Note: This might be a nonissue, it depends on the font used in the files that will need to be processed. I tried some things besides Calibri and it worked out fine.
From How to extract fonts from a PDF:
You actually cannot extract a font from a PDF, not even if the font is
fully embedded. There are two reasons why this is not feasible:
•Most fonts are copyrighted, making it illegal to use an extractor.
•When a font is embedded in a PDF, not all of the font data are
included. Obviously the font outline data are included as well as the
font width tables. Other information, such as data about ligatures,
are irrelevant within the PDF so those data do not get enclosed in a
PDF. I am not aware of any font extraction tools but if you come
across one, the above reasons should make it clear that these
utilities are to be avoided.

Categories

Resources