When I sign a document and I use an image in the signature appearance, I get this result:
However, I want the result to be like this:
This is my code:
PdfSignatureAppearance appearance = stamper.getSignatureAppearance();
appearance.setReason(info.getReason());
appearance.setLocation(info.getLocation());
appearance.setVisibleSignature(t.getRectangle(), 1, "SIGN");
appearance.setSignatureGraphic(image);
appearance.setCertificationLevel(PdfSignatureAppearance.NOT_CERTIFIED);
appearance.setRenderingMode(PdfSignatureAppearance.RenderingMode.GRAPHIC);
MakeSignature.signDetached(appearance, digest, signature,chain, null, null, null, 0, MakeSignature.CryptoStandard.CADES);
The visual representation of a digital signature in a PDF document is a widget annotation. A widget annotation is a specific type of annotation.
In PDF, you have the actual content of a page. This content is stored in a content stream. A page is defined in a page dictionary. There's a /Contents entry in this page dictionary that refers to one of more content streams. Together these content streams contain the syntax that describes the content of a page.
Annotations are not part of any of those streams. Annotations are referred to from the page dictionary using the /Annots entry. In your case, there is an /Annots entry that refers to a widget annotation with the appearance of your signature.
All annotations are rendered on top of the actual page content.
To make a long story short: you have a PDF with some page content and the appearance of an annotation is added on top of that page content. This is normal. This is as described in ISO-32000-1. You are now asking for a PDF viewer to render annotations under the page content. This is not possible according to the PDF specification.
The short answer to your question is: you are asking something that can't be done. The closest you'll get to a solution is to make the image transparent (but that's another question).
While Bruno's answer is correct - there is no way to make the signature annotation appearance to be beneath the content -, in some cases you can make it appear to be.
Blend mode use
Bruno already mentioned that one approach might be to make the image transparent. Regular (alpha channel style) transparency has one major disadvantage, though: the original content or the image (probably both) will appear somewhat pale. But there is a related technique in PDF for puting foreground and backdrop together in a special way: blend modes.
In particular if your pdf content is mostly black and white, using the blend mode Darken or Multiply will create a result looking like the desired image.
Signature graphic in content
If your pdf has not been signed yet, you can actually add the signature image to the page background itself, e.g. using the UnderContent of the PdfStamper. (If the page has an opaque background, e.g. due to white filled rectangle as background, this will become a bit more difficult but still manageable.)
Page content in signature appearance
Alternatively you can draw a copy of the page content over the image in the signature annotation appearance. This also works for previously signed PDFs but is also sensitive to opaque page contents.
Related
I'm attempting to perform some string validation against individual PDF pages in a file via the use of Apache PDFBox.
I'm going to be utilizing PDFTextStripper for the majority of this, so my first issue to tackle was the fact that all the PDFs i'm going to be validating against are generated as 2up; e.g Page 1 of 2 and page 2 of 2 were on the same page or if you imagine you literally scanned a book face down into a scanner - In addition to this, they were oriented incorrectly, and needed rotating 90 degrees so PDFTextStripper could read them properly.
Using elements of the below questions/solutions, i have built a method which first crops the page exactly in half, exports the cropped pages in order to a new file, rotates each page to the correct orientation and then saves the file;
Rotate PDF around its center using PDFBox in java
Split a PDF page in two parts [duplicate]
Visually, my method is seemingly working as expected until i run PDFTextStripper against it - It appears to be returning the text of not just the page i want, but also the page i cropped out of it.
To confirm the issue, I extracted a single page out of the entire document and saved it as a new file - when running PDFTextStripper, i still get the same results even though all i can see is literally one page. Adobe search doesn't bring up the hidden, legacy data either.
I can only assume that during my transform method, i need to redefine the cropped page with only the contents of the cropped page.
My question is, how can i do this?
p.s - i haven't posted my code as it's basically a amalgamation of the solutions provided in the aforementioned links above - however if it i needed, i can provide
The PDFTextStripper ignores the CropBox you set to crop the pages. It also ignores whether text is covered by some filled rectangle or image or whether the text is invisible, it extracts all text (except text in patterns or contains in Type 3 font characters).
You might want to try the PDFTextStripperByArea instead. This class (which is derived from PDFTextStripper) restricts itself to regions you can define.
(Unfortunately these regions have to be defined using a different coordinate system than the one used for the CropBox, so usually you will have to transform the coordinates first.)
I have to create a pdf using itext which will contain a button, when clicked should add a row in an existing PdfPTable. I wrote some code to create a PushbuttonField. While trying to set action I can only find PdfAction.javaScript. I am not able to figure out how to add a row in a table. I tried searching online but all I could find is PdfAction.javaScript
Any help would be greatly appreciated. Thank you.
When you create a PDF file, you draw text, lines and shapes to a canvas. That is also what happens when you add a PdfPTable to a Document. If you look at the syntax of the PDF page, you won't recognize a table. You'll find text (the content of the cells), lines (the borders), and shapes (the backgrounds), but you won't find a table. If the table is distributed over different pages, the "table" on one page won't know that it is related to the "table" on the other page.
Sure, you can add semantic structure to the document by introducing marked content, and by creating a structure tree, but that mechanism which we call Tagged PDF can't be used to make the PDF "editable" the same way a Word document is editable. Tagged PDF is (among others) used to allow assistive technology to present the content to the visually impaired (e.g. in the context of PDF/UA). The presence of structure doesn't change the fact that all text, all lines, and all shapes are added at absolute positions.
This is very different from HTML where the position on a page of a <table>, <tr>, <th>, or <td> is calculated at the moment the page is rendered. In HTML this position can even change when you resize the browser window.
There is no such thing in PDF (except if you use XFA (*), a technology that is deprecated since ISO 32000-2). All content on a page has a fixed position, hardcoded into the page's content stream. Changing the size of the PDF viewer window won't change anything to the position of the page content.
Because of all of this, your question is invalid. It is impossible to create a button in PDF that adds a row to a table, because:
In many cases there is no table: there is just a bunch of text, lines, and shapes at absolute positions,
Even if there is the notion of a table (using Tagged PDF): the visual represenation of that table is fixed at creation time, it can't be changed at consumption time.
You want to use an ordinary PDF viewer as if it were a PDF editor. That is impossible for all the reasons listed above.
(*) XFA was deprecated for different reasons. One of the most important reasons it is the lack of support for XFA. There aren't many viewers that support XFA. If you would post a follow-up question asking *"How can I create an XFA document?", the answer would be: "Don't do this!" Creating XFA is extremely complex, and once you've succeeded in creating an XFA form, you'll discover that many of your customers won't be able to consume the file because their viewer doesn't support the format.
I want to change the background color of an already present pdf to transparent or white,
and I am using pdfBox for performing other tasks on the pdf, I found some documentation here:
setBackroundColor - pdfBox
But I am completely unaware of how to use it as I am not accustomed to java.
Can someone possibly provide some example code on doing it ?
I want to change the background color of an already present pdf to transparent or white
According to the PDF specification ISO 32000-1, section 11.4.7:
Ordinarily, the page shall be imposed directly on an output medium, such as paper or a display screen. The page group shall be treated as an isolated group, whose results shall then be composited with a backdrop colour appropriate for the medium. The backdrop is nominally white, although varying according to the actual properties of the medium. However, some conforming readers may choose to provide a different backdrop, such as a checker board or grid to aid in visualizing the effects of transparency in the artwork.
PDF viewers most often do use this white backdrop. Thus, if your PDF on standard viewers displays a different color in the back, this normally is due to some area filling operation(s) somewhere in the page content stream.
Thus, there is not a simple single attribute of the PDF to set somewhere but instead you have to parse the page content, find the operations which paint what you perceive as background, and change them. There are numerous different operations which may be used for this task, though, and these operations may also be used for other purposes than background coloring. Thus, there is not the method to change backgrounds.
If you have a single specific PDF or PDFs generated alike, please provide a sample document to allow helping you to find find a way to change the perceived background color.
PS: The PDLayoutAttributeObject.setBackgroundColor method you found refers to the creation of so called Layout Attributes which
specify parameters of the layout process used to produce the appearance described by a
document’s PDF content. [...]
NOTE The intent is that these parameters can be used to reflow the content or export it to some other document format with at least basic styling preserved.
(section 14.8.5.4 in the PDF specification ISO 32000-1)
Thus, they are provided only in PDFs intended for content reflow or content export and are not used by regular PDF viewers.
I have PDF with text field which contains some characters. But the language specific characters are overlapping.
When it gains focus, text changes and displays correctly. When lost focus, displays incorrectly.
When text is edited displays also correctly.
File test_extended_filled.pdf see bellow
How I created PDF:
Created odg template in OpenOffice Draw 4.0.1 -> test.odg
Exported as PDF -> test.pdf
Edited test.pdf with Adobe Acrobat X Pro 10.0.0 and resaved with extended functions (needed to save on local PC) -> test_extended.pdf
Filled form by java (pdfstamper) -> test_extended_filled.pdf
Bonus: when i change font by pdfstamper in java it looks like changes are applied only on focused text too. -> test_extended_filled_font_size.pdf
Note: When I fill test.pdf from 2. it's displayed correctly -> text_filled.pdf
Attached files (go to download section):
https://rapidshare.com/share/ACC0D81E9235A6DA2CC2353BD21A4C37
After I added
stamper.getAcroFields().addSubstitutionFont
it's better, but some characters still overlap. -> test_extended_filled_font_size_with_substitution_font.pdf
http://rapidshare.com/share/0EE3238F37E9115C36A7A74706B09826
Any ideas?
Please take a look at the FillFormSpecialChars example and the resulting PDF.
Open Office doesn't really create nice forms. As mkl already indicates, the NeedAppearances flag can cause problems, the border of the fields is drawn onto the page content instead of being part of the widget annotation, etc...
In your case, you've defined a font that isn't optimal for special characters. Using a substitution font isn't ideal, because you can clearly see that drawing the glyphs isn't that much of a problem. The problem is that the metrics are all wrong. It's as if the special characters have an advance of 0 glyph units. In this case, you should change the font using the setFieldProperty() method.
I have a scenario where I need a Java app to be able to extract content from a PDF file in one of 2 modes: TEXT_ONLY or ALL. In text mode, only visible text ("visible" as if a human being was reading the PDF) is read out into strings. In all mode, all content (text, images, etc.) is read out of the file.
For instance, if a PDF file was to have 1 page in it, and that page had 3 paragraphs of contiguous text, and was word-wrapping 2 images, then TEXT_ONLY would extract all 3 paragraphs, and ALL would extract all 3 paragraphs and both images:
while(page.hasMoreText())
textList.add(page.nextTextChunk());
if(allMode)
while(page.hasMoreImages())
imageList.add(page.nextImage());
I know Apache Tika uses PDFBox under the hood, but am worried that this kind of functionality is shaded/prohibited by Tika (in which case, I probably need to do this directly from PDFBox).
So I ask: is this possible, and if so, which library is more appropriate for me to use? Am I going about this entirely the wrong way? Any pitfalls/caveats I am not considering here?
To expound some aspects of why #markStephens points you towards some resources giving some background on PDF.
In text mode, only visible text ("visible" as if a human being was reading the PDF) is read out into strings.
Your definition "visible" as if a human being was reading the PDF is not yet very well-defined:
Is text 1 pt in size visible? When zooming in, a human can read it; in standard magnification not, though. Which size would be the limit?
Is text in RGB (128, 129, 128) in a background of (128, 128, 128) visible? How different have the colors to be?
Is text displayed in some white noise pattern on a background of some other white noise pattern visible? How different have patterns to be?
Is text only partially on-screen visible? If yes, is one visible pixel enough? And what about some character 'I' in a giant size where the visible page area fits into the dot on the letter?
What about text covered by some annotation which can easily be moved, probably even by some automatically executed JavaScript code in the file?
What about text in some optional content group only visible when printing?
*...
I would expect most available PDF text parsing libraries to ignore all these circumstances and extract the text, at most respecting a crop box. In case of images with added, invisible OCR'ed text the extraction of that text in general is desired.
For instance, if a PDF file was to have 1 page in it, and that page had 3 paragraphs of contiguous text, and was word-wrapping 2 images, then TEXT_ONLY would extract all 3 paragraphs, and ALL would extract all 3 paragraphs and both images:
PDF (in general) does not know about paragraphs, just some groups of glyphs positioned somewhere on the page. Recognizing paragraphs is a task which cannot be guaranteed to work properly as there are heuristics at work. If, furthermore, you have multicolumn text with an irregular separation, maybe even some image in between (making it hard to decide whether there are two columns divided by the image or whether there is one column with an integrated image), you can count on recognition of the text flow let alone text elements like paragraphs, sections, etc. to fail miserably.
If your PDFs are either properly tagged or all generated by a tool chain for which patterns in the created PDF content streams betray text structures, you may be more lucky. In case of the latter, though, your solution would have to be custom-made for that tool chain.
but am worried that this kind of functionality is shaded/prohibited by Tika (in which case, I probably need to do this directly from pdfBox).
There you point towards another point of interest: PDFs can be marked that text extraction is forbidden while they otherwise can be displayed by anyone. While technically PDFs marked like that can be handled just like documents without that mark with just one decoding step (essentially they are encrypted with a publicly known password), doing so is clearly acting against the declared intention of the author and violating his copyright.
So I ask: is this possible, and if so, which library is more appropriate for me to use? Am I going about this entirely the wrong way? Any pitfalls/caveats I am not considering here?
As long as you expect 100% accuracy for generic input, you should reconsider your architecture.
If the PDFs are all you have and a solution as effective is possible is OK, on the other hand, there are multiple possible libraries for you, iText, and PDFBox to name but two while there are more. Which is best for you depends on more factors, e.g. on whether you need some generic solution or all PDFs are created by a tool chain as above.
In any case you'll have to do some programming yourself, though, to fine-tune them for your use case.