Can java use pdfbox to turn the red stamp into black

Can java use pdfbox to turn the red stamp into black - java

I tried to use the following code to convert the stamp color, but it didn't work
PDFDocumentSignature signature = signatures.get(i);
PDPageContentStream contents2 = new PDPageContentStream(pdDocument, pages,PDPageContentStream.AppendMode.APPEND, false, false);
PDExtendedGraphicsState r01 = new PDExtendedGraphicsState();
r01.setBlendMode(BlendMode.SATURATION);
contents2.setGraphicsStateParameters(r01);
contents2.setNonStrokingColor(Color.DARK_GRAY);
contents2.addRect(signature.getX(), signature.getY(), signature.getHeight(), signature.getHeight());
contents2.fill();
contents2.close();`

As it turned out in the comments to the question, the red stamp in question is a signature form field widget.
This explains why the OP's code could not de-saturate the stamp: That code de-saturates a portion of the static page content. Widgets (and annotations in general), though, are not part of the static page content but are rendered on top of it.
Thus, we have to manipulate the content stream of the widget annotation in question or another annotation over it.
You can manipulate the signature appearance like this:
PDDocument pdf = ...;
PDAcroForm acroForm = pdf.getDocumentCatalog().getAcroForm();
PDTerminalField acroField = (PDTerminalField) acroForm.getField("Signature1");
PDAnnotationWidget widget = acroField.getWidgets().get(0);
PDAppearanceStream appearance = widget.getAppearance().getNormalAppearance().getAppearanceStream();
byte[] originalBytes;
try ( InputStream oldContent = appearance.getContents() ) {
originalBytes = IOUtils.toByteArray(oldContent);
}
try ( PDPageContentStream canvas = new PDPageContentStream(pdf, appearance) ) {
canvas.appendRawCommands(originalBytes);
PDExtendedGraphicsState r01 = new PDExtendedGraphicsState();
r01.setBlendMode(BlendMode.SATURATION);
canvas.setGraphicsStateParameters(r01);
canvas.setNonStrokingColor(Color.DARK_GRAY);
PDRectangle bbox = appearance.getBBox();
canvas.addRect(bbox.getLowerLeftX(), bbox.getLowerLeftY(), bbox.getWidth(), bbox.getHeight());
canvas.fill();
}
pdf.save(RESULT);
(ChangeAppearance test testRemoveAppearanceSaturation)
before
after

Related

How to insert a image created by decode of a string to a pdf in pdfbox

I'm trying to insert an image (which needs to be converted from a string by java.util.Base64.getDecoder().decode(imageInputString)) to a certain position of a pdf file.
The main logic of the code will be:
//create a PDImageXObject myImage first (or something that could be used in addImage method.
//And this is what I could not figure out how to accomplish.
//open the pdf file and use addImage to insert the image to the specific page at specific position.
PDDocument document = PDDocument.load(pdfFile);
PDPageContentStream contentStream = new PDPageContentStream(document, pageNumber);
contentStream.addImage(myImage,x,y);
document.save();
Most of the tutorial I found created the myImage from reading an image file. Could someone help me to see if I could do the same thing but with a byte [], which is the output of java.util.Base64.getDecoder().decode(imageInputString)?
Thanks!

You can use the static method PDImageXObject.createFromByteArray(), which detects the file type based on contents and will decide which PDF image type / image compression is best. (javadoc)

Thanks to Tilman Hausherr.
Here is the final code (just the core part):
int pageNumber = j;
PDPage page = document.getPage(pageNumber);
PDResources resources = page.getResources();
byte[] ba = java.util.Base64.getDecoder().decode(base64str);
PDImageXObject sigimg = PDImageXObject.createFromByteArray(document,ba,"signature");
float imgW = sigimg.getWidth();
float imgH = sigimg.getHeight();
PDPageContentStream contentStream = new PDPageContentStream(document, page,PDPageContentStream.AppendMode.APPEND, true,true);
PDRectangle sigRect = field.getWidgets().get(0).getRectangle();
float fieldW = sigRect.getWidth();
float fieldH = sigRect.getHeight();
if (imgW > fieldW || imgH > fieldH){
if(imgW/fieldW > imgH/fieldH){
sigimg.setWidth(Math.round(fieldW));
sigimg.setHeight(Math.round(imgH/imgW*fieldW));
}
else{
sigimg.setWidth(Math.round(imgW/imgH*fieldH));
sigimg.setHeight(Math.round(fieldH));
}
}
contentStream.drawImage(sigimg,sigRect.getLowerLeftX(),sigRect.getLowerLeftY());
contentStream.close();

PDFBox incorrect text appearance after copy/paste

I’m using PDFBox 2.0.4 to create PDF documents with acroForms. Here is my test code example:
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
String dir = "../testPdfBox/src/main/resources/fonts/";
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));
PDResources resources = new PDResources();
String fontName = resources.add(font).getName();
acroForm.setDefaultResources(resources);
String defaultAppearanceString = format("/%s 12 Tf 0 g", fontName);
acroForm.setDefaultAppearance(defaultAppearanceString);
PDTextField field = new PDTextField(acroForm);
field.setPartialName("SampleField");
field.setDefaultAppearance(defaultAppearanceString);
acroForm.getFields().add(field);
PDAnnotationWidget widget = field.getWidgets().get(0);
PDRectangle rect = new PDRectangle(50, 750, 200, 50);
widget.setRectangle(rect);
widget.setPage(page);
widget.setPrinted(true);
page.getAnnotations().add(widget);
field.setValue("Sample field 123456");
acroForm.flatten();
document.save("target/SimpleForm.pdf");
document.close();
Everything works fine. But when I try to copy text from the created document and paste it to the NotePad or Word it becomes squares.
􀀷􀁅􀁑􀁔􀁐􀁉􀀄􀁊􀁍􀁉􀁐􀁈􀀄􀀕􀀖􀀗􀀘􀀙􀀚
I search a lot about this problem. The most popular answer is that there is no toUnicode cmap in created PDF. So I explore my document with CanOpener for Acrobat:
Yes, there is no toUnicode cmap, but everything works properly, if not to use acroForm.flatten(). When form fields are not flattened, I can copy/paste text from the document and it looks correct. Nevertheless I need all fields to be flattened.
So, I have two questions:
Why there is a problem with copy/pasting text in flattened form, and everything is ok in non-flattened?
What can I do to avoid problem with text copy/pasting?
Is there only one solution - to create toUnicode CMap by my own, like in this example?
My test pdf files are available here.

Please replace
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));
with
PDType0Font font = PDType0Font.load(document, new FileInputStream(dir + "Roboto-Regular.ttf"), false);
This makes sure that the font is embedded in full and not just as a subset.

Render Type3 font character as image using PDFBox

In my project, I'm stuck with necessity to parse PDF file, that contains some characters rendered by Type3 fonts. So, what I need to do is to render such characters into BufferedImage for further processing.
I'm not sure if I'm looking in correct way, but I'm trying to get PDType3CharProc for such characters:
PDType3Font font = (PDType3Font)textPosition.getFont();
PDType3CharProc charProc = font.getCharProc(textPosition.getCharacterCodes()[0]);
and the input stream of this procedure contains following data:
54 0 1 -1 50 43 d1
q
49 0 0 44 1.1 -1.1 cm
BI
/W 49
/H 44
/BPC 1
/IM true
ID
<some binary data here>
EI
Q
but unfortunately I don't have any idea how can I use this data to render character into an image using PDFBox (or any other Java libraries).
Am I looking in correct direction, and what can I do with this data?
If not, are there some other tools that can solve such problem?

Unfortunately PDFBox out-of-the-box does not provide a class to render contents of arbitrary XObjects (like the type 3 font char procs), at least as far as I can see.
But it does provide a class for rendering complete PDF pages; thus, to render a given type 3 font glyph, one can simply create a page containing only that glyph and render this temporary page!
Assuming, for example, the type 3 font is defined on the first page of a PDDocument document and has name F1, all its char procs can be rendered like this:
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
COSName f1Name = COSName.getPDFName("F1");
PDType3Font fontF1 = (PDType3Font) pageResources.getFont(f1Name);
Map<String, Integer> f1NameToCode = fontF1.getEncoding().getNameToCodeMap();
COSDictionary charProcsDictionary = fontF1.getCharProcs();
for (COSName key : charProcsDictionary.keySet())
{
COSStream stream = (COSStream) charProcsDictionary.getDictionaryObject(key);
PDType3CharProc charProc = new PDType3CharProc(fontF1, stream);
PDRectangle bbox = charProc.getGlyphBBox();
if (bbox == null)
bbox = charProc.getBBox();
Integer code = f1NameToCode.get(key.getName());
if (code != null)
{
PDDocument charDocument = new PDDocument();
PDPage charPage = new PDPage(bbox);
charDocument.addPage(charPage);
charPage.setResources(pageResources);
PDPageContentStream charContentStream = new PDPageContentStream(charDocument, charPage);
charContentStream.beginText();
charContentStream.setFont(fontF1, bbox.getHeight());
charContentStream.getOutput().write(String.format("<%2X> Tj\n", code).getBytes());
charContentStream.endText();
charContentStream.close();
File result = new File(RESULT_FOLDER, String.format("4700198773-%s-%s.png", key.getName(), code));
PDFRenderer renderer = new PDFRenderer(charDocument);
BufferedImage image = renderer.renderImageWithDPI(0, 96);
ImageIO.write(image, "PNG", result);
charDocument.close();
}
}
(RenderType3Character.java test method testRender4700198773)
Considering the textPosition variable in the OP's code, he quite likely attempts this from a text extraction use case. Thus, he'll have to either pre-generate the bitmaps as above and simply look them up by name or adapt the code to match the available information in his use case (e.g. he might not have the original page at hand, only the font object; in that case he cannot copy the resources of the original page but instead may create a new resources object and add the font object to it).
Unfortunately the OP did not provide a sample PDF. Thus I used one from another stack overflow question, 4700198773.pdf from extract text with custom font result non readble for my test. There obviously might remain issues with the OP's own files.

I stumbled upon the same issue and I was able to render Type3 font by modifying PDFRenderer and the underlying PageDrawer:
class Type3PDFRenderer extends PDFRenderer
{
private PDFont font;
public Type3PDFRenderer(PDDocument document, PDFont font)
{
super(document);
this.font = font;
}
#Override
protected PageDrawer createPageDrawer(PageDrawerParameters parameters) throws IOException
{
FontType3PageDrawer pd = new FontType3PageDrawer(parameters, this.font);
pd.setAnnotationFilter(super.getAnnotationsFilter());//as done in the super class
return pd;
}
}
class FontType3PageDrawer extends PageDrawer
{
private PDFont font;
public FontType3PageDrawer(PageDrawerParameters parameters, PDFont font) throws IOException
{
super(parameters);
this.font = font;
}
#Override
public PDGraphicsState getGraphicsState()
{
PDGraphicsState gs = super.getGraphicsState();
gs.getTextState().setFont(this.font);
return gs;
}
}
Simply use Type3PDFRenderer instead of PDFRendered. Of course if you have multiple fonts this needs some more modification to handle them.
Edit: tested with pdfbox 2.0.9

Using PDFBox, how to set an appearance stream on annotation?

I am trying to highlight some text and convert it to image.
I tried some stuff but the annotation did not came out on the image.
Looking for help found this issue
http://issues.apache.org/jira/browse/PDFBOX-2162
which said that I must set appearance-stream to the annotation, something that acrobat reader do it automatically, but when converting to image it is needed. I could not figure out how to set the appearance-stream to the annotation.
looked for some examples on annotations and appearance-stream
(which I could not find an example that do both.. :-( )
This is what I have so far:
PDDocument document = PDDocument.load("sometest.pdf");
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
PDPage page = pages.get(0);
List<PDAnnotation> annotations = page.getAnnotations();
PDAnnotationTextMarkup annotation = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
final PDRectangle boundingBox = new PDRectangle();
//here I set the boundingbox coordinates
annotation.setRectangle(boundingBox);
final float[] quads = this.getQuads(boundingBox);
annotation.setQuadPoints(quads);
annotation.setContents("bla bla");
annotation.setConstantOpacity((float) 0.9);
PDGamma c = new PDGamma();
//Here I set the RGB
annotation.setColour(c);
annotation.setPrinted(true);
//create the Form for the appearance stream
PDResources holderFormResources = new PDResources();
PDStream holderFormStream = new PDStream(document);
PDXObjectForm holderForm = new PDXObjectForm(holderFormStream);
holderForm.setResources(holderFormResources);
holderForm.setBBox(boundingBox);
holderForm.setFormType(1);
// trying to set the appreanceStream for the annotation
PDAppearanceDictionary appearance = new PDAppearanceDictionary();
appearance.getCOSObject().setDirect(true);
PDAppearanceStream appearanceStream = new PDAppearanceStream(holderForm.getCOSStream());
holderForm.getCOSStream().createFilteredStream();
appearance.setNormalAppearance(appearanceStream);
annotation.setAppearance(appearance);
annotations.add(annotation);
//convert to image
BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_BGR, 600);
ImageIO.write(image, "jpg", new File("test.jpg"));
document.save("test.pdf");
document.close();
The "test.pdf" come out with the annotation, but the image not.
What am I missing?

Adding Header to existing PDF File using PDFBox

I am trying to add a Header to an existing PDF file. It works but the table header in the existing PDF are messed up by the change in the font. If I remove setting the font then the header doesn't show up. Here is my code:
// the document
PDDocument doc = null;
try
{
doc = PDDocument.load( file );
List allPages = doc.getDocumentCatalog().getAllPages();
//PDFont font = PDType1Font.HELVETICA_BOLD;
for( int i=0; i<allPages.size(); i++ )
{
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
PDFont font = PDType1Font.TIMES_ROMAN;
float fontSize = 15.0f;
contentStream.beginText();
// set font and font size
contentStream.setFont( font, fontSize);
contentStream.moveTextPositionByAmount(700, 1150);
contentStream.drawString( message);
contentStream.endText();
//contentStream.
contentStream.close();}
doc.save( outfile );
}
finally
{
if( doc != null )
{
doc.close();
}
}
}`

Essentially you are running into a PDFBox bug in the current version 1.8.2.
A workaround:
Add a getFonts call of the page resources after creating the new content stream before using a font:
PDPage page = (PDPage)allPages.get( i );
PDRectangle pageSize = page.findMediaBox();
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true,true);
page.getResources().getFonts(); // <<<<<<<<
PDFont font = PDType1Font.TIMES_ROMAN;
float fontSize = 15.0f;
contentStream.beginText();
The bug itself:
The bug is in the method PDResources.addFont which is called from PDPageContentStream.setFont:
public String addFont(PDFont font)
{
return addFont(font, MapUtil.getNextUniqueKey( fonts, "F" ));
}
It uses the current content of the fonts member variable to determine a unique name for the new font resource on the page at hand. Unfortunately this member variable still can be (and in your case is) uninitialized at this time. This results in the MapUtil.getNextUniqueKey( fonts, "F" ) call to always return F0.
The font variable then is initialized implicitly during the addFont(PDFont, String) call later.
Thus, if unfortunately there already existed a font named F0 on that page, it is replaced by the new font.
Having tested with your PDF this is exactly what happens in your case. As the existing font F0 uses some custom encoding while your replacement font uses a standard one, the text originally written using F0 now looks like gibberish.
The work-around mentioned above implicitly initializes that member variable and, thus, prevents the font replacement.
If you plan to use PDFBox in production for this task, you might want to report the bug.
PS: As mentioned in the comments above there is another bug to observe in context with inherited resources. It should be brought to the PDFBox development's attention, too.
PPS: The issue at hand meanwhile has been fixed in PDFBox for versions 1.8.3 and 2.0.0, cf. PDFBOX-1753.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Can java use pdfbox to turn the red stamp into black - java

Related

How to insert a image created by decode of a string to a pdf in pdfbox

PDFBox incorrect text appearance after copy/paste

Render Type3 font character as image using PDFBox

Using PDFBox, how to set an appearance stream on annotation?

Adding Header to existing PDF File using PDFBox

Categories

Resources