iText add Watermark to selected pages - java

I need to add a watermark to every page that has certain text, such as "PROCEDURE DELETED".
Based on Bruno Lowagie's suggestion in Adding watermark directly to the stream
So far have the PdfWatermark Class with:
protected Phrase watermark = new Phrase("DELETED", new Font(FontFamily.HELVETICA, 60, Font.NORMAL, BaseColor.PINK));
ArrayList<Integer> arrPages = new ArrayList<Integer>();
boolean pdfChecked = false;
#Override
public void onEndPage(PdfWriter writer, Document document) {
if(pdfChecked == false) {
detectPages(writer, document);
pdfChecked = true;
}
int pageNum = writer.getPageNumber();
if(arrPages.contains(pageNum)) {
PdfContentByte canvas = writer.getDirectContentUnder();
ColumnText.showTextAligned(canvas, Element.ALIGN_CENTER, watermark, 298, 421, 45);
}
}
And this works fine if I add, say, the number 3 to the arrPages ArrayList in my custom detectPages method - it shows the desired watermark on page 3.
What I am having trouble with is how to search through the document for the text string, which I have access to here, only from the PdfWriter writer or the com.itextpdf.text.Document document sent to onEndPage method.
Here is what I have tried, unsuccessfully:
private void detectPages(PdfWriter writer, Document document) {
try {
//arrPages.add(3);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter.getInstance(document, byteArrayOutputStream);
//following code no work
PdfReader reader = new PdfReader(writer.getDirectContent().getPdfDocument());
PdfContentByte canvas = writer.getDirectContent();
PdfImportedPage page;
for (int i = 0; i < reader.getNumberOfPages(); ) {
page = writer.getImportedPage(reader, ++i);
canvas.addTemplate(page, 1f, 0, 0.4f, 0.4f, 72, 50 * i);
canvas.beginText();
canvas.showTextAligned(Element.ALIGN_CENTER,
//search String
String.valueOf((char)(181 + i)), 496, 150 + 50 * i, 0);
//if(detected) arrPages.add(i);
canvas.endText();
}
Am I on the right track with this as a solution or do I need to back out?
Can anyone supply the missing link needed to scan the doc and pick out "PROCEDURE DELETED" pages?
EDIT: I am using iText 5.0.4 - cannot upgrade to 5.5.X at this time, but could probably upgrade to latest version below that.
EDIT2: More information: This is approach to adding text to the document (doc):
String processed = processText(template);
List<Element> objects = HTMLWorker.parseToListLog(new StringReader(processed),
styles, interfaceProps, errors);
for (Element elem : objects) {
doc.add(elem);
}
That is called in an addText method I control. The template is simply html from a database LOB. The processText checks the html for custom markers contained by curlies as in ${replaceMe}.
This seems to be the place to identify the "PROCEDURE DELETED" string during generation of the document, but I don't see the path to Chunk.setGenericTags().
EDIT3: Table difficulties
List<Element> objects = HTMLWorker.parseToListLog(new StringReader(processed),
styles, interfaceProps, errors);
for (Element elem : objects) {
//Code below no work
if (elem instanceof PdfPTable) {
PdfPTable table = (PdfPTable) elem;
ArrayList<Chunk> chks = table.getChunks();
for(Chunk chk : chks){
if(chk.toString().contains("TEST DELETED")) {
chk.setGenericTag("delete_tag");
}
}
}
doc.add(elem);
}
Commenters mlk and Bruno suggested to detect the "PROCEDURE DELETED" keywords at the time they are added to the doc. However, since the keywords are necessarily inside a table, they have to be detected through PdfPTable rather than the simpler Element.
I could not do it with the code above. Any suggestions exactly how to find text inside a table cell and do a string comparison on it?
EDIT4: Based on some experimentation, I would like to make some assertions and please show me the way through them:
Using Chunk.setGenericTag() is required to trigger the handler onGenericTag
For some reason (PdfPTable) table.getChunks() does not return chunks, at least that my system picks up. This is counterintuitive and possibly there is a setup, version, or code bug causing this behavior.
Therefore, a selection text string inside a table cannot be used to trigger a watermark.

Thanks to all the help from #mkl and #Bruno, I finally got a workable solution. Anybody interested who can give a more elegant approach is welcome - there is certainly room.
Two classes were in play in all the snippets given in the original question. Below is partial code that embodies the working solution.
public class PdfExporter
//a custom class to build content
//create some content to add to pdf
String text = "<div>Lots of important content here</div>";
//assume the section is known to be deleted
if(section_deleted) {
//add a special marker to the text, in white-on-white
text = "<span style=\"color:white;\">.!^DEL^?.</span>" + text + "</span>";
}
//based on some indicator, we know we want to force a new page for a new section
boolean ensureNewPage = true;
//use an instance of PdfDocHandler to add the content
pdfDocHandler.addText(text, ensureNewPage);
public class PdfDocHandler extends PdfPageEventHelper
private boolean isDEL;
public boolean addText(String text, boolean ensureNewPage)
if (ensureNewPage) {
//turn isDEL off: a forced pagebreak indicates a new section
isDEL = false;
}
//attempt to find the special DELETE marker in first chunk
//NOTE: this can be done several ways
ArrayList<Chunk> chks = elem.getChunks();
if(chks.size()>0) {
if(chks.get(0).getContent().contains(".!^DEL^?.")) {
//special delete marker found in text
isDEL = true;
}
}
//doc is an iText Document
doc.add(elem);
public void onEndPage(PdfWriter writer, Document document) {
if(isDEL) {
//set the watermark
Phrase watermark = new Phrase("DELETED", new Font(Font.FontFamily.HELVETICA, 60, Font.NORMAL, BaseColor.PINK));
PdfContentByte canvas = writer.getDirectContentUnder();
ColumnText.showTextAligned(canvas, Element.ALIGN_CENTER, watermark, 298, 421, 45);
}
}

Related

Remove FixedLeading at the first line on each page

I want to remove setFixedLeading at the first line on each page (100+)
I read a bit text(more 100 page with help while). And I set padding and margin to 0 but I still have top indent. Why? Help me pls? How delete it?
public static final String DEST = "PDF.pdf";
public static void main(String[] args) throws FileNotFoundException {
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(DEST));
Document doc = new Document(pdfDoc);
doc.setMargins(0,0,0,0);
for (int i = 0; i <20 ; i++) {
Paragraph element = new Paragraph("p " + i);
element.setPadding(0);
element.setMargin(0);
element.setFixedLeading(55);
doc.add(element);
}
doc.close();
}
PDF file:
https://pdfhost.io/v/Byt9LHJcy_PDFpdf.pdf
At the time of element creation you don't know the page it will end up on nor its resultant position. I don't think there is a property that allows you to configure the behavior depending on whether it's the top element on a page (such property would be too custom and tied to a specific workflow).
Fortunately, the layout mechanism is quite flexible and you can implement the desired behavior in a couple of lines of code.
First off, let's not use setFixedLeading and set the top margin for all paragraphs instead:
Document doc = new Document(pdfDocument);
doc.setMargins(0, 0, 0, 0);
for (int i = 0; i < 20; i++) {
Paragraph element = new Paragraph("p " + i);
element.setPadding(0);
element.setMargin(0);
element.setMarginTop(50);
doc.add(element);
}
doc.close();
This does not pretty much change anything in the visual result - it's just another way of doing things.
Now, we need a custom renderer to tweak the behavior of a paragraph if it is rendered at the top of the page. We are going to override layout method and check if the area we are given is located at the top of the page - and if so, we will not apply the top margin:
private static class CustomParagraphRenderer extends ParagraphRenderer {
Document document;
public CustomParagraphRenderer(Paragraph modelElement, Document document) {
super(modelElement);
this.document = document;
}
#Override
public IRenderer getNextRenderer() {
return new ParagraphRenderer((Paragraph) modelElement);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
if (layoutContext.getArea().getBBox().getTop() == document.getPdfDocument().getDefaultPageSize().getHeight()) {
((Paragraph)getModelElement()).setMarginTop(0);
}
return super.layout(layoutContext);
}
}
Now the only thing we need to do is to set the custom renderer instance to each paragraph in the loop:
element.setNextRenderer(new CustomParagraphRenderer(element, doc));
Visual result:

IText 7 How To Add Div or Paragraph in Header Without Overlapping Page Content?

I am facing the following problem for which i haven't found any solution yet. I am implementing a platform for a medical laboratory. They want for every incident to write the report to the system and then generate and print it from the system. I am using itext 7 to accomplish this. However i am facing the following problem.
They have a very strange template. On the first page in the beginning they want to print a specific table, while in the beginning of every other page they want to print something else. So i need to know when pages change in order to print in the beginning of the page the corresponding table.
After reading various sources i ended up creating the first page normally and then adding a header event handler that checks the page number and gets executed always except page 1.
public class VariableHeaderEventHandler implements IEventHandler {
#Override
public void handleEvent(Event event) {
System.out.println("THIS IS ME: HEADER EVENT HANDLER STARTED.....");
PdfDocumentEvent documentEvent = (PdfDocumentEvent) event;
PdfDocument pdfDoc = documentEvent.getDocument();
PdfPage page = documentEvent.getPage();
Rectangle pageSize = page.getPageSize();
int pageNumber = pdfDoc.getPageNumber(page);
if (pageNumber == 1) return; //Do nothing in the first page...
System.out.println("Page size: " + pageSize.getHeight());
Rectangle rectangle = new Rectangle(pageSize.getLeft() + 30, pageSize.getHeight()-234, pageSize.getWidth() - 60, 200);
PdfCanvas pdfCanvas = new PdfCanvas(page.newContentStreamBefore(), page.getResources(), pdfDoc);
pdfCanvas.rectangle(rectangle);
pdfCanvas.setFontAndSize(FontsAndStyles.getRegularFont(), 10);
Canvas canvas = new Canvas(pdfCanvas, pdfDoc, rectangle);
Div header = new Div();
Paragraph paragraph = new Paragraph();
Text text = new Text("Διαγνωστικό Εργαστήριο Ιστοπαθολογίας και Μοριακής Παθολογοανατομικής").addStyle(FontsAndStyles.getBoldStyle());
paragraph.add(text);
paragraph.add(new Text("\n"));
text = new Text("Μοριακή Διάγνωση σε Συνεργασία με").addStyle(FontsAndStyles.getBoldStyle());
paragraph.add(text);
paragraph.add(new Text("\n"));
text = new Text("Γκούρβας Βίκτωρας, M.D., Ph.D.").addStyle(FontsAndStyles.getBoldStyle());
paragraph.add(text);
paragraph.add(new Text("\n"));
text = new Text("Τσιμισκή 33, Τ.Κ. 54624, ΘΕΣΣΑΛΟΝΙΚΗ").addStyle(FontsAndStyles.getNormalStyle());
paragraph.add(text);
paragraph.add(new Text("\n"));
text = new Text("Τήλ/Φάξ: 2311292924 Κιν.: 6932104909 e-mail: vgourvas#gmail.com").addStyle(FontsAndStyles.getNormalStyle());
paragraph.add(text);
header.add(paragraph);
// =============Horizontal Line BOLD============
SolidLine solidLine = new SolidLine((float) 1.5);
header.add(new LineSeparator(solidLine));
// ========Horizontal Line BOLD End==========
text = new Text("ΠΑΘΟΛΟΓΟΑΝΑΤΟΜΙΚΗ ΕΞΕΤΑΣΗ").addStyle(FontsAndStyles.getBoldStyle());
paragraph = new Paragraph().add(text);
header.add(paragraph);
header.setTextAlignment(TextAlignment.CENTER);
canvas.add(header);
canvas.close();
}
However the problem i am facing now is that header overlaps content and i can't figure out how to set different margins per page. For example form page 2 and beyond i would like different topMargin.
Has anyone faced these problems before and have found a working solution? Am I implementing correct? Is there a better way of accomplishing the same result?
Thanks in advance,
Toutoudakis Michail
You should create your own custom document renderer and decrease the area which would be used to place content for each page except for the first one.
Please look at the snippet below and updateCurrentArea method in particular.
class CustomDocumentRenderer extends DocumentRenderer {
public CustomDocumentRenderer(Document document) {
super(document);
}
#Override
public IRenderer getNextRenderer() {
return new CustomDocumentRenderer(this.document);
}
#Override
protected LayoutArea updateCurrentArea(LayoutResult overflowResult) {
LayoutArea area = super.updateCurrentArea(overflowResult);
if (currentPageNumber > 1) {
area.setBBox(area.getBBox().decreaseHeight(200));
}
return area;
}
}
Then just set the renderer on your document:
Document doc = new Document(pdfDoc);
doc.setRenderer(new CustomDocumentRenderer(doc));
The resultant pdf which I get for your document looks as follows:
There is another solution however. Once you've added at least one element to your document, you can change the default document's margins. The change will be applied on all pages created afterwards (and in your case these are pages 2, 3, ...)
doc.add(new Paragraph("At least one element should be added. Otherwise the first page wouldn't be created and changing of the default margins would affect it."));
doc.setMargins(200, 36, 36, 36);
// now you can be sure that all the next pages would have new margins

Add Object onStartPage itextPdf except the last Page

I am adding a rectangle on top of my page for all pages but I do not want the rectangle on the last page. Here is my code:
#Override
public void onStartPage(PdfWriter writer, Document output) {
Font bold = new Font(Font.FontFamily.HELVETICA, 16, Font.BOLD);
bold.setStyle(Font.UNDERLINE);
bold.setColor(new BaseColor(171, 75, 15));
PdfContentByte cb = writer.getDirectContent();
// Bottom left coordinates x & y, followed by width, height and radius of corners.
cb.roundRectangle(100f, 1180f, 400f, 100f, 5f);//I dont want this on the ;ast page
cb.stroke();
try {
output.add(new Paragraph("STATEMENT OF ACCOUNT", bold));
output.add(new Paragraph(new Phrase(new Chunk(" "))));
output.add(new Paragraph(new Phrase(new Chunk(" "))));
output.add(new Paragraph(new Phrase(new Chunk(" "))));
output.add(new Paragraph(new Phrase(new Chunk(" "))));
Image logo = Image.getInstance(imagepath);
logo.setAbsolutePosition(780, 1230);
logo.scaleAbsolute(200, 180);
writer.getDirectContent().addImage(logo);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
Is there a way to either skip or remove this rectangle from the last Page of the document?
First of all iText developers often have stressed that in onStartPage one MUST NOT add content to the PDF. The reason is that under certain circumstances unused pages are created and onStartPage is called for them but they then are dropped. If you add content to them in onStartPage, though, they are not dropped but remain in your document.
Thus, always use onEndPage to add any content to a page.
In your use case there is yet another reason for using onEndPage: Usually it only becomes clear that a given page is the last page when the last bit of content has been added to the document. This usually occurs after onStartPage has been called for the page but before onEndPage has.
Thus, after you've added the last bit of regular page content to the document, you can simply set a flag in the page event listener that the current page is the final document page. Now the following onEndPage call knows it processes the final page and can add content differently.
So the page event listener would look like this
class MyPageEventListener extends PdfPageEventHelper {
public boolean lastPage = false;
#Override
public void onEndPage(PdfWriter writer, Document output) {
if (!lastPage) {
[add extra content for page before the last one]
} else {
[add extra content for last page]
}
}
...
}
and be used like this
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, TARGET);
MyPageEventListener pageEventListener = new MyPageEventListener();
writer.setPageEvent(pageEventListener);
document.open();
[add all regular content to the document]
pageEventListener.lastPage = true;
document.close();

Render Type3 font character as image using PDFBox

In my project, I'm stuck with necessity to parse PDF file, that contains some characters rendered by Type3 fonts. So, what I need to do is to render such characters into BufferedImage for further processing.
I'm not sure if I'm looking in correct way, but I'm trying to get PDType3CharProc for such characters:
PDType3Font font = (PDType3Font)textPosition.getFont();
PDType3CharProc charProc = font.getCharProc(textPosition.getCharacterCodes()[0]);
and the input stream of this procedure contains following data:
54 0 1 -1 50 43 d1
q
49 0 0 44 1.1 -1.1 cm
BI
/W 49
/H 44
/BPC 1
/IM true
ID
<some binary data here>
EI
Q
but unfortunately I don't have any idea how can I use this data to render character into an image using PDFBox (or any other Java libraries).
Am I looking in correct direction, and what can I do with this data?
If not, are there some other tools that can solve such problem?
Unfortunately PDFBox out-of-the-box does not provide a class to render contents of arbitrary XObjects (like the type 3 font char procs), at least as far as I can see.
But it does provide a class for rendering complete PDF pages; thus, to render a given type 3 font glyph, one can simply create a page containing only that glyph and render this temporary page!
Assuming, for example, the type 3 font is defined on the first page of a PDDocument document and has name F1, all its char procs can be rendered like this:
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
COSName f1Name = COSName.getPDFName("F1");
PDType3Font fontF1 = (PDType3Font) pageResources.getFont(f1Name);
Map<String, Integer> f1NameToCode = fontF1.getEncoding().getNameToCodeMap();
COSDictionary charProcsDictionary = fontF1.getCharProcs();
for (COSName key : charProcsDictionary.keySet())
{
COSStream stream = (COSStream) charProcsDictionary.getDictionaryObject(key);
PDType3CharProc charProc = new PDType3CharProc(fontF1, stream);
PDRectangle bbox = charProc.getGlyphBBox();
if (bbox == null)
bbox = charProc.getBBox();
Integer code = f1NameToCode.get(key.getName());
if (code != null)
{
PDDocument charDocument = new PDDocument();
PDPage charPage = new PDPage(bbox);
charDocument.addPage(charPage);
charPage.setResources(pageResources);
PDPageContentStream charContentStream = new PDPageContentStream(charDocument, charPage);
charContentStream.beginText();
charContentStream.setFont(fontF1, bbox.getHeight());
charContentStream.getOutput().write(String.format("<%2X> Tj\n", code).getBytes());
charContentStream.endText();
charContentStream.close();
File result = new File(RESULT_FOLDER, String.format("4700198773-%s-%s.png", key.getName(), code));
PDFRenderer renderer = new PDFRenderer(charDocument);
BufferedImage image = renderer.renderImageWithDPI(0, 96);
ImageIO.write(image, "PNG", result);
charDocument.close();
}
}
(RenderType3Character.java test method testRender4700198773)
Considering the textPosition variable in the OP's code, he quite likely attempts this from a text extraction use case. Thus, he'll have to either pre-generate the bitmaps as above and simply look them up by name or adapt the code to match the available information in his use case (e.g. he might not have the original page at hand, only the font object; in that case he cannot copy the resources of the original page but instead may create a new resources object and add the font object to it).
Unfortunately the OP did not provide a sample PDF. Thus I used one from another stack overflow question, 4700198773.pdf from extract text with custom font result non readble for my test. There obviously might remain issues with the OP's own files.
I stumbled upon the same issue and I was able to render Type3 font by modifying PDFRenderer and the underlying PageDrawer:
class Type3PDFRenderer extends PDFRenderer
{
private PDFont font;
public Type3PDFRenderer(PDDocument document, PDFont font)
{
super(document);
this.font = font;
}
#Override
protected PageDrawer createPageDrawer(PageDrawerParameters parameters) throws IOException
{
FontType3PageDrawer pd = new FontType3PageDrawer(parameters, this.font);
pd.setAnnotationFilter(super.getAnnotationsFilter());//as done in the super class
return pd;
}
}
class FontType3PageDrawer extends PageDrawer
{
private PDFont font;
public FontType3PageDrawer(PageDrawerParameters parameters, PDFont font) throws IOException
{
super(parameters);
this.font = font;
}
#Override
public PDGraphicsState getGraphicsState()
{
PDGraphicsState gs = super.getGraphicsState();
gs.getTextState().setFont(this.font);
return gs;
}
}
Simply use Type3PDFRenderer instead of PDFRendered. Of course if you have multiple fonts this needs some more modification to handle them.
Edit: tested with pdfbox 2.0.9

Reading a table or cell value in a pdf file using java?

I have gone through Java and PDF forums to extract a text value from the table in a pdf file, but could't find any solution except JPedal (It's not opensource and licensed).
So, I would like to know any opensource API's like pdfbox, itext to achieve the same result as JPedal.
Ref. Example:
In comments the OP clarified that he locates the text value from the table in a pdf file he wants to extract
By providing X and Y co-ordinates
Thus, while the question initially sounded like generic extraction of tabular data from PDFs (which can be difficult at least), it actually is essentially about extracting the text from a rectangular region on a page given by coordinates.
This is possible using either of the libraries you mentioned (and surely others, too).
iText
To restrict the region from which you want to extract text, you can use the RegionTextRenderFilter in a FilteredTextRenderListener, e.g.:
/**
* Parses a specific area of a PDF to a plain text file.
* #param pdf the original PDF
* #param txt the resulting text
* #throws IOException
*/
public void parsePdf(String pdf, String txt) throws IOException {
PdfReader reader = new PdfReader(pdf);
PrintWriter out = new PrintWriter(new FileOutputStream(txt));
Rectangle rect = new Rectangle(70, 80, 490, 580);
RenderFilter filter = new RegionTextRenderFilter(rect);
TextExtractionStrategy strategy;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
out.println(PdfTextExtractor.getTextFromPage(reader, i, strategy));
}
out.flush();
out.close();
reader.close();
}
(ExtractPageContentArea sample from iText in Action, 2nd edition)
Beware, though, iText extracts text based on the basic text chunks in the content stream, not based on each individual glyph in such a chunk. Thus, the whole chunk is processed if only the tiniest part of it is in the area.
This may or may not suit you.
If you run into the problem that more is extracted than you wanted, you should split the chunks into their constituting glyphs beforehand. This stackoverflow answer explains how to do that.
PDFBox
To restrict the region from which you want to extract text, you can use the PDFTextStripperByArea, e.g.:
PDDocument document = PDDocument.load( args[0] );
if( document.isEncrypted() )
{
document.decrypt( "" );
}
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition( true );
Rectangle rect = new Rectangle( 10, 280, 275, 60 );
stripper.addRegion( "class1", rect );
List allPages = document.getDocumentCatalog().getAllPages();
PDPage firstPage = (PDPage)allPages.get( 0 );
stripper.extractRegions( firstPage );
System.out.println( "Text in the area:" + rect );
System.out.println( stripper.getTextForRegion( "class1" ) );
(ExtractTextByArea from the PDFBox 1.8.8 examples)
Try PDFTextStream. At least I am able to identify the column values. Earlier, I was using iText and got stuck in defining strategy. Its hard.
This api separates column cells by putting more spaces. Its fixed. you can put logic. (this was missing in iText).
import com.snowtide.PDF;
import com.snowtide.pdf.Document;
import com.snowtide.pdf.OutputTarget;
public class PDFText {
public static void main(String[] args) throws java.io.IOException {
String pdfFilePath = "xyz.pdf";
Document pdf = PDF.open(pdfFilePath);
StringBuilder text = new StringBuilder(1024);
pdf.pipe(new OutputTarget(text));
pdf.close();
System.out.println(text);
}
}
Question has been asked related to this on stackoverflow!

Categories

Resources