How do I crop each page in pdf using PDFBOX in Java?

How do I crop each page in pdf using PDFBOX in Java? - java

I want to remove the bottom part of each page in the PDF, but not change page size, what is the recommended way to do this in java in PDFBOX? How to remove the footer from each page in PDF?
Is there possibly a way to use PDRectangle to just delete all text/images within it?
snippet of what I tried, using rectangle with setCropBox seems to lose page size, maybe cropBox is not intended for this?
PDRectangle rectangle = new PDRectangle();
rectangle.setUpperRightY(mypage.findCropBox().getUpperRightY());
rectangle.setLowerLeftY(50);
rectangle.setUpperRightX(mypage.findCropBox().getUpperRightX());
rectangle.setLowerLeftX(mypage.findCropBox().getLowerLeftX());
mypage.setCropBox(rectangle);
croppedDoc.addPage(mypage);
croppedDoc.save(filename);
croppedDoc.close();
Closest example in pdfbox cookbook examples I could find is on how to remove entire page, however this is not what I'm looking for, I'd like to just delete few elements from the page:
http://pdfbox.apache.org/userguide/cookbook.html

I'm also a newbie, but take a look at this page, in particular, the description of TrimBox. If there's no TrimBox on the page, it defaults to CropBox, which would cause what you're seeing.
In general, don't expect the PDFBox docs to tell you much of anything about PDF itself - to use PDFBox well I think you need to go elsewhere - AFAIK, mostly just to the PDF specification. I haven't even skimmed it yet, though!

The CropBox is the way to go if you want to remove a portion of a page while keeping a rectangular region visible. If you want the page size to remain the same, you need the MediaBox to remain the same.
From the PDF Spec:
CropBox - rectangle (Optional; inheritable) A rectangle, expressed in default user space units, defining the visible region of default
user space. When the page is displayed or printed, its contents are to
be clipped (cropped) to this rectangle and then imposed on the output
medium in some implementation-defined manner (see Section 10.10.1,
“Page Boundaries”). Default value: the value of MediaBox.
MediaBox - rectangle (Required; inheritable) A rectangle (see Section 3.8.4, “Rectangles”), expressed in default user space units,
defining the boundaries of the physical medium on which the page is
intended to be displayed or printed (see Section 10.10.1, “Page
Boundaries”).
A have seen (faulty) applications and libraries that force the CropBox and the MediaBox to be the same, double check that this is not what is happening on your case.
Also take into account that the coordinates origin (0,0) in PDF is the bottom-left corner, some libraries do the translation to top-left for you, some others not, you may also want to double check this on the library you are using.

Related

Get the Extreme right , left,top,bottom position of an image - Itext

I am setting a margin for a pdf and checking if the contents of the page are exceeding the margin.
I am easily able to do that if the contents of a page are just text.
Here s what I am doing:
I am using TextMarginFinder. I will set the left margin values of the pdf based on the book size. and check with the finder.getLlx(); since finder.getLlx(); will get me the left most position of a text in that page.
TextMarginFinder finder;
if(leftmar>=finder.getLlx())
{
errormargin=1; //left margin error
System.out.println("Page: "+i+"Margin Error:LeftMArginError ");
}
But this does not work in case if the page contains an image. Although the image goes outside of the margin, I am not getting the error with the above code since the finder.getLlx(); function seems to work only for texts.
Two Questions:
1) While looping through the pages in pdf, if there is an image in that page, how can I check if that particular page contains an image?
2) If it contains an image, how can I obtain its extreme positions?
Update after mkl suggestion
if(leftmar>=finder.getLlx())
{
errormargin=1; //left margin error
System.out.println("finder.getLlx() value ="+finder.getLlx()+", leftmar Value="+leftmar);
}
if(rightmar<= finder.getUrx()){
errormargin=1; //right margin error
System.out.println("finder.getUrx() value ="+finder.getUrx()+", rightmar Value="+rightmar);
}
if(margintop >= finder.getUry()){
errormargin=3; //top margin error
System.out.println("finder.getUry() value ="+finder.getUry()+", margintop Value="+margintop);
}
if(marginbottom >= finder.getLly()){
errormargin=3; //bottom margin error
System.out.println("finder.getLly() value ="+finder.getLly()+", marginbottom Value="+marginbottom);
}

This is more an answer to what the OP actually wanted, a way to retrieve the bounding box of all content on a page.
The OP already uses the iText TextMarginFinder render listener class to determine the bounding box of the text on page. In the context of this answer an analogous class MarginFinder has been developed which does not only consider text but also other kind of content, e.g. bitmap images and vector graphics.
Thus, replacing the use of TextMarginFinder by MarginFinder allows to find the bounding box of any content on the page.
Please be aware:
Any content is considered, the margin finder does not check whether the content makes a difference. E.g. think about white text, white bitmap areas, or white rectangles, all are considered content and, therefore, the bounding box encompasses such invisible content, too. Especially the latter example, white rectangles, might be a problem here or there as some software first paints a white rectangle over the whole page area.
Clipping paths are not considered. Thus, even content that never is drawn (because it is clipped away) makes the bounding box expand.
Page borders are not considered, either. Thus, off-page content like printer marks may make the bounding box expand even more.
The code calculating the bounding box for vector graphics is not correct: it simply returns the bounding box of all control points which in case of Bezier curves may be false. Its ignoring line widths and wedge types also results in somewhat-off coordinates.
Annotations are not considered. Thus, the resulting bounding box may be to small if annotations are expected to also be considered, e.g. for forms.
In spite of these shortcomings, the render listener usually returns correct results. If this is not enough, the class can be extended accordingly.
PS: Anyone who is interested in the original question may find answers in the MarginFinder render listener class and its use.

iTextPDF hyperlink not linking to the right place

I have a bunch of PDF's which I have merged by this point in the code. At the beginning of the merged PDF I have a contents page which links to said PDF's respectively. These pdfAction.gotoLocalPage links sometimes don't work correctly and instead jump to between the bottom of the page and the next, however the PDF bookmark hyperlinks always work fine.
The code for the bookmark:
int pageToLinkTo=prevSectionPageCount+sectionPageCount+numberOfIndexPages+currentIndexPage+1;
document.put("Title", documentName);
document.put("Action", "GoTo");
document.put("Page",String.format("%d Fit", pageToLinkTo));
The code for the contents page link:
PdfAction action = PdfAction.gotoLocalPage(pageToLinkTo, new PdfDestination(PdfDestination.FIT,-1,-1,0), stamper.getWriter());
chunk.setAction(action);
Both of these evaluate to the same page. Could there be something wrong with the source PDF files? The only notable difference between the links which do work and the links that jump to the wrong place, is that the source PDF's have a slightly different page size (0.1 of an inch different).
Any help would be appreciated!
Thanks

I see that you create your destination like this:
new PdfDestination(PdfDestination.FIT,-1,-1,0)
This is a strange way to create a destination so that the page is displayed to fit the viewer window. Please take a look at The ABC of PDF with iText. The book isn't finished yet, but it's free and in table 3.7, you can see which destinations take how many parameters.
If you want the page to fit the viewer window, you don't need any extra parameters:
new PdfDestination(PdfDestination.FIT)
There is a destination that takes three extra parameters:
new PdfDestination(PdfDestination.XYZ, x, y, z)
In this case x and y are coordinates and z is the zoom factor. I think that you are confusing the PDF viewer by adding x, y and z parameters when all you want it to fit the page in the viewer window.

How to fill out horizontal PDF forms using the PDFBox

How does one fill out a horizontal pdf form with the PDFBox library?
I access my fields and fill them using the supplied example code and it works fine. But, if the pdf page is tilted horizontally the filled out text are still left in the vertical position.
I have tried rotating the page first and then filling the form but the fields seem to be independent. I have also tried formatting the field through the various set methods defined for PDField and PDTextbox but this has no effect either.
Finally, I know that some of the rotation properties are controlled through the PDAnnotation and PDAnnotationWidget but trying to set their PDAppearanceCharacteristics has no effect on the initial text rotation. Rather, a user is required to interact with the field in order for this to take effect.
Thanking in advance,
J3lly

global positioning an image with itext

Anybody knows if there are any special coordinates in iText to global positioning an image at the bottom right of the document?
I'm not sure it exists...

First we need to know if you're talking about a Document that is created from scratch or about adding an Image to an existing document.
If you create a document from scratch, then the coordinate of the bottom-right depends on the page size you've used when creating the Document object. For instance: for an A4 page, the bottom right corner has x = 595 ; y = 0 (the measurements are done in 'user unit' which correspond with points by default). So if you want to position an image in the bottom right corner, you need to use img.setAbsolutePosition(595 - img.getScaledWidth(), 0); and then just use document.add(img); to add the image.
DISCLAIMER: if you use a page size that differs from the default, or if you define a CropBox, you'll need to adapt the coordinates accordingly.
If you want to add an image to an existing document, you need to inspect the page size, and you need to check if there's a CropBox. You need to calculate the offset of the image based on these values. Again you can use setAbsolutePosition(), but you need to add the image to a PdfContentByte object obtained using the getOverContent() or getUnderContent() method (assuming that you're using PdfStamper).

iText PDF colors are inconsistent in Acrobat

I'm generating a multipage PDF from Java using iText. Problem: the lines on my charts shift color between certain pages.
Here's a screenshot of the transition between pages:
This was taken from Adobe Reader. The lines are the correct color in OS X Preview.app.
In Reader the top is #73C352, the bottom is #35FF69. In Preview.app the line is #00FE7E.
Any thoughts on what could be causing this discrepancy? I saved the PDF from Preview.app and opened it in Adobe Reader, still has the colors off.
Here is the PDF that is having trouble. Open it in Adobe Reader and look at the transition between pages 11 & 12.
On checking this out further, it appears that the java.awt.print.PrinterJob is calling print() for each pageIndex twice. This might be a clue.

The problem with the pages with darker colors is that they include a pattern object with a transparent image. When transparency is involved, Adobe Acrobat switches automatically to a custom CMYK profile and this causes the darker colors. Only Acrobat does this, other viewers behave just fine. The solution is either to remove the pattern object with the transparent image (it seems to be a drawing artifact of the PDF generator engine, it is not used anywhere on the page) or you can make the page part of a transparency group and specify the transparency group to use RGB colorspace.

Several different possibilities, yes.
Different color matching. If you're using a "calibrated" color space on one page and a "device" color space on another, the same RGB/CMYK values can produce visually different values.
If the graph is inside a Form XObject, the same graph can appear differently depending on the current graphic state when the form is drawn.
If you could post a link to your PDF, I could probably give you a specific answer.
Ouch. That PDF is painful to shclep through. I'd like to have some words with whoever wrote their PDF converter. Harsh ones. Lots of unnecessary clipping ("text" is being clipped hither and yon, page 7 for example), poor use of patters for images, but not using patters when it would actually help, drawing text as paths, and on and on...
EDIT: Which is precisely the sort of stuff you see when rendering Java UI via a PdfGraphics2D object. You CAN keep the text as text though. It's just a matter of how you create the PdfGraphics2D instance.
Okay, so the color of the line itself is identical. 0 1 0.4 RG. HOWEVER, there is some "transparency stuff" going on.
On pages that have images with soft masks or extended graphic states that change the transparency, the green line appears darker. On pages without, it appears brighter.
I suspect that all those other PDF viewers that draw the lines consistently don't support transparency at all, or only poorly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I crop each page in pdf using PDFBOX in Java? - java

Related

Get the Extreme right , left,top,bottom position of an image - Itext

iTextPDF hyperlink not linking to the right place

How to fill out horizontal PDF forms using the PDFBox

global positioning an image with itext

iText PDF colors are inconsistent in Acrobat

Categories

Resources