Java PDFBOX writing to pages beyond first page

Java PDFBOX writing to pages beyond first page - java

I am writing a Java program that reads in a CSV file and then writes out the information in a certain format into a PDF. I am using Java 12.0.2+10 and PDFBOX-app-2.0.22 in Apache NetBeans 12.0. After much study I've learned the basics and have my program working well for 1 paged documents. The problem comes if my content exceeds the space on page 1. Then I need to create a new page, add it to the document, and then create a new content stream for page2. The name of the content stream must be different than it was in page 1 as that variable name has already been used. This requires me to duplicate, but with the new content stream name, the code used for page 1 that formats & displays the text, lines, etc.. If my content exceeds 2 pages, I have to create a third page and again duplicate all the code that formats and shows the text. This makes for very large and difficult to maintain code. Is not there is a way to make the content stream name a variable so that I can have one code block to write to the different pages in my document?
PDDocument doc = new PDDocument(); // creating instance of PDFdoc
PDPage page1 = new PDPage(PDRectangle.LETTER);
doc.addPage(page1);
PDPageContentStream content = new PDPageContentStream(doc,page1);
content.beginText();
for(int I = 0; I < lineCount; i++){
// code to format text
content.setFont(fontBold,fontSize10);
content.newLineAtOffset(leftMargin, pageHeight - marginTop - page1HeaderHeight - (pageLineCount * lineSpacing10));
content.showText(“text to display”); // display the text
etc...
content.endText();
content.close(); // closes content stream for page 1
}
// page 2 code block
PDPage page2 = new PDPage(PDRectangle.LETTER); // set page size to 8.5 x 11.0" = US letter
doc.addPage(page2); // adding a page in PDF file
PDPageContentStream content2 = new PDPageContentStream(doc,page2,AppendMode.APPEND, true, true);
content2.beginText();
content2.setFont(fontBold,fontSize10);
content2.newLineAtOffset(leftMargin, pageHeight - marginTop - page1HeaderHeight - (pageLineCount * lineSpacing10));
content.showText(“text to display”); // display the text
etc...
content2.endText();
content.2close(); // closes content stream for page 2

Related

Remove or Update added Image icon from pdf page using OpenPdf based on iText Core

I have added an icon as Image object into PDF page with OpenPdf that is based on iText core. Here is my code
// inout stream from file
InputStream inputStream = new FileInputStream(file);
// we create a reader for a certain document
PdfReader reader = new PdfReader(inputStream);
// we create a stamper that will copy the document to a new file
PdfStamper stamp = new PdfStamper(reader, new FileOutputStream(file));
// adding content to each page
PdfContentByte over;
// get watermark icon
Image img = Image.getInstance(PublicFunction.getByteFromDrawable(context, R.drawable.ic_chat_lawone_new));
img.setAnnotation(new Annotation(0, 0, 0, 0, "https://github.com/LibrePDF/OpenPDF"));
img.setAbsolutePosition(pointF.x, pointF.y);
img.scaleAbsolute(50, 50);
// get page file number count
int pageNumbers = reader.getNumberOfPages();
if (pageNumbers < pageIndex) {
// closing PdfStamper will generate the new PDF file
stamp.close();
throw new PDFException("page index is out of pdf file page numbers", new Throwable());
}
// annotation added into target page
over = stamp.getOverContent(pageIndex);
if (over == null) {
stamp.close();
throw new PDFException("getUnderContent is null", new Throwable());
}
over.addImage(img);
// closing PdfStamper will generate the new PDF file
stamp.close();
// close reader
reader.close();
now I need to delete or update the color of added image object on user click, I have the click function that returns MotionEvent, now I need to delete or update or replace added image object.
Any Idea?!

In your parallel OpenPDF issue 464 you posted additionally:
Here my progress
Now I can achieve the XObjects added into pdf file, and I can remove them from pdf page this way:
// inout stream from file
InputStream inputStream = new FileInputStream(file);
// we create a reader for a certain document
PdfReader pdfReader = new PdfReader(inputStream);
// get page file number count
int pageNumbers = pdfReader.getNumberOfPages();
// we create a stamper that will copy the document to a new file
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileOutputStream(file));
// get page
PdfDictionary page = pdfReader.getPageN(currPage);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
// get page resources
PdfArray annots = resources.getAsArray(PdfName.ANNOTS);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
// remove Xobjects
for (PdfName key : xobjects.getKeys()) {
xobjects.remove(key);
}
// remove annots
for (PdfObject element : annots.getElements()) {
annots.remove(0);
}
// close pdf stamper
pdfStamper.close();
// close pdf reader
pdfReader.close();
So the XObjects will remove from the screen, but still there is a problem!!!
When I remove them and try to add a new one, last deleted object appears and add into the pdf page! REALLY!!! :))
I think there should be another place that these objects should be removed from.
What happens here is that you indeed do remove the bitmap image resources from the page:
for (PdfName key : xobjects.getKeys()) {
xobjects.remove(key);
}
but you don't remove the instructions for drawing these resources from the content stream. This has two consequences:
Your PDF strictly speaking becomes invalid as a resource is referenced from the content stream which is not defined in the page resources. Depending on the viewer in question this might result in warning messages.
If the PDF is further processed and some new XObject is added to the same page with the same resource name, the original image drawing instruction now again has a resource to draw and makes it appear at the original position once more.
This explains your observation:
When I remove them and try to add a new one, last deleted object appears and add into the pdf page! REALLY!!! :))
I assume you used the same source image in your test, so it looked like the original image appeared again at the original position when actually the new image appeared there.
Thus, instead of merely removing the image XObject, you have the choice of either
also removing the XObject drawing instruction from the content stream or
replacing the image XObject by an invisible XObject instead.
The former option in general is non-trivial, in particular if your PDF-tool-to-be also allows other changes of the page content. In case of iText 5 or iText 7 I'd propose using the PdfContentStreamEditor (see here) / PdfCanvasEditor (see here) class to find and remove Do operations from the page content streams but I have no OpenPDF version of that class yet.
What you can do quite easily, though, is replacing the image resources by form XObjects without any content:
PdfTemplate pdfTemplate = PdfTemplate.createTemplate(pdfStamper.getWriter(), 1, 1);
// replace Xobjects
for (PdfName key : xobjects.getKeys()) {
xobjects.put(key, pdfTemplate.getIndirectReference());
}
(RemoveImage test testRemoveImageAddedLikeHamidReza)
Beware, replacing all XObjects by empty XObjects has an obvious disadvantage, it replaces all XObjects, not merely the ones your tool created before! Thus, if the original PDFs processed by your tool also drew XObjects in their immediate content streams, those XObjects also are rendered invisible. If you don't want that, you need some specific criteria to recognize the image XObjects you added and only replace them.
Furthermore, there are other problems afoot: Each time you process the OverContent of a page in a PdfStamper, the pre-existing content of that page is wrapped into a q / Q (save-graphics-state / restore-graphics-state) envelope to prevent changes of the graphics state in that previous content bleed through and mix up your OverContent additions. Thus, if you manipulate a file many times in your tool, the original page content may be wrapped in a fairly deep nesting of such envelopes. Unfortunately PDF readers may support only a limited nesting depth, e.g. ISO 32000-1 mentions a maximum depth of 28 envelopes.
Thus, if you still have the chance to overhaul your design, you should consider putting the images into annotation appearances instead of into the page content. After all, you already do generate annotations, currently merely to transport a link, so you could also generate annotations with appearances.

How to define pdf size while splitting multi page pdf into single pdf [duplicate]

Here is how I split a large PDF (144 mb):
public int SplitAndSave(string inputPath, string outputPath)
{
FileInfo file = new FileInfo(inputPath);
string name = file.Name.Substring(0, file.Name.LastIndexOf("."));
using (PdfReader reader = new PdfReader(inputPath))
{
for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber++)
{
string filename = pagenumber.ToString() + ".pdf";
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + filename, FileMode.Create));
document.Open();
copy.AddPage(copy.GetImportedPage(reader, pagenumber));
document.Close();
}
return reader.NumberOfPages;
}
}
For most PDFs (small size, and I guess old format), all works fine. But for a bigger one (that perhaps are using something like refstreams for better compression), the split pages open as one page, but its size is equal to the original PDF's size. What can I do?

In case of your document Top_Gear_Magazine_2012_09.pdf the reason is indeed the one I mentioned: All pages refer to object 2 0 R as their /Resources, and the dictionary in 2 0 obj in turn references all images in the PDF.
To split that document into partial documents containing only the images required, you should preprocess the document by first finding out which images belong to which pages and then creating individual /Resources dictionaries for all pages.
As you already use iText in this context, you can also use it to find out which images are required for which pages. Use the iText parser package to initially parse the PDF page by page using a RenderListener implementation whose RenderImage method simply remembers which image objects are used on the current page. (As a special twist, iText hides the name of the image XObject in question; you get the indirect object, though, and can query its object and generation number which suffices for the next step.)
In a second step, you open the document in a PdfStamperand iterate over the pages. For each page you retrieve the /Resources dictionary and copy it, but only copy those XObjects references referencing one of the image objects whose object number and generation you remembered for the respective page during the first step. Finally you set the diminished copy as the /Resources dictionary of the page in question.
The resulting PDF should split just fine.
PS A very similar issue recently came up on the iText mailing list. In that thread the solution recipe given here has been improved, to get around the difficulties caused by iText hiding the xobject name, I now would propose to intervene before the name is lost by using a different ContentOperator for "Do", here the Java version:
class Do implements ContentOperator
{
public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) throws IOException
{
PdfName xobjectName = (PdfName)operands.get(0);
names.add(xobjectName);
}
final List<PdfName> names = new ArrayList<PdfName>();
}
This content operator simply collects the names of the used xobjects, i.e. the xobject resources to keep for the given page.

Create a pdf with dimensions 1700pixels*2200pixels in java using pdfBox

I have an application which opens a pdf file with dimensions 1700pixels*2200pixels. I will get dimensions of a rectangle drawn over a pdf from it.
When I am trying to create the same rectangle on a pdf,
I am using PdfBox which creates a pdf page with dimensions.
System.out.println(page.getMediaBox().getHeight());
System.out.println(page.getMediaBox().getWidth());
results in :
612
792
How to convert the pdf coordinates from 1700*2200 to 612*792?

Your output
612 792
of
System.out.println(page.getMediaBox().getHeight()); System.out.println(page.getMediaBox().getWidth());
seems to indicate that you create that PDPage using the default constructor, i.e. using new PDPage() as that constructor sets the page size to the US Letter page format.
If you want pages in a different format, you should use the constructor PDPage(PDRectangle), e.g.:
PDRectangle rec = new PDRectangle(1700, 2200);
PDDocument document = new PDDocument();
PDPage page = new PDPage(rec);
document.addPage(page);
This creates a PDF with a page whose size is 1700x2200 user space units, i.e. about 23.6"x30.6".
BTW, you talk about a pdf file in the dimensions 1700pixels*2200pixels - PDFs don't know the unit 'pixel'. They know the default user space unit which defaults to 1/72" and, therefore, more or less corresponds to the unit point. This especially does not imply a resolution.

PDFBox LayerUtility - Importing layers into existing PDF

I am using pdfbox to manipulate PDF content. I have a big PDF file (say 500 pages). I also have a few other single page PDF files containing only a single image which are around 8-15kb per file at the max. What I need to do is to import these single page pdf's like an overlay onto certain pages of the big PDF file.
I have tried the LayerUtility of pdfbox where I've succeeded but it creates a very large sized file as the output. The source pdf is about 1MB before processing and when added with the smaller pdf files, the size goes upto 64MB. And sometimes I need to include two smaller PDF's onto the bigger one.
Is there a better way to do this or am I just doing this wrong? Posting code below trying to add two layers onto a single page:
...
...
..
overlayDoc[pCounter] = PDDocument.load("data\\" + overlay + ".pdf");
outputPage[pCounter] = (PDPage) overlayDoc[pCounter].getDocumentCatalog().getAllPages().get(0);
LayerUtility lu = new LayerUtility( overlayDoc[pCounter] );
form[pCounter] = lu.importPageAsForm( bigPDFDoc, Integer.parseInt(pageNo)-1);
lu.appendFormAsLayer( outputPage[pCounter], form[pCounter], aTrans, "OVERLAY_"+pCounter );
outputDoc.addPage(outputPage[pCounter]);
mOverlayDoc[pCounter] = PDDocument.load("data\\" + overlay2 + ".pdf");
mOutputPage[pCounter] = (PDPage) mOverlayDoc[pCounter].getDocumentCatalog().getAllPages().get(0);
LayerUtility lu2 = new LayerUtility( mOverlayDoc[pCounter] );
mForm[pCounter] = lu2.importPageAsForm(outputDoc, outputDoc.getNumberOfPages()-1);
lu.appendFormAsLayer( mOutputPage[pCounter], mForm[pCounter], aTrans, "OVERLAY_2"+pCounter );
outputDoc.removePage(outputPage[pCounter]);
outputDoc.addPage(mOutputPage[pCounter]);
...
...

With code like the following I don't see any unepected growth of size:
PDDocument bigDocument = PDDocument.load(BIG_SOURCE_FILE);
LayerUtility layerUtility = new LayerUtility(bigDocument);
List bigPages = bigDocument.getDocumentCatalog().getAllPages();
// import each page to superimpose only once
PDDocument firstSuperDocument = PDDocument.load(FIRST_SUPER_FILE);
PDXObjectForm firstForm = layerUtility.importPageAsForm(firstSuperDocument, 0);
PDDocument secondSuperDocument = PDDocument.load(SECOND_SUPER_FILE);
PDXObjectForm secondForm = layerUtility.importPageAsForm(secondSuperDocument, 0);
// These things can easily be done in a loop, too
AffineTransform affineTransform = new AffineTransform(); // Identity... your requirements may differ
layerUtility.appendFormAsLayer((PDPage) bigPages.get(0), firstForm, affineTransform, "Superimposed0");
layerUtility.appendFormAsLayer((PDPage) bigPages.get(1), secondForm, affineTransform, "Superimposed1");
layerUtility.appendFormAsLayer((PDPage) bigPages.get(2), firstForm, affineTransform, "Superimposed2");
bigDocument.save(BIG_TARGET_FILE);
As you see I superimposed the first page of FIRST_SUPER_FILE on two pages of the target file but I only imported the page once. Thus, also the resources of that imported page are imported only once.
This approach is open for loops, too, but don't import the same page multiple times! Instead import all required template pages once up front as forms and in the later loop reference those forms again and again.
(I hope this solves your issue. If not, supply more code and the sample PDFs to reproduce your issue.)

Java - apache PDFBox two A3 papers to one A2?

I've got a printer which doesn't support the feature I need.
The printer prints A2 paper size. I would like to print two A3 size pages, that would fit on a single A2 paper, but my printer doesn't support this.
I already called the support of the company, but they told me I need to buy a newer one because my printer doesn't support this function. (Its very funny because an even older version of that printer does support this function).
So I tried to use the Apache PDFBox, where I can load my pdf file like this:
File pdfFile = new File(path);
PDDocument pdfDocument = load(pdfFile);
The file I loaded is size A3. I think it would be enough if I could get a new PDDocument with A2 paper size. Then put my loaded pdfFile twice in an A2 paper.
All in all, I need the file I loaded there two times on one page. I just don't know how to do that.
Best regards.

You might want to look at the PageCombinationSample.java which according to its JavaDoc does nearly what you need:
This sample demonstrates how to combine multiple pages into single bigger pages (for example
two A4 modules into one A3 module) using form XObjects [PDF:1.6:4.9].
Form XObjects are a convenient way to represent contents multiple times on multiple pages as
templates.
The central code:
// 1. Opening the source PDF file...
File sourceFile = new File(filePath);
// 2. Instantiate the target PDF file!
File file = new File();
// 3. Source page combination into target file.
Document document = file.getDocument();
Pages pages = document.getPages();
int pageIndex = -1;
PrimitiveComposer composer = null;
Dimension2D targetPageSize = PageFormat.getSize(SizeEnum.A4);
for(Page sourcePage : sourceFile.getDocument().getPages())
{
pageIndex++;
int pageMod = pageIndex % 2;
if(pageMod == 0)
{
if(composer != null)
{composer.flush();}
// Add a page to the target document!
Page page = new Page(
document,
PageFormat.getSize(SizeEnum.A3, OrientationEnum.Landscape)
); // Instantiates the page inside the document context.
pages.add(page); // Puts the page in the pages collection.
// Create a composer for the target content stream!
composer = new PrimitiveComposer(page);
}
// Add the form to the target page!
composer.showXObject(
sourcePage.toXObject(document), // Converts the source page into a form inside the target document.
new Point2D.Double(targetPageSize.getWidth() * pageMod, 0),
targetPageSize,
XAlignmentEnum.Left,
YAlignmentEnum.Top,
0
);
}
composer.flush();
// 4. Serialize the PDF file!
serialize(file, "Page combination", "combining multiple pages into single bigger ones", "page combination");
// 5. Closing the PDF file...
sourceFile.close();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java PDFBOX writing to pages beyond first page - java

Related

Remove or Update added Image icon from pdf page using OpenPdf based on iText Core

How to define pdf size while splitting multi page pdf into single pdf [duplicate]

Create a pdf with dimensions 1700pixels*2200pixels in java using pdfBox

PDFBox LayerUtility - Importing layers into existing PDF

Java - apache PDFBox two A3 papers to one A2?

Categories

Resources