Java - apache PDFBox two A3 papers to one A2? - java

I've got a printer which doesn't support the feature I need.
The printer prints A2 paper size. I would like to print two A3 size pages, that would fit on a single A2 paper, but my printer doesn't support this.
I already called the support of the company, but they told me I need to buy a newer one because my printer doesn't support this function. (Its very funny because an even older version of that printer does support this function).
So I tried to use the Apache PDFBox, where I can load my pdf file like this:
File pdfFile = new File(path);
PDDocument pdfDocument = load(pdfFile);
The file I loaded is size A3. I think it would be enough if I could get a new PDDocument with A2 paper size. Then put my loaded pdfFile twice in an A2 paper.
All in all, I need the file I loaded there two times on one page. I just don't know how to do that.
Best regards.

You might want to look at the PageCombinationSample.java which according to its JavaDoc does nearly what you need:
This sample demonstrates how to combine multiple pages into single bigger pages (for example
two A4 modules into one A3 module) using form XObjects [PDF:1.6:4.9].
Form XObjects are a convenient way to represent contents multiple times on multiple pages as
templates.
The central code:
// 1. Opening the source PDF file...
File sourceFile = new File(filePath);
// 2. Instantiate the target PDF file!
File file = new File();
// 3. Source page combination into target file.
Document document = file.getDocument();
Pages pages = document.getPages();
int pageIndex = -1;
PrimitiveComposer composer = null;
Dimension2D targetPageSize = PageFormat.getSize(SizeEnum.A4);
for(Page sourcePage : sourceFile.getDocument().getPages())
{
pageIndex++;
int pageMod = pageIndex % 2;
if(pageMod == 0)
{
if(composer != null)
{composer.flush();}
// Add a page to the target document!
Page page = new Page(
document,
PageFormat.getSize(SizeEnum.A3, OrientationEnum.Landscape)
); // Instantiates the page inside the document context.
pages.add(page); // Puts the page in the pages collection.
// Create a composer for the target content stream!
composer = new PrimitiveComposer(page);
}
// Add the form to the target page!
composer.showXObject(
sourcePage.toXObject(document), // Converts the source page into a form inside the target document.
new Point2D.Double(targetPageSize.getWidth() * pageMod, 0),
targetPageSize,
XAlignmentEnum.Left,
YAlignmentEnum.Top,
0
);
}
composer.flush();
// 4. Serialize the PDF file!
serialize(file, "Page combination", "combining multiple pages into single bigger ones", "page combination");
// 5. Closing the PDF file...
sourceFile.close();

Related

Java PDFBOX writing to pages beyond first page

I am writing a Java program that reads in a CSV file and then writes out the information in a certain format into a PDF. I am using Java 12.0.2+10 and PDFBOX-app-2.0.22 in Apache NetBeans 12.0. After much study I've learned the basics and have my program working well for 1 paged documents. The problem comes if my content exceeds the space on page 1. Then I need to create a new page, add it to the document, and then create a new content stream for page2. The name of the content stream must be different than it was in page 1 as that variable name has already been used. This requires me to duplicate, but with the new content stream name, the code used for page 1 that formats & displays the text, lines, etc.. If my content exceeds 2 pages, I have to create a third page and again duplicate all the code that formats and shows the text. This makes for very large and difficult to maintain code. Is not there is a way to make the content stream name a variable so that I can have one code block to write to the different pages in my document?
PDDocument doc = new PDDocument(); // creating instance of PDFdoc
PDPage page1 = new PDPage(PDRectangle.LETTER);
doc.addPage(page1);
PDPageContentStream content = new PDPageContentStream(doc,page1);
content.beginText();
for(int I = 0; I < lineCount; i++){
// code to format text
content.setFont(fontBold,fontSize10);
content.newLineAtOffset(leftMargin, pageHeight - marginTop - page1HeaderHeight - (pageLineCount * lineSpacing10));
content.showText(“text to display”); // display the text
etc...
content.endText();
content.close(); // closes content stream for page 1
}
// page 2 code block
PDPage page2 = new PDPage(PDRectangle.LETTER); // set page size to 8.5 x 11.0" = US letter
doc.addPage(page2); // adding a page in PDF file
PDPageContentStream content2 = new PDPageContentStream(doc,page2,AppendMode.APPEND, true, true);
content2.beginText();
content2.setFont(fontBold,fontSize10);
content2.newLineAtOffset(leftMargin, pageHeight - marginTop - page1HeaderHeight - (pageLineCount * lineSpacing10));
content.showText(“text to display”); // display the text
etc...
content2.endText();
content.2close(); // closes content stream for page 2

Remove or Update added Image icon from pdf page using OpenPdf based on iText Core

I have added an icon as Image object into PDF page with OpenPdf that is based on iText core. Here is my code
// inout stream from file
InputStream inputStream = new FileInputStream(file);
// we create a reader for a certain document
PdfReader reader = new PdfReader(inputStream);
// we create a stamper that will copy the document to a new file
PdfStamper stamp = new PdfStamper(reader, new FileOutputStream(file));
// adding content to each page
PdfContentByte over;
// get watermark icon
Image img = Image.getInstance(PublicFunction.getByteFromDrawable(context, R.drawable.ic_chat_lawone_new));
img.setAnnotation(new Annotation(0, 0, 0, 0, "https://github.com/LibrePDF/OpenPDF"));
img.setAbsolutePosition(pointF.x, pointF.y);
img.scaleAbsolute(50, 50);
// get page file number count
int pageNumbers = reader.getNumberOfPages();
if (pageNumbers < pageIndex) {
// closing PdfStamper will generate the new PDF file
stamp.close();
throw new PDFException("page index is out of pdf file page numbers", new Throwable());
}
// annotation added into target page
over = stamp.getOverContent(pageIndex);
if (over == null) {
stamp.close();
throw new PDFException("getUnderContent is null", new Throwable());
}
over.addImage(img);
// closing PdfStamper will generate the new PDF file
stamp.close();
// close reader
reader.close();
now I need to delete or update the color of added image object on user click, I have the click function that returns MotionEvent, now I need to delete or update or replace added image object.
Any Idea?!
In your parallel OpenPDF issue 464 you posted additionally:
Here my progress
Now I can achieve the XObjects added into pdf file, and I can remove them from pdf page this way:
// inout stream from file
InputStream inputStream = new FileInputStream(file);
// we create a reader for a certain document
PdfReader pdfReader = new PdfReader(inputStream);
// get page file number count
int pageNumbers = pdfReader.getNumberOfPages();
// we create a stamper that will copy the document to a new file
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileOutputStream(file));
// get page
PdfDictionary page = pdfReader.getPageN(currPage);
PdfDictionary resources = page.getAsDict(PdfName.RESOURCES);
// get page resources
PdfArray annots = resources.getAsArray(PdfName.ANNOTS);
PdfDictionary xobjects = resources.getAsDict(PdfName.XOBJECT);
// remove Xobjects
for (PdfName key : xobjects.getKeys()) {
xobjects.remove(key);
}
// remove annots
for (PdfObject element : annots.getElements()) {
annots.remove(0);
}
// close pdf stamper
pdfStamper.close();
// close pdf reader
pdfReader.close();
So the XObjects will remove from the screen, but still there is a problem!!!
When I remove them and try to add a new one, last deleted object appears and add into the pdf page! REALLY!!! :))
I think there should be another place that these objects should be removed from.
What happens here is that you indeed do remove the bitmap image resources from the page:
for (PdfName key : xobjects.getKeys()) {
xobjects.remove(key);
}
but you don't remove the instructions for drawing these resources from the content stream. This has two consequences:
Your PDF strictly speaking becomes invalid as a resource is referenced from the content stream which is not defined in the page resources. Depending on the viewer in question this might result in warning messages.
If the PDF is further processed and some new XObject is added to the same page with the same resource name, the original image drawing instruction now again has a resource to draw and makes it appear at the original position once more.
This explains your observation:
When I remove them and try to add a new one, last deleted object appears and add into the pdf page! REALLY!!! :))
I assume you used the same source image in your test, so it looked like the original image appeared again at the original position when actually the new image appeared there.
Thus, instead of merely removing the image XObject, you have the choice of either
also removing the XObject drawing instruction from the content stream or
replacing the image XObject by an invisible XObject instead.
The former option in general is non-trivial, in particular if your PDF-tool-to-be also allows other changes of the page content. In case of iText 5 or iText 7 I'd propose using the PdfContentStreamEditor (see here) / PdfCanvasEditor (see here) class to find and remove Do operations from the page content streams but I have no OpenPDF version of that class yet.
What you can do quite easily, though, is replacing the image resources by form XObjects without any content:
PdfTemplate pdfTemplate = PdfTemplate.createTemplate(pdfStamper.getWriter(), 1, 1);
// replace Xobjects
for (PdfName key : xobjects.getKeys()) {
xobjects.put(key, pdfTemplate.getIndirectReference());
}
(RemoveImage test testRemoveImageAddedLikeHamidReza)
Beware, replacing all XObjects by empty XObjects has an obvious disadvantage, it replaces all XObjects, not merely the ones your tool created before! Thus, if the original PDFs processed by your tool also drew XObjects in their immediate content streams, those XObjects also are rendered invisible. If you don't want that, you need some specific criteria to recognize the image XObjects you added and only replace them.
Furthermore, there are other problems afoot: Each time you process the OverContent of a page in a PdfStamper, the pre-existing content of that page is wrapped into a q / Q (save-graphics-state / restore-graphics-state) envelope to prevent changes of the graphics state in that previous content bleed through and mix up your OverContent additions. Thus, if you manipulate a file many times in your tool, the original page content may be wrapped in a fairly deep nesting of such envelopes. Unfortunately PDF readers may support only a limited nesting depth, e.g. ISO 32000-1 mentions a maximum depth of 28 envelopes.
Thus, if you still have the chance to overhaul your design, you should consider putting the images into annotation appearances instead of into the page content. After all, you already do generate annotations, currently merely to transport a link, so you could also generate annotations with appearances.

How to define pdf size while splitting multi page pdf into single pdf [duplicate]

Here is how I split a large PDF (144 mb):
public int SplitAndSave(string inputPath, string outputPath)
{
FileInfo file = new FileInfo(inputPath);
string name = file.Name.Substring(0, file.Name.LastIndexOf("."));
using (PdfReader reader = new PdfReader(inputPath))
{
for (int pagenumber = 1; pagenumber <= reader.NumberOfPages; pagenumber++)
{
string filename = pagenumber.ToString() + ".pdf";
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + filename, FileMode.Create));
document.Open();
copy.AddPage(copy.GetImportedPage(reader, pagenumber));
document.Close();
}
return reader.NumberOfPages;
}
}
For most PDFs (small size, and I guess old format), all works fine. But for a bigger one (that perhaps are using something like refstreams for better compression), the split pages open as one page, but its size is equal to the original PDF's size. What can I do?
In case of your document Top_Gear_Magazine_2012_09.pdf the reason is indeed the one I mentioned: All pages refer to object 2 0 R as their /Resources, and the dictionary in 2 0 obj in turn references all images in the PDF.
To split that document into partial documents containing only the images required, you should preprocess the document by first finding out which images belong to which pages and then creating individual /Resources dictionaries for all pages.
As you already use iText in this context, you can also use it to find out which images are required for which pages. Use the iText parser package to initially parse the PDF page by page using a RenderListener implementation whose RenderImage method simply remembers which image objects are used on the current page. (As a special twist, iText hides the name of the image XObject in question; you get the indirect object, though, and can query its object and generation number which suffices for the next step.)
In a second step, you open the document in a PdfStamperand iterate over the pages. For each page you retrieve the /Resources dictionary and copy it, but only copy those XObjects references referencing one of the image objects whose object number and generation you remembered for the respective page during the first step. Finally you set the diminished copy as the /Resources dictionary of the page in question.
The resulting PDF should split just fine.
PS A very similar issue recently came up on the iText mailing list. In that thread the solution recipe given here has been improved, to get around the difficulties caused by iText hiding the xobject name, I now would propose to intervene before the name is lost by using a different ContentOperator for "Do", here the Java version:
class Do implements ContentOperator
{
public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) throws IOException
{
PdfName xobjectName = (PdfName)operands.get(0);
names.add(xobjectName);
}
final List<PdfName> names = new ArrayList<PdfName>();
}
This content operator simply collects the names of the used xobjects, i.e. the xobject resources to keep for the given page.

Add named destinations to an existing PDF document with iText

I have a PDF previously created with FOP, and I need to add some named destinations to it so later another program can open and navigate the document with the Adobe PDF open parameters, namely the #namedest=destination_name parameter.
I don't need to add bookmarks or other dynamic content but just some destinations with a name and thus injecting a /Dests collection with names defined in the resulting PDF.
I use iText 5.3.0 and I read the chapter 7 of iText in Action (2nd edition), but still I cannot figure it out how to add the destinations and so use them with #nameddest in a browser.
I'm reading and manipulating the document with PdfReader and PdfStamper. I already know in advance where to put every destination after having parsed the document with a customized Listener and a PdfContentStreamProcessor, searching for a specific text marker on each page.
This is a shortened version of my code:
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new BufferedOutputStream(dest));
// search text markers for destinations, page by page
for (int i=1; i<reader.getNumberOfPages(); i++) {
// get a list of markers for this page, as obtained with a custom Listener and a PdfContentStreamProcessor
List<MyDestMarker> markers = ((MyListener)listener).getMarkersForThisPage();
// add a destination for every text marker in the current page
Iterator<MyDestMarker> it = markers.iterator();
while(it.hasNext()) {
MyDestMarker marker = it.next();
String name = marker.getName();
String x = marker.getX();
String y = marker.getY();
// create a new destination
PdfDestination dest = new PdfDestination(PdfDestination.FITH, y); // or XYZ
// add as a named destination -> does not work, only for new documents?
stamper.getWriter().addNamedDestination(name, i /* current page */, dest);
// alternatives
PdfContentByte content = stamper.getOverContent(i);
content.localDestination(name, dest); // doesn't work either -> no named dest found
// add dest name to a list for later use with Pdf Open Parameters
destinations.add(name);
}
}
stamper.close();
reader.close();
I also tried creating a PdfAnnotation with PdfFormField.createLink() but still, I just manage to get the annotation but with no named destination defined it does not work.
Any solution for this? Do I need to add some "ghost" content over the existing one with Chunks or something else?
Thanks in advance.
edit 01-27-2016:
I recently found an answer to my question in the examples section of iText website, here.
Unfortunately the example provided does not work for me if I test it with a pdf without destinations previously defined in it, as it is the case with the source primes.pdf which already contains a /Dests array. This behaviour appears to be consistent with the iText code, since the writer loads the destinations in a map attribute of PdfDocument which is not "inherited" by the stamper on closing.
That said, I got it working using the method addNamedDestination() of PdfStamper added with version 5.5.7; this method loads a named destination in a local map attribute of the class which is later processed and consolidated in the document when closing the stamper.
This approach reaised a new issue though: the navigation with Pdf Open Parameters (#, #nameddest=) works fine with IE but not with Chrome v47 (and probably Firefox, too). I tracked the problem down to the order in which the dests names are defined and referenced inside the document; the stamper uses a HashMap as the container for the destinations, which of course does not guarantee the order of its objects and for whatever reason Chrome refuse to recognise destinations not listed in "natural" order. So, the only way I got it to work is replacing the namedDestinations HashMap with a natural-ordered TreeMap.
Hope this help others with the same issue.
I 've been in the same need for my project previously. Had to display and navigate pdf document with acrobat.jar viewer. To navigate i needed the named destinations in the pdf. I have looked around the web for a possible solution, but no fortunate for me. Then I this idea strikes my mind.
I tried to recreate the existing pdf with itext, navigating through each page and adding localdestinations to each page and i got what I wanted. below is the snip of my code
OutputStream outputStream = new FileOutputStream(new File(filename));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
PdfOutline pol = cb.getRootOutline();
PdfOutline oline1 = null;
InputStream in1 = new FileInputStream(new File(inf1));
PdfReader reader = new PdfReader(in1);
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
document.newPage();
document.setMargins(0.0F, 18.0F, 18.0F, 18.0F);
PdfImportedPage page = writer.getImportedPage(reader, i);
document.add(new Chunk(new Integer(i).toString()).setLocalDestination(new Integer(i).toString()));
System.out.println(i);
cb.addTemplate(page, 0.0F, 0.0F);
}
outputStream.flush();
document.close();
outputStream.close();
Thought it would help you.

PDFBox LayerUtility - Importing layers into existing PDF

I am using pdfbox to manipulate PDF content. I have a big PDF file (say 500 pages). I also have a few other single page PDF files containing only a single image which are around 8-15kb per file at the max. What I need to do is to import these single page pdf's like an overlay onto certain pages of the big PDF file.
I have tried the LayerUtility of pdfbox where I've succeeded but it creates a very large sized file as the output. The source pdf is about 1MB before processing and when added with the smaller pdf files, the size goes upto 64MB. And sometimes I need to include two smaller PDF's onto the bigger one.
Is there a better way to do this or am I just doing this wrong? Posting code below trying to add two layers onto a single page:
...
...
..
overlayDoc[pCounter] = PDDocument.load("data\\" + overlay + ".pdf");
outputPage[pCounter] = (PDPage) overlayDoc[pCounter].getDocumentCatalog().getAllPages().get(0);
LayerUtility lu = new LayerUtility( overlayDoc[pCounter] );
form[pCounter] = lu.importPageAsForm( bigPDFDoc, Integer.parseInt(pageNo)-1);
lu.appendFormAsLayer( outputPage[pCounter], form[pCounter], aTrans, "OVERLAY_"+pCounter );
outputDoc.addPage(outputPage[pCounter]);
mOverlayDoc[pCounter] = PDDocument.load("data\\" + overlay2 + ".pdf");
mOutputPage[pCounter] = (PDPage) mOverlayDoc[pCounter].getDocumentCatalog().getAllPages().get(0);
LayerUtility lu2 = new LayerUtility( mOverlayDoc[pCounter] );
mForm[pCounter] = lu2.importPageAsForm(outputDoc, outputDoc.getNumberOfPages()-1);
lu.appendFormAsLayer( mOutputPage[pCounter], mForm[pCounter], aTrans, "OVERLAY_2"+pCounter );
outputDoc.removePage(outputPage[pCounter]);
outputDoc.addPage(mOutputPage[pCounter]);
...
...
With code like the following I don't see any unepected growth of size:
PDDocument bigDocument = PDDocument.load(BIG_SOURCE_FILE);
LayerUtility layerUtility = new LayerUtility(bigDocument);
List bigPages = bigDocument.getDocumentCatalog().getAllPages();
// import each page to superimpose only once
PDDocument firstSuperDocument = PDDocument.load(FIRST_SUPER_FILE);
PDXObjectForm firstForm = layerUtility.importPageAsForm(firstSuperDocument, 0);
PDDocument secondSuperDocument = PDDocument.load(SECOND_SUPER_FILE);
PDXObjectForm secondForm = layerUtility.importPageAsForm(secondSuperDocument, 0);
// These things can easily be done in a loop, too
AffineTransform affineTransform = new AffineTransform(); // Identity... your requirements may differ
layerUtility.appendFormAsLayer((PDPage) bigPages.get(0), firstForm, affineTransform, "Superimposed0");
layerUtility.appendFormAsLayer((PDPage) bigPages.get(1), secondForm, affineTransform, "Superimposed1");
layerUtility.appendFormAsLayer((PDPage) bigPages.get(2), firstForm, affineTransform, "Superimposed2");
bigDocument.save(BIG_TARGET_FILE);
As you see I superimposed the first page of FIRST_SUPER_FILE on two pages of the target file but I only imported the page once. Thus, also the resources of that imported page are imported only once.
This approach is open for loops, too, but don't import the same page multiple times! Instead import all required template pages once up front as forms and in the later loop reference those forms again and again.
(I hope this solves your issue. If not, supply more code and the sample PDFs to reproduce your issue.)

Categories

Resources