extracting one page from pdf file using iText

extracting one page from pdf file using iText - java

I want to return one page from pdf files from java servlet (to reduce file size download), using itext library.
using this code
try {
PdfReader reader = new PdfReader(input);
Document document = new Document(reader.getPageSizeWithRotation(page_number) );
PdfSmartCopy copy1 = new PdfSmartCopy(document, response.getOutputStream());
copy1.setFullCompression();
document.open();
copy1.addPage(copy1.getImportedPage(reader, page_i) );
copy1.freeReader(reader);
reader.close();
document.close();
} catch (DocumentException e) {
e.printStackTrace();
}
this code returns the page, but the file size is large and some times equals the original file size, even it is just a one page.

I have downloaded a single file from your repository: Abdomen.pdf
I have then used the following code to "burst" that PDF:
public static void main(String[] args) throws DocumentException, IOException {
PdfReader reader = new PdfReader("resources/Abdomen.pdf");
int n = reader.getNumberOfPages();
reader.close();
String path;
PdfStamper stamper;
for (int i = 1; i <= n; i++) {
reader = new PdfReader("resources/abdomen.pdf");
reader.selectPages(String.valueOf(i));
path = String.format("results/abdomen/p-%s.pdf", i);
stamper = new PdfStamper(reader,new FileOutputStream(path));
stamper.close();
reader.close();
}
}
To "burst" means to split in separate pages. While the original file Abdomen.pdf is 72,570 KB (about 70.8 MB), the separate pages are much smaller:
I can not reproduce the problem you describe.

A bit more updated and a lot cleaner (5.5.6 and up) :
/**
* Manipulates a PDF file src with the file dest as result
* #param src the original PDF
* #param dest the resulting PDF
* #throws IOException
* #throws DocumentException
*/
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
SmartPdfSplitter splitter = new SmartPdfSplitter(reader);
int part = 1;
while (splitter.hasMorePages()) {
splitter.split(new FileOutputStream("results/merge/part_" + part + ".pdf"), 200000);
part++;
}
reader.close();
}

Related

Background image itextpdf 5.5

I'm using itextpdf 5.5, can't change it to 7.
I have the problem with background image.
I have a document (text and tables) without stamp and I want to add stamp to it.
This is how I download existing doc.
PdfReader pdfReader = new PdfReader("/doc.pdf");
PdfImportedPage page = writer.getImportedPage(pdfReader, 1);
PdfContentByte pcb = writer.getDirectContent();
pcb.addTemplate(page, 0,0);
And this is how I download stamp image and add it to my doc.
PdfContentByte canvas = writer.getDirectContentUnder();
URL resource = getClass().getResource(getStamp());
Image background = new Jpeg(resource);
background.scaleToFit(463F, 132F);
background.setAbsolutePosition(275F, 100F);
canvas.addImage(background);
But when I download my document - I don't see the stamp. I tried to change getDirectContent() to getDirectContentUnder() when I download my doc but this leads to the opposite situation - my stamp isn't in background.
My first doc is generated this way.
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Document document = new Document(PageSize.A4);
try {
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
Paragraph title = new Paragraph(formatUtil.msg("my.header"), fontBold);
title.setAlignment(Element.ALIGN_CENTER);
document.add(title);
Template tmpl = fmConfig.getConfiguration().getTemplate("template.ftl");
Map<String, Object> params = new HashMap<>();
StringWriter writer = new StringWriter();
params.put("param", "param");
tmpl.process(params, writer);
document.add(new Paragraph(writer.toString(), fontCommon));
PdfPTable table = new PdfPTable(2);
document.add(table);
PdfContentByte canvas = writer.getDirectContentUnder();
Image background = new Jpeg(getClass().getResource("background.jpg"));
background.scaleAbsolute(PageSize.A4);
background.setAbsolutePosition(0,0);
canvas.addImage(background);
} finally {
if (document.isOpen()) {
document.close();
}
}

In comments it became clear that the task was to add some content (a bitmap image) to a PDF so that it is over the background (another bitmap image) added to the UnderContent during generation and under the text in the original DirectContent.
iText does not contain high-level code for such content manipulation. While the structure of iText generated PDFs would allow for such code, PDFs generated or manipulated by other PDF libraries may have a different structure; the structure in iText generated PDFs may even be changed during post-processing using other libraries. Thus, it is understandable that no high-level feature for this is provided by iText.
To implement the task nonetheless, therefore, we have to base our code on lower level iText APIs. In this context we can make use of the PdfContentStreamEditor helper class from this answer which already abstracts some details away. (That question originally is about iTextSharp for C# but further down in the answer Java versions of the code also are provided.)
In detail, we extend the PdfContentStreamEditor to remove the former UnderContent (and provide it as list of instructions). Now we can in a first step apply this editor to your file and in a second step add this former UnderContent plus an image over it to the intermediary file. (This could also be done in a single step but that would require a more complex, less maintainable editor class.)
First the new UnderContentRemover content stream editor class:
public class UnderContentRemover extends PdfContentStreamEditor {
/**
* Clears state of {#link UnderContentRemover}, in particular
* the collected content. Use this if you use this instance for
* multiple edit runs.
*/
public void clear() {
afterUnderContent = false;
underContent.clear();
depth = 0;
}
/**
* Retrieves the collected UnderContent instructions
*/
public List<List<PdfObject>> getUnderContent() {
return new ArrayList<List<PdfObject>>(underContent);
}
/**
* Adds the given instructions (which may previously have been
* retrieved using {#link #getUnderContent()}) to the given
* {#link PdfContentByte} instance.
*/
public static void write (PdfContentByte canvas, List<List<PdfObject>> operations) throws IOException {
for (List<PdfObject> operands : operations) {
int index = 0;
for (PdfObject object : operands) {
object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '\n');
}
}
}
protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {
String operatorString = operator.toString();
if (afterUnderContent) {
super.write(processor, operator, operands);
return;
} else if ("q".equals(operatorString)) {
depth++;
} else if ("Q".equals(operatorString)) {
depth--;
if (depth < 1)
afterUnderContent = true;
} else if (depth == 0) {
afterUnderContent = true;
super.write(processor, operator, operands);
return;
}
underContent.add(new ArrayList<>(operands));
}
boolean afterUnderContent = false;
List<List<PdfObject>> underContent = new ArrayList<>();
int depth = 0;
}
(UnderContentRemover)
As you see, its write method stores the leading instructions forwarded to it in the underContent list until it finds the restore-graphics-state (Q) instruction matching the initial save-graphics-state instruction (q). After that it instead forwards all further instructions to the parent write implementation which writes them to the edited page content.
We can use this for the task at hand as follows:
PdfReader pdfReader = new PdfReader(YOUR_DOCUMENT);
List<List<List<PdfObject>>> underContentByPage = new ArrayList<>();
byte[] sourceWithoutUnderContent = null;
try ( ByteArrayOutputStream outputStream = new ByteArrayOutputStream() ) {
PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream);
UnderContentRemover underContentRemover = new UnderContentRemover();
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
underContentRemover.clear();
underContentRemover.editPage(pdfStamper, i);
underContentByPage.add(underContentRemover.getUnderContent());
}
pdfStamper.close();
pdfReader.close();
sourceWithoutUnderContent = outputStream.toByteArray();
}
Image background = YOUR_IMAGE_TO_ADD_INBETWEEN;
background.scaleToFit(463F, 132F);
background.setAbsolutePosition(275F, 100F);
pdfReader = new PdfReader(sourceWithoutUnderContent);
byte[] sourceWithStampInbetween = null;
try ( ByteArrayOutputStream outputStream = new ByteArrayOutputStream() ) {
PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream);
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
PdfContentByte canvas = pdfStamper.getUnderContent(i);
UnderContentRemover.write(canvas, underContentByPage.get(i-1));
canvas.addImage(background);
}
pdfStamper.close();
pdfReader.close();
sourceWithStampInbetween = outputStream.toByteArray();
}
Files.write(new File("PdfLikeVladimirSafonov-WithStampInbetween.pdf").toPath(), sourceWithStampInbetween);
(AddImageInBetween test testForVladimirSafonov)

Masking aadhaar number in an UIDAI aadhaar pdf

Does anybody know how to mask aadhaar number in an aadhaar pdf downloaded from UIDAI website.
I have already tried below code for itext. It is not working for an aadhaar PDF but is working for other normal pdfs
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new MainClassName().manipulatePdf(SRC, DEST);
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("8989 7890 4567", "XXXX XXXX 4567").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
Tried suggestions from this post too (i.e, using pdfbox)
Search and replace text in PDF using JAVA
Even that is not working for an aadhaar PDF but is working for a normal pdf
(I know we can download a masked aadhaar pdf from UIDAI website itself, but I need to do it through JAVA)

you can convert your PDF to an Image then with help of Google Cloud OCR
and then mask text and convert it back to PDF.
Here is Sample Project in javascript.
https://github.com/rinkeshjain/aadhaar-mask

merge multiple pdfs in order

hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf

It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}

function that can use iText to concatenate / merge pdfs together - causing some issues

I'm using the following code to merge PDFs together using iText:
public static void concatenatePdfs(List<File> listOfPdfFiles, File outputFile) throws DocumentException, IOException {
Document document = new Document();
FileOutputStream outputStream = new FileOutputStream(outputFile);
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
for (File inFile : listOfPdfFiles) {
PdfReader reader = new PdfReader(inFile.getAbsolutePath());
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
document.newPage();
PdfImportedPage page = writer.getImportedPage(reader, i);
cb.addTemplate(page, 0, 0);
}
}
outputStream.flush();
document.close();
outputStream.close();
}
This usually works great! But once and a while, it's rotating some of the pages by 90 degrees? Anyone ever have this happen?
I am looking into the PDFs themselves to see what is special about the ones that are being flipped.

There are errors once in a while because you are using the wrong method to concatenate documents. Please read chapter 6 of my book and you'll notice that using PdfWriter to concatenate (or merge) PDF documents is wrong:
You completely ignore the page size of the pages in the original document (you assume they are all of size A4),
You ignore page boundaries such as the crop box (if present),
You ignore the rotation value stored in the page dictionary,
You throw away all interactivity that is present in the original document, and so on.
Concatenating PDFs is done using PdfCopy, see for instance the FillFlattenMerge2 example:
Document document = new Document();
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
PdfReader reader;
String line = br.readLine();
// loop over readers
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
// end loop
document.close();
There are other examples in the book.

In case anyone is looking for it, using Bruno Lowagie's correct answer above, here is the version of the function that does not seem to have the page flipping issue i described above:
public static void concatenatePdfs(List<File> listOfPdfFiles, File outputFile) throws DocumentException, IOException {
Document document = new Document();
FileOutputStream outputStream = new FileOutputStream(outputFile);
PdfCopy copy = new PdfSmartCopy(document, outputStream);
document.open();
for (File inFile : listOfPdfFiles) {
PdfReader reader = new PdfReader(inFile.getAbsolutePath());
copy.addDocument(reader);
reader.close();
}
document.close();
}

Editing PDF text using Java

Is there a way I can edit a PDF from Java?
I have a PDF document which contains placeholders for text that I need to be replaced using Java, but all the libraries that I saw created PDF from scratch and small editing functionality.
Is there anyway I can edit a PDF or is this impossible?

You can do it with iText. I tested it with following code. It adds a chunk of text and a red circle over each page of an existing PDF.
/* requires itextpdf-5.1.2.jar or similar */
import java.io.*;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.*;
public class AddContentToPDF {
public static void main(String[] args) throws IOException, DocumentException {
/* example inspired from "iText in action" (2006), chapter 2 */
PdfReader reader = new PdfReader("C:/temp/Bubi.pdf"); // input PDF
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream("C:/temp/Bubi_modified.pdf")); // output PDF
BaseFont bf = BaseFont.createFont(
BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED); // set font
//loop on pages (1-based)
for (int i=1; i<=reader.getNumberOfPages(); i++){
// get object for writing over the existing content;
// you can also use getUnderContent for writing in the bottom layer
PdfContentByte over = stamper.getOverContent(i);
// write text
over.beginText();
over.setFontAndSize(bf, 10); // set font and size
over.setTextMatrix(107, 740); // set x,y position (0,0 is at the bottom left)
over.showText("I can write at page " + i); // set text
over.endText();
// draw a red circle
over.setRGBColorStroke(0xFF, 0x00, 0x00);
over.setLineWidth(5f);
over.ellipse(250, 450, 350, 550);
over.stroke();
}
stamper.close();
}
}

I modified the code found a bit and it was working as follows
public class Principal {
public static final String SRC = "C:/tmp/244558.pdf";
public static final String DEST = "C:/tmp/244558-2.pdf";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new Principal().manipulatePdf(SRC, DEST);
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("NULA", "Nulo").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
}

Take a look at iText and this sample code

Take a look at aspose and this sample code

I've done this using LibreOffice Draw.
You start by manually opening a pdf in Draw, checking that it renders OK, and saving it as a Draw .odg file.
That's a zipped xml file, so you can modify it in code to find and replace the placeholders.
Next (from code) you use a command line call to Draw to generate the pdf.
Success!
The main issue is that Draw doesn't handle fonts embedded in a pdf. If the font isn't also installed on your system - then it will render oddly, as Draw will replace it with a standard one that inevitably has different sizing.
If this approach is of interest, I'll put together some shareable code.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

extracting one page from pdf file using iText - java

Related

Background image itextpdf 5.5

Masking aadhaar number in an UIDAI aadhaar pdf

merge multiple pdfs in order

function that can use iText to concatenate / merge pdfs together - causing some issues

Editing PDF text using Java

Categories

Resources