merge multiple pdfs in order

merge multiple pdfs in order - java

hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf

It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}

Related

Delete paragraph from PDF - Java

Greeting,
What is the easiest way to delete text/paragraph from a PDF document. The program takes a PDF document and creates a separate PDF document for each page. In each document I have text from the original that I would like to delete.
I tried a couple of examples but it doesn't work or they use old libraries
I am using the iText 7 library
private void processPDF(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
if (object instanceof PRStream) {
PRStream stream = (PRStream) object;
byte[] data = PdfReader.getStreamBytes(stream);
String dd = new String(data, "UTF8")
.replace("Hand made software", "");
stream.setData(dd.getBytes("UTF8"));
if (dd.contains("Hand made software")) {
System.out.println("Contains");
}
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
private void processPDF2(String src, String dest) throws InvalidPasswordException, IOException {
Map<String, String> map = new HashMap<>();
map.put("Hand made software", "");
File template = new File(src);
PDDocument document = PDDocument.load(template);
List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
for (PDField field : fields) {
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().equals(field.getFullyQualifiedName())) {
field.setValue(entry.getValue());
field.setReadOnly(true);
}
}
}
File out = new File(dest);
document.save(out);
document.close();
}
I want to delete line "Hand made software"

You can make it easy iterating through the PDF elements. First create a PDF reader and writter that will read the template located on src string path.
File template = new File(src);
PdfReader reader = new PdfReader(template);
File out = new File(dest);
PdfWritter writter = new PdfWritter(out);
Then create a Document object by firstly creating a PdfDocument:
PdfDocument pdf = new PdfDocument(reader, writter);
Document document = new Document(pdf);
Last iterate through the elements of the pdf document until "Hand made software" line is found:
for (int i = 0; i < document.getRoots().size(); i++) {
if (document.getRoots().get(i) instanceof Paragraph) { //iterate only through paragraphs
Paragraph paragraph = (Paragraph) document.getRoots().get(i);
if(paragraph.getText().equals("Hand made software")){ //if the paragraph equals to the string to be removed, remove from the document
document.getRoots().remove(i);
i--;
}
}
}
Finally close de document to save the changes
document.close();

Merging of different file format into single pdf using itext is giving corrupt pdf

I have a multipart form in which different type of files will be uploaded in single request like pdf,png,jpg etc. I need to read these files and create final merged single pdf file. Does iText will support this kind of merger of different file types.
I can able to merge all image type but when i try to merge pdf with imgae getting some issues.Any sample example on this.Below code creating final pdf without images.
public void generateMergedPDF(Map<String, String> dataMap, MultipartFile[] files) throws Exception {
ClassPathResource resource = new ClassPathResource("templatepdfForm.pdf");
FileOutputStream userInfo = new FileOutputStream(new File("C:\\pdf\\megepath\\updateddocument.pdf"));
//Update filed values to template Starts
PdfReader reader = new PdfReader(resource.getInputStream());
PdfStamper stamper = new PdfStamper(reader, userInfo);
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
Map<String, Item> fieldMap = form.getFields();
for (String key : fieldMap.keySet()) {
String fieldValue = dataMap.get(key);
if (fieldValue != null) {
form.setField(key, fieldValue);
}
}
stamper.close();
//Update filed values to template Ends
Document mergePdfDoc = new Document();
PdfCopy pdfCopy;
boolean smartCopy = false;
FileOutputStream finalFile = new FileOutputStream("C:\\pdf\\finalmergedfile.pdf");
//Merge updated pdf with multipart content
if(smartCopy)
pdfCopy = new PdfSmartCopy(mergePdfDoc, finalFile);
else
pdfCopy = new PdfCopy(mergePdfDoc, finalFile);
PdfWriter writer = PdfWriter.getInstance(mergePdfDoc, finalFile);
mergePdfDoc.open();
PdfReader mergeReader = new PdfReader(new FileInputStream(new File("C:\\pdf\\megepath\\updateddocument.pdf")));
pdfCopy.addDocument(mergeReader);
pdfCopy.freeReader(mergeReader);
mergeReader.close();
PdfReader[] pdfReader = new PdfReader[files.length];
for(int i=0; i<=files.length-1;i++) {
if(FileContentType.APPLICATION_TYPE.getContentTypes().contains(files[i].getContentType())) {
//To add multipart pdf content
pdfReader[i] = new PdfReader(files[i].getInputStream());
pdfCopy.addDocument(pdfReader[i]);
pdfCopy.freeReader(pdfReader[i]);
pdfReader[i].close();
}else if(FileContentType.IMAGE_TYPE.getContentTypes().contains(files[i].getContentType())) {
//To add multipart image content
System.out.println("Image Type Loop");
Image fileImage = Image.getInstance(files[i].getBytes());
mergePdfDoc.setPageSize(fileImage);
mergePdfDoc.newPage();
claimImage.setAbsolutePosition(0, 0);
mergePdfDoc.add(fileImage);
}
}
pdfCopy.setMergeFields();
//mergePdfDoc.close(); //If i enable this close stream closed error
//writer.close(); //If i enable this close stream closed error
memInfo.close();
finalFile.close();
}

iText PdfAConformanceException: PDF array is out of bounds

i'm using iText 5.5.5 with Java5.
I'm trying to merge some PDF/A. when I got a "PdfAConformanceException: PDF array is out of bounds".
Trying to find error I find the "bad PDF" that cause the error and when I try to copy just it exception throw again. This error don't appear always, it appear just when this PDF/A is in the "job chain"; I tried with some other files and it's all fine. I cant share with you source PDF 'couse it's restricted.
That's my piece of code:
_log.info("Start Document Merge");
// Output pdf
ByteArrayOutputStream bos = new ByteArrayOutputStream();
com.itextpdf.text.Document document = new com.itextpdf.text.Document();
PdfCopy copy = new PdfACopy(document, bos, PdfAConformanceLevel.PDF_A_1B);
PageStamp stamp = null;
PdfReader reader = null;
PdfContentByte content = null;
int outPdfPageCount = 0;
BaseFont baseFont = BaseFont.createFont("Arial", BaseFont.WINANSI, BaseFont.EMBEDDED);
copyOutputIntents(reader, copy);
// Loop over the pages in that document
try {
int numberOfPages = reader.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfImportedPage pagecontent = copy.getImportedPage(reader, i);
_log.debug("Handling page numbering [" + i + "]");
stamp = copy.createPageStamp(pagecontent);
content = stamp.getUnderContent();
content.beginText();
content.setFontAndSize(baseFont, Configuration.NumPagSize);
content.showTextAligned(PdfContentByte.ALIGN_CENTER, String.format("%s %s ", Configuration.NumPagPrefix, i), Configuration.NumPagX, Configuration.NumPagY, 0);
content.endText();
stamp.alterContents();
copy.addPage(pagecontent);
outPdfPageCount++;
if (outPdfPageCount > Configuration.MaxPages) {
_log.error("Pdf Page Count > MaxPages");
throw new PackageException(Constants.ERROR_104_TEXT, Constants.ERROR_104);
}
}
copy.freeReader(reader);
reader.close();
copy.createXmpMetadata();
document.close();
} catch (Exception e) {
_log.error("Error during mergin Document, skip");
_log.debug(MiscUtil.stackToString(e));
}
return bos.toByteArray();
That's the full stacktrace:
com.itextpdf.text.pdf.PdfAConformanceException: PDF array is out of bounds.
at com.itextpdf.text.pdf.internal.PdfA1Checker.checkPdfObject(PdfA1Checker.java:269)
at com.itextpdf.text.pdf.internal.PdfAChecker.checkPdfAConformance(PdfAChecker.java:208)
at com.itextpdf.text.pdf.internal.PdfAConformanceImp.checkPdfIsoConformance(PdfAConformanceImp.java:71)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3480)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3476)
at com.itextpdf.text.pdf.PdfArray.toPdf(PdfArray.java:165)
at com.itextpdf.text.pdf.PdfDictionary.toPdf(PdfDictionary.java:149)
at com.itextpdf.text.pdf.PdfArray.toPdf(PdfArray.java:175)
at com.itextpdf.text.pdf.PdfDictionary.toPdf(PdfDictionary.java:149)
at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:373)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:369)
at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:843)
at com.itextpdf.text.pdf.PdfCopy.addToBody(PdfCopy.java:839)
at com.itextpdf.text.pdf.PdfCopy.addToBody(PdfCopy.java:821)
at com.itextpdf.text.pdf.PdfCopy.copyIndirect(PdfCopy.java:426)
at com.itextpdf.text.pdf.PdfCopy.copyIndirect(PdfCopy.java:446)
at com.itextpdf.text.pdf.PdfCopy.copyObject(PdfCopy.java:577)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:503)
at com.itextpdf.text.pdf.PdfCopy.copyObject(PdfCopy.java:573)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:503)
at com.itextpdf.text.pdf.PdfCopy.copyObject(PdfCopy.java:573)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:493)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:519)
at com.itextpdf.text.pdf.PdfCopy.addPage(PdfCopy.java:663)
at com.itextpdf.text.pdf.PdfACopy.addPage(PdfACopy.java:115)
at it.m2sc.engageone.documentpackage.generator.PackageGenerator.mergePDF(PackageGenerator.java:256)

In that specific case, the problem depends by a specific Font ( Gulim ) that is too big to be embedded in PDF/A-1 file. When that font was removed, everything war run fine.

extracting one page from pdf file using iText

I want to return one page from pdf files from java servlet (to reduce file size download), using itext library.
using this code
try {
PdfReader reader = new PdfReader(input);
Document document = new Document(reader.getPageSizeWithRotation(page_number) );
PdfSmartCopy copy1 = new PdfSmartCopy(document, response.getOutputStream());
copy1.setFullCompression();
document.open();
copy1.addPage(copy1.getImportedPage(reader, page_i) );
copy1.freeReader(reader);
reader.close();
document.close();
} catch (DocumentException e) {
e.printStackTrace();
}
this code returns the page, but the file size is large and some times equals the original file size, even it is just a one page.

I have downloaded a single file from your repository: Abdomen.pdf
I have then used the following code to "burst" that PDF:
public static void main(String[] args) throws DocumentException, IOException {
PdfReader reader = new PdfReader("resources/Abdomen.pdf");
int n = reader.getNumberOfPages();
reader.close();
String path;
PdfStamper stamper;
for (int i = 1; i <= n; i++) {
reader = new PdfReader("resources/abdomen.pdf");
reader.selectPages(String.valueOf(i));
path = String.format("results/abdomen/p-%s.pdf", i);
stamper = new PdfStamper(reader,new FileOutputStream(path));
stamper.close();
reader.close();
}
}
To "burst" means to split in separate pages. While the original file Abdomen.pdf is 72,570 KB (about 70.8 MB), the separate pages are much smaller:
I can not reproduce the problem you describe.

A bit more updated and a lot cleaner (5.5.6 and up) :
/**
* Manipulates a PDF file src with the file dest as result
* #param src the original PDF
* #param dest the resulting PDF
* #throws IOException
* #throws DocumentException
*/
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
SmartPdfSplitter splitter = new SmartPdfSplitter(reader);
int part = 1;
while (splitter.hasMorePages()) {
splitter.split(new FileOutputStream("results/merge/part_" + part + ".pdf"), 200000);
part++;
}
reader.close();
}

continuing text to next page

I want to generate a PDF of questions and their options using iText. I am able to generate the PDF but the problem is sometimes questions get printed at the end of a page and options go to the next page.
How can I determine that a question and its option will not fit in the same page?
This means that if question and options will not fit in the same page then that they must be placed on the next page.
UPDATED
com.itextpdf.text.Document document = new com.itextpdf.text.Document(PageSize.A4,50,50,15,15);
ByteArrayOutputStream OutputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, OutputStream);
document.open();
Paragraph paragraph = new Paragraph("Paper Name Here",new Font(FontFamily.TIMES_ROMAN,15,Font.BOLD));
paragraph.setAlignment(Element.ALIGN_CENTER);
document.add(paragraph);
document.addTitle("Paper Name Here");
document.addAuthor("corp");
com.itextpdf.text.List list = new com.itextpdf.text.List(true);
for (long i = 1; i <= 20 ; i++)
{
List<MultipleChoiceSingleCorrect> multipleChoiceSingleCorrects = new MultipleChoiceSingleCorrectServicesImpl().getItemDetailsByItemID(i);
for (MultipleChoiceSingleCorrect multipleChoiceSingleCorrect : multipleChoiceSingleCorrects) {
list.add(multipleChoiceSingleCorrect.getItemText());
RomanList oplist = new RomanList();
oplist.setIndentationLeft(20);
for (OptionSingleCorrect optionSingleCorrect : multipleChoiceSingleCorrect.getOptionList()) {
oplist.add(optionSingleCorrect.getOptionText());
}
list.add(oplist);
}
}
document.add(list);
document.close();
after this I m getting abnormal page brakes means some times question is at end of page and option jumps to next page.(AS shown in image below)

What you are interested in are the setKeepTogether(boolean) methods :
for Paragraph
or for PdfPTable
This will keep the object in one page, forcing the creation of a new page if the content doesn't fit in the remaining page.

with the help of Alexis Pigeon I done with this code. So special thanks to him.
I have added question text to Paragraph after that all options kept in an list.
Option list opList added in paragraph, this paragraph add to an ListItem and this ListItem
added to an master list.
This way question splitting on two pages is resolved but I m not getting question numbers.. I already set master list as numbered=true com.itextpdf.text.List list = new com.itextpdf.text.List(true);
Code:-
try {
String Filename="PaperName.pdf";
com.itextpdf.text.Document document = new com.itextpdf.text.Document(PageSize.A4,50,50,15,15);
ByteArrayOutputStream OutputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, OutputStream);
document.open();
Paragraph paragraph = new Paragraph("Paper Name Here",new Font(FontFamily.TIMES_ROMAN,15,Font.BOLD));
paragraph.setAlignment(Element.ALIGN_CENTER);
paragraph.setSpacingAfter(20);
document.add(paragraph);
document.addTitle("Paper Name Here");
document.addAuthor("crop");
document.addCreator("crop");
com.itextpdf.text.List list = new com.itextpdf.text.List(true);
for (long i = 1; i <= 20 ; i++)
{
List<MultipleChoiceSingleCorrect> multipleChoiceSingleCorrects = new MultipleChoiceSingleCorrectServicesImpl().getItemDetailsByItemID(i);
for (MultipleChoiceSingleCorrect multipleChoiceSingleCorrect : multipleChoiceSingleCorrects) {
Paragraph paragraph2 =new Paragraph();
paragraph2.setKeepTogether(true);
paragraph2.add(multipleChoiceSingleCorrect.getItemText());
paragraph2.add(Chunk.NEWLINE);
RomanList oplist = new RomanList();
oplist.setIndentationLeft(20);
for (OptionSingleCorrect optionSingleCorrect : multipleChoiceSingleCorrect.getOptionList()) {
oplist.add(optionSingleCorrect.getOptionText());
}
paragraph2.add(oplist);
paragraph2.setSpacingBefore(20);
ListItem listItem =new ListItem();
listItem.setKeepTogether(true);
listItem.add(paragraph2);
list.add(listItem);
}
}
document.add(list);
document.close();
response.setContentLength(OutputStream.size());
response.setContentType("application/pdf");
response.setHeader("Content-disposition", "attachment; filename=" + Filename);
ServletOutputStream out = response.getOutputStream();
OutputStream.writeTo(out);
out.flush();
}
catch (Exception e)
{
e.printStackTrace();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

merge multiple pdfs in order - java

Related

Delete paragraph from PDF - Java

Merging of different file format into single pdf using itext is giving corrupt pdf

iText PdfAConformanceException: PDF array is out of bounds

extracting one page from pdf file using iText

continuing text to next page

Categories

Resources