How to merge two PDF files into one in Java?

How to merge two PDF files into one in Java? - java

I want to merge many PDF files into one using PDFBox and this is what I've done:
PDDocument document = new PDDocument();
for (String pdfFile: pdfFiles) {
PDDocument part = PDDocument.load(pdfFile);
List<PDPage> list = part.getDocumentCatalog().getAllPages();
for (PDPage page: list) {
document.addPage(page);
}
part.close();
}
document.save("merged.pdf");
document.close();
Where pdfFiles is an ArrayList<String> containing all the PDF files.
When I'm running the above, I'm always getting:
org.apache.pdfbox.exceptions.COSVisitorException: Bad file descriptor
Am I doing something wrong? Is there any other way of doing it?

Why not use the PDFMergerUtility of pdfbox?
PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource(...);
ut.addSource(...);
ut.addSource(...);
ut.setDestinationFileName(...);
ut.mergeDocuments();

A quick Google search returned this bug: "Bad file descriptor while saving a document w. imported PDFs".
It looks like you need to keep the PDFs to be merged open, until after you have saved and closed the combined PDF.

This is a ready to use code, merging four pdf files with itext.jar from http://central.maven.org/maven2/com/itextpdf/itextpdf/5.5.0/itextpdf-5.5.0.jar, more on http://tutorialspointexamples.com/
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
/**
* This class is used to merge two or more
* existing pdf file using iText jar.
*/
public class PDFMerger {
static void mergePdfFiles(List<InputStream> inputPdfList,
OutputStream outputStream) throws Exception{
//Create document and pdfReader objects.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
outputStream.flush();
document.close();
outputStream.close();
System.out.println("Pdf files merged successfully.");
}
public static void main(String args[]){
try {
//Prepare input pdf file list as list of input stream.
List<InputStream> inputPdfList = new ArrayList<InputStream>();
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_1.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_2.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_3.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_4.pdf"));
//Prepare output stream for merged pdf file.
OutputStream outputStream =
new FileOutputStream("..\\pdf\\MergeFile_1234.pdf");
//call method to merge pdf files.
mergePdfFiles(inputPdfList, outputStream);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Multiple pdf merged method using org.apache.pdfbox:
public void mergePDFFiles(List<File> files,
String mergedFileName) {
try {
PDFMergerUtility pdfmerger = new PDFMergerUtility();
for (File file : files) {
PDDocument document = PDDocument.load(file);
pdfmerger.setDestinationFileName(mergedFileName);
pdfmerger.addSource(file);
pdfmerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
document.close();
}
} catch (IOException e) {
logger.error("Error to merge files. Error: " + e.getMessage());
}
}
From main program, call mergePDFFiles method using list of files and target file name.
String mergedFileName = "Merged.pdf";
mergePDFFiles(files, mergedFileName);
After calling mergePDFFiles, load merged file
File mergedFile = new File(mergedFileName);

package article14;
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.util.PDFMergerUtility;
public class Pdf
{
public static void main(String args[])
{
new Pdf().createNew();
new Pdf().combine();
}
public void combine()
{
try
{
PDFMergerUtility mergePdf = new PDFMergerUtility();
String folder ="pdf";
File _folder = new File(folder);
File[] filesInFolder;
filesInFolder = _folder.listFiles();
for (File string : filesInFolder)
{
mergePdf.addSource(string);
}
mergePdf.setDestinationFileName("Combined.pdf");
mergePdf.mergeDocuments();
}
catch(Exception e)
{
}
}
public void createNew()
{
PDDocument document = null;
try
{
String filename="test.pdf";
document=new PDDocument();
PDPage blankPage = new PDPage();
document.addPage( blankPage );
document.save( filename );
}
catch(Exception e)
{
}
}
}

If you want to combine two files where one overlays the other (example: document A is a template and document B has the text you want to put on the template), this works:
after creating "doc", you want to write your template (templateFile) on top of that -
PDDocument watermarkDoc = PDDocument.load(getServletContext()
.getRealPath(templateFile));
Overlay overlay = new Overlay();
overlay.overlay(watermarkDoc, doc);

Using iText (existing PDF in bytes)
public static byte[] mergePDF(List<byte[]> pdfFilesAsByteArray) throws DocumentException, IOException {
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Document document = null;
PdfCopy writer = null;
for (byte[] pdfByteArray : pdfFilesAsByteArray) {
try {
PdfReader reader = new PdfReader(pdfByteArray);
int numberOfPages = reader.getNumberOfPages();
if (document == null) {
document = new Document(reader.getPageSizeWithRotation(1));
writer = new PdfCopy(document, outStream); // new
document.open();
}
PdfImportedPage page;
for (int i = 0; i < numberOfPages;) {
++i;
page = writer.getImportedPage(reader, i);
writer.addPage(page);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
document.close();
outStream.close();
return outStream.toByteArray();
}

Related

How to merge or concat to byte [ ] arrays of PDF's type in java but am getting pdf1 as output [duplicate]

I want to merge many PDF files into one using PDFBox and this is what I've done:
PDDocument document = new PDDocument();
for (String pdfFile: pdfFiles) {
PDDocument part = PDDocument.load(pdfFile);
List<PDPage> list = part.getDocumentCatalog().getAllPages();
for (PDPage page: list) {
document.addPage(page);
}
part.close();
}
document.save("merged.pdf");
document.close();
Where pdfFiles is an ArrayList<String> containing all the PDF files.
When I'm running the above, I'm always getting:
org.apache.pdfbox.exceptions.COSVisitorException: Bad file descriptor
Am I doing something wrong? Is there any other way of doing it?

Why not use the PDFMergerUtility of pdfbox?
PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource(...);
ut.addSource(...);
ut.addSource(...);
ut.setDestinationFileName(...);
ut.mergeDocuments();

A quick Google search returned this bug: "Bad file descriptor while saving a document w. imported PDFs".
It looks like you need to keep the PDFs to be merged open, until after you have saved and closed the combined PDF.

This is a ready to use code, merging four pdf files with itext.jar from http://central.maven.org/maven2/com/itextpdf/itextpdf/5.5.0/itextpdf-5.5.0.jar, more on http://tutorialspointexamples.com/
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
/**
* This class is used to merge two or more
* existing pdf file using iText jar.
*/
public class PDFMerger {
static void mergePdfFiles(List<InputStream> inputPdfList,
OutputStream outputStream) throws Exception{
//Create document and pdfReader objects.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
outputStream.flush();
document.close();
outputStream.close();
System.out.println("Pdf files merged successfully.");
}
public static void main(String args[]){
try {
//Prepare input pdf file list as list of input stream.
List<InputStream> inputPdfList = new ArrayList<InputStream>();
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_1.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_2.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_3.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_4.pdf"));
//Prepare output stream for merged pdf file.
OutputStream outputStream =
new FileOutputStream("..\\pdf\\MergeFile_1234.pdf");
//call method to merge pdf files.
mergePdfFiles(inputPdfList, outputStream);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Multiple pdf merged method using org.apache.pdfbox:
public void mergePDFFiles(List<File> files,
String mergedFileName) {
try {
PDFMergerUtility pdfmerger = new PDFMergerUtility();
for (File file : files) {
PDDocument document = PDDocument.load(file);
pdfmerger.setDestinationFileName(mergedFileName);
pdfmerger.addSource(file);
pdfmerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
document.close();
}
} catch (IOException e) {
logger.error("Error to merge files. Error: " + e.getMessage());
}
}
From main program, call mergePDFFiles method using list of files and target file name.
String mergedFileName = "Merged.pdf";
mergePDFFiles(files, mergedFileName);
After calling mergePDFFiles, load merged file
File mergedFile = new File(mergedFileName);

package article14;
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.util.PDFMergerUtility;
public class Pdf
{
public static void main(String args[])
{
new Pdf().createNew();
new Pdf().combine();
}
public void combine()
{
try
{
PDFMergerUtility mergePdf = new PDFMergerUtility();
String folder ="pdf";
File _folder = new File(folder);
File[] filesInFolder;
filesInFolder = _folder.listFiles();
for (File string : filesInFolder)
{
mergePdf.addSource(string);
}
mergePdf.setDestinationFileName("Combined.pdf");
mergePdf.mergeDocuments();
}
catch(Exception e)
{
}
}
public void createNew()
{
PDDocument document = null;
try
{
String filename="test.pdf";
document=new PDDocument();
PDPage blankPage = new PDPage();
document.addPage( blankPage );
document.save( filename );
}
catch(Exception e)
{
}
}
}

If you want to combine two files where one overlays the other (example: document A is a template and document B has the text you want to put on the template), this works:
after creating "doc", you want to write your template (templateFile) on top of that -
PDDocument watermarkDoc = PDDocument.load(getServletContext()
.getRealPath(templateFile));
Overlay overlay = new Overlay();
overlay.overlay(watermarkDoc, doc);

Using iText (existing PDF in bytes)
public static byte[] mergePDF(List<byte[]> pdfFilesAsByteArray) throws DocumentException, IOException {
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Document document = null;
PdfCopy writer = null;
for (byte[] pdfByteArray : pdfFilesAsByteArray) {
try {
PdfReader reader = new PdfReader(pdfByteArray);
int numberOfPages = reader.getNumberOfPages();
if (document == null) {
document = new Document(reader.getPageSizeWithRotation(1));
writer = new PdfCopy(document, outStream); // new
document.open();
}
PdfImportedPage page;
for (int i = 0; i < numberOfPages;) {
++i;
page = writer.getImportedPage(reader, i);
writer.addPage(page);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
document.close();
outStream.close();
return outStream.toByteArray();
}

How can I append an existing PDF to a generated PDF with flying saucer for Java [duplicate]

I want to merge many PDF files into one using PDFBox and this is what I've done:
PDDocument document = new PDDocument();
for (String pdfFile: pdfFiles) {
PDDocument part = PDDocument.load(pdfFile);
List<PDPage> list = part.getDocumentCatalog().getAllPages();
for (PDPage page: list) {
document.addPage(page);
}
part.close();
}
document.save("merged.pdf");
document.close();
Where pdfFiles is an ArrayList<String> containing all the PDF files.
When I'm running the above, I'm always getting:
org.apache.pdfbox.exceptions.COSVisitorException: Bad file descriptor
Am I doing something wrong? Is there any other way of doing it?

Why not use the PDFMergerUtility of pdfbox?
PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource(...);
ut.addSource(...);
ut.addSource(...);
ut.setDestinationFileName(...);
ut.mergeDocuments();

A quick Google search returned this bug: "Bad file descriptor while saving a document w. imported PDFs".
It looks like you need to keep the PDFs to be merged open, until after you have saved and closed the combined PDF.

This is a ready to use code, merging four pdf files with itext.jar from http://central.maven.org/maven2/com/itextpdf/itextpdf/5.5.0/itextpdf-5.5.0.jar, more on http://tutorialspointexamples.com/
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
/**
* This class is used to merge two or more
* existing pdf file using iText jar.
*/
public class PDFMerger {
static void mergePdfFiles(List<InputStream> inputPdfList,
OutputStream outputStream) throws Exception{
//Create document and pdfReader objects.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
outputStream.flush();
document.close();
outputStream.close();
System.out.println("Pdf files merged successfully.");
}
public static void main(String args[]){
try {
//Prepare input pdf file list as list of input stream.
List<InputStream> inputPdfList = new ArrayList<InputStream>();
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_1.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_2.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_3.pdf"));
inputPdfList.add(new FileInputStream("..\\pdf\\pdf_4.pdf"));
//Prepare output stream for merged pdf file.
OutputStream outputStream =
new FileOutputStream("..\\pdf\\MergeFile_1234.pdf");
//call method to merge pdf files.
mergePdfFiles(inputPdfList, outputStream);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Multiple pdf merged method using org.apache.pdfbox:
public void mergePDFFiles(List<File> files,
String mergedFileName) {
try {
PDFMergerUtility pdfmerger = new PDFMergerUtility();
for (File file : files) {
PDDocument document = PDDocument.load(file);
pdfmerger.setDestinationFileName(mergedFileName);
pdfmerger.addSource(file);
pdfmerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
document.close();
}
} catch (IOException e) {
logger.error("Error to merge files. Error: " + e.getMessage());
}
}
From main program, call mergePDFFiles method using list of files and target file name.
String mergedFileName = "Merged.pdf";
mergePDFFiles(files, mergedFileName);
After calling mergePDFFiles, load merged file
File mergedFile = new File(mergedFileName);

package article14;
import java.io.File;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.util.PDFMergerUtility;
public class Pdf
{
public static void main(String args[])
{
new Pdf().createNew();
new Pdf().combine();
}
public void combine()
{
try
{
PDFMergerUtility mergePdf = new PDFMergerUtility();
String folder ="pdf";
File _folder = new File(folder);
File[] filesInFolder;
filesInFolder = _folder.listFiles();
for (File string : filesInFolder)
{
mergePdf.addSource(string);
}
mergePdf.setDestinationFileName("Combined.pdf");
mergePdf.mergeDocuments();
}
catch(Exception e)
{
}
}
public void createNew()
{
PDDocument document = null;
try
{
String filename="test.pdf";
document=new PDDocument();
PDPage blankPage = new PDPage();
document.addPage( blankPage );
document.save( filename );
}
catch(Exception e)
{
}
}
}

If you want to combine two files where one overlays the other (example: document A is a template and document B has the text you want to put on the template), this works:
after creating "doc", you want to write your template (templateFile) on top of that -
PDDocument watermarkDoc = PDDocument.load(getServletContext()
.getRealPath(templateFile));
Overlay overlay = new Overlay();
overlay.overlay(watermarkDoc, doc);

Using iText (existing PDF in bytes)
public static byte[] mergePDF(List<byte[]> pdfFilesAsByteArray) throws DocumentException, IOException {
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Document document = null;
PdfCopy writer = null;
for (byte[] pdfByteArray : pdfFilesAsByteArray) {
try {
PdfReader reader = new PdfReader(pdfByteArray);
int numberOfPages = reader.getNumberOfPages();
if (document == null) {
document = new Document(reader.getPageSizeWithRotation(1));
writer = new PdfCopy(document, outStream); // new
document.open();
}
PdfImportedPage page;
for (int i = 0; i < numberOfPages;) {
++i;
page = writer.getImportedPage(reader, i);
writer.addPage(page);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
document.close();
outStream.close();
return outStream.toByteArray();
}

convert Word document to .pdf

I am not able to read Microsoft Word page by page and write into PDF file.
I am having Apache poi jar 3.15 and itext jar 5.0.6. In Apache poi,i am having 5 jars namely poi-3.14.jar,poi-ooxml-3.14.jar,poi-ooxml-schemas-3.14.jar ,XmlBeans 2.6.0.jar.Poi Scratchpad 3.14.jar My code is shown below.
public class Script
{
public void WordToPdf()
{
POIFSFileSystem fs = null;
Document document = new Document();
try
{
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("C:\\Users\\admin\\Desktop\\Srieedher.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("C:\\Users\\admin\\Desktop\\test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
}
catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
}
finally
{
document.close();
}
}
public static void main(String args[])
{
Script1 s=new Script();
s1.WordToPdf();
}
}

How to read raw text from pdf file using java [duplicate]

I need to extract text (word by word) from a pdf file.
import java.io.*;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
import com.itextpdf.text.pdf.parser.*;
public class pdf {
private static String INPUTFILE = "http://ontology.buffalo.edu/ontology%28PIC%29.pdf" ;
private static String OUTPUTFILE = "c:/new3.pdf";
public static void main(String[] args) throws DocumentException,
IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream(OUTPUTFILE));
document.open();
PdfReader reader = new PdfReader(INPUTFILE);
int n = reader.getNumberOfPages();
PdfImportedPage page;
// Go through all pages
for (int i = 1; i <= n; i++) {
page = writer.getImportedPage(reader, i);
System.out.println(i);
Image instance = Image.getInstance(page);
document.add(instance);
}
document.close();
PdfReader readerN = new PdfReader(OUTPUTFILE);
PdfTextExtractor parse = new PdfTextExtractor();
for (int i = 1; i <= n; i++)
System.out.println(parser.getTextFromPage(reader,i));
}
When I compile the code, I have this error:
the constructor PdfTextExtractor is undefined
How do I fix this?

PDFTextExtractor only contains static methods and the constructor is private. itext
You can call it like so:
String myLine = PDFTextExtractor.getTextFromPage(reader, pageNumber)

If you want to get all the text from the PDF file and save it to a text file you can use below code.
Use pdfutil.jar library.
import java.io.IOException;
import java.io.PrintWriter;
import com.testautomationguru.utility.PDFUtil;
public class PDFToText{
public static void main(String[] args) {
try {
String pdfFilePath = "C:\\abc.pdf";
PDFUtil pdfUtil = new PDFUtil();
String content = pdfUtil.getText(pdfFilePath);
PrintWriter out = new PrintWriter("C:\\abc.txt");
out.println(content);
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

// Try Apache PDF Box
import java.io.FilterInputStream;
import java.io.InputStream;
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
// Your PDF file
String filePath = "";
InputStream inputStream = null;
try
{
inputStream = new FileInputStream(filePath);
PDFParser parser = new PDFParser(inputStream);
// This will parse the stream and populate the COSDocument object.
parser.parse();
// Get the document that was parsed.
COSDocument cosDoc = parser.getDocument();
// This class will take a pdf document and strip out all of the text and
// ignore the formatting and such.
PDFTextStripper pdfStripper = new PDFTextStripper();
// This is the in-memory representation of the PDF document
PDDocument pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(pdDoc.getNumberOfPages());
// This will return the text of a document.
def statementPDF = pdfStripper.getText(pdDoc);
}
catch(Exception e)
{
String errorMessage += "\nUnexpected Exception: " + e.getClass() + "\n" + e.getMessage();
for (trace in e.getStackTrace())
{
errorMessage += "\n\t" + trace;
}
}
finally
{
if (inputStream != null)
{
inputStream.close();
}
}

Create pdf and merge with pdfbox

This is what I want to do:
Make 2 different pdf files using pdfbox
Merge these two files together using pdfmerger
I know how to do this if I'm saving #1 to the server side local hard drive and load the files for the #2. But what I want to do is using "directly from the memory". I've searched all the methods from this pdfboxes but still couldn't find it.
This is my code getting from the local file
Thank you.
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDTrueTypeFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.util.PDFMergerUtility;
/**
* This is an example that creates a simple document
* with a ttf-font.
*
* #author Michael Niedermair
* #version $Revision: 1.2 $
*/
public class Test2
{
/**
* create the second sample document from the PDF file format specification.
*
* #param file The file to write the PDF to.
* #param message The message to write in the file.
* #param fontfile The ttf-font file.
*
* #throws IOException If there is an error writing the data.
* #throws COSVisitorException If there is an error writing the PDF.
*/
public void doIt(final String file, final String message) throws IOException, COSVisitorException
{
// the document
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount(100, 700);
contentStream.drawString(message);
contentStream.endText();
contentStream.close();
doc.save(file);
System.out.println(file + " created!");
}
finally
{
if (doc != null)
{
doc.close();
}
}
}
/**
* This will create a hello world PDF document
* with a ttf-font.
* <br />
* see usage() for commandline
*
* #param args Command line arguments.
*/
public static void main(String[] args)
{
Test2 app = new Test2();
Test2 app2 = new Test2();
try {
app.doIt("C:/here.pdf", "hello");
app2.doIt("C:/here2.pdf", "helloagain");
PDFMergerUtility merger = new PDFMergerUtility();
merger.addSource("C:/here.pdf");
merger.addSource("C:/here2.pdf");
OutputStream bout2 = new BufferedOutputStream(new FileOutputStream("C:/hereisthefinal.pdf"));
merger.setDestinationStream(bout2);
merger.mergeDocuments();
} catch (COSVisitorException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

You just need to use the PdfMergeUtility.addSource(InputStream) method to add source from an inputstream and not from a physical file.
With a quick glance at the API, what you could do is use the PDDocument.save(OutputStream) method to write the file into a byte array in memory, something like this should work.
static byte[] doIt(String message) {
PDDocument doc = new PDDocument();
// add the message
ByteArrayOutputStream baos = new ByteArrayOutputStream();
doc.save(baos);
return baos.toByteArray();
}
void main(String args[]) {
byte[] pdf1 = doIt("hello");
byte[] pdf2 = doIt("world");
PDFMergerUtility merger = new PDFMergerUtility();
merger.addSource(new ByteArrayInputStream(pdf1));
merger.addSource(new ByteArrayInputStream(pdf2));
// do the rest with the merger
}

I use this to merge some documents (InputStreams) and write the merged document in a HttpServletResponse.
PDFMergerUtility mergedDoc = new PDFMergerUtility();
ByteArrayOutputStream colDocOutputstream = new ByteArrayOutputStream();
for (int i = 0; i < documentCount; i++)
{
ByteArrayOutputStream tempZipOutstream = new ByteArrayOutputStream();
...
mergedDoc.addSource(new ByteArrayInputStream(tempZipOutstream.toByteArray()));
}
mergedDoc.setDestinationStream(colDocOutputstream);
mergedDoc.mergeDocuments();
response.setContentLength(colDocOutputstream.size());
response.setContentType("application/pdf");
response.setHeader("Content-Disposition", "attachment; filename=mergedDocument.pdf");
response.setHeader("Pragma", "public");
response.setHeader("Cache-Control", "max-age=0");
response.addDateHeader("Expires", 0);
response.getOutputStream().write(colDocOutputstream.toByteArray());

You can use this way also:-
1) Create List of InputStream
2) Instantiate PDFMergerUtility class
3) Set Destination Output Stream
4) Add all InputStreams to PDFMerger as Source files which needs to be merged.
5) Merge the documents by calling "PDFmerger.mergeDocuments();"
List<InputStream> locations=new ArrayList<InputStream>();
locations.add(new FileInputStream("E:/Filenet Project Support/MergePDFs_Sample_Code/Attorney_new_form.pdf"));
locations.add(new FileInputStream("E:/Filenet Project Support/MergePDFs_Sample_Code/JH.pdf"));
locations.add(new FileInputStream("E:/Filenet Project Support/MergePDFs_Sample_Code/Interpreter_new_form.pdf"));
//Instantiating PDFMergerUtility class
PDFMergerUtility PDFmerger = new PDFMergerUtility();
//Setting Destination Output Stream
OutputStream out = new FileOutputStream("E:/Filenet Project Support/MergePDFs_Sample_Code/merged.pdf");
//Adding all InputStreams to PDFMerger as Source files which needs to be merged.
PDFmerger.addSources(locations);
//Setting Destination Output Stream
PDFmerger.setDestinationStream(out);
//Merging the two documents
PDFmerger.mergeDocuments();
System.out.println("Documents merged");

Using REST and PDFBOX
#RequestMapping(value = "/getMergePdf", method = RequestMethod.GET)
public ResponseEntity<byte[]> getMergePdf(#RequestParam(value = "filePath", required = true) String filePath,
#RequestParam(value = "newFileName", required = true) String newFileName) throws IOException {
// Step 1: Loading an Existing PDF Document
File file = new File(filePath);
File[] listFile = file.listFiles();
// Step 2: Instantiating the PDFMergerUtility class
PDFMergerUtility mergePdf = new PDFMergerUtility();
// Step 3: Setting the source files
for (File pdfName : listFile) {
mergePdf.addSource(pdfName);
}
// Step 4: Setting the destination file
ByteArrayOutputStream pdfDocOutputstream = new ByteArrayOutputStream();
mergePdf.setDestinationFileName(newFileName + ".pdf");
mergePdf.setDestinationStream(pdfDocOutputstream);
mergePdf.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
// Step 5: write in Response
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
// Here you have to set the actual filename of your pdf
headers.setContentDispositionFormData(mergePdf.getDestinationFileName(), mergePdf.getDestinationFileName());
headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
ResponseEntity<byte[]> response = new ResponseEntity<>(pdfDocOutputstream.toByteArray(), headers, HttpStatus.OK);
return response;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to merge two PDF files into one in Java? - java

Why not use the PDFMergerUtility of pdfbox? PDFMergerUtility ut = new PDFMergerUtility(); ut.addSource(...); ut.addSource(...); ut.addSource(...); ut.setDestinationFileName(...); ut.mergeDocuments();

A quick Google search returned this bug: "Bad file descriptor while saving a document w. imported PDFs". It looks like you need to keep the PDFs to be merged open, until after you have saved and closed the combined PDF.

Related

How to merge or concat to byte [ ] arrays of PDF's type in java but am getting pdf1 as output [duplicate]

How can I append an existing PDF to a generated PDF with flying saucer for Java [duplicate]

convert Word document to .pdf

How to read raw text from pdf file using java [duplicate]

Create pdf and merge with pdfbox

Categories

Resources