How to append a PDF file to an existing one with iText?

How to append a PDF file to an existing one with iText? - java

In an application I am trying to append multiple PDF files to a single already existing file. Using iText I found this
Using iText I found this tutorial, which, in my case doesn't seem to work.
Here are some ways I've tried to make it work.
String path = "path/to/destination.pdf";
PdfCopy mergedFile = new PdfCopy(pdf, new FileOutputStream(path));
PdfReader reader;
for(String toMergePath : toMergePaths){
reader = new PdfReader(toMergePath);
mergedFile.addDocument(reader);
mergedFile.freeReader(reader);
reader.close();
}
mergedFile.close();
When I try to add the document logcat tells me that the document is not open.
But, pdf (the original document) is already open by other methods, and closed only after this one. And, mergedFile is exactly like in the tutorial, which, I believe, must be right.
Did anyone experience the same problem? Otherwise, do anyone know a better method to do what I want to do?
I've seen other solutions copying the bite from one page and append them to a new file but I'm affraid this will "compile" the annotations which I need.
Thank you for your help,
Cordially,
Matthieu Meunier

I hope this code will help you.
public static void mergePdfs(){
try {
String[] files = { "D:\\1.pdf" ,"D:\\2.pdf" ,"D:\\3.pdf" ,"D:\\4.pdf"};
Document pDFCombineUsingJava = new Document();
PdfCopy copy = new PdfCopy(pDFCombineUsingJava , new FileOutputStream("D:\\CombinedFile.pdf"));
pDFCombineUsingJava.open();
PdfReader ReadInputPDF;
int number_of_pages;
for (int i = 0; i < files.length; i++) {
ReadInputPDF = new PdfReader(files[i]);
copy.addDocument(ReadInputPDF);
copy.freeReader(ReadInputPDF);
}
pDFCombineUsingJava.close();
}
catch (Exception i)
{
System.out.println(i);
}
}

Related

ArrayList<String> in PDF from a new row

I want to send some survey in PDF from java, I tryed different methods. I use with StringBuffer and without, but always see text in PDF in one row.
public void writePdf(OutputStream outputStream) throws Exception {
Paragraph paragraph = new Paragraph();
Document document = new Document();
PdfWriter.getInstance(document, outputStream);
document.open();
document.addTitle("Survey PDF");
ArrayList nameArrays = new ArrayList();
StringBuffer sb = new StringBuffer();
int i = -1;
for (String properties : textService.getAnswer()) {
nameArrays.add(properties);
i++;
}
for (int a= 0; a<=i; a++){
System.out.println("nameArrays.get(a) -"+nameArrays.get(a));
sb.append(nameArrays.get(a));
}
paragraph.add(sb.toString());
document.add(paragraph);
document.close();
}
textService.getAnswer() this - ArrayList<String>
Could you please advise how to separate the text in order each new sentence will be starting from new row?
Now I see like this:

You forgot the newline character \n and your code seems a bit overcomplicated.
Try this:
StringBuffer sb = new StringBuffer();
for (String property : textService.getAnswer()) {
sb.append(property);
sb.append('\n');
}

What about:
nameArrays.add(properties+"\n");

You might be able to fix that by simply appending "\n" to the strings that you collecting in your list; but I think: that very much depends on the PDF library you are using.
You see, "newlines" or "paragraphs" are to a certain degree about formatting. It seems like a conceptual problem to add that "formatting" information to the data that you are processing.
Meaning: you might want to check if your library allows you to provide strings - and then have the library do the formatting for you!
In other words: instead of giving strings with newlines; you should check if you can keep using strings without newlines, but if there is way to have the PDF library add line breaks were appropriate.
Side note on code quality: you are using raw types:
ArrayList nameArrays = new ArrayList();
should better be
ArrayList<String> names = new ArrayList<>();
[ I also changed the name - there is no point in putting the type of a collection into the variable name! ]

This method is for save values in array list into a pdf document. In the mfilePath variable "/" in here you can give folder name. As a example "/example/".
and also for mFileName variable you can use name. I give the date and time that document will created. don't give static name other vice your values are overriding in same pdf.
private void savePDF()
{
com.itextpdf.text.Document mDoc = new com.itextpdf.text.Document();
String mFileName = new SimpleDateFormat("YYYY-MM-DD-HH-MM-SS", Locale.getDefault()).format(System.currentTimeMillis());
String mFilePath = Environment.getExternalStorageDirectory() + "/" + mFileName + ".pdf";
try
{
PdfWriter.getInstance(mDoc, new FileOutputStream(mFilePath));
mDoc.open();
for(int d = 0; d < g; d++)
{
String mtext = answers.get(d);
mDoc.add(new Paragraph(mtext));
}
mDoc.close();
}
catch (Exception e)
{
}
}

Change order pages of PDF document in iTextSharp

I'm trying to change reorder pages of my PDF document, but i can't and I don't know why.
I read several articals about changing order, it's java(iText) and i have got few problems with it.(exampl1, exampl2, example3). This example on c#, but there is using other method(exampl4)
I want take my TOC on 12 page and put to 2 page. After 12 page I have other content. This is my template for change order of pages:
String.Format("1,%s, 2-%s, %s-%s", toc, toc-1, toc+1, n)
This is my method for changing order of pages:
public void ChangePageOrder(string path)
{
MemoryStream baos = new MemoryStream();
PdfReader sourcePDFReader = new PdfReader(path);
int toc = 12;
int n = sourcePDFReader.NumberOfPages;
sourcePDFReader.SelectPages(String.Format("1,%s, 2-%s, %s-%s", toc, toc-1, toc+1, n));
using (var fs = new FileStream(path, FileMode.Open, FileAccess.ReadWrite))
{
PdfStamper stamper = new PdfStamper(sourcePDFReader, fs);
stamper.Close();
}
}
Here is call to method:
...
doc.Close();
ChangePageOrder(filePath);
What am I doing not right?
Thank you.

Your code can't work because you are using path to create the PdfReader as well as to create the FileStream. You probably get an error such as "The file is in use" or "The file can't be accessed".
This is explained here:
StackOverflow: How to update a PDF without creating a new PDF?
Official web site:
How to update a PDF without creating a new PDF?
You create a MemoryStream() named baos, but you aren't using that object anywhere. One way to solve your problem, is to replace the FileStream when you first create your PDF by that MemoryStream, and then use the bytes stored in that memory stream to create a PdfReader instance. In that case, PdfStamper won't be writing to a file that is in use.
Another option would be to use a different path. For instance: first you write the document to a file named my_story_unordered.pdf (created by PdfWriter), then you write the document to a file named my_story_reordered.pdf (created by PdfStamper).
It's also possible to create the final document in one go. In that case, you need to switch to linear mode. There's an example in my book "iText in Action - Second Edition" that shows how to do this: MovieHistory1
In the C# port of this example, you have:
writer.SetLinearPageMode();
In normal circumstances, iText will create a page tree with branches and leaves. As soon a a branch has more than 10 leaves, a new branch is created. With setLinearPageMode(), you tell iText not to do this. The complete page tree will consist of one branch with nothing but leaves (no extra branches). This is bad from the point of view of performance when viewing the document, but it's acceptable if the number of pages in your document is limited.
Once you've switched to page mode, you can reorder the pages like this:
document.NewPage();
// get the total number of pages that needs to be reordered
int total = writer.ReorderPages(null);
// change the order
int[] order = new int[total];
for (int i = 0; i < total; i++) {
order[i] = i + toc;
if (order[i] > total) {
order[i] -= total;
}
}
// apply the new order
writer.ReorderPages(order);
Summarized: if your document doesn't have many pages, use the ReorderPages method. If your document has many pages, use the method you've been experimenting with, but do it correctly. Don't try to write to the file that you are still trying to read.

Without going into details about what you should do you can loop through all pages from a pdf, put them into a new pdf doc with all the pages. You can put your logic inside the for loop.
reader = new PdfReader(sourcePDFpath);
sourceDocument = new Document(reader.GetPageSizeWithRotation(startpage));
pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPDFpath, System.IO.FileMode.Create));
sourceDocument.Open();
for (int i = startpage; i <= endpage; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument.Close();
reader.Close();

Converting a docx containing a chart to PDF

I've got a docx4j generated file which contains several tables, titles and, finally, an excel-generated curve chart.
I have tried many approaches in order to convert this file to PDF, but did not get to any successful result.
Docx4j with xsl-fo did not work, most of the things included in the docx file are not yet implemented and show up in red text as "not implemented".
JODConverter did not work either, I got a resulting PDF in which everything was pretty good (just little formatting/styling issues) BUT the graph did not show up.
Finally, the closest approach was using Apache POI: The resulting PDF was identical to my docx file, but still no chart showing up.
I already know Aspose would solve this pretty easily, but I am looking for an open source, free solution.
The code I am using with Apache POI is as follows:
public static void convert(String inputPath, String outputPath)
throws XWPFConverterException, IOException {
PdfConverter converter = new PdfConverter();
converter.convert(new XWPFDocument(new FileInputStream(new File(
inputPath))), new FileOutputStream(new File(outputPath)),
PdfOptions.create());
}
I do not know what to do to get the chart inside the PDF, could anybody tell me how to proceed?
Thanks in advance.

I don't know if this helps you but you could use "jacob" (I don't know if its possible with apache poi or docx4j)
With this solution you open "Word" yourself and export it as pdf.
!Word needs to be installed on the computer!
Heres the download-page: http://sourceforge.net/projects/jacob-project/
try {
if (System.getProperty("os.arch").contains("64")) {
System.load(DLL_64BIT_PATH);
} else {
System.load(DLL_32BIT_PATH);
}
} catch (UnsatisfiedLinkError e) {
//TODO
} catch (IOException e) {
//TODO
}
ActiveXComponent oleComponent = new ActiveXComponent("Word.Application");
oleComponent.setProperty("Visible", false);
Variant var = Dispatch.get(oleComponent, "Documents");
Dispatch document = var.getDispatch();
Dispatch activeDoc = Dispatch.call(document, "Open", fileName).toDispatch();
// https://msdn.microsoft.com/EN-US/library/office/ff845579.aspx
Dispatch.call(activeDoc, "ExportAsFixedFormat", new Object[] { "path to pdfFile.pdf", new Integer(17), false, 0 });
Object args[] = { new Integer(0) };//private static final int DO_NOT_SAVE_CHANGES = 0;
Dispatch.call(activeDoc, "Close", args);
Dispatch.call(oleComponent, "Quit");

How to edit MS Word documents using Java?

I do have few Word templates, and my requirement is to replace some of the words/place holders in the document based on the user input, using Java. I tried lot of libraries including 2-3 versions of docx4j but nothing work well, they all just didn't do anything!
I know this question has been asked before, but I tried all options I know. So, using what java library I can "really" replace/edit these templates? My preference goes to the "easy to use / Few line of codes" type libraries.
I am using Java 8 and my MS Word templates are in MS Word 2007.
Update
This code is written by using the code sample provided by SO member Joop Eggen
public Main() throws URISyntaxException, IOException, ParserConfigurationException, SAXException
{
URI docxUri = new URI("C:/Users/Yohan/Desktop/yohan.docx");
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties);
Path documentXmlPath = zipFS.getPath("/word/document.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
byte[] content = Files.readAllBytes(documentXmlPath);
String xml = new String(content, StandardCharsets.UTF_8);
//xml = xml.replace("#DATE#", "2014-09-24");
xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper"));
content = xml.getBytes(StandardCharsets.UTF_8);
Files.write(documentXmlPath, content);
}
However this returns the below error
java.nio.file.ProviderNotFoundException: Provider "C" Not found
at: java.nio.file.FileSystems.newFileSystem(FileSystems.java:341) at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)
at java.nio.fileFileSystems.newFileSystem(FileSystems.java:276)

One may use for docx (a zip with XML and other files) a java zip file system and XML or text processing.
URI docxUri = ,,, // "jar:file:/C:/... .docx"
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
Path documentXmlPath = zipFS.getPath("/word/document.xml");
When using XML:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Files.newInputStream(documentXmlPath));
//Element root = doc.getDocumentElement();
You can then use XPath to find the places, and write the XML back again.
It even might be that you do not need XML but could replace place holders:
byte[] content = Files.readAllBytes(documentXmlPath);
String xml = new String(content, StandardCharsets.UTF_8);
xml = xml.replace("#DATE#", "2014-09-24");
xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
...
content = xml.getBytes(StandardCharsets.UTF_8);
Files.delete(documentXmlPath);
Files.write(documentXmlPath, content);
For a fast development, rename a copy of the .docx to a name with the .zip file extension, and inspect the files.
File.write should already apply StandardOpenOption.TRUNCATE_EXISTING, but I have added Files.delete as some error occured. See comments.

Try Apache POI. POI can work with doc and docx, but docx is more documented therefore support of it better.
UPD: You can use XDocReport, which use POI. Also I recomend to use xlsx for templates because it more suitable and more documented

I have spent a few days on this issue, until I found that what makes the difference is the try-with-resources on the FileSystem instance, appearing in Joop Eggen's snippet but not in question snippet:
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties))
Without such try-with-resources block, the FileSystem resource will not be closed (as explained in Java tutorial), and the word document not modified.

Stepping back a bit, there are about 4 different approaches for editing words/placeholders:
MERGEFIELD or DOCPROPERTY fields (if you are having problems with this in docx4j, then you have probably not set up your input docx correctly)
content control databinding
variable replacement on the document surface (either at the DOM/SAX level, or using a library)
do stuff as XHTML, then import that
Before choosing one, you should decide whether you also need to be able to handle:
repeating data (eg adding table rows)
conditional content (eg entire paragraphs which will either be present or absent)
adding images
If you need these, then MERGEFIELD or DOCPROPERTY fields are probably out (though you can also use IF fields, if you can find a library which supports them). And adding images makes DOM/SAX manipulation as advocated in one of the other answers, messier and error prone.
The other things to consider are:
your authors: how technical are they? What does that imply for the authoring UI?
the "user input" you mention for variable replacement, is this given, or is obtaining it part of the problem you are solving?

Please try this to edit or replace the word in document
public class UpdateDocument {
public static void main(String[] args) throws IOException {
UpdateDocument obj = new UpdateDocument();
obj.updateDocument(
"c:\\test\\template.docx",
"c:\\test\\output.docx",
"Piyush");
}
private void updateDocument(String input, String output, String name)
throws IOException {
try (XWPFDocument doc = new XWPFDocument(
Files.newInputStream(Paths.get(input)))
) {
List<XWPFParagraph> xwpfParagraphList = doc.getParagraphs();
//Iterate over paragraph list and check for the replaceable text in each paragraph
for (XWPFParagraph xwpfParagraph : xwpfParagraphList) {
for (XWPFRun xwpfRun : xwpfParagraph.getRuns()) {
String docText = xwpfRun.getText(0);
//replacement and setting position
docText = docText.replace("${name}", name);
xwpfRun.setText(docText, 0);
}
}
// save the docs
try (FileOutputStream out = new FileOutputStream(output)) {
doc.write(out);
}
}
}
}

How to create image from PDF using PDFBox in JAVA

I want to create an image from first page of PDF . I am using PDFBox . After researching in web , I have found the following snippet of code :
public class ExtractImages
{
public static void main(String[] args)
{
ExtractImages obj = new ExtractImages();
try
{
obj.read_pdf();
}
catch (IOException ex)
{
System.out.println("" + ex);
}
}
void read_pdf() throws IOException
{
PDDocument document = null;
try
{
document = PDDocument.load("H:\\ct1_answer.pdf");
}
catch (IOException ex)
{
System.out.println("" + ex);
}
List<PDPage>pages = document.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
int i =1;
String name = null;
while (iter.hasNext())
{
PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map pageImages = resources.getImages();
if (pageImages != null)
{
Iterator imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
PDXObjectImage image = (PDXObjectImage) pageImages.get(key);
image.write2file("H:\\image" + i);
i ++;
}
}
}
}
}
In the above code there is no error . But the output of this code is nothing . I have expected that the above code will produce a series of image which will be saved in H drive . But there is no image in that code produced from this code . Why ?

Without trying to be rude, here is what the code you posted does inside its main working loop:
PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map pageImages = resources.getImages();
It's getting each page from the PDF file, getting the resources from the page, and extracting the embedded images. It then writes those to disk.
If you are to be a competent software developer you need to be able to research and read documentation. With Java, that means Javadocs. Googling PDPage (or explicitly going to the apache site) turns up the Javadoc for PDPage.
On that page you find two versions of the method convertToImage() for converting the PDPage to an image. Problem solved.
Except ...
Unfortunately, they return a java.awt.image.BufferedImage which based on other questions you have asked is a problem because it is not supported on the Android platform which is what you're working on.
In short, you can't use Apache's PDFBox on Android to do what you're trying to do.
Searching on StackOverflow you find this same question posed several times in different forms, which will lead you to this: https://stackoverflow.com/questions/4665957/pdf-parsing-library-for-android/4766335#4766335 with the following answer that would be of interest to you: https://stackoverflow.com/a/4779852/302916
Unfortunately even the one that the aforementioned answer says will work ... is not very user friendly; there's no "How to" or docs that I can find. It's also labeled as "alpha". This is probably not something for the feint hearted as it's going to require reading and understanding their code to even start using it.

I copied your above code and added following libs to my buildpath in eclipse. It is working.
Apache PDFBox 1.7.1 libs
Commons Logging 1.1.1 libs

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.