Merge documents to create TOC in iText (Java)

Merge documents to create TOC in iText (Java) - java

When creating documents with iText that need a table of contents, I have usually used a process where I create the main document in memory, create the TOC as a separate document in memory (using dummy links), merge them as a third document, and then use a PdfStamper to reconcile the links into the document and write it to a file.
This works with all versions of iText except the most recent (5.5.6). I will include a simple program that does this process (the real programs are much more complex). When running this with iText 5.5.5 or earlier, it creates the desired document (2 pages with the first page containing text that provides a link to open the second page). With 5.5.6 the call to makeRemoteNamedDestinationsLocal causes an exception com.itextpdf.text.pdf.PdfDictionary cannot be cast to com.itextpdf.text.pdf.PdfArray.
As this had always worked until the latest version, I have some suspicion that this may be a bug in the newest version. Is this a bug, or am I doing something wrong? How should I do this task if it is not a bug? Additionally, how are bug reports usually submitted for iText? From the website, it looks like they expect a question to be submitted here as a report.
import com.itextpdf.text.pdf.*;
import com.itextpdf.text.pdf.draw.*;
import java.io.*;
// WORKS CORRECTLY USING itext version 5.5.5
// FAILS WITH 5.5.6
// CAUSES AN EXCEPTION
// "com.itextpdf.text.pdf.PdfDictionary cannot be cast to com.itextpdf.text.pdf.PdfArray"
// with makeRemoteNamedDestinationsLocal()
public class testPdf {
public static void main (String[] args) throws Exception {
// Create simple document
ByteArrayOutputStream main = new ByteArrayOutputStream();
Document doc = new Document(new Rectangle(612f,792f),54f,54f,36f,36f);
PdfWriter pdfwrite = PdfWriter.getInstance(doc,main);
doc.open();
doc.add(new Paragraph("Testing Page"));
doc.close();
// Create TOC document
ByteArrayOutputStream two = new ByteArrayOutputStream();
Document doc2 = new Document(new Rectangle(612f,792f),54f,54f,36f,36f);
PdfWriter pdfwrite2 = PdfWriter.getInstance(doc2,two);
doc2.open();
Chunk chn = new Chunk("<<-- Link To Testing Page -->>");
chn.setRemoteGoto("DUMMY.PDF","page-num-1");
doc2.add(new Paragraph(chn));
doc2.close();
// Merge documents
ByteArrayOutputStream three = new ByteArrayOutputStream();
PdfReader reader1 = new PdfReader(main.toByteArray());
PdfReader reader2 = new PdfReader(two.toByteArray());
Document doc3 = new Document();
PdfCopy DocCopy = new PdfCopy(doc3,three);
doc3.open();
DocCopy.addPage(DocCopy.getImportedPage(reader2,1));
DocCopy.addPage(DocCopy.getImportedPage(reader1,1));
DocCopy.addNamedDestination("page-num-1",2,new PdfDestination(PdfDestination.FIT));
doc3.close();
// Fix references and write to file
PdfReader finalReader = new PdfReader(three.toByteArray());
// Fails on this line
finalReader.makeRemoteNamedDestinationsLocal();
PdfStamper stamper = new PdfStamper(finalReader,new FileOutputStream("Testing.pdf"));
stamper.close();
}
}

You have detected a bug that was introduced in iText 5.5.6. This has already been fixed in our repository:
Thank you for reporting this bug. You can find the fix on github: https://github.com/itext/itextpdf/commit/eac1a4318e6c31b054e0726ad44d0da5b8a720c2

Related

Form fields created using iText5 PDF Java library, duplicates the fields when uploaded to Adobe sign tool

Downloaded pdf with form field from code
Adobe sign tool duplicating the form fields
I have tried creating date fields and signature fields for PDF using iText5 Library and generated the PDF. In the downloaded pdf when opened in Adobe Acrobat Reader, I see only one field created, but when the pdf is uploaded to Adobe Sign, the form fields are duplicated.
Below is my code:
Document document = new Document(PageSize.A4,40,40,60,50);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document,baos);
PdfPTable signatureTable = createPdfTable(2);
signatureTable.setWidths(new int[] {1,1});
PdfPCell signatureCell = new PdfPCell();
TextField signatureText = new TextField(writer,signatureCell,"TCSig1_es_:signer1:signature");
signatureText.setBackgroundColor(BaseColor.WHITE);
PdfFormField signField = signatureText.getTextField();
signField.setPlaceInPage(4);
FieldPositioningEvents signatureEvent = new FieldPositioningEvents(writer, signField);
signatureCell.setCellEvent(signatureEvent);
signatureTable.addCell(signatureCell);
document.add(signatureTable);

Using iText 2.1.7 to merge large PDFs

I am using an older version of iText (2.1.7) to merge PDFs. Because that is the last version under the MPL available to me. I cannot change this.
Anyways. I am trying to merge multiple PDFs. Everything seems to work ok, but when I go over about 1500 pages, then the generated PDF fails to open (behaves as if it is corrupted)
This is how I am doing it:
private byte[] mergePDFs(List<byte[]> pdfBytesList) throws DocumentException, IOException {
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfCopy copy = new PdfCopy(document, outputStream);
document.open();
for (byte[] pdfByteArray : pdfBytesList) {
ByteArrayInputStream readerStream = new ByteArrayInputStream(pdfByteArray);
PdfReader reader = new PdfReader(readerStream);
for (int i = 0; i < reader.getNumberOfPages(); ) {
copy.addPage(copy.getImportedPage(reader, ++i));
}
copy.freeReader(reader);
reader.close();
}
document.close();
return outputStream.toByteArray();
}
Is this the correct approach? Is there anything about this that would hint at breaking when going over a certain amount of pages? There are no exceptions thrown or anything.

For anyone curious, the issue had nothing to do with iText and instead was the code responsible for returning the response from iText.

How to copy/move AcroForm fields from one document to new blank one using IText5 or IText7?

I need to copy whole AcroForm including field positions and values from template PDF to a new blank PDF file. How can I do that?
In short words - I need to get rid of "background" from the template and leave only filed forms.
The whole point of this is to create a PDF with content that would be printed on pre-printed templates.
I am using IText 5 but I can switch to 7 if usefull examples would be provided

After a lot of trial and error I have found the solution to "How to copy AcfroForm fields into another PDF". It is a iText v7 version. I hope it will help somebody someday.
private byte[] copyFormElements(byte[] sourceTemplate) throws IOException {
PdfReader completeReader = new PdfReader(new ByteArrayInputStream(sourceTemplate));
PdfDocument completeDoc = new PdfDocument(completeReader);
ByteArrayOutputStream out = new ByteArrayOutputStream();
PdfWriter offsetWriter = new PdfWriter(out);
PdfDocument offsetDoc = new PdfDocument(offsetWriter);
offsetDoc.initializeOutlines();
PdfPage blank = offsetDoc.addNewPage();
PdfAcroForm originalForm = PdfAcroForm.getAcroForm(completeDoc, false);
// originalForm.getPdfObject().copyTo(offsetDoc,false);
PdfAcroForm offsetForm = PdfAcroForm.getAcroForm(offsetDoc, true);
for (String name : originalForm.getFormFields().keySet()) {
PdfFormField field = originalForm.getField(name);
PdfDictionary copied = field.getPdfObject().copyTo(offsetDoc, false);
PdfFormField copiedField = PdfFormField.makeFormField(copied, offsetDoc);
offsetForm.addField(copiedField, blank);
}
offsetDoc.close();
completeDoc.close();
return out.toByteArray();
}

Did you check the PdfCopyForms object:
Allows you to add one (or more) existing PDF document(s) to create a new PDF and add the form of another PDF document to this new PDF.
I didn't find an example, but you could try something like this:
PdfReader reader1 = new PdfReader(src1); // a document with a form
PdfReader reader2 = new PdfReader(src2); // a document without a form
PdfCopyForms copy = new PdfCopyForms(new FileOutputStream(dest));
copy.AddDocument(reader1); // add the document without the form
copy.CopyDocumentFields(reader2); // add the fields of the document with the form
copy.close();
reader1.close();
reader2.close();
I see that the class is deprecated. I'm not sure of that's because iText 7 makes it much easier to do this, or if it's because there were technical problems with the class.

Converting a pdf to word document using java

I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\\java\\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}

In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
The code is quite simple:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.

You can use 7-pdf library
have a look at this it may help :
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!

Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConverter can help you.

Remove page from PDF

I'm currently using iText and I'm wondering if there is a way to delete a page from a PDF file?
I have opened it up with a reader etc., and I want to remove a page before it is then saved back to a new file; how can I do that?

The 'better' way to 'delete' pages is doing
reader.selectPages("1-5,10-12");
Which means we only select pages 1-5, 10-12 effectively 'deleting' pages 6-9.

Get the reader of existing pdf file by
PdfReader pdfReader = new PdfReader("source pdf file path");
Now update the reader by
pdfReader.selectPages("1-5,15-20");
then get the pdf stamper object to write the changes into a file by
PdfStamper pdfStamper = new PdfStamper(pdfReader,
new FileOutputStream("destination pdf file path"));
close the PdfStamper by
pdfStamper.close();
It will close the PdfReader too.
Cheers.....

For iText 7 I found this example:
PdfReader pdfReader = new PdfReader(PATH + name + ".pdf");
PdfDocument srcDoc = new PdfDocument(pdfReader);
PdfDocument resultDoc = new PdfDocument(new PdfWriter(PATH + name + "_cut.pdf"));
resultDoc.initializeOutlines();
srcDoc.copyPagesTo(1, 2, resultDoc);
resultDoc.close();
srcDoc.close();
See also here: clone-reordering-pages
and here: clone-splitting-pdf-file

You can use a PdfStamper in combination with PdfCopy.
In this answer it is explained how to copy a whole document. If you change the criteria for the loop in the sample code you can remove the pages you don't need.

Here is a removing function ready for real life usage. Proven to work ok with itext 2.1.7. It does not use "strigly typing" also.
/**
* Removes given pages from a document.
* #param reader document
* #param pagesToRemove pages to remove; 1-based
*/
public static void removePages(PdfReader reader, int... pagesToRemove) {
int pagesTotal = reader.getNumberOfPages();
List<Integer> allPages = new ArrayList<>(pagesTotal);
for (int i = 1; i <= pagesTotal; i++) {
allPages.add(i);
}
for (int page : pagesToRemove) {
allPages.remove(new Integer(page));
}
reader.selectPages(allPages);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Merge documents to create TOC in iText (Java) - java

You have detected a bug that was introduced in iText 5.5.6. This has already been fixed in our repository: Thank you for reporting this bug. You can find the fix on github: https://github.com/itext/itextpdf/commit/eac1a4318e6c31b054e0726ad44d0da5b8a720c2

Related

Form fields created using iText5 PDF Java library, duplicates the fields when uploaded to Adobe sign tool

Using iText 2.1.7 to merge large PDFs

How to copy/move AcroForm fields from one document to new blank one using IText5 or IText7?

Converting a pdf to word document using java

Remove page from PDF

Categories

Resources