iText PdfAConformanceException: PDF array is out of bounds - java

i'm using iText 5.5.5 with Java5.
I'm trying to merge some PDF/A. when I got a "PdfAConformanceException: PDF array is out of bounds".
Trying to find error I find the "bad PDF" that cause the error and when I try to copy just it exception throw again. This error don't appear always, it appear just when this PDF/A is in the "job chain"; I tried with some other files and it's all fine. I cant share with you source PDF 'couse it's restricted.
That's my piece of code:
_log.info("Start Document Merge");
// Output pdf
ByteArrayOutputStream bos = new ByteArrayOutputStream();
com.itextpdf.text.Document document = new com.itextpdf.text.Document();
PdfCopy copy = new PdfACopy(document, bos, PdfAConformanceLevel.PDF_A_1B);
PageStamp stamp = null;
PdfReader reader = null;
PdfContentByte content = null;
int outPdfPageCount = 0;
BaseFont baseFont = BaseFont.createFont("Arial", BaseFont.WINANSI, BaseFont.EMBEDDED);
copyOutputIntents(reader, copy);
// Loop over the pages in that document
try {
int numberOfPages = reader.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfImportedPage pagecontent = copy.getImportedPage(reader, i);
_log.debug("Handling page numbering [" + i + "]");
stamp = copy.createPageStamp(pagecontent);
content = stamp.getUnderContent();
content.beginText();
content.setFontAndSize(baseFont, Configuration.NumPagSize);
content.showTextAligned(PdfContentByte.ALIGN_CENTER, String.format("%s %s ", Configuration.NumPagPrefix, i), Configuration.NumPagX, Configuration.NumPagY, 0);
content.endText();
stamp.alterContents();
copy.addPage(pagecontent);
outPdfPageCount++;
if (outPdfPageCount > Configuration.MaxPages) {
_log.error("Pdf Page Count > MaxPages");
throw new PackageException(Constants.ERROR_104_TEXT, Constants.ERROR_104);
}
}
copy.freeReader(reader);
reader.close();
copy.createXmpMetadata();
document.close();
} catch (Exception e) {
_log.error("Error during mergin Document, skip");
_log.debug(MiscUtil.stackToString(e));
}
return bos.toByteArray();
That's the full stacktrace:
com.itextpdf.text.pdf.PdfAConformanceException: PDF array is out of bounds.
at com.itextpdf.text.pdf.internal.PdfA1Checker.checkPdfObject(PdfA1Checker.java:269)
at com.itextpdf.text.pdf.internal.PdfAChecker.checkPdfAConformance(PdfAChecker.java:208)
at com.itextpdf.text.pdf.internal.PdfAConformanceImp.checkPdfIsoConformance(PdfAConformanceImp.java:71)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3480)
at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3476)
at com.itextpdf.text.pdf.PdfArray.toPdf(PdfArray.java:165)
at com.itextpdf.text.pdf.PdfDictionary.toPdf(PdfDictionary.java:149)
at com.itextpdf.text.pdf.PdfArray.toPdf(PdfArray.java:175)
at com.itextpdf.text.pdf.PdfDictionary.toPdf(PdfDictionary.java:149)
at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:373)
at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:369)
at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:843)
at com.itextpdf.text.pdf.PdfCopy.addToBody(PdfCopy.java:839)
at com.itextpdf.text.pdf.PdfCopy.addToBody(PdfCopy.java:821)
at com.itextpdf.text.pdf.PdfCopy.copyIndirect(PdfCopy.java:426)
at com.itextpdf.text.pdf.PdfCopy.copyIndirect(PdfCopy.java:446)
at com.itextpdf.text.pdf.PdfCopy.copyObject(PdfCopy.java:577)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:503)
at com.itextpdf.text.pdf.PdfCopy.copyObject(PdfCopy.java:573)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:503)
at com.itextpdf.text.pdf.PdfCopy.copyObject(PdfCopy.java:573)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:493)
at com.itextpdf.text.pdf.PdfCopy.copyDictionary(PdfCopy.java:519)
at com.itextpdf.text.pdf.PdfCopy.addPage(PdfCopy.java:663)
at com.itextpdf.text.pdf.PdfACopy.addPage(PdfACopy.java:115)
at it.m2sc.engageone.documentpackage.generator.PackageGenerator.mergePDF(PackageGenerator.java:256)

In that specific case, the problem depends by a specific Font ( Gulim ) that is too big to be embedded in PDF/A-1 file. When that font was removed, everything war run fine.

Related

PDFBox join two pdfside by side optimizing disk space

I am using PDFBox to join two PDFs side by side.
I am using the following code:
PDDocument outDoc = new PDDocument();
int maxPages = targetDoc.getNumberOfPages();
if (sourceDoc.getNumberOfPages() > targetDoc.getNumberOfPages()) {
maxPages = sourceDoc.getNumberOfPages();
}
PDPage sourceIndexPage;
PDPage targetIndexPage;
PDRectangle pdf1Frame;
PDRectangle pdf2Frame;
PDRectangle outPdfFrame;
COSDictionary dict;
PDPage outPdfPage;
LayerUtility layerUtility;
PDFormXObject sourceFormPDF;
PDFormXObject targetFormPDF;
AffineTransform afLeft;
AffineTransform afRight;
for (int indexPage = 0; indexPage < maxPages; indexPage++) {
// Create output PDF frame
try {
sourceIndexPage = sourceDoc.getPage(indexPage);
} catch (IndexOutOfBoundsException error) {
sourceDoc.addPage(new PDPage());
sourceIndexPage = targetDoc.getPage(indexPage);
}
try {
targetIndexPage = targetDoc.getPage(indexPage);
} catch (IndexOutOfBoundsException error) {
targetDoc.addPage(new PDPage());
targetIndexPage = targetDoc.getPage(indexPage);
}
sourceIndexPage.setRotation(0);
targetIndexPage.setRotation(0);
pdf1Frame = sourceIndexPage.getCropBox();
pdf2Frame = targetIndexPage.getCropBox();
outPdfFrame = new PDRectangle(pdf1Frame.getWidth() + pdf2Frame.getWidth(),
Math.max(pdf1Frame.getHeight(), pdf2Frame.getHeight()));
// Create output page with calculated frame and add it to the document
dict = new COSDictionary();
dict.setItem(COSName.TYPE, COSName.PAGE);
dict.setItem(COSName.MEDIA_BOX, outPdfFrame);
dict.setItem(COSName.CROP_BOX, outPdfFrame);
dict.setItem(COSName.ART_BOX, outPdfFrame);
outPdfPage = new PDPage(dict);
outDoc.addPage(outPdfPage);
// Source PDF pages has to be imported as form XObjects to be able to insert them at a specific point in the output page
// pageNumber
layerUtility = new LayerUtility(outDoc);
sourceFormPDF = layerUtility.importPageAsForm(sourceDoc, indexPage);
targetFormPDF = layerUtility.importPageAsForm(targetDoc, indexPage);
// Add form objects to output page
afLeft = new AffineTransform();
layerUtility.appendFormAsLayer(outPdfPage, sourceFormPDF, afLeft, "left " + indexPage);
afRight = AffineTransform.getTranslateInstance(pdf1Frame.getWidth(), 0.0);
layerUtility.appendFormAsLayer(outPdfPage, targetFormPDF, afRight, "right" + indexPage);
}
outDoc.save("oudDoc.pdf");
The issue I have is that for some documents, the size of the outDoc is too high. I expected it to be something around dim source document + dim target document, but it is 10x, 20x more in reality.
Looking inside the document's structure, I noticed that I am repeating common resources that in the original PDFs were separated. Is there a way to compress/optimize my code to have less space on disk?
We solved the problem by postprocessing the generated pdf with ghostscript

Merging PDF Files in-between another PDF File in java using IText or PDFBox

I have two PDF files A and B, I have a requirement where i need to Merge both these PDF files based on a condition.Like,If i find a string like "attach PDF" in my A, i have to merge the B file into A from that particular page in A. For Example,If i spot the word in Page No 3 in my A file I need to merge the B file from Page No:3. I'm using I-Text 5.5.10. Is it possible to achieve this in I-Text or PDFBox. Here is what i have tried as of now.
public static void mergePdf() throws IOException, DocumentException
{
PdfReader reader1 = new PdfReader("C:\\Users\\user1\\Downloads\\generatedSample.pdf");
PdfReader reader2 = new PdfReader("C:\\Users\\user1\\Desktop\\sample1.pdf");
Document document = new Document();
document.addHeader("Header Text", "");
FileOutputStream fos = new FileOutputStream("C:\\Users\\user1\\Downloads\\MergeFile.pdf");
PdfCopy copy = new PdfCopy(document, fos);
document.open();
PdfImportedPage page;
PdfCopy.PageStamp stamp;
Phrase phrase;
BaseFont bf = BaseFont.createFont();
Font font = new Font(bf, 9);
int n = reader1.getNumberOfPages();
for (int i = 1; i <= reader1.getNumberOfPages(); i++)
{
page = copy.getImportedPage(reader1, i);
stamp = copy.createPageStamp(page);
ColumnText.showTextAligned(stamp.getOverContent(), Element.ALIGN_CENTER, null, 520, 5, 0);
stamp.alterContents();
copy.addPage(page);
}
for (int i = 1; i <= reader2.getNumberOfPages(); i++)
{
page = copy.getImportedPage(reader2, i);
stamp = copy.createPageStamp(page);
ColumnText.showTextAligned(stamp.getOverContent(), Element.ALIGN_CENTER, null, 520, 5, 0);
stamp.alterContents();
copy.addPage(page);
}
document.close();
reader1.close();
reader2.close();
}
Thanks for the solution in advance !!

merge multiple pdfs in order

hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf
It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}

iText rotate() won't orient the first page

Everything I have read regarding iText says you should be able to set the page size and then create a new page. But for some reason when I try this my first page isn't rotated. But my second is. Any ideas?
response.setContentType("application/pdf");
Document document = new Document();
try{
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
PdfWriter.getInstance(document, buffer);
document.open();
//Start a new page
document.setPageSize(PageSize.LETTER.rotate()); // 11" x 8.5" new Rectangle(792f, 612f)
document.newPage();
Paragraph topText = new Paragraph();
// add some content here...
document.close();
DataOutput dataOutput = new DataOutputStream(response.getOutputStream());
byte[] bytes = buffer.toByteArray();
response.setContentLength(bytes.length);
for(int i = 0; i < bytes.length; i++) {
dataOutput.writeByte(bytes[i]);
}
} catch (DocumentException e) {
e.printStackTrace();
}
document.newPage() really means "finish the current page and open a new one". This implies that after you open() a document, you already have a blank page (with whatever size the document had set before) ready.
You should set your page size before opening the document:
document.setPageSize(PageSize.LETTER.rotate());
document.open();

Apache POI HWPF - problem in convert doc file to pdf

I am currently working Java project with use of apache poi.
Now in my project I want to convert doc file to pdf file. The conversion done successfully but I only get text in pdf not any text style or text colour.
My pdf file looks like a black & white. While my doc file is coloured and have different style of text.
This is my code,
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("/document/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
System.out.println("Length:" + paragraphs[i].length());
System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());
// add the paragraph to the document
document.add(new Paragraph(paragraphs[i]));
}
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
please help me.
Thnx in advance.
If you look at Apache Tika, there's a good example of reading some style information from a HWPF document. The code in Tika generates HTML based on the HWPF contents, but you should find that something very similar works for your case.
The Tika class is
https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
One thing to note about word documents is that everything in any one Character Run has the same formatting applied to it. A Paragraph is therefore made up of one or more Character Runs. Some styling is applied to a Paragraph, and other parts are done on the runs. Depending on what formatting interests you, it may therefore be on the paragraph or the run.
If you use WordExtractor, you will get text only. Try using CharacterRun class. You will get style along with text. Please refer following Sample code.
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++) {
org.apache.poi.hwpf.usermodel.Paragraph poiPara = range.getParagraph(i);
int j = 0;
while (true) {
CharacterRun run = poiPara.getCharacterRun(j++);
System.out.println("Color "+run.getColor());
System.out.println("Font size "+run.getFontSize());
System.out.println("Font Name "+run.getFontName());
System.out.println(run.isBold()+" "+run.isItalic()+" "+run.getUnderlineCode());
System.out.println("Text is "+run.text());
if (run.getEndOffset() == poiPara.getEndOffset()) {
break;
}
}
}

Categories

Resources