Converting Tiff to PDF: PDF is corrupted

Converting Tiff to PDF: PDF is corrupted - java

I followed this example of iText 7 to convert a multi-page Tiff into a multi-page PDF, but when I open the PDF it's corrupted. Adobe Reader displays an error and Chrome shows this:
(Every page looks like that, but they aren't identical).
This is the code I used:
File newPdfFile = new File("<path...>/converted_file.pdf");
URL tiffUrl = UrlUtil.toURL("<path...>/original_file.tif");
IRandomAccessSource ras = new RandomAccessSourceFactory().createSource(tiffUrl);
RandomAccessFileOrArray rafoa = new RandomAccessFileOrArray(ras);
int numberOfPages = TiffImageData.getNumberOfPages(rafoa);
PdfDocument pdf = new PdfDocument(new PdfWriter(new FileOutputStream(newPdfFile)));
Document document = new Document(pdf);
for(int i = 1; i <= numberOfPages; ++i) {
Image image = new Image(ImageDataFactory.createTiff(tiffUrl, true, i, true));
document.add(image);
}
document.close();
pdf.close();
And this is the code I used with iText 5.5.11, which works but uses a deprecated constructor of RandomAccessFileOrArray:
File newPdfFile = new File("<path...>/converted_file.pdf");
RandomAccessFileOrArray rafoa = new RandomAccessFileOrArray("<path...>/original_file.tif");
int numberOfPages = TiffImage.getNumberOfPages(rafoa);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(newPdfFile));
document.open();
for (int i = 1; i <= numberOfPages; ++i) {
Image image = TiffImage.getTiffImage(rafoa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
Unfortunately I can't provide sample files because they are confidential/classified...
What could be the issue?
P.S.: I tried with the same tiff used in the example code I followed and it works. What's wrong with my tiffs? In the file properties, other than the dimensions and resolution there's:
Bit depth: 1
Compression: CCITT T.4
Resolution unit: 2

Ok, thanks to Michaël Demey's suggestions I managed to get the proper pdf using iText 7.
Here's the maven imports:
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>layout</artifactId>
<version>7.0.3</version>
</dependency>
And here's the code:
import com.itextpdf.io.image.ImageDataFactory;
import com.itextpdf.kernel.geom.PageSize;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Image;
import java.io.File;
import java.io.FileOutputStream;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
[...]
File newPdfFile = new File("<path...>/converted_file.pdf");
ImageReader reader = ImageIO.getImageReadersByFormatName("TIFF").next();
reader.setInput(ImageIO.createImageInputStream(new File("<path...>/original_file.tif")));
int numberOfPages = reader.getNumImages(true);
PdfDocument pdf = new PdfDocument(new PdfWriter(new FileOutputStream(newPdfFile)));
Document document = new Document(pdf);
for(int i = 0; i < numberOfPages; ++i) {// in javax.imageio.ImageReader they start from 0!
java.awt.Image img = reader.read(i);
Image tempImage = new Image(ImageDataFactory.create(img, null));
pdf.addNewPage(new PageSize(tempImage.getImageWidth(), tempImage.getImageHeight()));
tempImage.setFixedPosition(i + 1, 0, 0);
document.add(tempImage);
}
document.close();
pdf.close();

I had also faced the same issue .I was able to convert file which is of "TIFF" extension and with that of "TIF" extension i was not able to get it converted properly .Then i tried something that was able to do it.Try changing as below and check from
Image image = new Image(ImageDataFactory.createTiff(tiffUrl, true, i, true));
to
Image image = new Image(ImageDataFactory.createTiff(tiffUrl, true, i, false));
This made my conversion work.Hope it works for you as well

Related

How to convert a PDF to a JSON/EXCEL/WORD file?

I need to get data from the pdf file with its header for further comparing with DB data
I tried to use the pdfbox , google vision ocr , itext, but all libraries gave me a row without structure and headers.
Example: Date\nNumber\nStatus\n12\12\2020\n442334\delivered
I will trying convert pdf to excel/word and get data from them, but for this realisation i need reading pdf and write data in excel/word
How can I get data with headers?

"Date\nNumber\nStatus\n12/12/2020\n442334\ndelivered" looks structured enough to me. You could just split it at the "\n"s. That would require some knowledge of the table structure, though.
I've made good experience with Google Vision OCR. How are you calling it?

I not found answer on my question.
I'm use this code for my task :
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import java.awt.*;
import java.io.File;
import java.io.IOException;
public class ExtractTextByArea {
public String getTextFromCoordinate(String filepath,int x,int y,int width,int height) {
String result = "";
try (PDDocument document = PDDocument.load(new File(filepath))) {
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
// Rectangle rect = new Rectangle(260, 35, 70, 10);
Rectangle rect = new Rectangle(x,y,width,height);
stripper.addRegion("class1", rect);
PDPage firstPage = document.getPage(0);
stripper.extractRegions( firstPage );
// System.out.println("Text in the area:" + rect);
result = stripper.getTextForRegion("class1");
}
} catch (IOException e){
System.err.println("Exception while trying to read pdf document - " + e);
}
return result;
}
}

Java - extract text from pdf from selected area to txt

The idea is next,
user selects a pdf file, and then this file converted into an image and such an image is displayed in the application.
In the image the user can choose positions that wants to read from a pdf file, and when the finish with selection position in the background program reads the original pdf and text stored in a txt file.
It is important that the resulting image from pdf file is the same size as himself pdf file
The next code convert pdf to image. I use pdfrenderer-0.9.1.jar
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import javax.imageio.ImageIO;
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;
public class Pdf2Image {
public static void main(String[] args) {
File file = new File("E:\\invoice-template-1.pdf");
RandomAccessFile raf;
try {
raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
// draw the first page to an image
int num=pdffile.getNumPages();
for(int i=0;i<num;i++)
{
PDFPage page = pdffile.getPage(i);
//get the width and height for the doc at the default zoom
int width=(int)page.getBBox().getWidth();
int height=(int)page.getBBox().getHeight();
Rectangle rect = new Rectangle(0,0,width,height);
int rotation=page.getRotation();
Rectangle rect1=rect;
if(rotation==90 || rotation==270)
rect1=new Rectangle(0,0,rect.height,rect.width);
//generate the image
BufferedImage img = (BufferedImage)page.getImage(
rect.width, rect.height, //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img, "png", new File("E:/invoice-template-"+i+".png"));
}
}
catch (FileNotFoundException e1) {
System.err.println(e1.getLocalizedMessage());
} catch (IOException e) {
System.err.println(e.getLocalizedMessage());
}
}
}
Then the image is displayed to the user in JavaFX application in ImageView components.
Can you help me to get the exact position of the mouse, the mouse when the user selects a portion of the image from which you want to read the text in the pdf file?
With this code I read pdf file and get text from the set position, only I must to manually input position:( . I use pdfbox-1.3.1.jar.
I would like to position the client chooses to keep a picture in the list and read the text from the pdf file with all of these positions.
File file = new File("E:/invoice-template-1.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
Rectangle rect1 = new Rectangle(38, 275, 15, 100);
Rectangle rect2 = new Rectangle(54, 275, 40, 100);
stripper.addRegion("row1column1", rect1);
stripper.addRegion("row1column2", rect2);
List allPages = document.getDocumentCatalog().getAllPages();
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
int j = 0;
for (PDPage page : pages) {
stripper.extractRegions(page);
stripper.setSortByPosition(true);
List<String> regions = stripper.getRegions();
for (String region : regions) {
String text = stripper.getTextForRegion(region);
System.out.println("Region: " + region + " on Page " + j);
System.out.println("\tText: \n" + text);
}
For example,
in the next invoice, I want to select the 4 positions to export the text, and when you select the picture, the dimensions of keeping in the list, then go through the list and from those positions export text from pdf file.

How to insert the values in page pdf with itext pdf

I tried to insert values into the pdf like this :
String nomeArquivo ="tempRelatorioPDF.pdf";
File file = new File(nomeArquivo);
file.createNewFile();
PdfWriter.getInstance(doc, new FileOutputStream(file));
doc.open();
PdfWriter writer = PdfWriter.getInstance(doc, new `FileOutputStream(nomeArquivo));`
Document doc = new Document(PageSize.A4.rotate());
while(iteratorRow.hasNext()){
Object valorCelulaRow = iteratorRow.next();
Paragraph p3 = new Paragraph();
p3.add(colunas[i].getName());
p3.add("\n");
doc.add((com.itextpdf.text.Element)p3);
if(valorCelulaRow != null){
Paragraph p = new Paragraph(valorCelulaRow.toString());
doc.add(p);
}
doc.close();
Displays the error: PDF FILE IS CORRUPT actually gives error opening the file appears to be corrupt.
**
ADJUSTMENTS PRIOR TO GET TO THIS POINT
**
I use the following libraries for this:
import com.itextpdf.text.Chunk;
import com.itextpdf.text.ListItem;
*>>>>> import com.itextpdf.text.Paragraph;*
...
import com.itextpdf.text.DocumentException;
*>>>>> import com.lowagie.text.Document;*
import com.lowagie.text.Element;
import com.lowagie.text.Font;
...
There is a conlito when retreat commented library and additio the itext. the doc.Add in error.
import com.itextpdf.text.Document;
// import com.lowagie.text.Document;
ERROR After the change of import reference
Document doc = new Document(PageSize.A4.rotate());
//The constructor Document(Rectangle) is undefined
PdfWriter.getInstance(doc, new FileOutputStream(file));
PdfWriter writer = PdfWriter.getInstance(doc, new FileOutputStream(nomeArquivo));
//The method getInstance(Document, OutputStream) in the type PdfWriter is not applicable for the arguments (Document, FileOutputStream)
[partially solved]
Add the "(com.itextpdf.text.Element)"
change "Add" for "add";
The debug data is loaded into the variable 'p' and 'P3'. But the PDF document is generated with 0kb.

This should work, I'm not sure how it would not. I don't think you should be casting anything. Paragraph allows for direct construction using a String parameter. I take it you want the paragraph to have value of valueCellRow.toString() correct?
Document doc = new Document(PageSize.A4.rotate());
while(iteratorRow.hasNext()) {
Object valorCelulaRow = iteratorRow.next();
if(valorCelulaRow != null) {
Paragraph p = new Paragraph(valueCellRow.toString());
doc.Add (p);
}
}
doc.close();
If all else fails, it's your libraries (com.lowagie.text.Element vs com.itextpdf.text.Element) clashing. Try use:
doc.Add((com.itextpdf.text.Element) p);
Though you shouldn't need to.

How to generate a downloadable PDF with pdfbox (Corrupted PDF)?

How do I make a PDF file downloadable in a link?
I'm building a web application using JSF, when the user clicks in a "Save as PDF" link a PDF should be available to be downloaded.
So far I've a working code that generates the PDF file, but the file is saved on my desktop and what I want to do is that when the user clicks on the link the pdf file should be downloadable instead of being stored in the app.
UPDATE 3:
Thank you for your help guys, I modifed my code with your suggestions and it's working.
UPDATE 2:
I'm getting the following error: Adoble Reader could not open "yourfile.pdf" because is either not a supported file type or because the file has been damaged
UPDATE 1:
I'm adding my current code with the changes you have pointed me out, however I'm still struggling to make this work
This is my method that generated the PDF:
public ByteArrayOutputStream createPDF() throws IOException, COSVisitorException {
PDDocument document;
PDPage page;
PDFont font;
PDPageContentStream contentStream;
PDJpeg front;
PDJpeg back;
InputStream inputFront;
InputStream inputBack;
ByteArrayOutputStream output = new ByteArrayOutputStream();
// Creating Document
document = new PDDocument();
// Creating Pages
for(int i=0; i<2; i++) {
page = new PDPage();
// Adding page to document
document.addPage(page);
// Adding FONT to document
font = PDType1Font.HELVETICA;
// Retrieve Image to be added to the PDF
inputFront = new FileInputStream(new File("D:/Media/imageFront.jpg"));
inputBack = new FileInputStream(new File("D:/Media/imageBack.jpg"));
BufferedImage buffFront = ImageIO.read(inputFront);
BufferedImage resizedFront = Scalr.resize(buffFront, 460);
BufferedImage buffBack = ImageIO.read(inputBack);
BufferedImage resizedBack = Scalr.resize(buffBack, 460);
front = new PDJpeg(document, resizedFront);
back = new PDJpeg(document, resizedBack);
// Next we start a new content stream which will "hold" the to be created content.
contentStream = new PDPageContentStream(document, page);
// Let's define the content stream
contentStream.beginText();
contentStream.setFont(font, 8);
contentStream.moveTextPositionByAmount(10, 770);
contentStream.drawString("Amount: $1.00");
contentStream.endText();
contentStream.beginText();
contentStream.setFont(font, 8);
contentStream.moveTextPositionByAmount(200, 770);
contentStream.drawString("Sequence Number: 123456789");
contentStream.endText();
contentStream.beginText();
contentStream.setFont(font, 8);
contentStream.moveTextPositionByAmount(10, 760);
contentStream.drawString("Account: 123456789");
contentStream.endText();
contentStream.beginText();
contentStream.setFont(font, 8);
contentStream.moveTextPositionByAmount(200, 760);
contentStream.drawString("Captura Date: 04/25/2011");
contentStream.endText();
contentStream.beginText();
contentStream.setFont(font, 8);
contentStream.moveTextPositionByAmount(10, 750);
contentStream.drawString("Bank Number: 123456789");
contentStream.endText();
contentStream.beginText();
contentStream.setFont(font, 8);
contentStream.moveTextPositionByAmount(200, 750);
contentStream.drawString("Check Number: 123456789");
contentStream.endText();
// Let's close the content stream
contentStream.close();
}
// Finally Let's save the PDF
document.save(output);
document.close();
return output;
}
This is my servlet that call the previous code and generates the output and set the header:
try {
ByteArrayOutputStream output = new ByteArrayOutputStream();
output = createPDF();
response.addHeader("Content-Type", "application/force-download");
response.addHeader("Content-Disposition", "attachment; filename=\"yourFile.pdf\"");
response.getOutputStream().write(output.toByteArray());
} catch (Exception ex) {
ex.printStackTrace();
}
I'm not sure what I'm missing since when I try to open the PDF I got the error: Adoble Reader could not open "yourfile.pdf" because is either not a supported file type or because the file has been damaged

You need to set the proper http headers in order to tell the browser to download the file.
response.addHeader("Content-Type", "application/force-download")
response.addHeader("Content-Disposition", "attachment; filename=\"yourFile.pdf\"")

I have not done this in awhile, so bear with me, but what you do is instead of saving the pdf to a file via a stream, you save the stream in memory as a byte array and then when the user clicks on the link, you set the MIME type to PDF and then open up the byte array as a stream which you return as the response. I apologize for being a bit hazy on details. I think I also used jpedal and iText to get it done.
I can't show you all of the code, but here is some:
import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.Phrase;
import com.itextpdf.text.pdf.PdfPCell;
import com.itextpdf.text.pdf.PdfPTable;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.draw.DottedLineSeparator;
... // class, etc.
public ByteArrayOutputStream createOrderFormPdf(OrderFormDTO dto)
throws DocumentException {
Document document = new Document();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
document.open();
Paragraph header = new Paragraph();
header.add(new Phrase(...));
OrderFormDtoPdfAdapter pdfAdapter = new OrderFormDtoPdfAdapter(dto);
header.add(pdfAdapter.getPdfHeaderTable());
document.add(header);
... // other code
Paragraph footer = new Paragraph();
footer.add(pdfAdapter.getPDFFooterTable());
document.add(footer);
Paragraph paragraph = new Paragraph();
PdfTableUtils.addEmptyLine(paragraph, 2);
document.add(paragraph);
document.add(new DottedLineSeparator());
document.close();
return baos;
}
You can then write out the baos on your response as a pdf using the correct MIME type.

Creating PDF from TIFF image using iText

I'm currently generating PDF files from TIFF images using iText.
Basically the procedure is as follows:
1. Read the TIFF file.
2. For each "page" of the TIFF, instantiate an Image object and write that to a Document instance, which is the PDF file.
I'm having a hard time understanding how to add those images to the PDF keeping the original resolution.
I've tried to scale the Image to the dimensions in pixels of the original image of the TIFF, for instance:
// Pixel Dimensions 1728 × 2156 pixels
// Resolution 204 × 196 ppi
RandomAccessFileOrArray tiff = new RandomAccessFileOrArray("/path/to/tiff/file");
Document pdf = new Document(PageSize.LETTER);
Image temp = TiffImage.getTiffImage(tiff, page);
temp.scaleAbsolute(1728f, 2156f);
pdf.add(temp);
I would really appreciate if someone can shed some light on this. Perhaps I'm missing the functionality of the Image class methods...
Thanks in advance!

I think if you scale the image then you can not retain the original resolution (please correct me if I am wrong :)).
What you can try doing is to creat a PDF document with different sized pages (if images are of different resolution in the tif image).
Try the following code. It sets the size of PDF page equal to that of image file and then create that PDF page. the PDF page size varies according to the image size so the resolution is maintained :)
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Tiff2Pdf {
/**
* #param args
* #throws DocumentException
* #throws IOException
*/
public static void main(String[] args) throws DocumentException,
IOException {
String imgeFilename = "/home/saurabh/Downloads/image.tif";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(
document,
new FileOutputStream("/home/saurabh/Desktop/out"
+ Math.random() + ".pdf"));
writer.setStrictImageSequence(true);
document.open();
document.add(new Paragraph("Multipages tiff file"));
Image image;
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imgeFilename);
int pages = TiffImage.getNumberOfPages(ra);
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(ra, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.add(image);
document.newPage();
}
document.close();
}
}

I've found that this line doesn't work well:
document.setPageSize(pageSize);
If your TIFF files only contain one image then you're better off using this instead:
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imageFilePath);
Image image = TiffImage.getTiffImage(ra, 1);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
Document document = new Document(pageSize);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputFileName));
writer.setStrictImageSequence(true);
document.open();
document.add(image);
document.newPage();
document.close();
This will result in a page size that fits the image size exactly, so no scaling is required.

Another example non-deprecated up to iText 5.5 with the first page issue fixed. I'm using 5.5.11 Itext.
import java.io.FileOutputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.io.FileChannelRandomAccessSource;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Test1 {
public static void main(String[] args) throws Exception {
RandomAccessFile aFile = new RandomAccessFile("/myfolder/origin.tif", "r");
FileChannel inChannel = aFile.getChannel();
FileChannelRandomAccessSource fcra = new FileChannelRandomAccessSource(inChannel);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("/myfolder/destination.pdf"));
document.open();
RandomAccessFileOrArray rafa = new RandomAccessFileOrArray(fcra);
int pages = TiffImage.getNumberOfPages(rafa);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(rafa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
aFile.close();
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Converting Tiff to PDF: PDF is corrupted - java

Related

How to convert a PDF to a JSON/EXCEL/WORD file?

Java - extract text from pdf from selected area to txt

How to insert the values in page pdf with itext pdf

How to generate a downloadable PDF with pdfbox (Corrupted PDF)?

Creating PDF from TIFF image using iText

Categories

Resources