Creating PDF from TIFF image using iText

Creating PDF from TIFF image using iText - java

I'm currently generating PDF files from TIFF images using iText.
Basically the procedure is as follows:
1. Read the TIFF file.
2. For each "page" of the TIFF, instantiate an Image object and write that to a Document instance, which is the PDF file.
I'm having a hard time understanding how to add those images to the PDF keeping the original resolution.
I've tried to scale the Image to the dimensions in pixels of the original image of the TIFF, for instance:
// Pixel Dimensions 1728 × 2156 pixels
// Resolution 204 × 196 ppi
RandomAccessFileOrArray tiff = new RandomAccessFileOrArray("/path/to/tiff/file");
Document pdf = new Document(PageSize.LETTER);
Image temp = TiffImage.getTiffImage(tiff, page);
temp.scaleAbsolute(1728f, 2156f);
pdf.add(temp);
I would really appreciate if someone can shed some light on this. Perhaps I'm missing the functionality of the Image class methods...
Thanks in advance!

I think if you scale the image then you can not retain the original resolution (please correct me if I am wrong :)).
What you can try doing is to creat a PDF document with different sized pages (if images are of different resolution in the tif image).
Try the following code. It sets the size of PDF page equal to that of image file and then create that PDF page. the PDF page size varies according to the image size so the resolution is maintained :)
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Tiff2Pdf {
/**
* #param args
* #throws DocumentException
* #throws IOException
*/
public static void main(String[] args) throws DocumentException,
IOException {
String imgeFilename = "/home/saurabh/Downloads/image.tif";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(
document,
new FileOutputStream("/home/saurabh/Desktop/out"
+ Math.random() + ".pdf"));
writer.setStrictImageSequence(true);
document.open();
document.add(new Paragraph("Multipages tiff file"));
Image image;
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imgeFilename);
int pages = TiffImage.getNumberOfPages(ra);
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(ra, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.add(image);
document.newPage();
}
document.close();
}
}

I've found that this line doesn't work well:
document.setPageSize(pageSize);
If your TIFF files only contain one image then you're better off using this instead:
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imageFilePath);
Image image = TiffImage.getTiffImage(ra, 1);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
Document document = new Document(pageSize);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputFileName));
writer.setStrictImageSequence(true);
document.open();
document.add(image);
document.newPage();
document.close();
This will result in a page size that fits the image size exactly, so no scaling is required.

Another example non-deprecated up to iText 5.5 with the first page issue fixed. I'm using 5.5.11 Itext.
import java.io.FileOutputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.io.FileChannelRandomAccessSource;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Test1 {
public static void main(String[] args) throws Exception {
RandomAccessFile aFile = new RandomAccessFile("/myfolder/origin.tif", "r");
FileChannel inChannel = aFile.getChannel();
FileChannelRandomAccessSource fcra = new FileChannelRandomAccessSource(inChannel);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("/myfolder/destination.pdf"));
document.open();
RandomAccessFileOrArray rafa = new RandomAccessFileOrArray(fcra);
int pages = TiffImage.getNumberOfPages(rafa);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(rafa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
aFile.close();
}
}

Related

Merge PDF documents and images into one PDF

I have read examples in merging PDF documents section however I couldn't develop more optimal solution for the following task:
I would like to merge series of PDF and image files coming in any order (original post). The inefficiency comes from the fact that I need to create dummy 1-page PDF file for image using PdfWriter and then read it back from byte array using PdfReader.
Question: Is there more efficient way of doing the same (maybe via PdfCopy#addPage())?
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfSmartCopy;
import com.itextpdf.text.pdf.PdfWriter;
/**
* Helper class that creates PDF from given image(s) (JPEG, PNG, ...) or PDFs.
*/
public class MergeToPdf {
public static void main(String[] args) throws IOException, DocumentException {
if (args.length < 2) {
System.err.println("At least two arguments are required: in1.pdf [, image2.jpg ...], out.pdf");
System.exit(1);
}
Document mergedDocument = new Document();
PdfSmartCopy pdfCopy = new PdfSmartCopy(mergedDocument, new FileOutputStream(args[args.length - 1]));
mergedDocument.open();
for (int i = 0; i < args.length - 1; i++) {
PdfReader reader;
if (args[i].toLowerCase().endsWith(".pdf")) {
System.out.println("Adding PDF " + args[i] + "...");
// Copy PDF document:
reader = new PdfReader(args[i]);
}
else {
System.out.println("Adding image " + args[i] + "...");
final ByteArrayOutputStream byteStream = new ByteArrayOutputStream();
final Document imageDocument = new Document();
PdfWriter.getInstance(imageDocument, byteStream);
imageDocument.open();
// Create single page with the dimensions as source image and no margins:
Image image = Image.getInstance(args[i]);
image.setAbsolutePosition(0, 0);
imageDocument.setPageSize(image);
imageDocument.newPage();
imageDocument.add(image);
imageDocument.close();
// Copy PDF document with only one page carrying the image:
reader = new PdfReader(byteStream.toByteArray());
}
pdfCopy.addDocument(reader);
reader.close();
}
mergedDocument.close();
}
}

Converting Tiff to PDF: PDF is corrupted

I followed this example of iText 7 to convert a multi-page Tiff into a multi-page PDF, but when I open the PDF it's corrupted. Adobe Reader displays an error and Chrome shows this:
(Every page looks like that, but they aren't identical).
This is the code I used:
File newPdfFile = new File("<path...>/converted_file.pdf");
URL tiffUrl = UrlUtil.toURL("<path...>/original_file.tif");
IRandomAccessSource ras = new RandomAccessSourceFactory().createSource(tiffUrl);
RandomAccessFileOrArray rafoa = new RandomAccessFileOrArray(ras);
int numberOfPages = TiffImageData.getNumberOfPages(rafoa);
PdfDocument pdf = new PdfDocument(new PdfWriter(new FileOutputStream(newPdfFile)));
Document document = new Document(pdf);
for(int i = 1; i <= numberOfPages; ++i) {
Image image = new Image(ImageDataFactory.createTiff(tiffUrl, true, i, true));
document.add(image);
}
document.close();
pdf.close();
And this is the code I used with iText 5.5.11, which works but uses a deprecated constructor of RandomAccessFileOrArray:
File newPdfFile = new File("<path...>/converted_file.pdf");
RandomAccessFileOrArray rafoa = new RandomAccessFileOrArray("<path...>/original_file.tif");
int numberOfPages = TiffImage.getNumberOfPages(rafoa);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(newPdfFile));
document.open();
for (int i = 1; i <= numberOfPages; ++i) {
Image image = TiffImage.getTiffImage(rafoa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
Unfortunately I can't provide sample files because they are confidential/classified...
What could be the issue?
P.S.: I tried with the same tiff used in the example code I followed and it works. What's wrong with my tiffs? In the file properties, other than the dimensions and resolution there's:
Bit depth: 1
Compression: CCITT T.4
Resolution unit: 2

Ok, thanks to Michaël Demey's suggestions I managed to get the proper pdf using iText 7.
Here's the maven imports:
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>layout</artifactId>
<version>7.0.3</version>
</dependency>
And here's the code:
import com.itextpdf.io.image.ImageDataFactory;
import com.itextpdf.kernel.geom.PageSize;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.Image;
import java.io.File;
import java.io.FileOutputStream;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
[...]
File newPdfFile = new File("<path...>/converted_file.pdf");
ImageReader reader = ImageIO.getImageReadersByFormatName("TIFF").next();
reader.setInput(ImageIO.createImageInputStream(new File("<path...>/original_file.tif")));
int numberOfPages = reader.getNumImages(true);
PdfDocument pdf = new PdfDocument(new PdfWriter(new FileOutputStream(newPdfFile)));
Document document = new Document(pdf);
for(int i = 0; i < numberOfPages; ++i) {// in javax.imageio.ImageReader they start from 0!
java.awt.Image img = reader.read(i);
Image tempImage = new Image(ImageDataFactory.create(img, null));
pdf.addNewPage(new PageSize(tempImage.getImageWidth(), tempImage.getImageHeight()));
tempImage.setFixedPosition(i + 1, 0, 0);
document.add(tempImage);
}
document.close();
pdf.close();

I had also faced the same issue .I was able to convert file which is of "TIFF" extension and with that of "TIF" extension i was not able to get it converted properly .Then i tried something that was able to do it.Try changing as below and check from
Image image = new Image(ImageDataFactory.createTiff(tiffUrl, true, i, true));
to
Image image = new Image(ImageDataFactory.createTiff(tiffUrl, true, i, false));
This made my conversion work.Hope it works for you as well

Java - extract text from pdf from selected area to txt

The idea is next,
user selects a pdf file, and then this file converted into an image and such an image is displayed in the application.
In the image the user can choose positions that wants to read from a pdf file, and when the finish with selection position in the background program reads the original pdf and text stored in a txt file.
It is important that the resulting image from pdf file is the same size as himself pdf file
The next code convert pdf to image. I use pdfrenderer-0.9.1.jar
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import javax.imageio.ImageIO;
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;
public class Pdf2Image {
public static void main(String[] args) {
File file = new File("E:\\invoice-template-1.pdf");
RandomAccessFile raf;
try {
raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
// draw the first page to an image
int num=pdffile.getNumPages();
for(int i=0;i<num;i++)
{
PDFPage page = pdffile.getPage(i);
//get the width and height for the doc at the default zoom
int width=(int)page.getBBox().getWidth();
int height=(int)page.getBBox().getHeight();
Rectangle rect = new Rectangle(0,0,width,height);
int rotation=page.getRotation();
Rectangle rect1=rect;
if(rotation==90 || rotation==270)
rect1=new Rectangle(0,0,rect.height,rect.width);
//generate the image
BufferedImage img = (BufferedImage)page.getImage(
rect.width, rect.height, //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img, "png", new File("E:/invoice-template-"+i+".png"));
}
}
catch (FileNotFoundException e1) {
System.err.println(e1.getLocalizedMessage());
} catch (IOException e) {
System.err.println(e.getLocalizedMessage());
}
}
}
Then the image is displayed to the user in JavaFX application in ImageView components.
Can you help me to get the exact position of the mouse, the mouse when the user selects a portion of the image from which you want to read the text in the pdf file?
With this code I read pdf file and get text from the set position, only I must to manually input position:( . I use pdfbox-1.3.1.jar.
I would like to position the client chooses to keep a picture in the list and read the text from the pdf file with all of these positions.
File file = new File("E:/invoice-template-1.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
Rectangle rect1 = new Rectangle(38, 275, 15, 100);
Rectangle rect2 = new Rectangle(54, 275, 40, 100);
stripper.addRegion("row1column1", rect1);
stripper.addRegion("row1column2", rect2);
List allPages = document.getDocumentCatalog().getAllPages();
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
int j = 0;
for (PDPage page : pages) {
stripper.extractRegions(page);
stripper.setSortByPosition(true);
List<String> regions = stripper.getRegions();
for (String region : regions) {
String text = stripper.getTextForRegion(region);
System.out.println("Region: " + region + " on Page " + j);
System.out.println("\tText: \n" + text);
}
For example,
in the next invoice, I want to select the 4 positions to export the text, and when you select the picture, the dimensions of keeping in the list, then go through the list and from those positions export text from pdf file.

Convert Tiff to Pdf in java using itext

I am using the below code for converting tiff to pdf
It works fine for tiff images of dimensions 850*1100.But when I am trying to give the input tiff image of dimensions(Eg :- 1574*732, 684*353 or other 850*1100), I am getting the below error. Please help me how to convert tiff images of different dimensions to pdf.
Error Occured for below code .Compression JPEG is only supported with a single strip. This image has 45 strips.
RandomAccessFileOrArray myTifFile = null;
com.itextpdf.text.Document tiffToPDF= null;
PdfWriter pdfWriter = null;
try{
myTifFile = new RandomAccessFileOrArray(fileName);
int numberOfPages = TiffImage.getNumberOfPages(myTifFile);
tiffToPDF = new com.itextpdf.text.Document(PageSize.LETTER_LANDSCAPE);
String temp = fileName.substring(0, fileName.lastIndexOf("."));
pdfWriter = PdfWriter.getInstance(tiffToPDF, new FileOutputStream(temp+".pdf"));
pdfWriter.setStrictImageSequence(true);
tiffToPDF.open();
for(int tiffImageCounter = 1;tiffImageCounter <= numberOfPages;tiffImageCounter++)
{
Image img = TiffImage.getTiffImage(myTifFile, tiffImageCounter);
img.setAbsolutePosition(0,0);
img.scaleToFit(612,792);
tiffToPDF.add(img);
tiffToPDF.newPage();
}
}

This code will explain how you can convert tiff to pdf.. more information can be found here and here
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
//Read Tiff File, Get number of Pages
import com.itextpdf.text.pdf.codec.TiffImage;
//We need the library below to write the final
//PDF file which has our image converted to PDF
import java.io.FileOutputStream;
//The image class to extract separate images from Tiff image
import com.itextpdf.text.Image;
//PdfWriter object to write the PDF document
import com.itextpdf.text.pdf.PdfWriter;
//Document object to add logical image files to PDF
import com.itextpdf.text.Document;
public class TiffToPDF {
public static void main(String args[]){
try{
//Read the Tiff File
RandomAccessFileOrArray myTiffFile=new RandomAccessFileOrArray("c:\\java\\test.tif");
//Find number of images in Tiff file
int numberOfPages=TiffImage.getNumberOfPages(myTiffFile);
System.out.println("Number of Images in Tiff File" + numberOfPages);
Document TifftoPDF=new Document();
PdfWriter.getInstance(TifftoPDF, new FileOutputStream("c:\\java\\tiff2Pdf.pdf"));
TifftoPDF.open();
//Run a for loop to extract images from Tiff file
//into a Image object and add to PDF recursively
for(int i=1;i<=numberOfPages;i++){
Image tempImage=TiffImage.getTiffImage(myTiffFile, i);
TifftoPDF.add(tempImage);
}
TifftoPDF.close();
System.out.println("Tiff to PDF Conversion in Java Completed" );
}
catch (Exception i1){
i1.printStackTrace();
}
}
}

Writing image into pdf file in java

I'm writing a code to convert Microsoft power-point(ppt) slides into images and to write the generated images into pdf file. Following code generates and writes the images into pdf file but the problem i'm facing is, when i write image into pdf file it's size is exceeding the pdf page size and i can view only 75% of the image rest is invisible. One more thing to notice here is, written images in pdf file look like zoomed or expanded. Take a look at the following snippet of code:
for (int i = 0; i < slide.length; i++) {
BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
Graphics2D graphics = img.createGraphics();
graphics.setPaint(Color.white);
graphics.fill(new Rectangle(0, 0, pgsize.width, pgsize.height));
slide[i].draw(graphics);
fileName="C:/DATASTORE/slide-"+(i+1)+".png";
FileOutputStream out = new FileOutputStream(fileName);
javax.imageio.ImageIO.write(img, "png", out);
out.flush();
out.close();
com.lowagie.text.Image image =com.lowagie.text.Image.getInstance(fileName);
image.setWidthPercentage(40.0f);
doc.add((image));
}
doc.close();
} catch(DocumentException de) {
System.err.println(de.getMessage());
}
If anybody knows the solution please help me to rectify. Thank you.
Here is the code it accomplishes the task i wished. Now i'm getting the desired results after following Bruno Lowagie recommendations.
But, as Bruno Lowagie pointed out earlier, their is a problem in generated png image. The generated png image is not correct because shape or image in the slide overlaps with the texts of the slide. Can you help me to identify and rectify the error?
import java.awt.Color;
import java.awt.Dimension;
import java.awt.Graphics2D;
import java.awt.Rectangle;
import com.itextpdf.text.Image;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.usermodel.SlideShow;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
public class ConvertSlidesIntoImages {
public static void main(String[] args){
try {
FileInputStream is = new FileInputStream("C:/DATASTORE/testPPT.ppt");
SlideShow ppt = new SlideShow(is);
is.close();
String fileName;
Dimension pgsize = ppt.getPageSize();
Slide[] slide = ppt.getSlides();
Document doc=new Document();
PdfWriter.getInstance(doc, new FileOutputStream("c:/DATASTORE/convertPPTSlidesIntoPDFImages.pdf"));
doc.open();
for (int i = 0; i < slide.length; i++) {
BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
Graphics2D graphics = img.createGraphics();
graphics.setPaint(Color.white);
graphics.fill(new Rectangle(0, 0, pgsize.width, pgsize.height));
slide[i].draw(graphics);
fileName="C:/DATASTORE/slide-"+(i+1)+".png";
FileOutputStream out = new FileOutputStream(fileName);
javax.imageio.ImageIO.write(img, "png", out);
out.flush();
out.close();
com.itextpdf.text.Image image =com.itextpdf.text.Image.getInstance(fileName);
doc.setPageSize(new com.itextpdf.text.Rectangle(image.getScaledWidth(), image.getScaledHeight()));
doc.newPage();
image.setAbsolutePosition(0, 0);
doc.add(image);
}
doc.close();
}catch(DocumentException de) {
System.err.println(de.getMessage());
}
catch(Exception ex) {
ex.printStackTrace();
}
}
Thank you

First this: If the png stored as "C:/DATASTORE/slide-"+(i+1)+".png" isn't correct, the slide in the PDF won't be correct either.
And this: Your code snippet doesn't show us how you create the Document object. By default, the page size is A4 in portrait. It goes without saying that images that are bigger than 595 x 842 don't fit that page.
Now the answer: There are two ways to solve this.
Either you change the size of the image (not with setWidthPercentage() unless you've calculated the actual percentage) and you add it a the position (0, 0) so that it doesn't take into account the margins. For instance:
image.scaleToFit(595, 842);
image.setAbsolutePosition(0, 0);
doc.add(image);
doc.newPage();
A better solution would be to adapt the size of the page to the size of the image.
Document doc = new Document(new Rectangle(image.getScaledWidth(), image.getScaledHeight()));
// create a writer, open the document
image.setAbsolutePosition(0, 0);
doc.add(image);
doc.newPage();
If the size of the images varies, you can change the page size while adding images like this:
doc.setPageSize(new Rectangle(image.getScaledWidth(), image.getScaledHeight()));
doc.newPage();
image.setAbsolutePosition(0, 0);
doc.add(image);
It is important to understand that the new page size will only come into effect after doc.newPage();
CAVEAT 1:
If your PDF only holds the last slide, you're probably putting all the slides on the same page, and the last slide covers them all. You need to invoke the newPage() method each time you add an image (as done in a code snippet in my answer).
CAVEAT 2:
Your allegation is wrong. According to the API docs, there is a method setPageSize(Rectangle rect), maybe you're using the wrong Rectangle class. If you didn't follow my advice (which IMHO wouldn't be wise), you're probably looking for com.lowagie.text.Rectangle instead of java.awt.Rectangle.
CAVEAT 3:
This is similar to CAVEAT 2, there are indeed no such methods in the class java.awt.Image, but as documented in the API docs, the class com.itextpdf.text.Image has a getScaleWidth() method and a getScaledHeight() method.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Creating PDF from TIFF image using iText - java

Related

Merge PDF documents and images into one PDF

Converting Tiff to PDF: PDF is corrupted

Java - extract text from pdf from selected area to txt

Convert Tiff to Pdf in java using itext

Writing image into pdf file in java

Categories

Resources