I am using the below code for converting tiff to pdf
It works fine for tiff images of dimensions 850*1100.But when I am trying to give the input tiff image of dimensions(Eg :- 1574*732, 684*353 or other 850*1100), I am getting the below error. Please help me how to convert tiff images of different dimensions to pdf.
Error Occured for below code .Compression JPEG is only supported with a single strip. This image has 45 strips.
RandomAccessFileOrArray myTifFile = null;
com.itextpdf.text.Document tiffToPDF= null;
PdfWriter pdfWriter = null;
try{
myTifFile = new RandomAccessFileOrArray(fileName);
int numberOfPages = TiffImage.getNumberOfPages(myTifFile);
tiffToPDF = new com.itextpdf.text.Document(PageSize.LETTER_LANDSCAPE);
String temp = fileName.substring(0, fileName.lastIndexOf("."));
pdfWriter = PdfWriter.getInstance(tiffToPDF, new FileOutputStream(temp+".pdf"));
pdfWriter.setStrictImageSequence(true);
tiffToPDF.open();
for(int tiffImageCounter = 1;tiffImageCounter <= numberOfPages;tiffImageCounter++)
{
Image img = TiffImage.getTiffImage(myTifFile, tiffImageCounter);
img.setAbsolutePosition(0,0);
img.scaleToFit(612,792);
tiffToPDF.add(img);
tiffToPDF.newPage();
}
}
This code will explain how you can convert tiff to pdf.. more information can be found here and here
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
//Read Tiff File, Get number of Pages
import com.itextpdf.text.pdf.codec.TiffImage;
//We need the library below to write the final
//PDF file which has our image converted to PDF
import java.io.FileOutputStream;
//The image class to extract separate images from Tiff image
import com.itextpdf.text.Image;
//PdfWriter object to write the PDF document
import com.itextpdf.text.pdf.PdfWriter;
//Document object to add logical image files to PDF
import com.itextpdf.text.Document;
public class TiffToPDF {
public static void main(String args[]){
try{
//Read the Tiff File
RandomAccessFileOrArray myTiffFile=new RandomAccessFileOrArray("c:\\java\\test.tif");
//Find number of images in Tiff file
int numberOfPages=TiffImage.getNumberOfPages(myTiffFile);
System.out.println("Number of Images in Tiff File" + numberOfPages);
Document TifftoPDF=new Document();
PdfWriter.getInstance(TifftoPDF, new FileOutputStream("c:\\java\\tiff2Pdf.pdf"));
TifftoPDF.open();
//Run a for loop to extract images from Tiff file
//into a Image object and add to PDF recursively
for(int i=1;i<=numberOfPages;i++){
Image tempImage=TiffImage.getTiffImage(myTiffFile, i);
TifftoPDF.add(tempImage);
}
TifftoPDF.close();
System.out.println("Tiff to PDF Conversion in Java Completed" );
}
catch (Exception i1){
i1.printStackTrace();
}
}
}
Related
I need to get data from the pdf file with its header for further comparing with DB data
I tried to use the pdfbox , google vision ocr , itext, but all libraries gave me a row without structure and headers.
Example: Date\nNumber\nStatus\n12\12\2020\n442334\delivered
I will trying convert pdf to excel/word and get data from them, but for this realisation i need reading pdf and write data in excel/word
How can I get data with headers?
"Date\nNumber\nStatus\n12/12/2020\n442334\ndelivered" looks structured enough to me. You could just split it at the "\n"s. That would require some knowledge of the table structure, though.
I've made good experience with Google Vision OCR. How are you calling it?
I not found answer on my question.
I'm use this code for my task :
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import java.awt.*;
import java.io.File;
import java.io.IOException;
public class ExtractTextByArea {
public String getTextFromCoordinate(String filepath,int x,int y,int width,int height) {
String result = "";
try (PDDocument document = PDDocument.load(new File(filepath))) {
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
// Rectangle rect = new Rectangle(260, 35, 70, 10);
Rectangle rect = new Rectangle(x,y,width,height);
stripper.addRegion("class1", rect);
PDPage firstPage = document.getPage(0);
stripper.extractRegions( firstPage );
// System.out.println("Text in the area:" + rect);
result = stripper.getTextForRegion("class1");
}
} catch (IOException e){
System.err.println("Exception while trying to read pdf document - " + e);
}
return result;
}
}
Please let me know what method can be used to convert pdf to image in iText7.
In Itexsharp, there was an option to convert pdf file to images. Following is the link. PDF to Image Using iTextSharp
http://www.c-sharpcorner.com/UploadFile/a0927b/create-pdf-document-and-convert-to-image-programmatically/
Below is the sample code created using the following refernce link.
itext7 pdf to image
this is not working as expected. It is not converting the pdf to image. It is creating a 1kb blank image.
string fileName = System.IO.Path.GetFileNameWithoutExtension(inputFilePath);
var pdfReader = new PdfReader(inputFilePath);
var pdfDoc = new iText.Kernel.Pdf.PdfDocument(pdfReader);
int pagesLength = pdfDoc.GetNumberOfPages()+1;
for (int i = 1; i < pagesLength; i++)
{
if (!File.Exists(System.IO.Path.Combine(imageFileDir, fileName + "_" +
`enter code here`(startIndex + i) + ".png")) && i < pagesLength)
{
PdfPage pdfPages = pdfDoc.GetPage(i);
PdfWriter writer = new PdfWriter(System.IO.Path.Combine(imageFileDir, fileName + "_" + (startIndex + i) + ".png"), new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdf = new PdfDocument(writer);
PdfFormXObject pageCopy = pdfPages.CopyAsFormXObject(pdf);
iText.Layout.Element.Image image = new iText.Layout.Element.Image(pageCopy);
}
}
Quoting Bruno:
iText does not convert PDFs to raster images (such as .jpg, .png,...). You are misinterpreting the examples that create an Image instance based on an existing page. Those examples create an XObject that can be reused in a new PDF as if it were a vector image; they don't convert a PDF page to a raster image.
What you can use for this (which is what we at iText internally use for testing) is GhostScript. It takes a pdf as input and converts it to a series of images (one image per page).
I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\\java\\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}
In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
The code is quite simple:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.
You can use 7-pdf library
have a look at this it may help :
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!
Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConverter can help you.
I converted .docx file to .pdf file, the text is converting fine, but the images in the .docx file is not appearing, instead it is represented as some special characters, below is my code:
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import java.io.File;
import java.io.FileOutputStream;
public class PDFConversion {
/**
* 14. This method is used to convert the given file to a PDF format 15.
*
* #param inputFile
* - Name and the path of the file 16.
* #param outputFile
* - Name and the path where the PDF file to be saved 17.
* #param isPictureFile
* 18.
*/
private void createPdf(String inputFile, String outputFile, boolean isPictureFile) {
/**
* 22. Create a new instance for Document class 23.
*/
Document pdfDocument = new Document();
String pdfFilePath = outputFile;
try {
FileOutputStream fileOutputStream = new FileOutputStream(pdfFilePath);
PdfWriter writer = null;
writer = PdfWriter.getInstance(pdfDocument, fileOutputStream);
writer.open();
pdfDocument.open();
/**
* 34. Proceed if the file given is a picture file 35.
*/
if (isPictureFile) {
pdfDocument.add(com.lowagie.text.Image.getInstance(inputFile));
}
/**
* 41. Proceed if the file given is (.txt,.html,.doc etc) 42.
*/
else {
File file = new File(inputFile);
pdfDocument.add(new Paragraph(org.apache.commons.io.FileUtils
.readFileToString(file)));
}
pdfDocument.close();
writer.close();
} catch (Exception exception) {
System.out.println("Document Exception!" + exception);
}
}
public static void main(String args[]) {
PDFConversion pdfConversion = new PDFConversion();
pdfConversion.createPdf("C:/Users/LENOVO/Downloads/The_JFileChooser_Component.doc",
"E:/The_JFileChooser_Component.pdf", false);
// For other files
// pdfConversion.createPdf("C:/shunmuga/sample.html",
// "C:/shunmuga/sample.pdf", false);
}
}
I'm not sure what it could be, but for some alternatives have a look at:
Apose.Words Library for Java it has some really cool features one of them being docx to pdf conversion by a few simple lines (and it's reliable):
Document doc = new Document("d:/test/mydoc.docx");
doc.Save("d:/test/Out.pdf", SaveFormat.Pdf);
Docx4j
which can be used to convert docx and many others to PDF, it does this by first using HTML/XML based on IText then converts it to a PDF (All libararies are included within docx4j, just added the itext link for completeness):
org.docx4j.convert.out.pdf.PdfConversion c
= new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);//using xml
// = new org.docx4j.convert.out.pdf.viaHTML.Conversion(wordMLPackage);//using html
// = new org.docx4j.convert.out.pdf.viaIText.Conversion(wordMLPackage);//using itext libs
If that's not enough it has sample source code for you to try.
xdocreport also comes with a lot of samples for conversion (haven't downloaded them, but it should have the doc/docx to PDF converter source)
I'm currently generating PDF files from TIFF images using iText.
Basically the procedure is as follows:
1. Read the TIFF file.
2. For each "page" of the TIFF, instantiate an Image object and write that to a Document instance, which is the PDF file.
I'm having a hard time understanding how to add those images to the PDF keeping the original resolution.
I've tried to scale the Image to the dimensions in pixels of the original image of the TIFF, for instance:
// Pixel Dimensions 1728 × 2156 pixels
// Resolution 204 × 196 ppi
RandomAccessFileOrArray tiff = new RandomAccessFileOrArray("/path/to/tiff/file");
Document pdf = new Document(PageSize.LETTER);
Image temp = TiffImage.getTiffImage(tiff, page);
temp.scaleAbsolute(1728f, 2156f);
pdf.add(temp);
I would really appreciate if someone can shed some light on this. Perhaps I'm missing the functionality of the Image class methods...
Thanks in advance!
I think if you scale the image then you can not retain the original resolution (please correct me if I am wrong :)).
What you can try doing is to creat a PDF document with different sized pages (if images are of different resolution in the tif image).
Try the following code. It sets the size of PDF page equal to that of image file and then create that PDF page. the PDF page size varies according to the image size so the resolution is maintained :)
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Tiff2Pdf {
/**
* #param args
* #throws DocumentException
* #throws IOException
*/
public static void main(String[] args) throws DocumentException,
IOException {
String imgeFilename = "/home/saurabh/Downloads/image.tif";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(
document,
new FileOutputStream("/home/saurabh/Desktop/out"
+ Math.random() + ".pdf"));
writer.setStrictImageSequence(true);
document.open();
document.add(new Paragraph("Multipages tiff file"));
Image image;
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imgeFilename);
int pages = TiffImage.getNumberOfPages(ra);
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(ra, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.add(image);
document.newPage();
}
document.close();
}
}
I've found that this line doesn't work well:
document.setPageSize(pageSize);
If your TIFF files only contain one image then you're better off using this instead:
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imageFilePath);
Image image = TiffImage.getTiffImage(ra, 1);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
Document document = new Document(pageSize);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputFileName));
writer.setStrictImageSequence(true);
document.open();
document.add(image);
document.newPage();
document.close();
This will result in a page size that fits the image size exactly, so no scaling is required.
Another example non-deprecated up to iText 5.5 with the first page issue fixed. I'm using 5.5.11 Itext.
import java.io.FileOutputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.io.FileChannelRandomAccessSource;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Test1 {
public static void main(String[] args) throws Exception {
RandomAccessFile aFile = new RandomAccessFile("/myfolder/origin.tif", "r");
FileChannel inChannel = aFile.getChannel();
FileChannelRandomAccessSource fcra = new FileChannelRandomAccessSource(inChannel);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("/myfolder/destination.pdf"));
document.open();
RandomAccessFileOrArray rafa = new RandomAccessFileOrArray(fcra);
int pages = TiffImage.getNumberOfPages(rafa);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(rafa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
aFile.close();
}
}