PDFBox locks JPEG input file until application exits - java

I'm using PDFBox RC2 in a Windows 7 environment, Java 1.8_66. I'm using it to create a PDF from a collection of 200dpi page-sized image files, both JPEG and PNG.
It turns out that when adding JPEG files to a PDF, the PDImageXObject.createFromFile() routine fails to close an internal file handle, thus locking the image file for the lifetime of the application. When adding PNG files to a PDF, there is no problem.
Here's some sample code that reproduces the issue. Using process explorer (from sysinternals), view the open file handles for the java.exe process and run this code. My test uses about 20 full sized JPEG files. Note that after the method exits, several locked files still remain behind.
public Boolean CreateFromImages_Broken(String pdfFilename, String[] imageFilenames) {
PDDocument doc = new PDDocument();
for (String imageFilename : imageFilenames) {
try {
PDPage page = new PDPage();
doc.addPage(page);
PDImageXObject pdImage = PDImageXObject.createFromFile(imageFilename, doc);
// at this point, if the imageFilename is a jpeg, pdImage holds onto a handle for
// the given imageFilename and that file remains locked until the application is closed
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
float scale = (float)72.0 / 200;
page.setMediaBox(new PDRectangle((int)(pdImage.getWidth() * scale), (int)(pdImage.getHeight() * scale)));
contentStream.drawImage(pdImage, 0, 0, pdImage.getWidth()*scale, pdImage.getHeight()*scale);
}
} catch (IOException ioe) {
return false;
}
}
try {
doc.save(pdfFilename);
doc.close();
} catch (IOException ex) {
return false;
}
return true;
}

As a workaround, I reviewed the source code for PNG and JPEG handling, and I've had success by implementing this, which seems to work for both file types:
public Boolean CreateFromImages_FIXED(String pdfFilename, String[] imageFilenames) {
PDDocument doc = new PDDocument();
for (String imageFilename : imageFilenames) {
FileInputStream fis = null;
try {
PDPage page = new PDPage();
doc.addPage(page);
PDImageXObject pdImage = null;
// work around JPEG issue by opening up our own stream, with which
// we can close ourselves instead of PDFBOX leaking it. For PNG
// images, the createFromFile seems to be OK
if (imageFilename.toLowerCase().endsWith(".jpg")) {
fis = new FileInputStream(new File(imageFilename));
pdImage = JPEGFactory.createFromStream(doc, fis);
} else {
pdImage = PDImageXObject.createFromFile(imageFilename, doc);
}
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
float scale = (float)72.0 / 200;
page.setMediaBox(new PDRectangle((int)(pdImage.getWidth() * scale), (int)(pdImage.getHeight() * scale)));
contentStream.drawImage(pdImage, 0, 0, pdImage.getWidth()*scale, pdImage.getHeight()*scale);
if (fis != null) {
fis.close();
fis = null;
}
}
} catch (IOException ioe) {
return false;
}
}
try {
doc.save(pdfFilename);
doc.close();
} catch (IOException ex) {
return false;
}
return true;
}

Related

AcroForm not visible when merging documents

I'm trying to merge documents side by side with PDFBox, using the following code:
function void generateSideBySidePDF() {
File pdf1File = new File(FILE1_PATH);
File pdf2File = new File(FILE2_PATH);
File outPdfFile = new File(OUTFILE_PATH);
PDDocument pdf1 = null;
PDDocument pdf2 = null;
PDDocument outPdf = null;
try {
pdf1 = PDDocument.load(pdf1File);
pdf2 = PDDocument.load(pdf2File);
outPdf = new PDDocument();
// Create output PDF frame
PDRectangle pdf1Frame = pdf1.getPage(0).getCropBox();
PDRectangle pdf2Frame = pdf2.getPage(0).getCropBox();
PDRectangle outPdfFrame = new PDRectangle(pdf1Frame.getWidth()+pdf2Frame.getWidth(), Math.max(pdf1Frame.getHeight(), pdf2Frame.getHeight()));
// Create output page with calculated frame and add it to the document
COSDictionary dict = new COSDictionary();
dict.setItem(COSName.TYPE, COSName.PAGE);
dict.setItem(COSName.MEDIA_BOX, outPdfFrame);
dict.setItem(COSName.CROP_BOX, outPdfFrame);
dict.setItem(COSName.ART_BOX, outPdfFrame);
PDPage outPdfPage = new PDPage(dict);
outPdf.addPage(outPdfPage);
// Source PDF pages has to be imported as form XObjects to be able to insert them at a specific point in the output page
LayerUtility layerUtility = new LayerUtility(outPdf);
PDFormXObject formPdf1 = layerUtility.importPageAsForm(pdf1, 0);
PDFormXObject formPdf2 = layerUtility.importPageAsForm(pdf2, 0);
// Add form objects to output page
AffineTransform afLeft = new AffineTransform();
layerUtility.appendFormAsLayer(outPdfPage, formPdf1, afLeft, "left");
AffineTransform afRight = AffineTransform.getTranslateInstance(pdf1Frame.getWidth(), 0.0);
layerUtility.appendFormAsLayer(outPdfPage, formPdf2, afRight, "right");
outPdf.save(outPdfFile);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (pdf1 != null) pdf1.close();
if (pdf2 != null) pdf2.close();
if (outPdf != null) outPdf.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
However the form fields contained in the original documents are not displayed in the final PDF. I also tried to set the acroform on the final document, doing:
outDoc.getDocumentCatalog().setAcroForm(acroForm);
but it doesn't work.

How to convert pdf file in to multipagetiff images using pdfBox? [duplicate]

I'm trying to convert PDFs as represented by the org.apache.pdfbox.pdmodel.PDDocument class and the icafe library (https://github.com/dragon66/icafe/) to a multipage tiff with group 4 compression and 300 dpi. The sample code works for me for 288 dpi but strangely NOT for 300 dpi, the exported tiff remains just white. Has anybody an idea what the issue is here?
The sample pdf which I use in the example is located here: http://www.bergophil.ch/a.pdf
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import cafe.image.ImageColorType;
import cafe.image.ImageParam;
import cafe.image.options.TIFFOptions;
import cafe.image.tiff.TIFFTweaker;
import cafe.image.tiff.TiffFieldEnum.Compression;
import cafe.io.FileCacheRandomAccessOutputStream;
import cafe.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "a.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
Or is there another library to write multi-page TIFFs?
EDIT:
Thanks to dragon66 the bug in icafe is now fixed. In the meantime I experimented with other libraries and also with invoking ghostscript. As I think ghostscript is very reliable as id is a widely used tool, on the other hand I have to rely that the user of my code has an ghostscript-installation, something like this:
/**
* Converts a given pdf as specified by its path to an tiff using group 4 compression
*
* #param pdfFilePath The absolute path of the pdf
* #param tiffFilePath The absolute path of the tiff to be created
* #param dpi The resolution of the tiff
* #throws MyException If the conversion fails
*/
private static void convertPdfToTiffGhostscript(String pdfFilePath, String tiffFilePath, int dpi) throws MyException {
// location of gswin64c.exe
String ghostscriptLoc = context.getGhostscriptLoc();
// enclose src and dest. with quotes to avoid problems if the paths contain whitespaces
pdfFilePath = "\"" + pdfFilePath + "\"";
tiffFilePath = "\"" + tiffFilePath + "\"";
logger.debug("invoking ghostscript to convert {} to {}", pdfFilePath, tiffFilePath);
String cmd = ghostscriptLoc + " -dQUIET -dBATCH -o " + tiffFilePath + " -r" + dpi + " -sDEVICE=tiffg4 " + pdfFilePath;
logger.debug("The following command will be invoked: {}", cmd);
int exitVal = 0;
try {
exitVal = Runtime.getRuntime().exec(cmd).waitFor();
} catch (Exception e) {
logger.error("error while converting to tiff using ghostscript", e);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR, e);
}
if (exitVal != 0) {
logger.error("error while converting to tiff using ghostscript, exitval is {}", exitVal);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR);
}
}
I found that the produced tif from ghostscript strongly differs in quality from the tiff produced by icafe (the group 4 tiff from ghostscript looks greyscale-like)
It's been a while since the question was asked and I finally find time and a wonderful ordered dither matrix which allows me to give some details on how "icafe" can be used to get similar or better results than calling external ghostscript executable. Some new features were added to "icafe" recently such as better quantization and ordered dither algorithms which is used in the following example code.
Here the sample pdf I am going to use is princeCatalogue. Most of the following code is from the OP with some changes due to package name change and more ImageParam control settings.
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import com.icafe4j.image.ImageColorType;
import com.icafe4j.image.ImageParam;
import com.icafe4j.image.options.TIFFOptions;
import com.icafe4j.image.quant.DitherMethod;
import com.icafe4j.image.quant.DitherMatrix;
import com.icafe4j.image.tiff.TIFFTweaker;
import com.icafe4j.image.tiff.TiffFieldEnum.Compression;
import com.icafe4j.io.FileCacheRandomAccessOutputStream;
import com.icafe4j.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "princecatalogue.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL).ditherMatrix(DitherMatrix.getBayer8x8Diag()).applyDither(true).ditherMethod(DitherMethod.BAYER);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
For ghostscript, I used command line directly with the same parameters provided by the OP. The screenshots for the first page of the resulted TIFF images are showing below:
The lefthand side shows the output of "ghostscript" and the righthand side the output of "icafe". It can be seen, at least in this case, the output from "icafe" is better than the output from "ghostscript".
Using CCITTFAX4 compression, the file size from "ghostscript" is 2.22M and the file size from "icafe" is 2.08M. Both are not so good given the fact dither is used while creating the black and white output. In fact, a different compression algorithm will create way smaller file size. For example, using LZW, the same output from "icafe" is only 634K and if using DEFLATE compression the output file size went down to 582K.
Here's some code to save in a multipage tiff which I use with PDFBox. It requires the TIFFUtil class from PDFBox (it isn't public, so you have to make a copy).
void saveAsMultipageTIFF(ArrayList<BufferedImage> bimTab, String filename, int dpi) throws IOException
{
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("tiff");
ImageWriter imageWriter = writers.next();
ImageOutputStream ios = ImageIO.createImageOutputStream(new File(filename));
imageWriter.setOutput(ios);
imageWriter.prepareWriteSequence(null);
for (BufferedImage image : bimTab)
{
ImageWriteParam param = imageWriter.getDefaultWriteParam();
IIOMetadata metadata = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(image), param);
param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
TIFFUtil.setCompressionType(param, image);
TIFFUtil.updateMetadata(metadata, image, dpi);
imageWriter.writeToSequence(new IIOImage(image, null, metadata), param);
}
imageWriter.endWriteSequence();
imageWriter.dispose();
ios.flush();
ios.close();
}
I experimented on this for myself some time ago by using this code:
https://www.java.net/node/670205 (I used solution 2)
However...
If you create an array with lots of images, your memory consumption
really goes up. So it would probably be better to render an image, then
add it to the tiff file, then render the next page and lose the
reference of the previous one so that the gc can get the space if needed.
Refer to my github code for an implementation with PDFBox.
Since some dependencies used by solutions for this problem looks not maintained. I got a solution by using latest version (2.0.16) pdfbox:
ByteArrayOutputStream imageBaos = new ByteArrayOutputStream();
ImageOutputStream output = ImageIO.createImageOutputStream(imageBaos);
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try (final PDDocument document = PDDocument.load(new File("/tmp/tmp.pdf"))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCount = document.getNumberOfPages();
BufferedImage[] images = new BufferedImage[pageCount];
// ByteArrayOutputStream[] baosArray = new ByteArrayOutputStream[pageCount];
writer.setOutput(output);
ImageWriteParam params = writer.getDefaultWriteParam();
params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
// Compression: None, PackBits, ZLib, Deflate, LZW, JPEG and CCITT
// variants allowed
params.setCompressionType("Deflate");
writer.prepareWriteSequence(null);
for (int page = 0; page < pageCount; page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, DPI, ImageType.RGB);
images[page] = image;
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
// ImageIO.write(image, "tiff", baosArray[page]);
}
System.out.println("imageBaos size: " + imageBaos.size());
// Finished write to output
writer.endWriteSequence();
document.close();
} catch (IOException e) {
e.printStackTrace();
throw new Exception(e);
} finally {
// avoid memory leaks
writer.dispose();
}
Then you may using imageBaos write to your local file. But if you want to pass your image to ByteArrayOutputStream and return to privious method like me. Then we need other steps.
After processing is done, the image bytes would be available in the ImageOutputStream
output object. We need to position the offset to the beginning of the output object and then read the butes to write to new ByteArrayOutputStream, a concise way like this:
ByteArrayOutputStream bos = new ByteArrayOutputStream();
long counter = 0;
while (true) {
try {
bos.write(ios.readByte());
counter++;
} catch (EOFException e) {
System.out.println("End of Image Stream");
break;
} catch (IOException e) {
System.out.println("Error processing the Image Stream");
break;
}
}
return bos
Or you can just ImageOutputStream.flush() at end to get your imageBaos Byte then return.
Inspired by Yusaku answer,
I made my own version,
This can convert multiple pdf pages to a byte array.
I Used pdfbox 2.0.16 in combination with imageio-tiff 3.4.2
//PDF converter to tiff toolbox method.
private byte[] bytesToTIFF(#Nonnull byte[] in) {
int dpi = 300;
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try(ByteArrayOutputStream imageBaos = new ByteArrayOutputStream(255)){
writer.setOutput(ImageIO.createImageOutputStream(imageBaos));
writer.prepareWriteSequence(null);
PDDocument document = PDDocument.load(in);
PDFRenderer pdfRenderer = new PDFRenderer(document);
ImageWriteParam params = writer.getDefaultWriteParam();
for (int page = 0; page < document.getNumberOfPages(); page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, dpi, ImageType.RGB);
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
}
LOG.trace("size found: {}", imageBaos.size());
writer.endWriteSequence();
writer.reset();
return imageBaos.toByteArray();
} catch (Exception ex) {
LOG.warn("can't instantiate the bytesToTiff method with: PDF", ex);
} finally {
writer.dispose();
}
}

Convert CCITT Group 3 1-Dimensional TIFF to PDF using iText in Java

I am experiencing an EOF Exception as follows when attempting to read tiff files using iText 5.5.10
ExceptionConverter: java.io.EOFException
at com.itextpdf.text.pdf.RandomAccessFileOrArray.readFully(RandomAccessFileOrArray.java:249)
at com.itextpdf.text.pdf.RandomAccessFileOrArray.readFully(RandomAccessFileOrArray.java:241)
at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:209)
at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:314)
at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:302)
at com.itextpdf.text.Image.getInstance(Image.java:428)
at com.itextpdf.text.Image.getInstance(Image.java:374)
at TiffToPdf.main(TiffToPdf.java:137)
The code I am using is:
byte[] data = null;
Image img = null;
try {
data = Files.readAllBytes(Paths.get("tiff.tif"));
img = Image.getInstance(data, true);
}
catch (Exception e) {
e.printStackTrace();
}
I have tried skipping the Image step and using the TiffImage class explicitly but I experience the same error.
byte[] data = null;
Image img = null;
try {
data = Files.readAllBytes(Paths.get("tiff.tif"));
RandomAccessSourceFactory factory = new RandomAccessSourceFactory();
RandomAccessSource fileBytes = factory.createSource(data);
RandomAccessFileOrArray s = new RandomAccessFileOrArray(fileBytes);
img = TiffImage.getTiffImage(s, true, 1, true);
}
catch (Exception e) {
e.printStackTrace();
}
I noticed that there are 2 classes within iText called TIFFFaxDecompressor and TIFFFaxDecoder but I haven't been able to find any resources online on how to use them.
with your given tiff image, the following code does worked for me i.e., converted to pdf successfully.
byte[] data = null;
com.itextpdf.text.Image img = null;
try {
//System.out.println(Paths.get("src/main/resources/tiff.tif"));
data = Files.readAllBytes(Paths.get("src/main/resources/file.tif"));
RandomAccessSourceFactory factory = new RandomAccessSourceFactory();
RandomAccessSource fileBytes = factory.createSource(data);
RandomAccessFileOrArray s = new RandomAccessFileOrArray(fileBytes);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("src/main/resources/destination.pdf"));
document.open();
int pages = TiffImage.getNumberOfPages(s);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(s, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
} catch (Exception e) {
e.printStackTrace();
}

Converting PDF to multipage tiff (Group 4)

I'm trying to convert PDFs as represented by the org.apache.pdfbox.pdmodel.PDDocument class and the icafe library (https://github.com/dragon66/icafe/) to a multipage tiff with group 4 compression and 300 dpi. The sample code works for me for 288 dpi but strangely NOT for 300 dpi, the exported tiff remains just white. Has anybody an idea what the issue is here?
The sample pdf which I use in the example is located here: http://www.bergophil.ch/a.pdf
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import cafe.image.ImageColorType;
import cafe.image.ImageParam;
import cafe.image.options.TIFFOptions;
import cafe.image.tiff.TIFFTweaker;
import cafe.image.tiff.TiffFieldEnum.Compression;
import cafe.io.FileCacheRandomAccessOutputStream;
import cafe.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "a.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
Or is there another library to write multi-page TIFFs?
EDIT:
Thanks to dragon66 the bug in icafe is now fixed. In the meantime I experimented with other libraries and also with invoking ghostscript. As I think ghostscript is very reliable as id is a widely used tool, on the other hand I have to rely that the user of my code has an ghostscript-installation, something like this:
/**
* Converts a given pdf as specified by its path to an tiff using group 4 compression
*
* #param pdfFilePath The absolute path of the pdf
* #param tiffFilePath The absolute path of the tiff to be created
* #param dpi The resolution of the tiff
* #throws MyException If the conversion fails
*/
private static void convertPdfToTiffGhostscript(String pdfFilePath, String tiffFilePath, int dpi) throws MyException {
// location of gswin64c.exe
String ghostscriptLoc = context.getGhostscriptLoc();
// enclose src and dest. with quotes to avoid problems if the paths contain whitespaces
pdfFilePath = "\"" + pdfFilePath + "\"";
tiffFilePath = "\"" + tiffFilePath + "\"";
logger.debug("invoking ghostscript to convert {} to {}", pdfFilePath, tiffFilePath);
String cmd = ghostscriptLoc + " -dQUIET -dBATCH -o " + tiffFilePath + " -r" + dpi + " -sDEVICE=tiffg4 " + pdfFilePath;
logger.debug("The following command will be invoked: {}", cmd);
int exitVal = 0;
try {
exitVal = Runtime.getRuntime().exec(cmd).waitFor();
} catch (Exception e) {
logger.error("error while converting to tiff using ghostscript", e);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR, e);
}
if (exitVal != 0) {
logger.error("error while converting to tiff using ghostscript, exitval is {}", exitVal);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR);
}
}
I found that the produced tif from ghostscript strongly differs in quality from the tiff produced by icafe (the group 4 tiff from ghostscript looks greyscale-like)
It's been a while since the question was asked and I finally find time and a wonderful ordered dither matrix which allows me to give some details on how "icafe" can be used to get similar or better results than calling external ghostscript executable. Some new features were added to "icafe" recently such as better quantization and ordered dither algorithms which is used in the following example code.
Here the sample pdf I am going to use is princeCatalogue. Most of the following code is from the OP with some changes due to package name change and more ImageParam control settings.
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import com.icafe4j.image.ImageColorType;
import com.icafe4j.image.ImageParam;
import com.icafe4j.image.options.TIFFOptions;
import com.icafe4j.image.quant.DitherMethod;
import com.icafe4j.image.quant.DitherMatrix;
import com.icafe4j.image.tiff.TIFFTweaker;
import com.icafe4j.image.tiff.TiffFieldEnum.Compression;
import com.icafe4j.io.FileCacheRandomAccessOutputStream;
import com.icafe4j.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "princecatalogue.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL).ditherMatrix(DitherMatrix.getBayer8x8Diag()).applyDither(true).ditherMethod(DitherMethod.BAYER);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
For ghostscript, I used command line directly with the same parameters provided by the OP. The screenshots for the first page of the resulted TIFF images are showing below:
The lefthand side shows the output of "ghostscript" and the righthand side the output of "icafe". It can be seen, at least in this case, the output from "icafe" is better than the output from "ghostscript".
Using CCITTFAX4 compression, the file size from "ghostscript" is 2.22M and the file size from "icafe" is 2.08M. Both are not so good given the fact dither is used while creating the black and white output. In fact, a different compression algorithm will create way smaller file size. For example, using LZW, the same output from "icafe" is only 634K and if using DEFLATE compression the output file size went down to 582K.
Here's some code to save in a multipage tiff which I use with PDFBox. It requires the TIFFUtil class from PDFBox (it isn't public, so you have to make a copy).
void saveAsMultipageTIFF(ArrayList<BufferedImage> bimTab, String filename, int dpi) throws IOException
{
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("tiff");
ImageWriter imageWriter = writers.next();
ImageOutputStream ios = ImageIO.createImageOutputStream(new File(filename));
imageWriter.setOutput(ios);
imageWriter.prepareWriteSequence(null);
for (BufferedImage image : bimTab)
{
ImageWriteParam param = imageWriter.getDefaultWriteParam();
IIOMetadata metadata = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(image), param);
param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
TIFFUtil.setCompressionType(param, image);
TIFFUtil.updateMetadata(metadata, image, dpi);
imageWriter.writeToSequence(new IIOImage(image, null, metadata), param);
}
imageWriter.endWriteSequence();
imageWriter.dispose();
ios.flush();
ios.close();
}
I experimented on this for myself some time ago by using this code:
https://www.java.net/node/670205 (I used solution 2)
However...
If you create an array with lots of images, your memory consumption
really goes up. So it would probably be better to render an image, then
add it to the tiff file, then render the next page and lose the
reference of the previous one so that the gc can get the space if needed.
Refer to my github code for an implementation with PDFBox.
Since some dependencies used by solutions for this problem looks not maintained. I got a solution by using latest version (2.0.16) pdfbox:
ByteArrayOutputStream imageBaos = new ByteArrayOutputStream();
ImageOutputStream output = ImageIO.createImageOutputStream(imageBaos);
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try (final PDDocument document = PDDocument.load(new File("/tmp/tmp.pdf"))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCount = document.getNumberOfPages();
BufferedImage[] images = new BufferedImage[pageCount];
// ByteArrayOutputStream[] baosArray = new ByteArrayOutputStream[pageCount];
writer.setOutput(output);
ImageWriteParam params = writer.getDefaultWriteParam();
params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
// Compression: None, PackBits, ZLib, Deflate, LZW, JPEG and CCITT
// variants allowed
params.setCompressionType("Deflate");
writer.prepareWriteSequence(null);
for (int page = 0; page < pageCount; page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, DPI, ImageType.RGB);
images[page] = image;
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
// ImageIO.write(image, "tiff", baosArray[page]);
}
System.out.println("imageBaos size: " + imageBaos.size());
// Finished write to output
writer.endWriteSequence();
document.close();
} catch (IOException e) {
e.printStackTrace();
throw new Exception(e);
} finally {
// avoid memory leaks
writer.dispose();
}
Then you may using imageBaos write to your local file. But if you want to pass your image to ByteArrayOutputStream and return to privious method like me. Then we need other steps.
After processing is done, the image bytes would be available in the ImageOutputStream
output object. We need to position the offset to the beginning of the output object and then read the butes to write to new ByteArrayOutputStream, a concise way like this:
ByteArrayOutputStream bos = new ByteArrayOutputStream();
long counter = 0;
while (true) {
try {
bos.write(ios.readByte());
counter++;
} catch (EOFException e) {
System.out.println("End of Image Stream");
break;
} catch (IOException e) {
System.out.println("Error processing the Image Stream");
break;
}
}
return bos
Or you can just ImageOutputStream.flush() at end to get your imageBaos Byte then return.
Inspired by Yusaku answer,
I made my own version,
This can convert multiple pdf pages to a byte array.
I Used pdfbox 2.0.16 in combination with imageio-tiff 3.4.2
//PDF converter to tiff toolbox method.
private byte[] bytesToTIFF(#Nonnull byte[] in) {
int dpi = 300;
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try(ByteArrayOutputStream imageBaos = new ByteArrayOutputStream(255)){
writer.setOutput(ImageIO.createImageOutputStream(imageBaos));
writer.prepareWriteSequence(null);
PDDocument document = PDDocument.load(in);
PDFRenderer pdfRenderer = new PDFRenderer(document);
ImageWriteParam params = writer.getDefaultWriteParam();
for (int page = 0; page < document.getNumberOfPages(); page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, dpi, ImageType.RGB);
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
}
LOG.trace("size found: {}", imageBaos.size());
writer.endWriteSequence();
writer.reset();
return imageBaos.toByteArray();
} catch (Exception ex) {
LOG.warn("can't instantiate the bytesToTiff method with: PDF", ex);
} finally {
writer.dispose();
}
}

PdfBox: issues when creating pdf from bmp

When generating a PDF form BMP the result is allways curios.
Input "hellowworld.bmp"
Output (only the relevant part)
why is there a loss of quality
why is it repeated three times
why is there a black square ( green Frame)
Heres how i test it:
#Test
public final void testWriteSingleBMPtoPDF() throws IOException {
Assert.assertTrue("File existst", TestFileHelper.getBMP(BMPS.HELLOWORLD).exists());
Assert.assertTrue("File readable", TestFileHelper.getBMP(BMPS.HELLOWORLD).canRead());
ArrayList<File> doc = new ArrayList<EncodedPage>();
doc.add(createPage(BMPS.HELLOWORLD));
File result = null;
try {
result = ConvertPDF.bmpToPDF(doc);
} catch (COSVisitorException e) {
e.printStackTrace();
}
Assert.assertTrue("File existst", result.exists());
Assert.assertTrue("File readable", result.canRead());
System.out.println("Please Check >"+result+"<");
}
Heres the part of my java implementation
public static File bmpToPDF(ArrayList<File> inputDoc)
PDDocument document = new PDDocument();
String saveTo = "C:\\temp\\" + System.currentTimeMillis() + ".pdf";
for (File bmpPage : inputDoc) {
PDPage page = null;
PDXObjectImage ximage = null;
page = new PDPage();
document.addPage(page);
BufferedImage awtImage = ImageIO.read(bmpPage);
ximage = new PDPixelMap(document, awtImage);
PDPageContentStream content = new PDPageContentStream(document, page);
content.drawImage(ximage, 0, 0);
content.close();
}
document.save(saveTo);
document.close();
return new File(saveTo) ;
Version of Apache PDFBox is 1.7.1

Categories

Resources