How to convert pdf file in to multipagetiff images using pdfBox? [duplicate]

How to convert pdf file in to multipagetiff images using pdfBox? [duplicate] - java

I'm trying to convert PDFs as represented by the org.apache.pdfbox.pdmodel.PDDocument class and the icafe library (https://github.com/dragon66/icafe/) to a multipage tiff with group 4 compression and 300 dpi. The sample code works for me for 288 dpi but strangely NOT for 300 dpi, the exported tiff remains just white. Has anybody an idea what the issue is here?
The sample pdf which I use in the example is located here: http://www.bergophil.ch/a.pdf
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import cafe.image.ImageColorType;
import cafe.image.ImageParam;
import cafe.image.options.TIFFOptions;
import cafe.image.tiff.TIFFTweaker;
import cafe.image.tiff.TiffFieldEnum.Compression;
import cafe.io.FileCacheRandomAccessOutputStream;
import cafe.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "a.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
Or is there another library to write multi-page TIFFs?
EDIT:
Thanks to dragon66 the bug in icafe is now fixed. In the meantime I experimented with other libraries and also with invoking ghostscript. As I think ghostscript is very reliable as id is a widely used tool, on the other hand I have to rely that the user of my code has an ghostscript-installation, something like this:
/**
* Converts a given pdf as specified by its path to an tiff using group 4 compression
*
* #param pdfFilePath The absolute path of the pdf
* #param tiffFilePath The absolute path of the tiff to be created
* #param dpi The resolution of the tiff
* #throws MyException If the conversion fails
*/
private static void convertPdfToTiffGhostscript(String pdfFilePath, String tiffFilePath, int dpi) throws MyException {
// location of gswin64c.exe
String ghostscriptLoc = context.getGhostscriptLoc();
// enclose src and dest. with quotes to avoid problems if the paths contain whitespaces
pdfFilePath = "\"" + pdfFilePath + "\"";
tiffFilePath = "\"" + tiffFilePath + "\"";
logger.debug("invoking ghostscript to convert {} to {}", pdfFilePath, tiffFilePath);
String cmd = ghostscriptLoc + " -dQUIET -dBATCH -o " + tiffFilePath + " -r" + dpi + " -sDEVICE=tiffg4 " + pdfFilePath;
logger.debug("The following command will be invoked: {}", cmd);
int exitVal = 0;
try {
exitVal = Runtime.getRuntime().exec(cmd).waitFor();
} catch (Exception e) {
logger.error("error while converting to tiff using ghostscript", e);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR, e);
}
if (exitVal != 0) {
logger.error("error while converting to tiff using ghostscript, exitval is {}", exitVal);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR);
}
}
I found that the produced tif from ghostscript strongly differs in quality from the tiff produced by icafe (the group 4 tiff from ghostscript looks greyscale-like)

It's been a while since the question was asked and I finally find time and a wonderful ordered dither matrix which allows me to give some details on how "icafe" can be used to get similar or better results than calling external ghostscript executable. Some new features were added to "icafe" recently such as better quantization and ordered dither algorithms which is used in the following example code.
Here the sample pdf I am going to use is princeCatalogue. Most of the following code is from the OP with some changes due to package name change and more ImageParam control settings.
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import com.icafe4j.image.ImageColorType;
import com.icafe4j.image.ImageParam;
import com.icafe4j.image.options.TIFFOptions;
import com.icafe4j.image.quant.DitherMethod;
import com.icafe4j.image.quant.DitherMatrix;
import com.icafe4j.image.tiff.TIFFTweaker;
import com.icafe4j.image.tiff.TiffFieldEnum.Compression;
import com.icafe4j.io.FileCacheRandomAccessOutputStream;
import com.icafe4j.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "princecatalogue.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL).ditherMatrix(DitherMatrix.getBayer8x8Diag()).applyDither(true).ditherMethod(DitherMethod.BAYER);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
For ghostscript, I used command line directly with the same parameters provided by the OP. The screenshots for the first page of the resulted TIFF images are showing below:
The lefthand side shows the output of "ghostscript" and the righthand side the output of "icafe". It can be seen, at least in this case, the output from "icafe" is better than the output from "ghostscript".
Using CCITTFAX4 compression, the file size from "ghostscript" is 2.22M and the file size from "icafe" is 2.08M. Both are not so good given the fact dither is used while creating the black and white output. In fact, a different compression algorithm will create way smaller file size. For example, using LZW, the same output from "icafe" is only 634K and if using DEFLATE compression the output file size went down to 582K.

Here's some code to save in a multipage tiff which I use with PDFBox. It requires the TIFFUtil class from PDFBox (it isn't public, so you have to make a copy).
void saveAsMultipageTIFF(ArrayList<BufferedImage> bimTab, String filename, int dpi) throws IOException
{
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("tiff");
ImageWriter imageWriter = writers.next();
ImageOutputStream ios = ImageIO.createImageOutputStream(new File(filename));
imageWriter.setOutput(ios);
imageWriter.prepareWriteSequence(null);
for (BufferedImage image : bimTab)
{
ImageWriteParam param = imageWriter.getDefaultWriteParam();
IIOMetadata metadata = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(image), param);
param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
TIFFUtil.setCompressionType(param, image);
TIFFUtil.updateMetadata(metadata, image, dpi);
imageWriter.writeToSequence(new IIOImage(image, null, metadata), param);
}
imageWriter.endWriteSequence();
imageWriter.dispose();
ios.flush();
ios.close();
}
I experimented on this for myself some time ago by using this code:
https://www.java.net/node/670205 (I used solution 2)
However...
If you create an array with lots of images, your memory consumption
really goes up. So it would probably be better to render an image, then
add it to the tiff file, then render the next page and lose the
reference of the previous one so that the gc can get the space if needed.

Refer to my github code for an implementation with PDFBox.

Since some dependencies used by solutions for this problem looks not maintained. I got a solution by using latest version (2.0.16) pdfbox:
ByteArrayOutputStream imageBaos = new ByteArrayOutputStream();
ImageOutputStream output = ImageIO.createImageOutputStream(imageBaos);
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try (final PDDocument document = PDDocument.load(new File("/tmp/tmp.pdf"))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCount = document.getNumberOfPages();
BufferedImage[] images = new BufferedImage[pageCount];
// ByteArrayOutputStream[] baosArray = new ByteArrayOutputStream[pageCount];
writer.setOutput(output);
ImageWriteParam params = writer.getDefaultWriteParam();
params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
// Compression: None, PackBits, ZLib, Deflate, LZW, JPEG and CCITT
// variants allowed
params.setCompressionType("Deflate");
writer.prepareWriteSequence(null);
for (int page = 0; page < pageCount; page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, DPI, ImageType.RGB);
images[page] = image;
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
// ImageIO.write(image, "tiff", baosArray[page]);
}
System.out.println("imageBaos size: " + imageBaos.size());
// Finished write to output
writer.endWriteSequence();
document.close();
} catch (IOException e) {
e.printStackTrace();
throw new Exception(e);
} finally {
// avoid memory leaks
writer.dispose();
}
Then you may using imageBaos write to your local file. But if you want to pass your image to ByteArrayOutputStream and return to privious method like me. Then we need other steps.
After processing is done, the image bytes would be available in the ImageOutputStream
output object. We need to position the offset to the beginning of the output object and then read the butes to write to new ByteArrayOutputStream, a concise way like this:
ByteArrayOutputStream bos = new ByteArrayOutputStream();
long counter = 0;
while (true) {
try {
bos.write(ios.readByte());
counter++;
} catch (EOFException e) {
System.out.println("End of Image Stream");
break;
} catch (IOException e) {
System.out.println("Error processing the Image Stream");
break;
}
}
return bos
Or you can just ImageOutputStream.flush() at end to get your imageBaos Byte then return.

Inspired by Yusaku answer,
I made my own version,
This can convert multiple pdf pages to a byte array.
I Used pdfbox 2.0.16 in combination with imageio-tiff 3.4.2
//PDF converter to tiff toolbox method.
private byte[] bytesToTIFF(#Nonnull byte[] in) {
int dpi = 300;
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try(ByteArrayOutputStream imageBaos = new ByteArrayOutputStream(255)){
writer.setOutput(ImageIO.createImageOutputStream(imageBaos));
writer.prepareWriteSequence(null);
PDDocument document = PDDocument.load(in);
PDFRenderer pdfRenderer = new PDFRenderer(document);
ImageWriteParam params = writer.getDefaultWriteParam();
for (int page = 0; page < document.getNumberOfPages(); page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, dpi, ImageType.RGB);
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
}
LOG.trace("size found: {}", imageBaos.size());
writer.endWriteSequence();
writer.reset();
return imageBaos.toByteArray();
} catch (Exception ex) {
LOG.warn("can't instantiate the bytesToTiff method with: PDF", ex);
} finally {
writer.dispose();
}
}

Related

Convert CCITT Group 3 1-Dimensional TIFF to PDF using iText in Java

I am experiencing an EOF Exception as follows when attempting to read tiff files using iText 5.5.10
ExceptionConverter: java.io.EOFException
at com.itextpdf.text.pdf.RandomAccessFileOrArray.readFully(RandomAccessFileOrArray.java:249)
at com.itextpdf.text.pdf.RandomAccessFileOrArray.readFully(RandomAccessFileOrArray.java:241)
at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:209)
at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:314)
at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:302)
at com.itextpdf.text.Image.getInstance(Image.java:428)
at com.itextpdf.text.Image.getInstance(Image.java:374)
at TiffToPdf.main(TiffToPdf.java:137)
The code I am using is:
byte[] data = null;
Image img = null;
try {
data = Files.readAllBytes(Paths.get("tiff.tif"));
img = Image.getInstance(data, true);
}
catch (Exception e) {
e.printStackTrace();
}
I have tried skipping the Image step and using the TiffImage class explicitly but I experience the same error.
byte[] data = null;
Image img = null;
try {
data = Files.readAllBytes(Paths.get("tiff.tif"));
RandomAccessSourceFactory factory = new RandomAccessSourceFactory();
RandomAccessSource fileBytes = factory.createSource(data);
RandomAccessFileOrArray s = new RandomAccessFileOrArray(fileBytes);
img = TiffImage.getTiffImage(s, true, 1, true);
}
catch (Exception e) {
e.printStackTrace();
}
I noticed that there are 2 classes within iText called TIFFFaxDecompressor and TIFFFaxDecoder but I haven't been able to find any resources online on how to use them.

with your given tiff image, the following code does worked for me i.e., converted to pdf successfully.
byte[] data = null;
com.itextpdf.text.Image img = null;
try {
//System.out.println(Paths.get("src/main/resources/tiff.tif"));
data = Files.readAllBytes(Paths.get("src/main/resources/file.tif"));
RandomAccessSourceFactory factory = new RandomAccessSourceFactory();
RandomAccessSource fileBytes = factory.createSource(data);
RandomAccessFileOrArray s = new RandomAccessFileOrArray(fileBytes);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("src/main/resources/destination.pdf"));
document.open();
int pages = TiffImage.getNumberOfPages(s);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(s, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
} catch (Exception e) {
e.printStackTrace();
}

Java. Broken image from URL

I am trying to load this image from URL, but receive image like this.
Code:
#Override
protected void paintComponent(Graphics g) {
super.paintComponent(g);
Graphics2D g2 = (Graphics2D)g;
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
URL url = new URL("http://s.developers.org.ua/img/announces/java_1.jpg");
BufferedInputStream in = new BufferedInputStream(url.openStream());
byte[] b = new byte[512];
while (in.read(b)!=-1)
out.write(b);
Image img = ImageIO.read(new ByteArrayInputStream(out.toByteArray()));
g2.drawImage(img, 0, 0, getWidth(), getHeight(), null);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Don't read images inside the paintComponent method, it will make your application appear sluggish, as the method is executed on the event dispatcher thread (EDT). Also, it will be re-read, whenever your component is re-painted, meaning you'll download the image over and over. Instead, read it up front, or in a separate thread (ie. use a SwingWorker), and only invoke g.drawImage(...) from inside the paintComponent method.
The reason for the broken image is your byte copying code, where you don't pay attention to how many bytes are read (as long as the value isn't -1), but instead unconditionally copy 512 bytes. However, you don't need to do that here, you can simply pass the stream to ImageIO.read, like this, making the code simpler and more readable:
URL url = new URL("http://s.developers.org.ua/img/announces/java_1.jpg");
try (BufferedInputStream in = new BufferedInputStream(url.openStream())) {
BufferedImage img = ImageIO.read(in);
}
Adding the extra try (try-with-resources) block makes sure your stream is also properly closed to avoid resource leaks.
For completeness, to fix the byte copying code, the correct version would be:
// ... as above ...
byte[] b = new byte[512];
int bytesRead; // Keep track of the number of bytes read into 'b'
while ((bytesRead = in.read(b)) != -1)
out.write(b, 0, bytesRead);

I don't know if this is the only problem, but you might write more than you get.
I suggest that you change your writing code to:
int len;
while ((len=in.read(b))!=-1)
out.write(b, 0, len);
Otherwise, if the last buffer is not exactly 512 bytes long, you'll write too much

I have some code copy file from URL to Local.. So far the result is same like actual source. Just do some modification maybe can help to solved it.
import java.awt.Image;
import java.awt.image.BufferedImage;
import java.io.File;
import java.net.URL;
import java.util.ArrayList;
import org.apache.commons.io.FilenameUtils;
import javax.imageio.ImageIO;
public class ImagesUrlToImagesLocal {
public ArrayList<String> getIt(ArrayList<String> urlFile)
{
ArrayList<String> strResult = new ArrayList<String>();
Image imagesUrl = null;
String baseName = null;
String extension = null;
File outputfile = null;
try {
for (int i = 0; i < urlFile.size(); i++)
{
URL url = new URL(urlFile.get(i));
baseName = FilenameUtils.getBaseName(urlFile.get(i));
extension = FilenameUtils.getExtension(urlFile.get(i));
imagesUrl = ImageIO.read(url);
BufferedImage image = (BufferedImage) imagesUrl;
outputfile = new File("temp_images/" + baseName + "." + extension);
ImageIO.write(image, extension, outputfile);
strResult.add("temp_images/" + baseName + "." + extension);
}
} catch (Exception e) {
e.printStackTrace();
}
return strResult;
}
}

Converting PDF to multipage tiff (Group 4)

I'm trying to convert PDFs as represented by the org.apache.pdfbox.pdmodel.PDDocument class and the icafe library (https://github.com/dragon66/icafe/) to a multipage tiff with group 4 compression and 300 dpi. The sample code works for me for 288 dpi but strangely NOT for 300 dpi, the exported tiff remains just white. Has anybody an idea what the issue is here?
The sample pdf which I use in the example is located here: http://www.bergophil.ch/a.pdf
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import cafe.image.ImageColorType;
import cafe.image.ImageParam;
import cafe.image.options.TIFFOptions;
import cafe.image.tiff.TIFFTweaker;
import cafe.image.tiff.TiffFieldEnum.Compression;
import cafe.io.FileCacheRandomAccessOutputStream;
import cafe.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "a.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
Or is there another library to write multi-page TIFFs?
EDIT:
Thanks to dragon66 the bug in icafe is now fixed. In the meantime I experimented with other libraries and also with invoking ghostscript. As I think ghostscript is very reliable as id is a widely used tool, on the other hand I have to rely that the user of my code has an ghostscript-installation, something like this:
/**
* Converts a given pdf as specified by its path to an tiff using group 4 compression
*
* #param pdfFilePath The absolute path of the pdf
* #param tiffFilePath The absolute path of the tiff to be created
* #param dpi The resolution of the tiff
* #throws MyException If the conversion fails
*/
private static void convertPdfToTiffGhostscript(String pdfFilePath, String tiffFilePath, int dpi) throws MyException {
// location of gswin64c.exe
String ghostscriptLoc = context.getGhostscriptLoc();
// enclose src and dest. with quotes to avoid problems if the paths contain whitespaces
pdfFilePath = "\"" + pdfFilePath + "\"";
tiffFilePath = "\"" + tiffFilePath + "\"";
logger.debug("invoking ghostscript to convert {} to {}", pdfFilePath, tiffFilePath);
String cmd = ghostscriptLoc + " -dQUIET -dBATCH -o " + tiffFilePath + " -r" + dpi + " -sDEVICE=tiffg4 " + pdfFilePath;
logger.debug("The following command will be invoked: {}", cmd);
int exitVal = 0;
try {
exitVal = Runtime.getRuntime().exec(cmd).waitFor();
} catch (Exception e) {
logger.error("error while converting to tiff using ghostscript", e);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR, e);
}
if (exitVal != 0) {
logger.error("error while converting to tiff using ghostscript, exitval is {}", exitVal);
throw new MyException(ErrorMessages.GHOSTSTSCRIPT_ERROR);
}
}
I found that the produced tif from ghostscript strongly differs in quality from the tiff produced by icafe (the group 4 tiff from ghostscript looks greyscale-like)

It's been a while since the question was asked and I finally find time and a wonderful ordered dither matrix which allows me to give some details on how "icafe" can be used to get similar or better results than calling external ghostscript executable. Some new features were added to "icafe" recently such as better quantization and ordered dither algorithms which is used in the following example code.
Here the sample pdf I am going to use is princeCatalogue. Most of the following code is from the OP with some changes due to package name change and more ImageParam control settings.
import java.awt.image.BufferedImage;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import com.icafe4j.image.ImageColorType;
import com.icafe4j.image.ImageParam;
import com.icafe4j.image.options.TIFFOptions;
import com.icafe4j.image.quant.DitherMethod;
import com.icafe4j.image.quant.DitherMatrix;
import com.icafe4j.image.tiff.TIFFTweaker;
import com.icafe4j.image.tiff.TiffFieldEnum.Compression;
import com.icafe4j.io.FileCacheRandomAccessOutputStream;
import com.icafe4j.io.RandomAccessOutputStream;
public class Pdf2TiffConverter {
public static void main(String[] args) {
String pdf = "princecatalogue.pdf";
PDDocument pddoc = null;
try {
pddoc = PDDocument.load(pdf);
} catch (IOException e) {
}
try {
savePdfAsTiff(pddoc);
} catch (IOException e) {
}
}
private static void savePdfAsTiff(PDDocument pdf) throws IOException {
BufferedImage[] images = new BufferedImage[pdf.getNumberOfPages()];
for (int i = 0; i < images.length; i++) {
PDPage page = (PDPage) pdf.getDocumentCatalog().getAllPages()
.get(i);
BufferedImage image;
try {
// image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 288); //works
image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 300); // does not work
images[i] = image;
} catch (IOException e) {
e.printStackTrace();
}
}
FileOutputStream fos = new FileOutputStream("a.tiff");
RandomAccessOutputStream rout = new FileCacheRandomAccessOutputStream(
fos);
ImageParam.ImageParamBuilder builder = ImageParam.getBuilder();
ImageParam[] param = new ImageParam[1];
TIFFOptions tiffOptions = new TIFFOptions();
tiffOptions.setTiffCompression(Compression.CCITTFAX4);
builder.imageOptions(tiffOptions);
builder.colorType(ImageColorType.BILEVEL).ditherMatrix(DitherMatrix.getBayer8x8Diag()).applyDither(true).ditherMethod(DitherMethod.BAYER);
param[0] = builder.build();
TIFFTweaker.writeMultipageTIFF(rout, param, images);
rout.close();
fos.close();
}
}
For ghostscript, I used command line directly with the same parameters provided by the OP. The screenshots for the first page of the resulted TIFF images are showing below:
The lefthand side shows the output of "ghostscript" and the righthand side the output of "icafe". It can be seen, at least in this case, the output from "icafe" is better than the output from "ghostscript".
Using CCITTFAX4 compression, the file size from "ghostscript" is 2.22M and the file size from "icafe" is 2.08M. Both are not so good given the fact dither is used while creating the black and white output. In fact, a different compression algorithm will create way smaller file size. For example, using LZW, the same output from "icafe" is only 634K and if using DEFLATE compression the output file size went down to 582K.

Here's some code to save in a multipage tiff which I use with PDFBox. It requires the TIFFUtil class from PDFBox (it isn't public, so you have to make a copy).
void saveAsMultipageTIFF(ArrayList<BufferedImage> bimTab, String filename, int dpi) throws IOException
{
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("tiff");
ImageWriter imageWriter = writers.next();
ImageOutputStream ios = ImageIO.createImageOutputStream(new File(filename));
imageWriter.setOutput(ios);
imageWriter.prepareWriteSequence(null);
for (BufferedImage image : bimTab)
{
ImageWriteParam param = imageWriter.getDefaultWriteParam();
IIOMetadata metadata = imageWriter.getDefaultImageMetadata(new ImageTypeSpecifier(image), param);
param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
TIFFUtil.setCompressionType(param, image);
TIFFUtil.updateMetadata(metadata, image, dpi);
imageWriter.writeToSequence(new IIOImage(image, null, metadata), param);
}
imageWriter.endWriteSequence();
imageWriter.dispose();
ios.flush();
ios.close();
}
I experimented on this for myself some time ago by using this code:
https://www.java.net/node/670205 (I used solution 2)
However...
If you create an array with lots of images, your memory consumption
really goes up. So it would probably be better to render an image, then
add it to the tiff file, then render the next page and lose the
reference of the previous one so that the gc can get the space if needed.

Refer to my github code for an implementation with PDFBox.

Since some dependencies used by solutions for this problem looks not maintained. I got a solution by using latest version (2.0.16) pdfbox:
ByteArrayOutputStream imageBaos = new ByteArrayOutputStream();
ImageOutputStream output = ImageIO.createImageOutputStream(imageBaos);
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try (final PDDocument document = PDDocument.load(new File("/tmp/tmp.pdf"))) {
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCount = document.getNumberOfPages();
BufferedImage[] images = new BufferedImage[pageCount];
// ByteArrayOutputStream[] baosArray = new ByteArrayOutputStream[pageCount];
writer.setOutput(output);
ImageWriteParam params = writer.getDefaultWriteParam();
params.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
// Compression: None, PackBits, ZLib, Deflate, LZW, JPEG and CCITT
// variants allowed
params.setCompressionType("Deflate");
writer.prepareWriteSequence(null);
for (int page = 0; page < pageCount; page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, DPI, ImageType.RGB);
images[page] = image;
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
// ImageIO.write(image, "tiff", baosArray[page]);
}
System.out.println("imageBaos size: " + imageBaos.size());
// Finished write to output
writer.endWriteSequence();
document.close();
} catch (IOException e) {
e.printStackTrace();
throw new Exception(e);
} finally {
// avoid memory leaks
writer.dispose();
}
Then you may using imageBaos write to your local file. But if you want to pass your image to ByteArrayOutputStream and return to privious method like me. Then we need other steps.
After processing is done, the image bytes would be available in the ImageOutputStream
output object. We need to position the offset to the beginning of the output object and then read the butes to write to new ByteArrayOutputStream, a concise way like this:
ByteArrayOutputStream bos = new ByteArrayOutputStream();
long counter = 0;
while (true) {
try {
bos.write(ios.readByte());
counter++;
} catch (EOFException e) {
System.out.println("End of Image Stream");
break;
} catch (IOException e) {
System.out.println("Error processing the Image Stream");
break;
}
}
return bos
Or you can just ImageOutputStream.flush() at end to get your imageBaos Byte then return.

Inspired by Yusaku answer,
I made my own version,
This can convert multiple pdf pages to a byte array.
I Used pdfbox 2.0.16 in combination with imageio-tiff 3.4.2
//PDF converter to tiff toolbox method.
private byte[] bytesToTIFF(#Nonnull byte[] in) {
int dpi = 300;
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try(ByteArrayOutputStream imageBaos = new ByteArrayOutputStream(255)){
writer.setOutput(ImageIO.createImageOutputStream(imageBaos));
writer.prepareWriteSequence(null);
PDDocument document = PDDocument.load(in);
PDFRenderer pdfRenderer = new PDFRenderer(document);
ImageWriteParam params = writer.getDefaultWriteParam();
for (int page = 0; page < document.getNumberOfPages(); page++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(page, dpi, ImageType.RGB);
IIOMetadata metadata = writer.getDefaultImageMetadata(new ImageTypeSpecifier(image), params);
writer.writeToSequence(new IIOImage(image, null, metadata), params);
}
LOG.trace("size found: {}", imageBaos.size());
writer.endWriteSequence();
writer.reset();
return imageBaos.toByteArray();
} catch (Exception ex) {
LOG.warn("can't instantiate the bytesToTiff method with: PDF", ex);
} finally {
writer.dispose();
}
}

ImageIO saves back to original size

I've been searching for some solutions from the internet yet I still haven't found an answer to my problem.
I've been working or doing a program that would get an image file from my PC then will be edited using Java Graphics to add some text/object/etc. After that, Java ImageIO will save the newly modified image.
So far, I was able to do it nicely but I got a problem about the size of the image. The original image and the modified image didn't have the same size.
The original is a 2x3inches-image while the modified one which supposedly have 2x3inches too sadly got 8x14inches. So, it has gone BIGGER than the original one.
What is the solution/code that would give me an output of 2x3inches-image which will still have a 'nice quality'?
UPDATE:
So, here's the code I used.
public Picture(String filename) {
try {
File file = new File("originalpic.jpg");
image = ImageIO.read(file);
width = image.getWidth();
}
catch (IOException e) {
throw new RuntimeException("Could not open file: " + filename);
}
}
private void write(int id) {
try {
ImageIO.write(image, "jpg", new File("newpic.jpg"));
} catch (IOException e) {
e.printStackTrace();
}
}
2nd UPDATE:
I now know what's the problem of the new image. As I check it from Photoshop, It has a different image resolution compared to the original one. The original has a 300 pixels/inch while the new image has a 72 pixels/inch resolution.
How will I be able to change the resolution using Java?

To set the image resolution (of the JFIF segment), you can probably use the IIOMetatada for JPEG.
Something along the lines of:
public class MetadataTest {
public static void main(String[] args) throws IOException {
BufferedImage image = new BufferedImage(100, 100, BufferedImage.TYPE_3BYTE_BGR);
ImageWriter writer = ImageIO.getImageWritersByFormatName("jpeg").next();
writer.setOutput(ImageIO.createImageOutputStream(new File("foo.jpg")));
ImageWriteParam param = writer.getDefaultWriteParam();
IIOMetadata metadata = writer.getDefaultImageMetadata(ImageTypeSpecifier.createFromRenderedImage(image), param);
IIOMetadataNode root = (IIOMetadataNode) metadata.getAsTree(metadata.getNativeMetadataFormatName());
IIOMetadataNode jfif = (IIOMetadataNode) root.getElementsByTagName("app0JFIF").item(0);
jfif.setAttribute("resUnits", "1");
jfif.setAttribute("Xdensity", "300");
jfif.setAttribute("Ydensity", "300");
metadata.mergeTree(metadata.getNativeMetadataFormatName(), root);
writer.write(null, new IIOImage(image, null, metadata), param);
}
}
Note: this code should not be used verbatim, but adding iteration, error handling, stream closing etc, clutters the example too much.
See JPEG Image Metadata DTD for documentation on the metadata format, and what options you can control.

Converting PDF Pages to JPG on Java-GAE

I am searching for a open-source java-library that enables me to render single pages of PDFs as JPG or PNG on server-side.
Unfortunately it mustn't use any other java.awt.* classes then
java.awt.datatransfer.DataFlavor
java.awt.datatransfer.MimeType
java.awt.datatransfer.Transferable
If there is any way, a little code-snippet would be fantastic.

i believe icepdf might have what you are looking for.
I've used this open source project a while back to turn uploaded pdfs into images for use in an online catalog.
import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
public byte[][] convert(byte[] pdf, String format) {
Document document = new Document();
try {
document.setByteArray(pdf, 0, pdf.length, null);
} catch (PDFException ex) {
System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
System.out.println("Error file not found " + ex);
} catch (IOException ex) {
System.out.println("Error handling PDF document " + ex);
}
byte[][] imageArray = new byte[document.getNumberOfPages()][];
// save page captures to bytearray.
float scale = 1.75f;
float rotation = 0f;
// Paint each pages content to an image and write the image to file
for (int i = 0; i < document.getNumberOfPages(); i++) {
BufferedImage image = (BufferedImage)
document.getPageImage(i,
GraphicsRenderingHints.SCREEN,
Page.BOUNDARY_CROPBOX, rotation, scale);
try {
//get the picture util object
PictureUtilLocal pum = (PictureUtilLocal) Component
.getInstance("pictureUtil");
//load image into util
pum.loadBuffered(image);
//write image in desired format
imageArray[i] = pum.imageToByteArray(format, 1f);
System.out.println("\t capturing page " + i);
} catch (IOException e) {
e.printStackTrace();
}
image.flush();
}
// clean up resources
document.dispose();
return imageArray;
}
Word of caution though, I have had trouble with this library throwing a SegFault on open-jdk. worked fine on Sun's. Not sure what it would do on GAE. I can't remember what version it was that had the problem so just be aware.

You can apache PDF box APi for this purpose and use following to code to convert two pdfs into JPG page by page .
public void convertPDFToJPG(String src,String FolderPath){
try{
File folder1 = new File(FolderPath+"\\");
comparePDF cmp=new comparePDF();
cmp.rmdir(folder1);
//load pdf file in the document object
PDDocument doc=PDDocument.load(new FileInputStream(src));
//Get all pages from document and store them in a list
List<PDPage> pages=doc.getDocumentCatalog().getAllPages();
//create iterator object so it is easy to access each page from the list
Iterator<PDPage> i= pages.iterator();
int count=1; //count variable used to separate each image file
//Convert every page of the pdf document to a unique image file
System.out.println("Please wait...");
while(i.hasNext()){
PDPage page=i.next();
BufferedImage bi=page.convertToImage();
ImageIO.write(bi, "jpg", new File(FolderPath+"\\Page"+count+".jpg"));
count++;
}
System.out.println("Conversion complete");
}catch(IOException ie){ie.printStackTrace();}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert pdf file in to multipagetiff images using pdfBox? [duplicate] - java

Refer to my github code for an implementation with PDFBox.

Related

Convert CCITT Group 3 1-Dimensional TIFF to PDF using iText in Java

Java. Broken image from URL

Converting PDF to multipage tiff (Group 4)

ImageIO saves back to original size

Converting PDF Pages to JPG on Java-GAE

Categories

Resources