What is the best solution to compress PDF with PDFBox?

What is the best solution to compress PDF with PDFBox? - java

I have a PDF file to save, but first I have to compress it with the best possible quality and I must use open source library (like Apache PDFBox®).
So, until now what I do is get all the image type resources, compress them and put them back in the PDF, but the compression ratio is to low. This is just a fragment of the code where I assign the compression parameters:
PDImageXObject imageXObject = (PDImageXObject) pdxObject;
ImageWriter imageWriter = ImageIO
.getImageWritersByFormatName(FileType.JPEG.name().toLowerCase()).next();
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(COMPRESSION_FACTOR);
There is some other mechanism to optimize a PDF, so far only compress the images shows a slightly poor result.

On compression. Indeed, images probably are the largest culprits.
Images: The image size, width and height, contribute to the file size too, not only the lossy image quality (your COMPRESSION_FACTOR). In general I would start with compressing a JPEG file outside the PDF. Then you can find the best compression, that still shows and prints (!) adequately. Photos JPEG, vector graphics (like diagrams) can best be done with Encapsulated PostScript.
Repeated images like page logos should not be stored repeatedly. The optimisation here is internet streaming.
Fonts: The default fonts need no space, the full fonts need the most space (for PDFs with forms for instance). Embedded fonts are a third possibility, only loading the symbols one needs.
PDFs own binary data: Text and other parts can be uncompressed, compressed using only 7bits ASCII, and further compressed using all bytes. The ASCII option is a bit outdated.
At the moment I am not using pdfbox, hence I leave that to you.

Related

How to reduce the size of split PDF document in PDFBox? [duplicate]

I have a PDF file to save, but first I have to compress it with the best possible quality and I must use open source library (like Apache PDFBox®).
So, until now what I do is get all the image type resources, compress them and put them back in the PDF, but the compression ratio is to low. This is just a fragment of the code where I assign the compression parameters:
PDImageXObject imageXObject = (PDImageXObject) pdxObject;
ImageWriter imageWriter = ImageIO
.getImageWritersByFormatName(FileType.JPEG.name().toLowerCase()).next();
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(COMPRESSION_FACTOR);
There is some other mechanism to optimize a PDF, so far only compress the images shows a slightly poor result.

On compression. Indeed, images probably are the largest culprits.
Images: The image size, width and height, contribute to the file size too, not only the lossy image quality (your COMPRESSION_FACTOR). In general I would start with compressing a JPEG file outside the PDF. Then you can find the best compression, that still shows and prints (!) adequately. Photos JPEG, vector graphics (like diagrams) can best be done with Encapsulated PostScript.
Repeated images like page logos should not be stored repeatedly. The optimisation here is internet streaming.
Fonts: The default fonts need no space, the full fonts need the most space (for PDFs with forms for instance). Embedded fonts are a third possibility, only loading the symbols one needs.
PDFs own binary data: Text and other parts can be uncompressed, compressed using only 7bits ASCII, and further compressed using all bytes. The ASCII option is a bit outdated.
At the moment I am not using pdfbox, hence I leave that to you.

Image size increased twice when convert from JPG to PNG using thumbnailator

Am using Thumbnailator to compress the image in my application. Everything is work fine alone when i try to convert the JPG image to PNG. At this process the size of an image getting twice after compressing. Following code is am used to convert image.
File a=new File("C:\\Users\\muthu\\Downloads\\SampleJPGImage_5mbmb.jpg");
Thumbnails.of(a).scale(1).outputQuality(0.5).toFile("C:\\Users\\muthu\\Downloads\\SampleJPGImage_5mbmb1.png");
using pure java also doing same and code is follows
BufferedImage bufferedImage = ImageIO.read(new File("C:\\Users\\muthu\\Downloads\\SampleJPGImage_5mbmb.jpg"));
ImageIO.write(bufferedImage, "png", new File("C:\\Users\\muthu\\Downloads\\javaPngimage.png"));
Ex: 5MB image file is converted to 32MB file. I should not go for resize to compress. Am stuck with this

JPEG and PNG are both compressed image formats.
JPEG compresses the pixels using frequency transforms and quantisation. It can be a lossy or lossless compression format.
PNG is a lossless compression format with different compression mechanisms. I dare say the "quality" parameter doesn’t actually change the image at all.
The biggest image file type would be BMP (.bmp), which is 3 bytes (RGB) for each pixel plus a header. It’s worth keeping this size in mind when deciding if an image file is "big" or not. JPEG compression is pretty good.
It sounds like your image has a lot of details that can be compressed well in the frequency domain (JPEG) but compress poorly as PNG.
Simplest solution: a JPEG format thumbnail. If you needed to use PNG, and you were resizing your image, I’d suggest resize JPEG then convert to PNG.

Java LZW Compresss & Decompress with Image

I've checked many sources about LZW compression but it didn't work with image file.
Here are the resources what I have checked so far:
https://www.codemiles.com/java/lzw-data-compression-decompression-algorithm-java-code-t99.html
This one the compress file is bigger than original file
https://codereview.stackexchange.com/questions/122080/simplifying-lzw-compression-decompression
Could you please give any resource that work with image compression?
Thanks!!!

Compressing an already compressed image is not a good idea, because the first compression removes any statistical hints the second compressor can use. That's at least true for contemporary compression algorithms, like they are used in the JPEG, PNG, GIF, TIFF, and WebP image formats. Typically, a compressed file, viewed in a hex editor, looks quite like a stream of random bytes, and random data (or non-random data with statistical properties similar to random data) cannot be compressed. Usually the result is even bigger than the original, due to some overhead in the storage format. Clever compressors detect this condition and revert to simply storing the original data, rather than compressing it.
So if you think that your image might be compressed further, you'll have to decompress it first. Then you can try a different compressor that might yield better results. However, I doubt that any LZW variant will give you any significant gain over JPEG. While it's a really clever enhancement of the Lempel-Ziv family of compression algorithms, LZW is a purely lossless technique, and hence has some innate limitation of the attainable compression ratio, rooted in the statistical distribution of the image data. JPEG and other lossy image formats trade image quality for size, and thus can easily outperform lossless techniques.
Note that the GIF format is a special case. While it uses lossless LZW compression, it requires a color palette of up to 256 entries. To encode a colorful image like a photograph as GIF, you'll have to quantize the color space first to get a 256-color palette. This is once again a lossy technique, albeit quite different from the algorithms used by JPEG and WebP lossy. Quantized GIF images of photos compress excellently due to the reduction of RGB information in the image, but expose noticeable deteriorations of color gradients, like they are found in human faces, flower leaves, and a cloudy heaven.
As an aside: If GIF would allow larger color palettes (say, 1024), it might become a real killer format for photographic images. Maybe it's time for a GIF17a format update?!

Reduce image size (bytes) in ITextPDF

I'm using the itext PDF library to build a very image-intensive PDF document in Java. Each page has a dozen images on it. The original source images are very high resolution, and I'm using scaleToFit to render the image to the size I need.
The problem I have is that the PDF document is still very large. My understanding is that the entire original high resolution image is being included, and the scaling I'm using only affects the actual rendering, not the size of the image that's included in the file.
I've verified this by removing the scaling — the pages were rendered with the high resolution images overlapping each other and the edge of pages, and the PDF was the same size as when the scaling was in place.
So, here's the question — how can I reduce the size of the PDF file by scaling down each image? If I lose a little bit of image quality that's ok. Rescaling the source images manually will be difficult.

So I've found a way to do it. I now load the image into a BufferedImage, and then scale that using the hints found here: how do I scale a BufferedImage.
This gives me a BufferedImage — I then convert this into an iText image using
Image returnedImage = Image.getInstance ( pcb, bufferedImage, quality );
Where quality is currently 0.6. That's acceptable for the work I'm doing.

How to save optimized png images with java's ImageIO?

I am generating lots of images in java and saving them through the ImageIO.write method like this:
final BufferedImage img = createSomeImage();
ImageIO.write( img, "png", new File( "/some/file.png" );
I was happy with the results until Google's firefox addon 'Page Speed' told me that i can save up to 60% of the size if i optimize the images. The images are QR codes, their size is around 900B each and the firefox-plugin optimized versions are around 300B.
I'd like to save such optimized 300B Images directly from java.
So here my question again: How to save optimized png images with java's ImageIO?

Use PngEncoderB to convert your BufferedImage into a PNG encoded byte array.
You can apply a filter to it, which helps prepare the image for better optimization. This is what OptiPNG does, only OptiPNG calculates which filter will get you the best compression.
You might have to try applying each filter to see which one is consistently better for you. With 2 bit color, I think the only filter that might help is "up", so I'm guessing that's the one to use.
Once you get the image to a PNG encoded byte array, you can write that directly to a file.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.