I'm trying to get a BufferedImage from PDXObjectImage that has png suffix with:
PDResources pdResources = pdPage.getResources();
Map<String, PDXObject> xobjects = (Map<String, PDXObject>) pdResources.getXObjects();
if (xobjects != null) {
for (String key : xobjects.keySet()) {
PDXObject xobject = xobjects.get(key);
if (xobject instanceof PDXObjectImage) {
PDXObjectImage imageObject = (PDXObjectImage) xobject;
String suffix = imageObject.getSuffix();
if (suffix != null) {
BufferedImage image = imageObject.getRGBImage();
}
}
}
}
this code works fine having jpg PDXObjectImages but image is null with png images.
What is the right way to get a BufferedImage from a PDXObjectImage that has PNG suffix?
I also tried :
BufferedImage image = ImageIO.read(((PDPixelMap)imageObject).getPDStream().createInputStream());
But again image is null.
I'm using org.apache.pdfbox version 1.8.11.
Finally moved to version 2.0 of PDFBox then got a clear warning that I have not installed jbig2 decoder and solved the problem adding the following dependency in maven.
<dependency>
<groupId>com.levigo.jbig2</groupId>
<artifactId>levigo-jbig2-imageio</artifactId>
<version>1.6.5</version>
</dependency>
#TilmanHausherr thanks.
Related
Im building a chat platform and Im implementing upload-attachments.
For that I need to use a couple of libraries to compress the files (pdf, image, video).
Im using lambda to do this.
Image compression is working fine.
Video compression Im working on it right now, to compress (mp4, avi, x264).
The difficult part is the PDF.
In the PDF there is a case that it contains images inside, or something else... To use this library, Im using this
` <dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.0-RC1</version>
</dependency>`
And the code for the compression is this
public class PdfCompressor implements MediaCompressor {
public byte[] compress(MediaData media) throws IOException {
byte[] data = media.getData();
if (data == null) {
throw new IllegalArgumentException("Data must not be null");
}
try (PDDocument document = Loader.loadPDF(new ByteArrayInputStream(data));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
for (PDPage page : document.getPages()) {
PDResources resources = page.getResources();
if (resources == null) {
continue;
}
for (COSName xObjectName : resources.getXObjectNames()) {
PDXObject xObject = resources.getXObject(xObjectName);
if (xObject instanceof PDImageXObject) {
compressImage((PDImageXObject) xObject, document, outputStream);
}
}
}
return outputStream.toByteArray();
}
}
private void compressImage(PDImageXObject imageObject, PDDocument document, ByteArrayOutputStream outputStream) throws IOException {
// Only compress supported image formats
String format = imageObject.getSuffix();
if (format == null || !(format.equals("jpg") || format.equals("jpeg") || format.equals("png") || format.equals("bmp"))) {
outputStream.write(imageObject.getStream().toByteArray());
return;
}
// Compress the image
BufferedImage image = imageObject.getImage();
ByteArrayOutputStream compressedStream = new ByteArrayOutputStream();
PDImageXObject compressedImage = LosslessFactory.createFromImage(document, image);
ImageIO.write(compressedImage.getImage(), format, compressedStream);
byte[] compressedImageData = compressedStream.toByteArray();
// Check if compressed size is less than original size
if (compressedImageData.length < imageObject.getStream().getLength()) {
outputStream.write(compressedImageData);
} else {
outputStream.write(imageObject.getStream().toByteArray());
}
}
}
But the problem is that the code is increasing the size of the PDF, and not only that, it is also damaging the file, which means that I cant open it after that.
Lambda, trying to upload it to S3 bucket.
I found in this forum some pretty good solutions how to extract images from PDF documents by using PDFBox. I used the following code snipped, that I found in one post:
PDPageTree list = document.getPages();
for (PDPage page : list) {
PDResources pdResources = page.getResources();
for (COSName c : pdResources.getXObjectNames()) {
try {
PDXObject imageObj = pdResources.getXObject(c);
if (imageObj instanceof PDImageXObject) {
// same image to list
BufferedImage bImage = ((PDImageXObject) imageObj).getImage();
acceptedImages.add(bImage);
}
} catch (MissingImageReaderException mex) {
log.warn("Missing Image Reader for format: ", mex);
}
}
}
But I got the problem, that in rare cases, some extracted images have a wrong orientation. When I look at the PDF document, the pictures are displayed correctl. But some of the extracted images are rotated by n x 90° degrees. I guess the rotation information is stored somewhere in the PDF?
Run the PrintImageLocations.java example from the source code download (or here) and analyse the CTM ("current transformation matrix") to extract the rotation with Math.round(Math.toDegrees(Math.atan2(ctmNew.getShearY(), ctmNew.getScaleY()))).
Task: My task is to extract all images from pdf and save it to a local file.
Problem: When I am trying to extract images from pdf, I have seen that all images are not loading in properly.
Consider my pdf as 3 images. When I am trying to extract images I have seen that few images are saving in rotated, inverted or with some other rotation angles.
I am not sure what properties does that image consists of in pdf.
Question:
1. I want to know whether an image is rotated or not?
2. If Yes, then I want to know in which angle is that rotated. So that I can change that rotation and save it to my local.
Is there any method or property by which we can know the above properties?
Below is the code I have used:
#Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException {
String operation = operator.getName();
if ("Do".equals(operation)) {
COSName objectName = (COSName) operands.get(0);
PDXObject xobject = getResources().getXObject(objectName);
if (xobject instanceof PDImageXObject) {
PDImageXObject image = (PDImageXObject) xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
System.out.println(image.getMetadata());//output null
// same image to local
BufferedImage bImage = new BufferedImage(imageWidth, imageHeight, BufferedImage.TYPE_INT_ARGB);
bImage = image.getImage();
System.out.println(bImage.getPropertyNames());//output null
ImageIO.write(bImage, "PNG", new File("C:/PdfBox_Examples/" + "image_" + imageNumber + ".png"));
System.out.println("Image saved.");
imageNumber++;
} else if (xobject instanceof PDFormXObject) {
PDFormXObject form = (PDFormXObject) xobject;
showForm(form);
}
} else {
super.processOperator(operator, operands);
}
}
I've some very basic code which inserts an image into an existing PDF:
public class InsertImg
{
public static void main (final String[] args) throws IOException
{
PDDocument document = PDDocument.load (new File ("original.pdf"));
PDPage page = document.getPage (0);
byte[] imgBytes = Files.readAllBytes (Paths.get ("signature.png"));
PDImageXObject pdImage = PDImageXObject.createFromByteArray (document, imgBytes, "name_of_image");
PDPageContentStream content = new PDPageContentStream (document, page, AppendMode.APPEND, true, true);
content.drawImage (pdImage, 50.0f, 350.0f, 100.0f, 25.0f);
content.close ();
document.save (new File ("result.pdf"));
document.close ();
}
}
While this code worked fine in PdfBox 2.08 for all image files, it works under version 2.012 only for some images and does not work anymore for all image files.
(Background: We would like to insert an image of a signature into an existing and already generated letter. The signatures are all generated with the same software. In version 2.12 not all signatures can be inserted anymore. In version 2.08 all signature could be inserted).
The generated pdf-file "result.pdf" cannot be opened in Acrobat Reader. Acrobat Reader shows only the original pdf "original.pdf", but does not display the signature-image. It says "error in page. please contact the creator of the pdf".
However, most images can be inserted, so it is likely that the problem depends on the very image used.
The images are all ok, they are png's and where checked and verified with various imaging programs, e.g. gimp or irfanview.
Furthermore, the code above has always worked fine with PdfBox 2.08. After an update of PdfBox to version 2.12, the problem showed up and also the newest version 2.16 still produces the error. Still on the same image files, and still not on all.
NB: When I put the following line into comment, then no error shows up in Acrobat Reader, so the problem must be somewhere within drawImage.
// content.drawImage (pdImage, 50.0f, 350.0f, 100.0f, 25.0f);
and the rest of the code seems to be fine.
Also, I've just tried starting with an empty PDF and not loading an already generated one.
PDDocument document = new PDDocument ();
PDPage page = new PDPage ();
document.addPage (page);
[...]
The problem here is still the same, so the issue does not depend on the underlying PDF.
It is a bug since 2.0.12 (wrong alternate colorspace for gray images created with the LosslessFactory) that has been fixed in PDFBOX-4607 and will be in release 2.0.17. Display works for all viewers I have tested except Adobe Reader, despite that the alternate colorspace shouldn't be used when an ICC colorspace is available. Here's some code to fix PDFs (this assumes that images are only on top level of a page, i.e. images in other structures are not considered)
for (PDPage page : doc.getPages())
{
PDResources resources = page.getResources();
if (resources == null)
{
continue;
}
for (COSName name : resources.getXObjectNames())
{
PDXObject xObject = resources.getXObject(name);
if (xObject instanceof PDImageXObject)
{
PDImageXObject img = (PDImageXObject) xObject;
if (img.getColorSpace() instanceof PDICCBased)
{
PDICCBased icc = (PDICCBased) img.getColorSpace();
if (icc.getNumberOfComponents() == 1 && PDDeviceRGB.INSTANCE.equals(icc.getAlternateColorSpace()))
{
List<PDColorSpace> list = new ArrayList<>();
list.add(PDDeviceGray.INSTANCE);
icc.setAlternateColorSpaces(list);
}
}
}
}
}
Im trying to add a image from a URL address to my pdf. The code is:
Image image=Image.getInstance("http://www.google.com/intl/en_ALL/images/logos/images_logo_lg.gif");
image.scaleToFit((float)200.0, (float)49.0);
paragraph.add(image);
But it does not work. What can be wrong?
This is a known issue when loading .gif from a remote location with iText.
A fix for this would be to download the .gif with Java (not via the getInstance method of iText's Image class) and to use the downloaded bytes in the getInstance method of the Image class.
Edit:
I went ahead and fixed remote gif loading in iText, it is included from iText 5.4.1 and later.
Adding Image into Itext PDF is not possible through URL .
Only way to add image in PDF is download all images in to local directory and apply below code
String photoPath = Environment.getExternalStorageDirectory() + "/abc.png";
BitmapFactory.Options options = new BitmapFactory.Options();
options.inSampleSize = 8;
final Bitmap b = BitmapFactory.decodeFile(photoPath, options);
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Bitmap.createScaledBitmap(b, 10, 10, false);
b.compress(Bitmap.CompressFormat.PNG, 30, stream);
Image img = null;
byte[] byteArray = stream.toByteArray();
try {
img = Image.getInstance(byteArray);
} catch (BadElementException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
The way you have used to add images to IText PDF is the way that is used for adding local files, not URLs.
For URLs, this way will solve the problem.
String imageUrl = "http://www.google.com/intl/en_ALL/"
+ "images/logos/images_logo_lg.gif";
Image image = Image.getInstance(new URL(imageUrl));
You may then proceed to add this image to some previously open document, using document.add(image).
For further reference, please visit the [Java IText: Image docs].