Extracting Images with PDFBox - Images being split into strips

Extracting Images with PDFBox - Images being split into strips - java

I'm using PDFBox to extract the images out of a PDF. My code is based on the example provided by PDFBox, here.
However, some images are being extracted as a number of horizontal 'strips', i.e. a single image in the PDF is extracted as 5 (for example) images that need stitching back together to form the whole image.
Why is this happening, and how do I stop it happening so I can extract the full single image?

Related

Itext PDF: adding image after text of dynamic length

I am using Itext pdf java library to generate pdf document.
Need to add an image to pdf page after some text snippet. As per current design, the first text will be added to pdf and later on based on the PdfPageEventHelper images will be added.
Now the problem is text snippet can be of any length. So while adding the image I need to provide x and y coordinates. How can I calculate that based on the length of text snippet on that page so that text and image don't overlap?

Embed Graphics2D into itext without absolute positioning

I'm creating PDFs with iText (LGPL) which include some Text and self-drawn (Graphics2D) images.
My current solution is to draw the images on a BufferedImage and then include it in the PDF, which has several drawbacks:
If printed, the images just look ugly, a way to circumvent this is to use larger images, and with 3000*3000 it looks ok. But this leads to the next problem: time. It takes several seconds to compress one image (I haven’t found a way to disable it, and files would be huge without compression).
PdfGraphics2d from iText looks good, but has one major drawback: it’s only able to draw to the background of the PDF, and there seems to be no way to wrap it in some kind of element.
Is there a way to draw on a PDF without having to use absolute positions? I’m using Graphics2d because it’s used also to provide a preview in the UI.

You can wrap a PdfTemplate inside an Image object without losing any of the vector image's quality. In most cases, you'll use the Image object to add raster images to a PDF document as an Image XObject. However, in this case, the PdfTemplate will be added as a Form XObject using its original vector data. Another situation when this happens, is when you add a WMF file; such as file is converted into PDF syntax automatically.

Java: Find image within image?

So, say I have two images, one which is a .bmp of some text and another which is a bufferedImage, how would I go about finding if the .bmp is inside the bufferedImage?
Im really lost on how to find an image within an image, a color is easier as its just one thing to search for but an image seems much harder...

One Solution to this Problem is "Template Matching".
This means sliding your Template (the image you want to find) over the Image (you want to search in) and at every Position compare the similiarity of all Pixels.
The Position of your Template in the Image is at the Maximum this procedure returned.
As suggested in the comments you can use OpenCV for this Task which support Template Matching.

How get scale and position parameter for pircture from XSLFPictureShape in pptx?

I use the library Apache POI to parse .PPTX files to their constituents - the text, tables, images and so on. In working with this tool I constantly confronted with the fact that I can not get complete information about the elements of slide. From XSLFPictureShape can I get information about the picture properties that is placed in the conteiner(such as offset, scale, crop etc) - not only get the type of data and file? And I everytime get original pictures even it was croped, scaled and other efects that was aplied to it - that is why I asked for parameters upper.

Browsing PDF page content. Problem with color space and DPI

I am writing a program that would validate PDF file. I am using iText java library to get content of a file, but I have some problems with parsing it. I need to get info about color space and DPI of each image. How can I get info about position and dimensions of image in PDF? I tried to browse each XObject of PDF but I stuck, I cannot find any information about width and height of file in PDF.
Are there any other libraries which can help me?
Thank You for all answers and tips.

The image object in the PDF file stores only the Width and the Height of the image in pixels. In order to know the position and size of the image on the page, in PDF points, you have to execute the page content stream, to create a virtual rendering. The image is painted on the page using the 'Do' operator and its position and size are given by the current transformation matrix that is in place when the 'Do' operator is executed.
The DPI for a specific drawing instance is computed as 72*imageSizeInPixels/imageSizeInPoints, imageSizeInPoints being computed as described above.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting Images with PDFBox - Images being split into strips - java

Related

Itext PDF: adding image after text of dynamic length

Embed Graphics2D into itext without absolute positioning

Java: Find image within image?

How get scale and position parameter for pircture from XSLFPictureShape in pptx?

Browsing PDF page content. Problem with color space and DPI

Categories

Resources