Edit images in PDF file using COSStream object - java

I try to edit images in PDF file using PDFBox library. How I have example working only for jpeg images. ImageIO.read() fails to decode images with 'png' suffix. Here is code example. So my question: how to do the same for all types of images in PDF documents? Can I still use ImageIO for it or need another approach?
public static void main(String[] args) throws Exception {
PDDocument doc = PDDocument.load("docs/input1.pdf");
// Get all images from first page
Map<String, PDXObjectImage> pageImages = ((PDPage) doc.getDocumentCatalog().getAllPages().get(0)).getResources().getImages();
if (pageImages != null)
{
// iterate by images
Iterator<String> imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext())
{
String key = imageIter.next();
PDXObjectImage image = pageImages.get(key); // get page image object
String suffix = image.getSuffix(); // get image suffix
String imageName = key+'.'+suffix; // compose image name
System.out.print("process "+imageName+"... ");
COSStream s = image.getCOSStream(); // get COSStream to manipulate
BufferedImage img = ImageIO.read(s.getFilteredStream()); // get BufferedImage to edit
if(img == null)
{
System.out.println("Can't decode");
}
else
{
paint(img.createGraphics()); // draw on it
ImageIO.write(img, suffix, new File("out/"+imageName)); // write file to check result...
// encode image back to COSStream
OutputStream out = s.createFilteredStream();
ImageIO.write(img, suffix, out);
out.close();
System.out.println("done");
}
}
}
doc.save("out/output1.pdf"); // save document
}
/**
* Draw red rectangular to test
* #param g graphics
*/
public static void paint(Graphics2D g) {
int xpoints[] = {25, 245, 245, 25};
int ypoints[] = {25, 25, 545, 545};
g.setColor(Color.RED);
g.fillPolygon(xpoints, ypoints, 4);
}

It's better to work not with stream of PDXObjectImage but create new instance of PDXObjectImage and replace it in resources collection. It's more generic and universal way. Use getRGBImage() to convert PDXObjectImage to BufferedImage and constructor (PDPixelMap, PDJpeg etc) to convert edited result back to PDXObjectImage. Note you still have problems with JBIG2 and Jpeg2000 images due to bugs. Here is code example I use to find and convert all images in document:
// Recursive resource processor
// Here can be images inside in PDXObjectForm objects
protected static void processResources(PDResources resources, PDDocument doc, String filename) throws IllegalArgumentException, SecurityException, IOException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JBIG2Exception, ColorSpaceException, ICCProfileException
{
if(resources == null) return;
Map<String, PDXObject> xObjects = resources.getXObjects();
if (xObjects == null) return;
// iterate by images
Iterator<String> imageIter = xObjects.keySet().iterator();
while (imageIter.hasNext())
{
String key = imageIter.next();
PDXObject o = xObjects.get(key);
if(o instanceof PDXObjectImage)
xObjects.put(key, processImage((PDXObjectImage) o /*, some additional parms... */));
if(o instanceof PDXObjectForm)
processResources(((PDXObjectForm) o).getResources(), doc, filename);
}
resources.setXObjects(xObjects);
}
Note resources.setXObjects() call at the end - without it changes you made in collection obtained by resources.getXObjects() will not be written back to document.

Related

Compressing PDF with Java library

Im building a chat platform and Im implementing upload-attachments.
For that I need to use a couple of libraries to compress the files (pdf, image, video).
Im using lambda to do this.
Image compression is working fine.
Video compression Im working on it right now, to compress (mp4, avi, x264).
The difficult part is the PDF.
In the PDF there is a case that it contains images inside, or something else... To use this library, Im using this
` <dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.0-RC1</version>
</dependency>`
And the code for the compression is this
public class PdfCompressor implements MediaCompressor {
public byte[] compress(MediaData media) throws IOException {
byte[] data = media.getData();
if (data == null) {
throw new IllegalArgumentException("Data must not be null");
}
try (PDDocument document = Loader.loadPDF(new ByteArrayInputStream(data));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
for (PDPage page : document.getPages()) {
PDResources resources = page.getResources();
if (resources == null) {
continue;
}
for (COSName xObjectName : resources.getXObjectNames()) {
PDXObject xObject = resources.getXObject(xObjectName);
if (xObject instanceof PDImageXObject) {
compressImage((PDImageXObject) xObject, document, outputStream);
}
}
}
return outputStream.toByteArray();
}
}
private void compressImage(PDImageXObject imageObject, PDDocument document, ByteArrayOutputStream outputStream) throws IOException {
// Only compress supported image formats
String format = imageObject.getSuffix();
if (format == null || !(format.equals("jpg") || format.equals("jpeg") || format.equals("png") || format.equals("bmp"))) {
outputStream.write(imageObject.getStream().toByteArray());
return;
}
// Compress the image
BufferedImage image = imageObject.getImage();
ByteArrayOutputStream compressedStream = new ByteArrayOutputStream();
PDImageXObject compressedImage = LosslessFactory.createFromImage(document, image);
ImageIO.write(compressedImage.getImage(), format, compressedStream);
byte[] compressedImageData = compressedStream.toByteArray();
// Check if compressed size is less than original size
if (compressedImageData.length < imageObject.getStream().getLength()) {
outputStream.write(compressedImageData);
} else {
outputStream.write(imageObject.getStream().toByteArray());
}
}
}
But the problem is that the code is increasing the size of the PDF, and not only that, it is also damaging the file, which means that I cant open it after that.
Lambda, trying to upload it to S3 bucket.

Java Apache POI: insert an image "infront the text"

I have a placeholder image in my docx file and I want to replace it with new image. The problem is - the placeholder image has an attribute "in front of text", but the new image has not. As a result the alignment breaks. Here is my code snippet and the docx with placeholder and the resulting docx.
.......
replaceImage(doc, "Рисунок 1", qr, 50, 50);
ByteArrayOutputStream out = new ByteArrayOutputStream();
doc.write(out);
out.close();
return out.toByteArray();
}
}
public XWPFDocument replaceImage(XWPFDocument document, String imageOldName, byte[] newImage, int newImageWidth, int newImageHeight) throws Exception {
try {
int imageParagraphPos = -1;
XWPFParagraph imageParagraph = null;
List<IBodyElement> documentElements = document.getBodyElements();
for (IBodyElement documentElement : documentElements) {
imageParagraphPos++;
if (documentElement instanceof XWPFParagraph) {
imageParagraph = (XWPFParagraph) documentElement;
if (imageParagraph.getCTP() != null && imageParagraph.getCTP().toString().trim().contains(imageOldName)) {
break;
}
}
}
if (imageParagraph == null) {
throw new Exception("Unable to replace image data due to the exception:\n"
+ "'" + imageOldName + "' not found in in document.");
}
ParagraphAlignment oldImageAlignment = imageParagraph.getAlignment();
// remove old image
boolean isDeleted = document.removeBodyElement(imageParagraphPos);
// now add new image
XWPFParagraph newImageParagraph = document.createParagraph();
XWPFRun newImageRun = newImageParagraph.createRun();
newImageParagraph.setAlignment(oldImageAlignment);
try (InputStream is = new ByteArrayInputStream(newImage)) {
newImageRun.addPicture(is, XWPFDocument.PICTURE_TYPE_JPEG, "qr",
Units.toEMU(newImageWidth), Units.toEMU(newImageHeight));
}
// set new image at the old image position
document.setParagraph(newImageParagraph, imageParagraphPos);
// NOW REMOVE REDUNDANT IMAGE FORM THE END OF DOCUMENT
document.removeBodyElement(document.getBodyElements().size() - 1);
return document;
} catch (Exception e) {
throw new Exception("Unable to replace image '" + imageOldName + "' due to the exception:\n" + e);
}
}
The image with placeholder:
enter image description here
The resulting image:
enter image description here
To replace picture templates in Microsoft Word there is no need to delete them.
The storage is as so:
The embedded media is stored as binary file. This is the picture data (XWPFPictureData). In the document a picture element (XWPFPicture) links to that picture data.
The XWPFPicture has settings for position, size and text flow. These dont need to be changed.
The changing is needed in XWPFPictureData. There one can replace the old binary content with the new.
So the need is to find the XWPFPicture in the document. There is a non visual picture name stored while inserting the picture in the document. So if one knows that name, then this could be a criteriea to find the picture.
If found one can get the XWPFPictureData from found XWPFPicture. There is method XWPFPicture.getPictureDatato do so. Then one can replace the old binary content of XWPFPictureData with the new. XWPFPictureData is a package part. So it has PackagePart.getOutputStream to get an output stream to write to.
Following complete example shows that all.
The source.docx needs to have an embedded picture named "QRTemplate.jpg". This is the name of the source file used while inserting the picture into Word document using Word GUI. And there needs to be a file QR.jpg which contains the new content.
The result.docx then has all pictures named "QRTemplate.jpg" replaced with the content of the given file QR.jpg.
import java.io.FileInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class WordReplacePictureData {
static XWPFPicture getPictureByName(XWPFRun run, String pictureName) {
if (pictureName == null) return null;
for (XWPFPicture picture : run.getEmbeddedPictures()) {
String nonVisualPictureName = picture.getCTPicture().getNvPicPr().getCNvPr().getName();
if (pictureName.equals(nonVisualPictureName)) {
return picture;
}
}
return null;
}
static void replacePictureData(XWPFPictureData source, String pictureResultPath) {
try ( FileInputStream in = new FileInputStream(pictureResultPath);
OutputStream out = source.getPackagePart().getOutputStream();
) {
byte[] buffer = new byte[2048];
int length;
while ((length = in.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
static void replacePicture(XWPFRun run, String pictureName, String pictureResultPath) {
XWPFPicture picture = getPictureByName(run, pictureName);
if (picture != null) {
XWPFPictureData source = picture.getPictureData();
replacePictureData(source, pictureResultPath);
}
}
public static void main(String[] args) throws Exception {
String templatePath = "./source.docx";
String resultPath = "./result.docx";
String pictureTemplateName = "QRTemplate.jpg";
String pictureResultPath = "./QR.jpg";
try ( XWPFDocument document = new XWPFDocument(new FileInputStream(templatePath));
FileOutputStream out = new FileOutputStream(resultPath);
) {
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
for (XWPFRun run : paragraph.getRuns()) {
replacePicture(run, pictureTemplateName, pictureResultPath);
}
}
}
document.write(out);
}
}
}
I have a dirty workaround. Since the text block on the right side of the image is static, I replaced the text with screen-shot on the original docx. And now, when the placeholder image been substituted by the new image, everything is rendered as expected.

How do I know whether the image is rotated or inverted while extracting from pdf using PDFBox

Task: My task is to extract all images from pdf and save it to a local file.
Problem: When I am trying to extract images from pdf, I have seen that all images are not loading in properly.
Consider my pdf as 3 images. When I am trying to extract images I have seen that few images are saving in rotated, inverted or with some other rotation angles.
I am not sure what properties does that image consists of in pdf.
Question:
1. I want to know whether an image is rotated or not?
2. If Yes, then I want to know in which angle is that rotated. So that I can change that rotation and save it to my local.
Is there any method or property by which we can know the above properties?
Below is the code I have used:
#Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException {
String operation = operator.getName();
if ("Do".equals(operation)) {
COSName objectName = (COSName) operands.get(0);
PDXObject xobject = getResources().getXObject(objectName);
if (xobject instanceof PDImageXObject) {
PDImageXObject image = (PDImageXObject) xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
System.out.println(image.getMetadata());//output null
// same image to local
BufferedImage bImage = new BufferedImage(imageWidth, imageHeight, BufferedImage.TYPE_INT_ARGB);
bImage = image.getImage();
System.out.println(bImage.getPropertyNames());//output null
ImageIO.write(bImage, "PNG", new File("C:/PdfBox_Examples/" + "image_" + imageNumber + ".png"));
System.out.println("Image saved.");
imageNumber++;
} else if (xobject instanceof PDFormXObject) {
PDFormXObject form = (PDFormXObject) xobject;
showForm(form);
}
} else {
super.processOperator(operator, operands);
}
}

ImageIO write specific tiff

I try to convert (with ImageIO ->
https://github.com/haraldk/TwelveMonkeys) a image into a specific tiff, like imagemagick does. I have a input image and want to write a specific tiff with following:
PLANAR_CONFIGURATION = 1
SAMPLES_PER_PIXEL = 1
BITS_PER_SAMPLE = 1
Y_RESOLUTION = 196
X_RESOLUTION = 204
IMAGE_WIDTH = 1728
Any idea how to render the inputstream? Currently the image is just converted into tiff.
BufferedImage image = ImageIO.read(inputstream)
ImageIO.write( image, "tiff", outputstream );
As #fmw42 says, making the image 1-bit you have do yourself. The TIFFImageWriter plugin will write the image it is passed as-is. Fortunately, this is not difficult to do.
Here's an easy (but not very sophisticated) way to convert the image to binary:
private static BufferedImage toBinary(BufferedImage original) {
if (original.getType() == BufferedImage.TYPE_BYTE_BINARY) {
return original;
}
// Quick and unsophisticated way to convert to B/W binary, using default dither and threshold (fixed, 50% I think)
BufferedImage image = new BufferedImage(original.getWidth(), original.getHeight(), BufferedImage.TYPE_BYTE_BINARY);
Graphics2D g = image.createGraphics();
try {
g.setRenderingHint(RenderingHints.KEY_DITHERING, RenderingHints.VALUE_DITHER_ENABLE);
g.setComposite(AlphaComposite.Src);
g.drawImage(original, 0, 0, null);
}
finally {
g.dispose();
}
return image;
}
I'll leave it as an exercise to write more advanced solutions, using adaptive thresholding, error-diffusion dithering etc.
Now you can use the following code, and you're nearly there:
public static void main(String[] args) throws IOException {
BufferedImage original = ImageIO.read(new File(args[0]));
ImageIO.write(toBinary(original), "TIFF", new File("out.tif"));
}
Unfortunately, this will not set the X and Y Resolution tags. If you need that as well, you have to dig a little deeper into the ImageIO API, and figure out how to use the metadata to control the output. Note that only some of the values in the metadata may be set in this way. Other values will be computed from the image data passed in, and some may be filled in with default values by the writer.
You can use the following code (the toBinary method is the same as above):
public static void main(String[] args) throws IOException {
BufferedImage original = ImageIO.read(new File(args[0]));
BufferedImage image = toBinary(original);
ImageWriter writer = ImageIO.getImageWritersByFormatName("TIFF").next();
try (ImageOutputStream stream = ImageIO.createImageOutputStream(new File("out.tif"))) {
// You may use the param to control compression
ImageWriteParam param = writer.getDefaultWriteParam();
IIOMetadata metadata = writer.getDefaultImageMetadata(ImageTypeSpecifier.createFromRenderedImage(image), param);
Node root = metadata.getAsTree("com_sun_media_imageio_plugins_tiff_image_1.0"); // "javax_imageio_tiff_image_1.0" will work in later versions
Node ifd = root.getFirstChild();
// Add X and Y resolution tags
ifd.appendChild(createResTag("282", "XResolution", "204/1"));
ifd.appendChild(createResTag("283", "YResolution", "196/1"));
// Merge changes back to metadata
metadata.mergeTree("com_sun_media_imageio_plugins_tiff_image_1.0", root);
// Write full image, with metadata
writer.setOutput(stream);
writer.write(null, new IIOImage(image, null, metadata), param);
}
finally {
writer.dispose();
}
}
private static IIOMetadataNode createResTag(String tagNumber, String tagName, String tagValue) {
IIOMetadataNode res = new IIOMetadataNode("TIFFField");
res.setAttribute("number", tagNumber);
res.setAttribute("name", tagName); // Tag name is optional
IIOMetadataNode value = new IIOMetadataNode("TIFFRational");
value.setAttribute("value", tagValue);
IIOMetadataNode rationals = new IIOMetadataNode("TIFFRationals");
rationals.appendChild(value);
res.appendChild(rationals);
return res;
}
PS: The TwelveMonkeys TIFF plugin currently don't write PlanarConfiguration: 1, as this is the default value, and there's no way to force it. But it should not matter, as all compliant TIFF software must use the default value in this case.

Editing PDF text using Java

Is there a way I can edit a PDF from Java?
I have a PDF document which contains placeholders for text that I need to be replaced using Java, but all the libraries that I saw created PDF from scratch and small editing functionality.
Is there anyway I can edit a PDF or is this impossible?
You can do it with iText. I tested it with following code. It adds a chunk of text and a red circle over each page of an existing PDF.
/* requires itextpdf-5.1.2.jar or similar */
import java.io.*;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.*;
public class AddContentToPDF {
public static void main(String[] args) throws IOException, DocumentException {
/* example inspired from "iText in action" (2006), chapter 2 */
PdfReader reader = new PdfReader("C:/temp/Bubi.pdf"); // input PDF
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream("C:/temp/Bubi_modified.pdf")); // output PDF
BaseFont bf = BaseFont.createFont(
BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED); // set font
//loop on pages (1-based)
for (int i=1; i<=reader.getNumberOfPages(); i++){
// get object for writing over the existing content;
// you can also use getUnderContent for writing in the bottom layer
PdfContentByte over = stamper.getOverContent(i);
// write text
over.beginText();
over.setFontAndSize(bf, 10); // set font and size
over.setTextMatrix(107, 740); // set x,y position (0,0 is at the bottom left)
over.showText("I can write at page " + i); // set text
over.endText();
// draw a red circle
over.setRGBColorStroke(0xFF, 0x00, 0x00);
over.setLineWidth(5f);
over.ellipse(250, 450, 350, 550);
over.stroke();
}
stamper.close();
}
}
I modified the code found a bit and it was working as follows
public class Principal {
public static final String SRC = "C:/tmp/244558.pdf";
public static final String DEST = "C:/tmp/244558-2.pdf";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new Principal().manipulatePdf(SRC, DEST);
}
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("NULA", "Nulo").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
}
Take a look at iText and this sample code
Take a look at aspose and this sample code
I've done this using LibreOffice Draw.
You start by manually opening a pdf in Draw, checking that it renders OK, and saving it as a Draw .odg file.
That's a zipped xml file, so you can modify it in code to find and replace the placeholders.
Next (from code) you use a command line call to Draw to generate the pdf.
Success!
The main issue is that Draw doesn't handle fonts embedded in a pdf. If the font isn't also installed on your system - then it will render oddly, as Draw will replace it with a standard one that inevitably has different sizing.
If this approach is of interest, I'll put together some shareable code.

Categories

Resources