Itext rectangle will not bleed to edge of page - java

I am attempting to modify the background-color of a single page of a multi-page PDF document created using iText.
The easiest way to do this appeared to be by creating a Rectangle the entire size of the page, with the specified background color, and applying it to the page in question using the PdfContentByte utility. (having explored using the Document API, this seemed not to be the best option, since this applied the styling to ALL pages in the document, which I did not want).
When run, on close inspection, I can see that there is a single pixel along the upper, right and bottom margins, which remains white, the rest of the page being the correct color. I have played with the rectangle to ensure no margins were created, but to no avail. Find the code I am using below.
Rectangle r = new Rectangle(0, 0, helper.getPageWidth(), helper.getPageHeight());
r.setBackgroundColor(Constants.GREEN);
PdfContentByte cb = helper.getWriter().getDirectContent();
cb.rectangle(r);
cb.setColorFill(Constants.GREEN);
cb.setColorStroke(Constants.GREEN);
cb.fillStroke();
It seems whatever I try, I cannot get rid of the single white pixel row along these 3 sides of the page. Does anyone have any idea how to bleed to the VERY edge of an iText page?

First:Please mention the itext version you are using.I'm currently used your code snippet and made some changes and it work out well.May be full code snippet will help me to find out whats wrong in your code.
(prime suspect to me this line Rectangle r = new Rectangle(0,0,helper.getPageWidth(),helper.getPageHeight()))
I've attached the output and the code i used.
package com.pra.itext;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Rectangle;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;
import java.awt.Color;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
/**
*
* #author Prajit
*/
public class ItextRect {
public static void main(String[] args) {
PdfReader rdrPdf = null;
PdfStamper stmprPdf = null;
try {
rdrPdf = new PdfReader("E:/Head.First.Servlets&Jsp.pdf");
stmprPdf = new PdfStamper(rdrPdf, new FileOutputStream(new File("D:/Example.pdf")));
for (int pgCnt = 1; pgCnt <= rdrPdf.getNumberOfPages(); pgCnt++) {
if (pgCnt == 1) {
PdfContentByte pdfCntntByt = stmprPdf.getUnderContent(pgCnt);
Rectangle r = new Rectangle(rdrPdf.getPageSize(pgCnt));
r.setBackgroundColor(Color.red);
pdfCntntByt.rectangle(r);
pdfCntntByt.setColorFill(Color.red);
pdfCntntByt.setColorStroke(Color.red);
pdfCntntByt.fillStroke();
}
}
stmprPdf.close();
rdrPdf.close();
} catch (DocumentException de) {
System.err.println(de.getMessage());
} catch (IOException ioe) {
System.err.println(ioe.getMessage());
}
}
}

Related

How to convert a PDF to a JSON/EXCEL/WORD file?

I need to get data from the pdf file with its header for further comparing with DB data
I tried to use the pdfbox , google vision ocr , itext, but all libraries gave me a row without structure and headers.
Example: Date\nNumber\nStatus\n12\12\2020\n442334\delivered
I will trying convert pdf to excel/word and get data from them, but for this realisation i need reading pdf and write data in excel/word
How can I get data with headers?
"Date\nNumber\nStatus\n12/12/2020\n442334\ndelivered" looks structured enough to me. You could just split it at the "\n"s. That would require some knowledge of the table structure, though.
I've made good experience with Google Vision OCR. How are you calling it?
I not found answer on my question.
I'm use this code for my task :
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import java.awt.*;
import java.io.File;
import java.io.IOException;
public class ExtractTextByArea {
public String getTextFromCoordinate(String filepath,int x,int y,int width,int height) {
String result = "";
try (PDDocument document = PDDocument.load(new File(filepath))) {
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
// Rectangle rect = new Rectangle(260, 35, 70, 10);
Rectangle rect = new Rectangle(x,y,width,height);
stripper.addRegion("class1", rect);
PDPage firstPage = document.getPage(0);
stripper.extractRegions( firstPage );
// System.out.println("Text in the area:" + rect);
result = stripper.getTextForRegion("class1");
}
} catch (IOException e){
System.err.println("Exception while trying to read pdf document - " + e);
}
return result;
}
}

GrayScale Images rendering to dark area when written using JAI / ImageIO

Can anyone explain why this happens. I read an image and render it into an output writer. If it is a color file (or black and white), it renders fine. However, if the source image is grayscale, all I get is a black box.
Sample files available at https://www.dropbox.com/sh/kyfsh5curobwxrw/AACfWr1NhX8lPUZpzVGWIPQia?dl=0
My pom plugin dependancy snippets follow.
<dependency>
<groupId>javax.media</groupId>
<artifactId>jai_core</artifactId>
<version>1.1.3</version>
</dependency>
<dependency>
<groupId>com.sun.media</groupId>
<artifactId>jai_imageio</artifactId>
<version>1.1</version>
</dependency>
A test program. I understand that this bit of code in itself is really of no value, but in reality it is part of a larger suite of operations. This code represents my efforts to narrow down the issue to a small piece of code.
import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.ImageWriter;
import javax.imageio.stream.ImageInputStream;
import javax.imageio.stream.ImageOutputStream;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
public class GrayScaleImaging {
public static void main(String[] args)
throws IOException {
//works
// final File inputFile = new File("/home/vinayb/Downloads/page1_color.tif");
// final File outputFile = new File("/home/vinayb/Downloads/page1_color_mod.tif");
//doesn't work
final File inputFile = new File("/home/vinayb/Downloads/page1_grayscale.tif");
final File outputFile = new File("/home/vinayb/Downloads/page1_grayscale_mod.tif");
if (outputFile.exists()) {
outputFile.delete();
}
ImageReader imageReader = null;
ImageWriter imageWriter = null;
Graphics2D g = null;
try (final ImageInputStream imageInputStream = ImageIO.createImageInputStream(inputFile);
final ImageOutputStream imageOutputStream = ImageIO.createImageOutputStream(outputFile);) {
//setup reader
imageReader = ImageIO.getImageReaders(imageInputStream).next();
imageReader.setInput(imageInputStream);
//read image
final BufferedImage initialImage = imageReader.read(0);
//prepare graphics for the output
final BufferedImage finalImage = new BufferedImage(initialImage.getWidth(), initialImage.getHeight(), imageType(initialImage));
g = finalImage.createGraphics();
//do something to the image
//doSomething(g)
//draw image
g.drawImage(initialImage, 0, 0, initialImage.getWidth(), initialImage.getHeight(), null);
//setup writer based on reader
imageWriter = ImageIO.getImageWriter(imageReader);
imageWriter.setOutput(imageOutputStream);
//write
imageWriter.write(null, new IIOImage(initialImage, null, imageReader.getImageMetadata(0)), imageWriter.getDefaultWriteParam());
} finally {
//cleanup
if (imageWriter != null) {
imageWriter.dispose();
}
if (imageReader != null) {
imageReader.dispose();
}
if (g != null) {
g.dispose();
}
}
}
private static int imageType(BufferedImage bufferedImage) {
return bufferedImage.getType() == 0 ? BufferedImage.TYPE_INT_ARGB : bufferedImage.getType();
}
}
Okay, I have now fixed my decoder, thanks for the sample file! :-)
Anyway, after some research, I have come to the conclusion that the problem is definitively the sample file, not your code nor the library you are using.
The issue with this file is that the TIFF metadata contains PhotometricInterpretation == 3/Palette and ColorMap tags. Ie. the image uses indexed color model/palette. If the image is read as it should according to (my understanding of) the spec, using the supplied color map, the image comes out all black. If instead I ignore this, and rather read it as gray scale (assuming PhotometricInterpretation == 1/BlackIsZero), it comes out as black text on white (light gray) background.
Edit:
A better explanation, is that the values in the color map are all 8 bit quantities (using the low 8 bits of each color entry) instead of using the full 16 bit as they should... If I detect this while reading and creating a palette using only the low 8 bits, the image comes out as intended (as in DropBox). This is still a bad image according to the spec, but detectable.

Writing image into pdf file in java

I'm writing a code to convert Microsoft power-point(ppt) slides into images and to write the generated images into pdf file. Following code generates and writes the images into pdf file but the problem i'm facing is, when i write image into pdf file it's size is exceeding the pdf page size and i can view only 75% of the image rest is invisible. One more thing to notice here is, written images in pdf file look like zoomed or expanded. Take a look at the following snippet of code:
for (int i = 0; i < slide.length; i++) {
BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
Graphics2D graphics = img.createGraphics();
graphics.setPaint(Color.white);
graphics.fill(new Rectangle(0, 0, pgsize.width, pgsize.height));
slide[i].draw(graphics);
fileName="C:/DATASTORE/slide-"+(i+1)+".png";
FileOutputStream out = new FileOutputStream(fileName);
javax.imageio.ImageIO.write(img, "png", out);
out.flush();
out.close();
com.lowagie.text.Image image =com.lowagie.text.Image.getInstance(fileName);
image.setWidthPercentage(40.0f);
doc.add((image));
}
doc.close();
} catch(DocumentException de) {
System.err.println(de.getMessage());
}
If anybody knows the solution please help me to rectify. Thank you.
Here is the code it accomplishes the task i wished. Now i'm getting the desired results after following Bruno Lowagie recommendations.
But, as Bruno Lowagie pointed out earlier, their is a problem in generated png image. The generated png image is not correct because shape or image in the slide overlaps with the texts of the slide. Can you help me to identify and rectify the error?
import java.awt.Color;
import java.awt.Dimension;
import java.awt.Graphics2D;
import java.awt.Rectangle;
import com.itextpdf.text.Image;
import java.awt.image.BufferedImage;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.usermodel.SlideShow;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
public class ConvertSlidesIntoImages {
public static void main(String[] args){
try {
FileInputStream is = new FileInputStream("C:/DATASTORE/testPPT.ppt");
SlideShow ppt = new SlideShow(is);
is.close();
String fileName;
Dimension pgsize = ppt.getPageSize();
Slide[] slide = ppt.getSlides();
Document doc=new Document();
PdfWriter.getInstance(doc, new FileOutputStream("c:/DATASTORE/convertPPTSlidesIntoPDFImages.pdf"));
doc.open();
for (int i = 0; i < slide.length; i++) {
BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB);
Graphics2D graphics = img.createGraphics();
graphics.setPaint(Color.white);
graphics.fill(new Rectangle(0, 0, pgsize.width, pgsize.height));
slide[i].draw(graphics);
fileName="C:/DATASTORE/slide-"+(i+1)+".png";
FileOutputStream out = new FileOutputStream(fileName);
javax.imageio.ImageIO.write(img, "png", out);
out.flush();
out.close();
com.itextpdf.text.Image image =com.itextpdf.text.Image.getInstance(fileName);
doc.setPageSize(new com.itextpdf.text.Rectangle(image.getScaledWidth(), image.getScaledHeight()));
doc.newPage();
image.setAbsolutePosition(0, 0);
doc.add(image);
}
doc.close();
}catch(DocumentException de) {
System.err.println(de.getMessage());
}
catch(Exception ex) {
ex.printStackTrace();
}
}
Thank you
First this: If the png stored as "C:/DATASTORE/slide-"+(i+1)+".png" isn't correct, the slide in the PDF won't be correct either.
And this: Your code snippet doesn't show us how you create the Document object. By default, the page size is A4 in portrait. It goes without saying that images that are bigger than 595 x 842 don't fit that page.
Now the answer: There are two ways to solve this.
Either you change the size of the image (not with setWidthPercentage() unless you've calculated the actual percentage) and you add it a the position (0, 0) so that it doesn't take into account the margins. For instance:
image.scaleToFit(595, 842);
image.setAbsolutePosition(0, 0);
doc.add(image);
doc.newPage();
A better solution would be to adapt the size of the page to the size of the image.
Document doc = new Document(new Rectangle(image.getScaledWidth(), image.getScaledHeight()));
// create a writer, open the document
image.setAbsolutePosition(0, 0);
doc.add(image);
doc.newPage();
If the size of the images varies, you can change the page size while adding images like this:
doc.setPageSize(new Rectangle(image.getScaledWidth(), image.getScaledHeight()));
doc.newPage();
image.setAbsolutePosition(0, 0);
doc.add(image);
It is important to understand that the new page size will only come into effect after doc.newPage();
CAVEAT 1:
If your PDF only holds the last slide, you're probably putting all the slides on the same page, and the last slide covers them all. You need to invoke the newPage() method each time you add an image (as done in a code snippet in my answer).
CAVEAT 2:
Your allegation is wrong. According to the API docs, there is a method setPageSize(Rectangle rect), maybe you're using the wrong Rectangle class. If you didn't follow my advice (which IMHO wouldn't be wise), you're probably looking for com.lowagie.text.Rectangle instead of java.awt.Rectangle.
CAVEAT 3:
This is similar to CAVEAT 2, there are indeed no such methods in the class java.awt.Image, but as documented in the API docs, the class com.itextpdf.text.Image has a getScaleWidth() method and a getScaledHeight() method.

Programmatically Unhide Slides in PPT File

I'm writing a little tool that converts slides in PPT Files to pngs, the problem I'm having is with hidden slides. How can I change a slide to be visible in java? Im currently using Apache POI for conversion to PNGs, although this doesn't work for clipart so I am tempted with exporting it to a PDF using unoconv first, then minipulating that. But doing it like this doesn't take in to account all the hidden slides. So how could I programmatically change the hidden slides to be visible?
This is kind of a hack and has been tested only with a PPT from Libre Office with POI 3.9 / POI-Scratchpad 3.8.
The spec ([MS-PPT].pdf / version 3.0 / page 201) says, that Bit 3 (fHidden) of Byte 18 specifies whether the corresponding slide is hidden and is not displayed during the slide show
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.lang.reflect.Field;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.record.Record;
import org.apache.poi.hslf.record.RecordTypes;
import org.apache.poi.hslf.record.UnknownRecordPlaceholder;
import org.apache.poi.hslf.usermodel.SlideShow;
public class UnhidePpt {
public static void main(String[] args) throws Exception {
FileInputStream fis = new FileInputStream("hiddenslide.ppt");
SlideShow ppt = new SlideShow(fis);
fis.close();
Field f = UnknownRecordPlaceholder.class.getDeclaredField("_contents");
f.setAccessible(true);
for (Slide slide : ppt.getSlides()) {
for (Record record : slide.getSlideRecord().getChildRecords()) {
if (record instanceof UnknownRecordPlaceholder
&& record.getRecordType() == RecordTypes.SSSlideInfoAtom.typeID) {
UnknownRecordPlaceholder urp = (UnknownRecordPlaceholder)record;
byte contents[] = (byte[])f.get(urp);
contents[18] &= (255-4);
f.set(urp, contents);
}
}
}
FileOutputStream fos = new FileOutputStream("unhidden.ppt");
ppt.write(fos);
fos.close();
}
}

Creating PDF from TIFF image using iText

I'm currently generating PDF files from TIFF images using iText.
Basically the procedure is as follows:
1. Read the TIFF file.
2. For each "page" of the TIFF, instantiate an Image object and write that to a Document instance, which is the PDF file.
I'm having a hard time understanding how to add those images to the PDF keeping the original resolution.
I've tried to scale the Image to the dimensions in pixels of the original image of the TIFF, for instance:
// Pixel Dimensions 1728 × 2156 pixels
// Resolution 204 × 196 ppi
RandomAccessFileOrArray tiff = new RandomAccessFileOrArray("/path/to/tiff/file");
Document pdf = new Document(PageSize.LETTER);
Image temp = TiffImage.getTiffImage(tiff, page);
temp.scaleAbsolute(1728f, 2156f);
pdf.add(temp);
I would really appreciate if someone can shed some light on this. Perhaps I'm missing the functionality of the Image class methods...
Thanks in advance!
I think if you scale the image then you can not retain the original resolution (please correct me if I am wrong :)).
What you can try doing is to creat a PDF document with different sized pages (if images are of different resolution in the tif image).
Try the following code. It sets the size of PDF page equal to that of image file and then create that PDF page. the PDF page size varies according to the image size so the resolution is maintained :)
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Tiff2Pdf {
/**
* #param args
* #throws DocumentException
* #throws IOException
*/
public static void main(String[] args) throws DocumentException,
IOException {
String imgeFilename = "/home/saurabh/Downloads/image.tif";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(
document,
new FileOutputStream("/home/saurabh/Desktop/out"
+ Math.random() + ".pdf"));
writer.setStrictImageSequence(true);
document.open();
document.add(new Paragraph("Multipages tiff file"));
Image image;
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imgeFilename);
int pages = TiffImage.getNumberOfPages(ra);
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(ra, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.add(image);
document.newPage();
}
document.close();
}
}
I've found that this line doesn't work well:
document.setPageSize(pageSize);
If your TIFF files only contain one image then you're better off using this instead:
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imageFilePath);
Image image = TiffImage.getTiffImage(ra, 1);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
Document document = new Document(pageSize);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputFileName));
writer.setStrictImageSequence(true);
document.open();
document.add(image);
document.newPage();
document.close();
This will result in a page size that fits the image size exactly, so no scaling is required.
Another example non-deprecated up to iText 5.5 with the first page issue fixed. I'm using 5.5.11 Itext.
import java.io.FileOutputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.io.FileChannelRandomAccessSource;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Test1 {
public static void main(String[] args) throws Exception {
RandomAccessFile aFile = new RandomAccessFile("/myfolder/origin.tif", "r");
FileChannel inChannel = aFile.getChannel();
FileChannelRandomAccessSource fcra = new FileChannelRandomAccessSource(inChannel);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("/myfolder/destination.pdf"));
document.open();
RandomAccessFileOrArray rafa = new RandomAccessFileOrArray(fcra);
int pages = TiffImage.getNumberOfPages(rafa);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(rafa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
aFile.close();
}
}

Categories

Resources