Java - extract text from pdf from selected area to txt

Java - extract text from pdf from selected area to txt - java

The idea is next,
user selects a pdf file, and then this file converted into an image and such an image is displayed in the application.
In the image the user can choose positions that wants to read from a pdf file, and when the finish with selection position in the background program reads the original pdf and text stored in a txt file.
It is important that the resulting image from pdf file is the same size as himself pdf file
The next code convert pdf to image. I use pdfrenderer-0.9.1.jar
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import javax.imageio.ImageIO;
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;
public class Pdf2Image {
public static void main(String[] args) {
File file = new File("E:\\invoice-template-1.pdf");
RandomAccessFile raf;
try {
raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
// draw the first page to an image
int num=pdffile.getNumPages();
for(int i=0;i<num;i++)
{
PDFPage page = pdffile.getPage(i);
//get the width and height for the doc at the default zoom
int width=(int)page.getBBox().getWidth();
int height=(int)page.getBBox().getHeight();
Rectangle rect = new Rectangle(0,0,width,height);
int rotation=page.getRotation();
Rectangle rect1=rect;
if(rotation==90 || rotation==270)
rect1=new Rectangle(0,0,rect.height,rect.width);
//generate the image
BufferedImage img = (BufferedImage)page.getImage(
rect.width, rect.height, //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img, "png", new File("E:/invoice-template-"+i+".png"));
}
}
catch (FileNotFoundException e1) {
System.err.println(e1.getLocalizedMessage());
} catch (IOException e) {
System.err.println(e.getLocalizedMessage());
}
}
}
Then the image is displayed to the user in JavaFX application in ImageView components.
Can you help me to get the exact position of the mouse, the mouse when the user selects a portion of the image from which you want to read the text in the pdf file?
With this code I read pdf file and get text from the set position, only I must to manually input position:( . I use pdfbox-1.3.1.jar.
I would like to position the client chooses to keep a picture in the list and read the text from the pdf file with all of these positions.
File file = new File("E:/invoice-template-1.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
Rectangle rect1 = new Rectangle(38, 275, 15, 100);
Rectangle rect2 = new Rectangle(54, 275, 40, 100);
stripper.addRegion("row1column1", rect1);
stripper.addRegion("row1column2", rect2);
List allPages = document.getDocumentCatalog().getAllPages();
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
int j = 0;
for (PDPage page : pages) {
stripper.extractRegions(page);
stripper.setSortByPosition(true);
List<String> regions = stripper.getRegions();
for (String region : regions) {
String text = stripper.getTextForRegion(region);
System.out.println("Region: " + region + " on Page " + j);
System.out.println("\tText: \n" + text);
}
For example,
in the next invoice, I want to select the 4 positions to export the text, and when you select the picture, the dimensions of keeping in the list, then go through the list and from those positions export text from pdf file.

Related

how to insert every image in a new page to a ( word document )

i'm having some images, and i'm trying to insert each image to a new page in a ( word document ). my code works fine only for one image
i'm trying to write a program to insert the first image to the first page of the document
and then automatically open a new page to insert the second one and so on
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class Test{
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
FileOutputStream fout = new FileOutputStream( new File("D:\\word java.docx"));
File image = new File("C:\\Users\\Pictures\\image1.jpg");
File image2 = new File("C:\\Users\\Pictures\\image2.jpg");
File image3 = new File("C:\\Users\\Pictures\\image2.jpg"); // i want to insert those three image in one (Word Document)
FileInputStream imageData= new FileInputStream(image);
int imageType = XWPFDocument.PICTURE_TYPE_JPEG;
String imageFileName = image.getName();
int width = 450;
int height = 400;
run.addPicture(imageData, imageType, imageFileName,
Units.toEMU(width),
Units.toEMU(height));
document.write(fout);
fout.close();
document.close();
}
}

You have already got how to add pictures into a XWPFRun. So your question now seems to be how to do that multiple times for different picture file paths and how to put page breaks after each inserted picture.
For the first we would need a loop. Put the pictures paths into a List. Then you can loop over all picture paths using a "for-each" loop.
For the second we take a look into the API documentation of XWPFRun. There we find XWPFRun.addBreak -> BreakType.PAGE.
Complete example:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.util.Units;
import java.util.List;
import java.util.ArrayList;
public class CreateWordPicturesInSinglePages {
public static void main(String[] args) throws Exception {
List<String> picturePaths = new ArrayList<String>();
picturePaths.add("./image1.jpg");
picturePaths.add("./image2.jpg");
picturePaths.add("./image3.jpg");
XWPFDocument document= new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("The pictures: ");
FileInputStream in;
File image;
// a loop over all picture paths
for (String picturePath : picturePaths) {
try { // maybe something gets wrong while image IO
image = new File(picturePath);
in = new FileInputStream(image);
int imageType = XWPFDocument.PICTURE_TYPE_JPEG;
String imageFileName = image.getName();
int width = 450;
int height = 400;
// add picture
paragraph = document.createParagraph();
run = paragraph.createRun();
run.addPicture(in, imageType, imageFileName, Units.toEMU(width), Units.toEMU(height));
// add text below the picture
paragraph = document.createParagraph();
run = paragraph.createRun();
run.setText("Image file-name: " + imageFileName);
// add page break
paragraph = document.createParagraph();
run = paragraph.createRun();
run.addBreak(BreakType.PAGE);
} catch (Exception ex) {
ex.printStackTrace();
}
}
FileOutputStream out = new FileOutputStream("./CreateWordPicturesInSinglePages.docx");
document.write(out);
out.close();
document.close();
}
}
The code is tested and works using current apache poi 5.2.2.
Result in Word:

You could use the setPageBreak() method. Below is a runnable example how this could be done. Of course, the image paths would need to reflect the images you want to apply to the WORD document. In the example runnable below, the image paths are placed into a String[] array names filePaths and Text to be added to the bottom of these images are placed into a parallel String[] array named imageTexts. Each image will be placed into its own document page with its respective text. Please read the comments in code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class Apache_POI_Demo {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
/* These are the images we want to place into a WORD document
but only one image per page. We place them into a String[]
Array for easier handling especially if you plan to plan to
insert lots of them into the document. */
String[] filePaths = {"C:/Users/Devils/Pictures/DUCKS/image1.jpg", // Image 1
"C:/Users/Devils/Pictures/DUCKS/image2.jpg", // Image 2
"C:/Users/Devils/Pictures/DUCKS/image3.jpg"}; // Image 3
/* There are specific Strings related to each image. These
strings will be applied under each respective image on
each page. */
String[] imageText = {"Just Some Cute Ducks", // For image1
"A Couple more Cute Ducks!", // For image2
"A Bit Of A Happy Duck"}; // For image3
// Create a new WORD document ('Try With Resources' used!)
try (XWPFDocument doc = new XWPFDocument()) {
// The pages we want (one page for each image).
XWPFParagraph[] page = new XWPFParagraph[filePaths.length];
/* Iterate through each image, load it in,
and apply it to the document. */
for (int pg = 0; pg < filePaths.length; pg++) {
// New paragraph for this current page.
page[pg] = doc.createParagraph();
// Align everything to center
page[pg].setAlignment(ParagraphAlignment.CENTER);
// Create Page
XWPFRun run = page[pg].createRun();
// Load Image File...
File image = new File(filePaths[pg]);
// 'Try With Resources' used! here as well.
try (FileInputStream imageData = new FileInputStream(image)) {
int imageType = XWPFDocument.PICTURE_TYPE_JPEG; // Image type is jpg
String imageFileName = image.getName(); // Get image file Name (only)
int width = 450; // Set image Width
int height = 400; // Set image Height
// Add image as first paragraph in document page.
run.addPicture(imageData, imageType, imageFileName,
Units.toEMU(width),
Units.toEMU(height));
// Add text center under image ("Image #n") as part of first paragraph.
run.setFontFamily("Courier"); // Set the Font name for text.
run.setBold(true); // Set Text Bold.
run.setItalic(true); // Set Text Italic.
run.setColor("0000ff"); // Set text color Blue (hex RGB color value).
run.setFontSize(11); // Set the Font Size for this text.
run.setText("Image #" + (pg + 1)); // The text to add to document...
run.setText(" - " + imageFileName); // The text to add to document line.
// Add a second paragraph to document on current page.
XWPFParagraph paragraph2 = doc.createParagraph();
// Also align this paragraph text to center of document.
paragraph2.setAlignment(ParagraphAlignment.CENTER);
// Create the paragraph 2 object
XWPFRun run2 = paragraph2.createRun();
run2.setFontFamily("Time New Roman"); // Set the Font for this paragraph
run2.setBold(false); // Set Font Bold OFF
run2.setItalic(false); // Set Font italic OFF
run2.setFontSize(14); // Set the Font Size to 14
run2.setText(imageText[pg]); // Apply the desired text from Array
page[pg].setPageBreak(true); // Apply a new Page Break. ******
}
}
// Save (Write) the WORD Document.
try (FileOutputStream out = new FileOutputStream("D:\\POI_test\\word java.docx")) {
doc.write(out);
}
}
catch (IOException | InvalidFormatException ex) {
System.err.println(ex.getMessage());
}
}
}
With the Images I used, the following was the result within WORD:

OpenCV library in Tomcat(8.5.32) Server unable to execute

Facing issue while image processing code setup. In spite of doing all code changes and different approaches facing the issue.
Libraries used – OpenCV – 3.4.2
Jars Used – opencv-3.4.2-0
tess4j-3.4.8
Lines added in pom.xml
<!-- https://mvnrepository.com/artifact/org.openpnp/opencv -->
<dependency>
<groupId>org.openpnp</groupId>
<artifactId>opencv</artifactId>
<version>3.4.2-0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.4.8</version>
</dependency>
Steps for OpenCV installation :
Download opencv.exe from the official site’
Run opencv.exe, it will create an opencv folder
We have now the opencv library available which we can use for eclipse.
Steps for Tesseract installation :
Download tess4j.zip file from the official link
Extract the zip folder after download
Provide the path of the tess4j folder
Following are the steps which we have performed for the setup in eclipse :
We have added native library by providing path to openCV library from build path settings
We downloaded tesseract for image reading.
We provided the path to the Tesseraact in the code
We have used System.loadlibrary(Core.NATIVE_LIBRARY_NAME) and openCv.loadLocally() for loading the library.
Then we have made the WAR export for deployment
There has been no changes or setup in apache tomcat
For loading the libraries in Tomcat we have to provide some setup here :-
Now for the new code we have used, Load Library static class in the code (as solutions stated on stack overflow)
In here System.loadLibrary is not working
We had to use System.load along with hardcoded path which is resulting in internal error
We have used System.load – 2 time in the static class out of which the when the 1st one is giving -std error -bad allocation
As there are 2 paths in opencv-
This is the 1st one
System.load("C:\Users\Downloads\opencv\build\bin\opencv_java342.dll");
and the 2nd one is giving the assertion error based on which one is kept above
This is the 2nd one
System.load("C:\User\Downloads\opencv\build\java\x64\opencv_java3412.dll");
The code is executing till mid-way and then getting out and till now not yet code has reached till tesseract.
Here is the code for the same :
import java.awt.Image;
import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.TimeUnit;
import javax.swing.ImageIcon;
import javax.swing.JFrame;
import javax.swing.JLabel;
import org.apache.commons.logging.impl.Log4JLogger;
import org.apache.log4j.Logger;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.opencv.core.Core;
import org.opencv.core.CvException;
import org.opencv.core.Mat;
import org.opencv.core.MatOfPoint;
import org.opencv.core.Rect;
import org.opencv.core.Size;
import org.opencv.highgui.HighGui;
import org.opencv.imgcodecs.Imgcodecs;
import org.opencv.imgproc.Imgproc;
import net.sourceforge.tess4j.Tesseract;
import nu.pattern.OpenCV;
public class ReadImageBox {
public String readDataFromImage(String imageToReadPath,String tesseractPath)
{
String result = "";
try {
String i = Core.NATIVE_LIBRARY_NAME;
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
logger.info("Img to read "+imageToReadPath);
String imagePath =imageToReadPath; // bufferNameOfImagePath = "";
logger.info(imagePath);
/*
* The class Mat represents an n-dimensional dense numerical single-channel or
* multi-channel array. It can be used to store real or complex-valued vectors
* and matrices, grayscale or color images, voxel volumes, vector fields, point
* clouds, tensors, histograms (though, very high-dimensional histograms may be
* better stored in a SparseMat ).
*/
logger.info("imagepath::"+imagePath);
OpenCV.loadLocally();
logger.info("imagepath::"+imagePath);
//logger.info("Library Information"+Core.getBuildInformation());
logger.info("imagepath::"+imagePath);
Mat source = Imgcodecs.imread(imagePath);
logger.info("Source image "+source);
String directoryPath = imagePath.substring(0,imagePath.lastIndexOf('/'));
logger.info("Going for Image Processing :" + directoryPath);
// calling image processing here to process the data from it
result = updateImage(100,20,10,3,3,2,source, directoryPath,tesseractPath);
logger.info("Data read "+result);
return result;
}
catch (UnsatisfiedLinkError error) {
// Output expected UnsatisfiedLinkErrors.
logger.error(error);
}
catch (Exception exception)
{
logger.error(exception);
}
return result;
}
public static String updateImage(int boxSize, int horizontalRemoval, int verticalRemoval, int gaussianBlur,
int denoisingClosing, int denoisingOpening, Mat source, String tempDirectoryPath,String tesseractPath) throws Exception{
// Tesseract Object
logger.info("Tesseract Path :"+tesseractPath);
Tesseract tesseract = new Tesseract();
tesseract.setDatapath(tesseractPath);
// Creating the empty destination matrix for further processing
Mat grayScaleImage = new Mat();``
Mat gaussianBlurImage = new Mat();
Mat thresholdImage = new Mat();
Mat morph = new Mat();
Mat morphAfterOpreation = new Mat();
Mat dilate = new Mat();
Mat hierarchy = new Mat();
logger.info("Image type"+source.type());
// Converting the image to gray scale and saving it in the grayScaleImage matrix
Imgproc.cvtColor(source, grayScaleImage, Imgproc.COLOR_RGB2GRAY);
//Imgproc.cvtColor(source, grayScaleImage, 0);
// Applying Gaussain Blur
logger.info("source image "+source);
Imgproc.GaussianBlur(grayScaleImage, gaussianBlurImage, new org.opencv.core.Size(gaussianBlur, gaussianBlur),
0);
// OTSU threshold
Imgproc.threshold(gaussianBlurImage, thresholdImage, 0, 255, Imgproc.THRESH_OTSU | Imgproc.THRESH_BINARY_INV);
logger.info("Threshold image "+gaussianBlur);
// remove the lines of any table inside the invoice
Mat horizontal = thresholdImage.clone();
Mat vertical = thresholdImage.clone();
int horizontal_size = horizontal.cols() / 30;
if(horizontal_size%2==0)
horizontal_size+=1;
// showWaitDestroy("Horizontal Lines Detected", horizontal);
Mat horizontalStructure = Imgproc.getStructuringElement(Imgproc.MORPH_RECT,
new org.opencv.core.Size(horizontal_size, 1));
Imgproc.erode(horizontal, horizontal, horizontalStructure);
Imgproc.dilate(horizontal, horizontal, horizontalStructure);
int vertical_size = vertical.rows() / 30;
if(vertical_size%2==0)
vertical_size+=1;
// Create structure element for extracting vertical lines through morphology
// operations
Mat verticalStructure = Imgproc.getStructuringElement(Imgproc.MORPH_RECT,
new org.opencv.core.Size(1, vertical_size));
// Apply morphology operations
Imgproc.erode(vertical, vertical, verticalStructure);
Imgproc.dilate(vertical, vertical, verticalStructure);
Core.absdiff(thresholdImage, horizontal, thresholdImage);
Core.absdiff(thresholdImage, vertical, thresholdImage);
logger.info("Vertical Structure "+verticalStructure);
Mat newImageFortest = thresholdImage;
logger.info("Threshold image "+thresholdImage);
// applying Closing operation
Imgproc.morphologyEx(thresholdImage, morph, Imgproc.MORPH_CLOSE, Imgproc.getStructuringElement(
Imgproc.MORPH_RECT, new Size(denoisingClosing, denoisingClosing)));
logger.info("Morph image "+morph);
// applying Opening operation
Imgproc.morphologyEx(morph, morphAfterOpreation, Imgproc.MORPH_OPEN, Imgproc.getStructuringElement(
Imgproc.MORPH_RECT, new Size(denoisingOpening, denoisingOpening)));
logger.info("Morph After operation image "+morphAfterOpreation);
// Applying dilation on the threshold image to create bounding box edges
Imgproc.dilate(morphAfterOpreation, dilate,
Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(boxSize, boxSize)));
logger.info("Dilate image "+dilate);
// creating string buffer object
String text = "";
try
{
// finding contours
List<MatOfPoint> contourList = new ArrayList<MatOfPoint>(); // A list to store all the contours
// finding contours
Imgproc.findContours(dilate, contourList, hierarchy, Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_NONE);
logger.info("Contour List "+contourList);
// Creating a copy of the image
//Mat copyOfImage = source;
Mat copyOfImage = newImageFortest;
logger.info("Copy of Image "+copyOfImage);
// Rectangle for cropping
Rect rectCrop = new Rect();
logger.info("Rectangle Crop New Object "+rectCrop);
// loop through the identified contours and crop them from the image to feed
// into Tesseract-OCR
for (int i = 0; i < contourList.size(); i++) {
// getting bound rectangle
rectCrop = Imgproc.boundingRect(contourList.get(i));
logger.info("Rectangle cropped"+rectCrop);
// cropping Image
Mat croppedImage = copyOfImage.submat(rectCrop.y, rectCrop.y + rectCrop.height, rectCrop.x,
rectCrop.x + rectCrop.width);
// writing cropped image to disk
logger.info("Path to write cropped image "+ tempDirectoryPath);
String writePath = tempDirectoryPath + "/croppedImg.png";
logger.info("writepath"+writePath);
// imagepath = imagepath.
Imgcodecs.imwrite(writePath, croppedImage);
try {
// extracting text from cropped image, goes to the image, extracts text and adds
// them to stringBuffer
logger.info("Exact Path where Image was written with Name "+ writePath);
String textExtracted = (tesseract
.doOCR(new File(writePath)));
//Adding Seperator
textExtracted = textExtracted + "_SEPERATOR_";
logger.info("Text Extracted "+textExtracted);
textExtracted = textExtracted + "\n";
text = textExtracted + text;
logger.info("Text Extracted Completely"+text);
// System.out.println("Andar Ka Text => " + text.toString());
} catch (Exception exception) {
logger.error(exception);
}
writePath = "";
logger.info("Making write Path empty for next Image "+ writePath);
}
}
catch(CvException ae)
{
logger.error("cv",ae);
}
catch(UnsatisfiedLinkError ae)
{
logger.error("unsatdif",ae);
}
catch(Exception ae)
{
logger.error("general",ae);
}
// converting into string
return text.toUpperCase();
}
// convert Mat to Image for GUI output
public static Image toBufferedImage(Mat m) {
// getting BYTE_GRAY formed image
int type = BufferedImage.TYPE_BYTE_GRAY;
if (m.channels() > 1) {
type = BufferedImage.TYPE_3BYTE_BGR;
}
int bufferSize = m.channels() * m.cols() * m.rows();
byte[] b = new byte[bufferSize];
m.get(0, 0, b); // get all the pixels
// creating buffered Image
BufferedImage image = new BufferedImage(m.cols(), m.rows(), type);
final byte[] targetPixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
System.arraycopy(b, 0, targetPixels, 0, b.length);
// returning Image
return image;
}
// method to display Mat format images using the GUI
private static void showWaitDestroy(String winname, Mat img) {
HighGui.imshow(winname, img);
HighGui.moveWindow(winname, 500, 0);
HighGui.waitKey(0);
HighGui.destroyWindow(winname);
}
}

Drawing vector images on PDF with PDFBox

I would like to draw a vector image on a PDF with Apache PDFBox.
This is the code I use to draw regular images
PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(1);
PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true);
BufferedImage _prevImage = ImageIO.read(new FileInputStream("path/to/image.png"));
PDPixelMap prevImage = new PDPixelMap(document, _prevImage);
contentStream.drawXObject(prevImage, prevX, prevY, imageWidth, imageHeight);
If I use a svg or wmf image instead of png, the resulting PDF document comes corrupted.
The main reason I want the image to be a vector image is that with PNG or JPG the image looks horrible, I think it gets somehow compressed so it looks bad. With vector images this shouldn't happen (well, when I export svg paths as PDF in Inkscape it doesn't happen, vector paths are preserved).
Is there a way to draw a svg or wmf (or other vector) to PDF using Apache PDFBox?
I'm currently using PDFBox 1.8, if that matters.

See the library pdfbox-graphics2d, touted in this Jira.
You can draw the SVG, via Batik or Salamander or whatever, onto the class PdfBoxGraphics2D, which is parallel to iText's template.createGraphics(). See the GitHub page for samples.
PDDocument document = ...;
PDPage page = ...; // page whereon to draw
String svgXML = "<svg>...</svg>";
double leftX = ...;
double bottomY = ...; // PDFBox coordinates are oriented bottom-up!
// I set these to the SVG size, which I calculated via Salamander.
// Maybe it doesn't matter, as long as the SVG fits on the graphic.
float graphicsWidth = ...;
float graphicsHeight = ...;
// Draw the SVG onto temporary graphics.
var graphics = new PdfBoxGraphics2D(document, graphicsWidth, graphicsHeight);
try {
int x = 0;
int y = 0;
drawSVG(svg, graphics, x, y); // with Batik, Salamander, or whatever you like
} finally {
graphics.dispose();
}
// Graphics are not visible till a PDFormXObject is added.
var xform = graphics.getXFormObject();
try (var contentWriter = new PDPageContentStream(document, page, AppendMode.APPEND, false)) { // false = don't compress
// XForm objects have to be placed via transform,
// since they cannot be placed via coordinates like images.
var transform = AffineTransform.getTranslateInstance(leftX, bottomY);
xform.setMatrix(transform);
// Now the graphics become visible.
contentWriter.drawForm(xform);
}
And ... in case you want also to scale the SVG graphics to 25% size:
// Way 1: Scale the SVG beforehand
svgXML = String.format("<svg transform=\"scale(%f)\">%s</svg>", .25, svgXML);
// Way 2: Scale in the transform (before calling xform.setMatrix())
transform.concatenate(AffineTransform.getScaleInstance(.25, .25));

I do this, but not directly.
In first transform your SVG documents in PDF documents with FOP librairy and Batik.
https://xmlgraphics.apache.org/fop/dev/design/svg.html.
In second times, you can use LayerUtility in pdfbox to transform your new pdf document in PDXObjectForm. After that, just needs to include PDXObjectForm in your final pdf documents.

The final working solution for me that loads an SVG file and overlays it on a PDF file (this renders the SVG in a 500x500 box at (0,0) coordinate which is bottom left of the PDF document):
package com.example.svgadder;
import java.io.*;
import java.nio.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import de.rototor.pdfbox.graphics2d.PdfBoxGraphics2D;
import java.awt.geom.AffineTransform;
import com.kitfox.svg.SVGDiagram;
import com.kitfox.svg.SVGException;
import com.kitfox.svg.SVGUniverse;
public class App
{
public static void main( String[] args ) throws Exception {
App app = new App();
}
public App() throws Exception {
// loading PDF and SVG files
File pdfFile = new File("input.pdf");
File svgFile = new File("input.svg");
PDDocument doc = PDDocument.load(pdfFile);
PDPage page = doc.getPage(0);
SVGUniverse svgUniverse = new SVGUniverse();
SVGDiagram diagram = svgUniverse.getDiagram(svgUniverse.loadSVG(f.toURL()));
PdfBoxGraphics2D graphics = new PdfBoxGraphics2D(doc, 500, 500);
try {
diagram.render(graphics);
} finally {
graphics.dispose();
}
PDFormXObject xform = graphics.getXFormObject();
try (PDPageContentStream contentWriter = new PDPageContentStream(doc, page, AppendMode.APPEND, false)) {
AffineTransform transform = AffineTransform.getTranslateInstance(0, 0);
xform.setMatrix(transform);
contentWriter.drawForm(xform);
}
doc.save("res.pdf");
doc.close();
}
}
Please use svgSalamander from here:
https://github.com/mgarin/svgSalamander
Please use what Coemgenus suggested for scaling your final overlaid SVG. I tried the 2nd option and it works well.
Nirmal

Creating PDF from TIFF image using iText

I'm currently generating PDF files from TIFF images using iText.
Basically the procedure is as follows:
1. Read the TIFF file.
2. For each "page" of the TIFF, instantiate an Image object and write that to a Document instance, which is the PDF file.
I'm having a hard time understanding how to add those images to the PDF keeping the original resolution.
I've tried to scale the Image to the dimensions in pixels of the original image of the TIFF, for instance:
// Pixel Dimensions 1728 × 2156 pixels
// Resolution 204 × 196 ppi
RandomAccessFileOrArray tiff = new RandomAccessFileOrArray("/path/to/tiff/file");
Document pdf = new Document(PageSize.LETTER);
Image temp = TiffImage.getTiffImage(tiff, page);
temp.scaleAbsolute(1728f, 2156f);
pdf.add(temp);
I would really appreciate if someone can shed some light on this. Perhaps I'm missing the functionality of the Image class methods...
Thanks in advance!

I think if you scale the image then you can not retain the original resolution (please correct me if I am wrong :)).
What you can try doing is to creat a PDF document with different sized pages (if images are of different resolution in the tif image).
Try the following code. It sets the size of PDF page equal to that of image file and then create that PDF page. the PDF page size varies according to the image size so the resolution is maintained :)
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Tiff2Pdf {
/**
* #param args
* #throws DocumentException
* #throws IOException
*/
public static void main(String[] args) throws DocumentException,
IOException {
String imgeFilename = "/home/saurabh/Downloads/image.tif";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(
document,
new FileOutputStream("/home/saurabh/Desktop/out"
+ Math.random() + ".pdf"));
writer.setStrictImageSequence(true);
document.open();
document.add(new Paragraph("Multipages tiff file"));
Image image;
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imgeFilename);
int pages = TiffImage.getNumberOfPages(ra);
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(ra, i);
Rectangle pageSize = new Rectangle(image.getWidth(),
image.getHeight());
document.setPageSize(pageSize);
document.add(image);
document.newPage();
}
document.close();
}
}

I've found that this line doesn't work well:
document.setPageSize(pageSize);
If your TIFF files only contain one image then you're better off using this instead:
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imageFilePath);
Image image = TiffImage.getTiffImage(ra, 1);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
Document document = new Document(pageSize);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputFileName));
writer.setStrictImageSequence(true);
document.open();
document.add(image);
document.newPage();
document.close();
This will result in a page size that fits the image size exactly, so no scaling is required.

Another example non-deprecated up to iText 5.5 with the first page issue fixed. I'm using 5.5.11 Itext.
import java.io.FileOutputStream;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.io.FileChannelRandomAccessSource;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;
public class Test1 {
public static void main(String[] args) throws Exception {
RandomAccessFile aFile = new RandomAccessFile("/myfolder/origin.tif", "r");
FileChannel inChannel = aFile.getChannel();
FileChannelRandomAccessSource fcra = new FileChannelRandomAccessSource(inChannel);
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("/myfolder/destination.pdf"));
document.open();
RandomAccessFileOrArray rafa = new RandomAccessFileOrArray(fcra);
int pages = TiffImage.getNumberOfPages(rafa);
Image image;
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(rafa, i);
Rectangle pageSize = new Rectangle(image.getWidth(), image.getHeight());
document.setPageSize(pageSize);
document.newPage();
document.add(image);
}
document.close();
aFile.close();
}
}

Don't know how to create a centered "image + text" watermark in pdf files with iText (Java)

I'm using the iText library, and I'm trying to add a watermark at the bottom of the page. The watermark is simple, it has to be centered an has an image on the left and a text on the right.
At this point, I have the image AND the text in a png format. I can calculate the position where I want to put the image (centered) calculating the page size and image size, but now I want to include the text AS text (better legibility, etc.).
Can I embed the image and the text in some component and then calculate the position like I'm doing now? Another solutions or ideas?
Here is my actual code:
try {
PdfReader reader = new PdfReader("example.pdf");
int numPages = reader.getNumberOfPages();
PdfStamper stamp = new PdfStamper(reader, new FileOutputStream("pdfWithWatermark.pdf"));
int i = 0;
Image watermark = Image.getInstance("watermark.png");
PdfContentByte addMark;
while (i < numPages) {
i++;
float x = reader.getPageSizeWithRotation(i).getWidth() - watermark.getWidth();
watermark.setAbsolutePosition(x/2, 15);
addMark = stamp.getUnderContent(i);
addMark.addImage(watermark);
}
stamp.close();
}
catch (Exception i1) {
logger.info("Exception adding watermark.");
i1.printStackTrace();
}
Thank you in advance!

you better check this:
import com.lowagie.text.*;
import java.io.*;
import com.lowagie.text.pdf.*;
import java.util.*;
class pdfWatermark
{
public static void main(String args[])
{
try
{
PdfReader reader = new PdfReader("text.pdf");
int n = reader.getNumberOfPages();
// Create a stamper that will copy the document to a new file
PdfStamper stamp = new PdfStamper(reader,
new FileOutputStream("text1.pdf"));
int i = 1;
PdfContentByte under;
PdfContentByte over;
Image img = Image.getInstance("watermark.jpg");
BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
BaseFont.WINANSI, BaseFont.EMBEDDED);
img.setAbsolutePosition(200, 400);
while (i < n)
{
// Watermark under the existing page
under = stamp.getUnderContent(i);
under.addImage(img);
// Text over the existing page
over = stamp.getOverContent(i);
over.beginText();
over.setFontAndSize(bf, 18);
over.showText("page " + i);
over.endText();
i++;
}
stamp.close();
}
catch (Exception de)
{}
}
}
(source)

is a bit ugly but, can't you add the image and the text to a table and then center it?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - extract text from pdf from selected area to txt - java

Related

how to insert every image in a new page to a ( word document )

OpenCV library in Tomcat(8.5.32) Server unable to execute

Drawing vector images on PDF with PDFBox

Creating PDF from TIFF image using iText

Don't know how to create a centered "image + text" watermark in pdf files with iText (Java)

Categories

Resources