My Spring boot Java application is using apache pdf box library {version 2.0.6} for generating pdf. I want decimal values to be right aligned. It means all decimal dot should align in same vertical line. I also attached screenShot.
stream.beginText();
stream.newLineAtOffset(xCordinate, yCordinate);
stream.showText(String.valueOf(item.getQuantity()));
List<String> resultList = processTextData(TextUtil.isEmpty(item.getDescription()) ? "-" : item.getDescription());
int y = 0;
int x = 50;
int tempYcordinate = yCordinate;
for (String string : resultList) {
stream.newLineAtOffset(x, y);
stream.showText(processStringForPdf(string));
x = 0;
y = -8;
}
tempYcordinate = tempYcordinate - (8 * resultList.size());
stream.endText();
stream.beginText();
stream.newLineAtOffset(285, yCordinate);
stream.showText("$" + NumberFormat.getInstance(Locale.US).format(Util.round(item.getUnitPrice())));
stream.newLineAtOffset(65, 0);
stream.showText("$" + NumberFormat.getInstance(Locale.US).format(Util.round(item.getExtPrice())));
stream.endText();
yCordinate = tempYcordinate;
To right align the text you need to compute the width of the text to show and align the output position to
(right alignment position) - (text width)
Find below a small snippet which shows the principle. You need to amend the snippet for your needs.
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class RightAlignDemo {
public static void main(String[] args) throws IOException {
File file = new File("out.pdf");
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream stream = new PDPageContentStream(doc, page);
PDType1Font font = PDType1Font.TIMES_ROMAN;
int fontSize = 12;
stream.setFont(font, fontSize);
double[] values = {0, 0.1, 0.01, 12.12, 123.12, 1234.12, 123456.12};
int columnOneLeftX = 50;
int columnTwoRightX = 170;
int columnThreeOffsetX = 10;
for (int i = 0; i < values.length; i++) {
stream.beginText();
stream.newLineAtOffset(columnOneLeftX, 700 - (i*10));
// show some left aligned non fixed width text
stream.showText("value " + values[i]);
// format the double value with thousands separator and
// two decimals
String text = String.format("%,.2f", values[i]);
// get the width of the formated value
float textWidth = getTextWidth(font, fontSize, text);
// align the position to (right alignment minus text width)
stream.newLineAtOffset(columnTwoRightX - textWidth, 0);
stream.showText(text);
// align the positon back to columnTwoRightX plus offset for
// column three
stream.newLineAtOffset(textWidth + columnThreeOffsetX, 0);
stream.showText("description " + i);
stream.endText();
}
stream.close();
doc.save(file);
doc.close();
}
private static float getTextWidth(PDType1Font font, int fontSize,
String text) throws IOException {
return (font.getStringWidth(text) / 1000.0f) * fontSize;
}
}
PDF output
Related
I am exporting a PDF in my program, and I wanted to create a table with ApachePDF Box which should be about 50-60% of the Page width.
However, I didnt manage to find anything about centering the rows/the table itself.
I found how to align text in the row/cell itself, but if I create a Row that does not use the full width of the page, its always left aligned, and I dont know how to center align the row, since the row or table does not have a setAlign method.
Im using Boxable on top of it (https://github.com/dhorions/boxable)
public void Test() throws IOException {
//Set margins
float margin = 10;
//Initialize Document
PDDocument doc = new PDDocument();
PDPage page = addNewPage(doc);
//Initialize table
float tableWidth = page.getMediaBox().getWidth() - (2 * margin);
float yStartNewPage = page.getMediaBox().getHeight() - (2 * margin);
boolean drawContent = true;
boolean drawLines = true;
float yStart = yStartNewPage;
float bottomMargin = 70;
BaseTable table = new BaseTable(yStart, yStartNewPage, bottomMargin, tableWidth, margin, doc, page, drawLines,
drawContent);
// set default line spacing for entire table
table.setLineSpacing(1.5f);
Row<PDPage> row = table.createRow(10);
// set single spacing for entire row
row.setLineSpacing(1f);
// my first 3x wider cell
Cell<PDPage> cell = row.createCell((3*100/15f), "1",
HorizontalAlignment.get("center"), VerticalAlignment.get("top"));
cell.setFontSize(6);
// my other 12 equal cells
for(int i=2; i<14; i++){
Cell<PDPage> cell2 = row.createCell((100/15f), String.valueOf(i),
HorizontalAlignment.get("center"), VerticalAlignment.get("top"));
cell2.setFontSize(6);
}
table.draw();
//Save the document
File file = new File("target/test.pdf");
System.out.println("Sample file saved at : " + file.getAbsolutePath());
Files.createParentDirs(file);
doc.save(file);
doc.close();
}
Adjust set position and table width....
Adjust set position and table width:
public void Test() throws IOException {
//Set margins
float margin = 10;
//Initialize Document
PDDocument doc = new PDDocument();
PDPage page = addNewPage(doc);
//Initialize table
float tableWidth = page.getMediaBox().getWidth() - (2 * margin);
float yStartNewPage = page.getMediaBox().getHeight() - (2 * margin);
boolean drawContent = true;
boolean drawLines = true;
float yStart = yStartNewPage;
float bottomMargin = 70;
BaseTable table = new BaseTable(yStart, yStartNewPage, bottomMargin, tableWidth, margin, doc, page, drawLines, drawContent);
// set default line spacing for entire table
table.setLineSpacing(1.5f);
Row<PDPage> row = table.createRow(10);
// set single spacing for entire row
row.setLineSpacing(1f);
// my first 3x wider cell
Cell<PDPage> cell = row.createCell((3*100/15f), "1",
HorizontalAlignment.get("center"), VerticalAlignment.get("top"));
cell.setFontSize(6);
// my other 12 equal cells
for(int i=2; i<14; i++){
Cell<PDPage> cell2 = row.createCell((100/15f), String.valueOf(i),
HorizontalAlignment.get("center"), VerticalAlignment.get("top"));
cell2.setFontSize(6);
}
table.draw();
//Save the document
File file = new File("target/test.pdf");
System.out.println("Sample file saved at : " + file.getAbsolutePath());
Files.createParentDirs(file);
doc.save(file);
doc.close();
}
I have code written using PDFBox API that highlights the words in a PDF but when I convert highlighted PDF pages to images, then whatever I have highlighted gets disappeared from the image.
Below screenshot is with highlighted text, for highlighting I have used PDFBox's PDAnnotationTextMarkup class:
Highlighted PDF Page
Below is the image after converting the pdf page to image:
Highlighted PDF Page Image after converting
Below is the code I have used for converting PDF to Image:
PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
int pageCounter = 0;
for (PDPage page : document.getPages())
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (pageCounter++) + ".png", 300);
}
document.close();
Please suggest what is wrong here, why PDFRenderer not able to take PDF page image along with the highlighted red box.
Below is the code I used to highlight the text in PDF:
private void highlightText(String pdfFilePath, String highlightedPdfFilePath) {
try {
// Loading an existing document
File file = new File(highlightedPdfFilePath);
if (!file.exists()) {
file = new File(pdfFilePath);
}
PDDocument document = PDDocument.load(file);
// extended PDFTextStripper class
PDFTextStripper stripper = new PDFTextHighlighter();
// Get number of pages
int number_of_pages = document.getDocumentCatalog().getPages().getCount();
// The method writeText will invoke an override version of
// writeString
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
// Print collected information
System.out.println("tokenStream:::"+tokenStream);
System.out.println("tokenStream size::"+tokenStream.size());
System.out.println("coordinates size::"+coordinates.size());
double page_height;
double page_width;
double width, height, minx, maxx, miny, maxy;
int rotation;
// scan each page and highlitht all the words inside them
for (int page_index = 0; page_index < number_of_pages; page_index++) {
// get current page
PDPage page = document.getPage(page_index);
// Get annotations for the selected page
List<PDAnnotation> annotations = page.getAnnotations();
// Define a color to use for highlighting text
PDColor red = new PDColor(new float[] { 1, 0, 0 }, PDDeviceRGB.INSTANCE);
// Page height and width
page_height = page.getMediaBox().getHeight();
page_width = page.getMediaBox().getWidth();
// Scan collected coordinates
for (int i = 0; i < coordinates.size(); i++) {
if (!differencePgaeNumber.contains(page_index)) {
differencePgaeNumber.add(page_index);
}
// if the current coordinates are not related to the current
// page, ignore them
if ((int) coordinates.get(i)[4] != (page_index + 1))
continue;
else {
// get rotation of the page...portrait..landscape..
rotation = (int) coordinates.get(i)[7];
// page rotated of 90degrees
if (rotation == 90) {
height = coordinates.get(i)[5];
width = coordinates.get(i)[6];
width = (page_height * width) / page_width;
// define coordinates of a rectangle
maxx = coordinates.get(i)[1];
minx = coordinates.get(i)[1] - height;
miny = coordinates.get(i)[0];
maxy = coordinates.get(i)[0] + width;
} else // i should add here the cases -90/-180 degrees
{
height = coordinates.get(i)[5];
minx = coordinates.get(i)[0];
maxx = coordinates.get(i)[2];
miny = page_height - coordinates.get(i)[1];
maxy = page_height - coordinates.get(i)[3] + height;
}
// Add an annotation for each scanned word
PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(
PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
txtMark.setColor(red);
txtMark.setConstantOpacity((float) 0.3); // 30%
// transparent
PDRectangle position = new PDRectangle();
position.setLowerLeftX((float) minx);
position.setLowerLeftY((float) miny);
position.setUpperRightX((float) maxx);
position.setUpperRightY((float) ((float) maxy + height));
txtMark.setRectangle(position);
float[] quads = new float[8];
quads[0] = position.getLowerLeftX(); // x1
quads[1] = position.getUpperRightY() - 2; // y1
quads[2] = position.getUpperRightX(); // x2
quads[3] = quads[1]; // y2
quads[4] = quads[0]; // x3
quads[5] = position.getLowerLeftY() - 2; // y3
quads[6] = quads[2]; // x4
quads[7] = quads[5]; // y5
txtMark.setQuadPoints(quads);
txtMark.setContents(tokenStream.get(i).toString());
annotations.add(txtMark);
}
}
}
// Saving the document in a new file
File highlighted_doc = new File(highlightedPdfFilePath);
document.save(highlighted_doc);
document.close();
} catch (IOException e) {
System.out.println(e);
}
}
You need to construct the visual appearance of the annotation with this call:
txtMark.constructAppearances(document);
I am using Pdfbox to generate PDF files using Java. The problem is that when i add long text contents in the document, it is not displayed properly. Only a part of it is displayed. That too in a single line.
I want text to be in multiple lines.
My code is given below:
PDPageContentStream pdfContent=new PDPageContentStream(pdfDocument, pdfPage, true, true);
pdfContent.beginText();
pdfContent.setFont(pdfFont, 11);
pdfContent.moveTextPositionByAmount(30,750);
pdfContent.drawString("I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox");
pdfContent.endText();
My output:
Adding to the answer of Mark you might want to know where to split your long string. You can use the PDFont method getStringWidth for that.
Putting everything together you get something like this (with minor differences depending on the PDFBox version):
PDFBox 1.8.x
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
spaceIndex = text.length();
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
System.out.printf("'%s' - %f of %f\n", subString, size, width);
if (size > width)
{
if (lastSpace < 0)
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
System.out.printf("'%s' is line\n", subString);
lastSpace = -1;
}
else if (spaceIndex == text.length())
{
lines.add(text);
System.out.printf("'%s' is line\n", text);
text = "";
}
else
{
lastSpace = spaceIndex;
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.moveTextPositionByAmount(startX, startY);
for (String line: lines)
{
contentStream.drawString(line);
contentStream.moveTextPositionByAmount(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save("break-long-string.pdf");
}
finally
{
if (doc != null)
{
doc.close();
}
}
(BreakLongString.java test testBreakString for PDFBox 1.8.x)
PDFBox 2.0.x
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
spaceIndex = text.length();
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
System.out.printf("'%s' - %f of %f\n", subString, size, width);
if (size > width)
{
if (lastSpace < 0)
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
System.out.printf("'%s' is line\n", subString);
lastSpace = -1;
}
else if (spaceIndex == text.length())
{
lines.add(text);
System.out.printf("'%s' is line\n", text);
text = "";
}
else
{
lastSpace = spaceIndex;
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.newLineAtOffset(startX, startY);
for (String line: lines)
{
contentStream.showText(line);
contentStream.newLineAtOffset(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save(new File(RESULT_FOLDER, "break-long-string.pdf"));
}
finally
{
if (doc != null)
{
doc.close();
}
}
(BreakLongString.java test testBreakString for PDFBox 2.0.x)
The result
This looks as expected.
Of course there are numerous improvements to make but this should show how to do it.
Adding unconditional line breaks
In a comment aleskv asked:
could you add line breaks when there are \n in the string?
One can easily extend the solution to unconditionally break at newline characters by first splitting the string at '\n' characters and then iterating over the split result.
E.g. if instead of the long string from above
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
you want to process this even longer string with embedded new line characters
String textNL = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox.\nFurthermore, I have added some newline characters to the string at which lines also shall be broken.\nIt should work alright like this...";
you can simply replace
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
[...]
}
in the solutions above by
String textNL = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox.\nFurthermore, I have added some newline characters to the string at which lines also shall be broken.\nIt should work alright like this...";
List<String> lines = new ArrayList<String>();
for (String text : textNL.split("\n"))
{
int lastSpace = -1;
while (text.length() > 0)
{
[...]
}
}
(from BreakLongString.java test testBreakStringNL)
The result:
I know it's a bit late, but i had a little problem with mkl's solution. If the last line would only contain one word, your algorithm writes it on the previous one.
For Example: "Lorem ipsum dolor sit amet" is your text and it should add a line break after "sit".
Lorem ipsum dolor sit
amet
But it does this:
Lorem ipsum dolor sit amet
I came up with my own solution i want to share with you.
/**
* #param text The text to write on the page.
* #param x The position on the x-axis.
* #param y The position on the y-axis.
* #param allowedWidth The maximum allowed width of the whole text (e.g. the width of the page - a defined margin).
* #param page The page for the text.
* #param contentStream The content stream to set the text properties and write the text.
* #param font The font used to write the text.
* #param fontSize The font size used to write the text.
* #param lineHeight The line height of the font (typically 1.2 * fontSize or 1.5 * fontSize).
* #throws IOException
*/
private void drawMultiLineText(String text, int x, int y, int allowedWidth, PDPage page, PDPageContentStream contentStream, PDFont font, int fontSize, int lineHeight) throws IOException {
List<String> lines = new ArrayList<String>();
String myLine = "";
// get all words from the text
// keep in mind that words are separated by spaces -> "Lorem ipsum!!!!:)" -> words are "Lorem" and "ipsum!!!!:)"
String[] words = text.split(" ");
for(String word : words) {
if(!myLine.isEmpty()) {
myLine += " ";
}
// test the width of the current line + the current word
int size = (int) (fontSize * font.getStringWidth(myLine + word) / 1000);
if(size > allowedWidth) {
// if the line would be too long with the current word, add the line without the current word
lines.add(myLine);
// and start a new line with the current word
myLine = word;
} else {
// if the current line + the current word would fit, add the current word to the line
myLine += word;
}
}
// add the rest to lines
lines.add(myLine);
for(String line : lines) {
contentStream.beginText();
contentStream.setFont(font, fontSize);
contentStream.moveTextPositionByAmount(x, y);
contentStream.drawString(line);
contentStream.endText();
y -= lineHeight;
}
}
///// FOR PDBOX 2.0.X
// FOR ADDING DYNAMIC PAGE ACCORDING THE LENGTH OF THE CONTENT
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class Document_Creation {
public static void main (String args[]) throws IOException {
PDDocument doc = null;
try
{
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
PDFont pdfFont = PDType1Font.HELVETICA;
float fontSize = 25;
float leading = 1.5f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 72;
float width = mediabox.getWidth() - 2*margin;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
String text = "I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox.An essay is, generally, a piece of writing that gives the author's own argument — but the definition is vague, overlapping with those of an article, a pamphlet, and a short story. Essays have traditionally been sub-classified as formal and informal. Formal essays are characterized by serious purpose, dignity, logical organization, length,whereas the informal essay is characterized by the personal element (self-revelation, individual tastes and experiences, confidential manner), humor, graceful style, rambling structure, unconventionality or novelty of theme.Lastly, one of the most attractive features of cats as housepets is their ease of care. Cats do not have to be walked. They get plenty of exercise in the house as they play, and they do their business in the litter box. Cleaning a litter box is a quick, painless procedure. Cats also take care of their own grooming. Bathing a cat is almost never necessary because under ordinary circumstances cats clean themselves. Cats are more particular about personal cleanliness than people are. In addition, cats can be left home alone for a few hours without fear. Unlike some pets, most cats will not destroy the furnishings when left alone. They are content to go about their usual activities until their owners return.";
List<String> lines = new ArrayList<String>();
int lastSpace = -1;
while (text.length() > 0)
{
int spaceIndex = text.indexOf(' ', lastSpace + 1);
if (spaceIndex < 0)
spaceIndex = text.length();
String subString = text.substring(0, spaceIndex);
float size = fontSize * pdfFont.getStringWidth(subString) / 1000;
System.out.printf("'%s' - %f of %f\n", subString, size, width);
if (size > width)
{
if (lastSpace < 0)
lastSpace = spaceIndex;
subString = text.substring(0, lastSpace);
lines.add(subString);
text = text.substring(lastSpace).trim();
System.out.printf("'%s' is line\n", subString);
lastSpace = -1;
}
else if (spaceIndex == text.length())
{
lines.add(text);
System.out.printf("'%s' is line\n", text);
text = "";
}
else
{
lastSpace = spaceIndex;
}
}
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.newLineAtOffset(startX, startY);
float currentY=startY;
for (String line: lines)
{
currentY -=leading;
if(currentY<=margin)
{
contentStream.endText();
contentStream.close();
PDPage new_Page = new PDPage();
doc.addPage(new_Page);
contentStream = new PDPageContentStream(doc, new_Page);
contentStream.beginText();
contentStream.setFont(pdfFont, fontSize);
contentStream.newLineAtOffset(startX, startY);
currentY=startY;
}
contentStream.showText(line);
contentStream.newLineAtOffset(0, -leading);
}
contentStream.endText();
contentStream.close();
doc.save("C:/Users/VINAYAK/Desktop/docccc/break-long-string.pdf");
}
finally
{
if (doc != null)
{
doc.close();
}
}
}
}
Just draw the string in a position below, typically done within a loop:
float textx = margin+cellMargin;
float texty = y-15;
for(int i = 0; i < content.length; i++){
for(int j = 0 ; j < content[i].length; j++){
String text = content[i][j];
contentStream.beginText();
contentStream.moveTextPositionByAmount(textx,texty);
contentStream.drawString(text);
contentStream.endText();
textx += colWidth;
}
texty-=rowHeight;
textx = margin+cellMargin;
}
These are the important lines:
contentStream.beginText();
contentStream.moveTextPositionByAmount(textx,texty);
contentStream.drawString(text);
contentStream.endText();
Just keep drawing new strings in new positions. For an example using a table, see here:
http://fahdshariff.blogspot.ca/2010/10/creating-tables-with-pdfbox.html
contentStream.moveTextPositionByAmount(textx,texty) is key point.
say for example if you are using a A4 size means 580,800 is width and height correspondling(approximately). so you have move your text based on the position of your document size.
PDFBox supports varies page format . so the height and width will vary for different page format
Pdfbox-layout abstracts out all the tedious details of managing the layout. As a complete Kotlin example, here is how to convert a text file to a pdf without worrying about line wrapping and pagination.
import org.apache.pdfbox.pdmodel.font.PDType1Font
import rst.pdfbox.layout.elements.Document
import rst.pdfbox.layout.elements.Paragraph
import java.io.File
fun main() {
val textFile = "input.txt"
val pdfFile = "output.pdf"
val font = PDType1Font.COURIER
val fontSize = 12f
val document = Document(40f, 50f, 40f, 60f)
val paragraph = Paragraph()
File(textFile).forEachLine {
paragraph.addText("$it\n", fontSize, font)
}
document.add(paragraph)
document.save(File(pdfFile))
}
i'm trying to extract text with coordinates from a pdf file using PDFBox.
I mixed some methods/info found on internet (stackoverflow too), but the problem i have the coordinates doesnt'seems to be right. When i try to use coordinates for drawing a rectangle on top of tex, for example, the rect is painted elsewhere.
This is my code (please don't judge the style, was written very fast just to test)
TextLine.java
import java.util.List;
import org.apache.pdfbox.text.TextPosition;
/**
*
* #author samue
*/
public class TextLine {
public List<TextPosition> textPositions = null;
public String text = "";
}
myStripper.java
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* #author samue
*/
public class myStripper extends PDFTextStripper {
public myStripper() throws IOException
{
}
#Override
protected void startPage(PDPage page) throws IOException
{
startOfLine = true;
super.startPage(page);
}
#Override
protected void writeLineSeparator() throws IOException
{
startOfLine = true;
super.writeLineSeparator();
}
#Override
public String getText(PDDocument doc) throws IOException
{
lines = new ArrayList<TextLine>();
return super.getText(doc);
}
#Override
protected void writeWordSeparator() throws IOException
{
TextLine tmpline = null;
tmpline = lines.get(lines.size() - 1);
tmpline.text += getWordSeparator();
super.writeWordSeparator();
}
#Override
protected void writeString(String text, List<TextPosition> textPositions) throws IOException
{
TextLine tmpline = null;
if (startOfLine) {
tmpline = new TextLine();
tmpline.text = text;
tmpline.textPositions = textPositions;
lines.add(tmpline);
} else {
tmpline = lines.get(lines.size() - 1);
tmpline.text += text;
tmpline.textPositions.addAll(textPositions);
}
if (startOfLine)
{
startOfLine = false;
}
super.writeString(text, textPositions);
}
boolean startOfLine = true;
public ArrayList<TextLine> lines = null;
}
click event on AWT button
private void jButton1MouseClicked(java.awt.event.MouseEvent evt) {
// TODO add your handling code here:
try {
File file = new File("C:\\Users\\samue\\Desktop\\mwb_I_201711.pdf");
PDDocument doc = PDDocument.load(file);
myStripper stripper = new myStripper();
stripper.setStartPage(1); // fix it to first page just to test it
stripper.setEndPage(1);
stripper.getText(doc);
TextLine line = stripper.lines.get(1); // the line i want to paint on
float minx = -1;
float maxx = -1;
for (TextPosition pos: line.textPositions)
{
if (pos == null)
continue;
if (minx == -1 || pos.getTextMatrix().getTranslateX() < minx) {
minx = pos.getTextMatrix().getTranslateX();
}
if (maxx == -1 || pos.getTextMatrix().getTranslateX() > maxx) {
maxx = pos.getTextMatrix().getTranslateX();
}
}
TextPosition firstPosition = line.textPositions.get(0);
TextPosition lastPosition = line.textPositions.get(line.textPositions.size() - 1);
float x = minx;
float y = firstPosition.getTextMatrix().getTranslateY();
float w = (maxx - minx) + lastPosition.getWidth();
float h = lastPosition.getHeightDir();
PDPageContentStream contentStream = new PDPageContentStream(doc, doc.getPage(0), PDPageContentStream.AppendMode.APPEND, false);
contentStream.setNonStrokingColor(Color.RED);
contentStream.addRect(x, y, w, h);
contentStream.fill();
contentStream.close();
File fileout = new File("C:\\Users\\samue\\Desktop\\pdfbox.pdf");
doc.save(fileout);
doc.close();
} catch (Exception ex) {
}
}
any suggestion? what am i doing wrong?
This is just another case of the excessive PdfTextStripper coordinate normalization. Just like you I had thought that by using TextPosition.getTextMatrix() (instead of getX() and getY) one would get the actual coordinates, but no, even these matrix values have to be corrected (at least in PDFBox 2.0.x, I haven't checked 1.8.x) because the matrix is multiplied by a translation making the lower left corner of the crop box the origin.
Thus, in your case (in which the lower left of the crop box is not the origin), you have to correct the values, e.g. by replacing
float x = minx;
float y = firstPosition.getTextMatrix().getTranslateY();
by
PDRectangle cropBox = doc.getPage(0).getCropBox();
float x = minx + cropBox.getLowerLeftX();
float y = firstPosition.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY();
Instead of
you now get
Obviously, though, you will also have to correct the height somewhat. This is due to the way the PdfTextStripper determines the text height:
// 1/2 the bbox is used as the height todo: why?
float glyphHeight = bbox.getHeight() / 2;
(from showGlyph(...) in LegacyPDFStreamEngine, the parent class of PdfTextStripper)
While the font bounding box indeed usually is too large, half of it often is not enough.
The following code worked for me:
// Definition of font baseline, ascent, descent: https://en.wikipedia.org/wiki/Ascender_(typography)
//
// The origin of the text coordinate system is the top-left corner where Y increases downward.
// TextPosition.getX(), getY() return the baseline.
TextPosition firstLetter = textPositions.get(0);
TextPosition lastLetter = textPositions.get(textPositions.size() - 1);
// Looking at LegacyPDFStreamEngine.showGlyph(), ascender and descender heights are calculated like
// CapHeight: https://stackoverflow.com/a/42021225/14731
float ascent = firstLetter.getFont().getFontDescriptor().getAscent() / 1000 * lastLetter.getFontSize();
Point topLeft = new Point(firstLetter.getX(), firstLetter.getY() - ascent);
float descent = lastLetter.getFont().getFontDescriptor().getDescent() / 1000 * lastLetter.getFontSize();
// Descent is negative, so we need to negate it to move downward.
Point bottomRight = new Point(lastLetter.getX() + lastLetter.getWidth(),
lastLetter.getY() - descent);
float descender = lastLetter.getFont().getFontDescriptor().getDescent() / 1000 * lastLetter.getFontSize();
// Descender height is negative, so we need to negate it to move downward
Point bottomRight = new Point(lastLetter.getX() + lastLetter.getWidth(),
lastLetter.getY() - descender);
In other words, we are creating a bounding box from the font's ascender down to its descender.
If you want to render these coordinates with the origin in the bottom-left corner, see https://stackoverflow.com/a/28114320/14731 for more details. You'll need to apply a transform like this:
contents.transform(new Matrix(1, 0, 0, -1, 0, page.getHeight()));
I am using the Apache PDFBox java library to create PDFs. Is there a way to create a data-table using pdfbox? If there is no such API to do it, I would require to manually draw the table using drawLine etc., Any suggestions on how to go about this?
Source: Creating tables with PDFBox
The following method draws a table with the specified table content. Its a bit of a hack and will work for small strings of text. It does not perform word wrapping, but you can get an idea of how it is done. Give it a go!
/**
* #param page
* #param contentStream
* #param y the y-coordinate of the first row
* #param margin the padding on left and right of table
* #param content a 2d array containing the table data
* #throws IOException
*/
public static void drawTable(PDPage page, PDPageContentStream contentStream,
float y, float margin,
String[][] content) throws IOException {
final int rows = content.length;
final int cols = content[0].length;
final float rowHeight = 20f;
final float tableWidth = page.findMediaBox().getWidth() - margin - margin;
final float tableHeight = rowHeight * rows;
final float colWidth = tableWidth/(float)cols;
final float cellMargin=5f;
//draw the rows
float nexty = y ;
for (int i = 0; i <= rows; i++) {
contentStream.drawLine(margin, nexty, margin+tableWidth, nexty);
nexty-= rowHeight;
}
//draw the columns
float nextx = margin;
for (int i = 0; i <= cols; i++) {
contentStream.drawLine(nextx, y, nextx, y-tableHeight);
nextx += colWidth;
}
//now add the text
contentStream.setFont( PDType1Font.HELVETICA_BOLD , 12 );
float textx = margin+cellMargin;
float texty = y-15;
for(int i = 0; i < content.length; i++){
for(int j = 0 ; j < content[i].length; j++){
String text = content[i][j];
contentStream.beginText();
contentStream.moveTextPositionByAmount(textx,texty);
contentStream.drawString(text);
contentStream.endText();
textx += colWidth;
}
texty-=rowHeight;
textx = margin+cellMargin;
}
}
Usage:
PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage( page );
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
String[][] content = {{"a","b", "1"},
{"c","d", "2"},
{"e","f", "3"},
{"g","h", "4"},
{"i","j", "5"}} ;
drawTable(page, contentStream, 700, 100, content);
contentStream.close();
doc.save("test.pdf" );
I created a small api for creating tables using PDFBox.
It can be found on github ( https://github.com/dhorions/boxable ) .
A sample of a generated pdf can be found here http://goo.gl/a7QvRM.
Any hints or suggestions are welcome.
Since I had the same problem some time ago I started to build a small library for it which I am also trying to keep up to date.
It uses Apache PDFBox 2.x and can be found here:
https://github.com/vandeseer/easytable
It allows for quite some customizations like setting the font, background color, padding etc. on the cell level, vertical and horizontal alignment, cell spanning, word wrapping and images in cells.
Drawing tables across several pages is also possible.
You can create tables like this for instance:
The code for this example can be found here – other examples in the same folder as well.
The accepted answer is nice but it will work with Apache PDFBox 1.x only, for Apache PDFBox 2.x you will need to modify a little bit the code to make it work properly.
So here is the same code but that is compatible with Apache PDFBox 2.x:
The method drawTable:
public static void drawTable(PDPage page, PDPageContentStream contentStream,
float y, float margin, String[][] content) throws IOException {
final int rows = content.length;
final int cols = content[0].length;
final float rowHeight = 20.0f;
final float tableWidth = page.getMediaBox().getWidth() - 2.0f * margin;
final float tableHeight = rowHeight * (float) rows;
final float colWidth = tableWidth / (float) cols;
//draw the rows
float nexty = y ;
for (int i = 0; i <= rows; i++) {
contentStream.moveTo(margin, nexty);
contentStream.lineTo(margin + tableWidth, nexty);
contentStream.stroke();
nexty-= rowHeight;
}
//draw the columns
float nextx = margin;
for (int i = 0; i <= cols; i++) {
contentStream.moveTo(nextx, y);
contentStream.lineTo(nextx, y - tableHeight);
contentStream.stroke();
nextx += colWidth;
}
//now add the text
contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12.0f);
final float cellMargin = 5.0f;
float textx = margin + cellMargin;
float texty = y - 15.0f;
for (final String[] aContent : content) {
for (String text : aContent) {
contentStream.beginText();
contentStream.newLineAtOffset(textx, texty);
contentStream.showText(text);
contentStream.endText();
textx += colWidth;
}
texty -= rowHeight;
textx = margin + cellMargin;
}
}
The Usage updated to use the try-with-resources statement to close the resources properly:
try (PDDocument doc = new PDDocument()) {
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
String[][] content = {{"a", "b", "1"},
{"c", "d", "2"},
{"e", "f", "3"},
{"g", "h", "4"},
{"i", "j", "5"}};
drawTable(page, contentStream, 700.0f, 100.0f, content);
}
doc.save("test.pdf");
}