Highlevel interface to PDF-Generation - java

I need to generate PDF-documents with java. I've tried with iTextPDF, and with ApachePDFbox. I have the impression, these are libraries to do all possible things with PDF.
But I only have a few requirements:
create a PDF-document from Scratch
add chapters having a title
adding plain text to a chapter
adding text from HTML to a chapter
adding an image to the chapter, being scaled to fit if needed
adding a table to the chapter
create a header and a footer
having a background image
There is quite a learning curve to do these things with the libraries mentioned above. So I'm dreaming of a high level API making my life much easier, having these few methods:
createChapter(String title)
addPlainText(Chapter chapter,String text)
addHtml(Chapter chapter,String html)
addImage(Chapter chapter,byte[] bytes) (because it's coming out of the DB)
addTable(Chapter chapter, TableModel table)
addHeader(HeaderFooterModel header)
addFooter(HeaderFooterModel footer)
addBackGroundImage(Chapter chapter,byte[] bytes)
Is there something like that already available? This would be cool and safe some time.

Following class is a great start at having a very high level API for iText.
I implemented most of the methods you requested.
The remainder is left as a challenge to the reader.
package stackoverflow;
import com.itextpdf.html2pdf.HtmlConverter;
import com.itextpdf.io.image.ImageDataFactory;
import com.itextpdf.kernel.color.Color;
import com.itextpdf.kernel.color.DeviceRgb;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.element.IBlockElement;
import com.itextpdf.layout.element.IElement;
import com.itextpdf.layout.element.Image;
import com.itextpdf.layout.element.Paragraph;
import java.io.File;
import java.io.IOException;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class ITextForDummiesFacade
{
// iText IO
private PdfDocument pdfDocument;
private Document layoutDocument;
// font sizes
private float regularFontSize = 12f;
private float chapterTitleFontSize = 14f;
// font colors
private Color chapterFontColor = new DeviceRgb(249, 157, 37);
private Color regularFontColor = new DeviceRgb(100, 100, 100);
// structure
private Map<String, Integer> chapterNames = new HashMap<>();
private Map<Integer, List<IElement>> elementsPerChapter = new HashMap<>();
public ITextForDummiesFacade(OutputStream os) throws IOException
{
this.pdfDocument = new PdfDocument(new PdfWriter(os));
this.layoutDocument = new Document(pdfDocument);
}
public ITextForDummiesFacade(File outputFile) throws IOException
{
this.pdfDocument = new PdfDocument(new PdfWriter(outputFile));
this.layoutDocument = new Document(pdfDocument);
}
public boolean createChapter(String title)
{
if(chapterNames.containsKey(title))
return false;
int nextID = chapterNames.size();
chapterNames.put(title, nextID);
elementsPerChapter.put(nextID, new ArrayList<IElement>());
elementsPerChapter.get(nextID).add(new Paragraph(title)
.setFontSize(chapterTitleFontSize)
.setFontColor(chapterFontColor));
return true;
}
public boolean addPlainText(String chapter, String text)
{
if(!chapterNames.containsKey(chapter))
return false;
int ID = chapterNames.get(chapter);
elementsPerChapter.get(ID).add(new Paragraph(text)
.setFontSize(regularFontSize)
.setFontColor(regularFontColor));
return true;
}
public boolean addHTML(String chapter, String HTML)
{
if(!chapterNames.containsKey(chapter))
return false;
int ID = chapterNames.get(chapter);
try
{
elementsPerChapter.get(ID).addAll(HtmlConverter.convertToElements(HTML));
} catch (IOException e)
{
e.printStackTrace();
return false;
}
return true;
}
public boolean addImage(String chapter, byte[] image)
{
if(!chapterNames.containsKey(chapter))
return false;
int ID = chapterNames.get(chapter);
elementsPerChapter.get(ID).add(new Image(ImageDataFactory.create(image)));
return true;
}
private void write()
{
for(int i=0;i<chapterNames.size();i++)
{
for(IElement e : elementsPerChapter.get(i))
if(e instanceof IBlockElement)
layoutDocument.add((IBlockElement) e);
}
}
public void close()
{
write();
layoutDocument.flush();
layoutDocument.close();
}
}
You can then easily call this facade to do the work for you.
File outputFile = new File(System.getProperty("user.home"), "output.pdf");
ITextForDummiesFacade facade = new ITextForDummiesFacade(outputFile);
facade.createChapter("Chapter 1");
facade.addPlainText("Chapter 1","Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.");
facade.close();

Related

Using PdfCleanUpTool or PdfAutoSweep causes some text to change to bold, line weights increase and double points change to hearts

I have weird problem when I try to use iText 7. Some parts of the PDF are modified (text to change to bold, line weights increase and double points change to hearts). In iText version 5.4.4 this didn't happened, but every version since that I have tried cause this same problem (5 or 7).
Does anyone have a clued why this is happening and is there anything I could to do to bypass this problem? Any help would be appreciated!
If more information is needed, I will try to provide it.
Below is simple code that I used to test iText 7.
Example PDF Files
package javaapplication1;
import com.itextpdf.kernel.colors.ColorConstants;
import com.itextpdf.kernel.geom.Rectangle;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.pdfcleanup.PdfCleanUpLocation;
import com.itextpdf.pdfcleanup.PdfCleanUpTool;
import com.itextpdf.pdfcleanup.autosweep.ICleanupStrategy;
import com.itextpdf.pdfcleanup.autosweep.PdfAutoSweep;
import com.itextpdf.pdfcleanup.autosweep.RegexBasedCleanupStrategy;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
public class JavaApplication1 {
public static final String DEST = "D:/TEMP/TEMP/PDF/result/orientation_result.pdf";
public static final String DEST2 = "D:/TEMP/TEMP/PDF/result/orientation_result2.pdf";
public static final String DEST3 = "D:/TEMP/TEMP/PDF/result/orientation_result3.pdf";
public static final String SRC = "D:/TEMP/TEMP/PDF/TEST_PDF.pdf";
public static void main(String[] args) throws IOException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new JavaApplication1().manipulatePdf(DEST);
new JavaApplication1().manipulatePdf2(DEST2);
try (PdfDocument pdf = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST3))) {
final ICleanupStrategy cleanupStrategy = new RegexBasedCleanupStrategy(Pattern.compile("2019", Pattern.CASE_INSENSITIVE)).setRedactionColor(ColorConstants.PINK);
final PdfAutoSweep autoSweep = new PdfAutoSweep(cleanupStrategy);
autoSweep.cleanUp(pdf);
} catch (Exception e) {
System.out.println(e.toString());
}
}
protected void manipulatePdf(String dest) throws IOException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
// The arguments of the PdfCleanUpLocation constructor: the number of page to be cleaned up,
// a Rectangle defining the area on the page we want to clean up,
// a color which will be used while filling the cleaned area.
PdfCleanUpLocation location = new PdfCleanUpLocation(1, new Rectangle(97, 405, 383, 40),
ColorConstants.GRAY);
cleanUpLocations.add(location);
PdfCleanUpTool cleaner = new PdfCleanUpTool(pdfDoc, cleanUpLocations);
cleaner.cleanUp();
pdfDoc.close();
}
protected void manipulatePdf2(String dest) throws IOException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));
// If the second argument is true, then regions to be erased are extracted from the redact annotations
// contained inside the given document. If the second argument is false (that's default behavior),
// then use PdfCleanUpTool.addCleanupLocation(PdfCleanUpLocation)
// method to set regions to be erased from the document.
PdfCleanUpTool cleaner = new PdfCleanUpTool(pdfDoc, true);
cleaner.cleanUp();
pdfDoc.close();
}
}
Sorry for late response. I managed now verify that updating to latest versions corrected this problem. Thanks to #mkl pointing this out.
Problem solved

How to change the graphical attributes of a point in an Excel sunburst chart through Apache POI

I have a requirement to color the different data points in an Excel sunburst chart programatically. By default, Excel creates the chart looking like this.
I need to be able to make something like this.
I've been able to load the chart and series, what I have not been able to work out is how to get to each of the points and change the fill color, and is that even possible.
The data to create this chart is :
Level 1,Level 2,Level 3,Series 1
A,A.a,A.a.1,5
A,A.a,A.a.2,5
A,A.b,A.b.1,5
A,A.b,A.b.2,5
B,B.a,B.a.1,5
B,B.a,B.a.2,5
B,B.b,B.b.1,5
B,B.b,B.b.2,5
C,C.a,C.a.1,5
C,C.a,C.a.2,5
C,C.b,C.b.1,5
C,C.b,C.b.2,5
My code so far
import java.io.IOException;
import java.util.List;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xddf.usermodel.chart.XDDFChartData;
import org.apache.poi.xssf.usermodel.XSSFChart;
import org.apache.poi.xssf.usermodel.XSSFDrawing;
import org.apache.poi.xssf.usermodel.XSSFSheet;
public class Format {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
XSSFWorkbook xwb = new XSSFWorkbook("ChartExample.xlsx");
XSSFSheet sheet = xwb.getSheetAt(0);
System.out.println("Loaded sheet is " + sheet.getSheetName());
XSSFDrawing drawing = sheet.getDrawingPatriarch();
List <XSSFChart> charts = drawing.getCharts();
System.out.println("No of Charts " + charts.size());
XSSFChart chart = charts.get(0);
List<XDDFChartData> series = chart.getChartSeries();
System.out.println("No of Data Series " + series.size());
XDDFChartData data = series.get(0);
// How do I now get to the data points and then set the fill color for that point?
xwb.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
How do I now get to the say Point C.b.1 and set it's fill color to red?
Thanks in advance.
If the goal is really to modify an Excel sunburst chart, then getting the sunburst chart XML will only be possible very low level by parsing the XML directly.
You even will not get the sunburst chart using List <XSSFChart> charts = drawing.getCharts();. A sunburst chart is not a XSSFChart. XSSFChart is of type application/vnd.openxmlformats-officedocument.drawingml.chart+xml while sunburst chart is of type application/vnd.ms-office.chartex+xml. This is because the sunburst chart is an extended chart type which is not available in versions of Office Open XML up to year 2007. But those old versions of Office Open XML is what apache poi is developed on.
But we can using at least parts of apache poi and must programming the XSSFChartEx class instead the XSSFChart our own then. Unfortunately also a class XSSFChartExRelation is needed because such an relation class of course also not exists already.
Example:
Excel source:
Code:
import java.io.IOException;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFDrawing;
import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import javax.xml.namespace.QName;
public class FormatSunBurstChart {
private static void setDataPointColor(XmlObject series, int number, String colorHex) {
XmlCursor cursor = series.newCursor();
cursor.toLastChild();
cursor.beginElement(new QName("http://schemas.microsoft.com/office/drawing/2014/chartex", "dataPt", "cx"));
cursor.insertAttributeWithValue("idx", "" + number);
cursor.beginElement(new QName("http://schemas.microsoft.com/office/drawing/2014/chartex", "spPr", "cx"));
cursor.beginElement(new QName("http://schemas.openxmlformats.org/drawingml/2006/main", "solidFill", "a"));
cursor.beginElement(new QName("http://schemas.openxmlformats.org/drawingml/2006/main", "srgbClr", "a"));
cursor.insertAttributeWithValue("val", colorHex);
cursor.dispose();
}
public static void main(String[] args) {
try {
XSSFWorkbook workbook = new XSSFWorkbook(new FileInputStream("ChartExample.xlsx"));
XSSFSheet sheet = workbook.getSheetAt(0);
System.out.println("Loaded sheet is " + sheet.getSheetName());
XSSFDrawing drawing = sheet.getDrawingPatriarch();
if (drawing != null) {
for (POIXMLDocumentPart dpart : drawing.getRelations()) {
PackagePart ppart = dpart.getPackagePart();
if ("application/vnd.ms-office.chartex+xml".equals(ppart.getContentType())) {
XSSFChartEx xssfChartEx = new XSSFChartEx(ppart);
String rId = drawing.getRelationId(dpart);
drawing.addRelation(
rId,
new XSSFChartExRelation(
"application/vnd.ms-office.chartex+xml",
"http://schemas.microsoft.com/office/2014/relationships/chartEx",
"/xl/charts/chartEx#.xml"),
xssfChartEx
);
XmlObject series = xssfChartEx.getSeries(0);
setDataPointColor(series, 1, "FF0000");
setDataPointColor(series, 2, "FFFF00");
setDataPointColor(series, 3, "00FF00");
setDataPointColor(series, 14, "FFFF00");
setDataPointColor(series, 16, "00FF00");
setDataPointColor(series, 18, "00FF00");
setDataPointColor(series, 19, "FF0000");
setDataPointColor(series, 20, "00FF00");
System.out.println(series);
}
}
}
FileOutputStream out = new FileOutputStream("ChartExampleChanged.xlsx");
workbook.write(out);
workbook.close();
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
private static class XSSFChartEx extends POIXMLDocumentPart {
private XmlObject chartExXmlObject;
private XSSFChartEx(PackagePart part) throws Exception {
super(part);
chartExXmlObject = XmlObject.Factory.parse(part.getInputStream());
}
private XmlObject getChartExXmlObject() {
return chartExXmlObject;
}
private XmlObject getSeries(int number) {
XmlObject[] result = chartExXmlObject.selectPath(
"declare namespace cx='http://schemas.microsoft.com/office/drawing/2014/chartex' " +
".//cx:chart/cx:plotArea/cx:plotAreaRegion/cx:series"
);
return result[number];
}
#Override
protected void commit() throws IOException {
PackagePart part = getPackagePart();
OutputStream out = part.getOutputStream();
chartExXmlObject.save(out);
out.close();
}
}
private static class XSSFChartExRelation extends POIXMLRelation {
private XSSFChartExRelation(String type, String rel, String defaultName) {
super(type, rel, defaultName);
}
}
}
Note: apache poi version 4.0.0 is used here.
Excel result:

pdf clown- not highlighting specific search keyword

I am using pdf-clown with pdfclown-0.2.0-HEAD.jar.I have written below code for highlighting search the keyword in Chinese language pdf file and same code is working fine with english pdf file.
import java.awt.Color;
import java.awt.Desktop;
import java.awt.geom.Rectangle2D;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.net.URL;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.BufferedInputStream;
import java.io.File;
import org.pdfclown.documents.Page;
import org.pdfclown.documents.contents.ITextString;
import org.pdfclown.documents.contents.TextChar;
import org.pdfclown.documents.contents.colorSpaces.DeviceRGBColor;
import org.pdfclown.documents.interaction.annotations.TextMarkup;
import org.pdfclown.documents.interaction.annotations.TextMarkup.MarkupTypeEnum;
import org.pdfclown.files.SerializationModeEnum;
import org.pdfclown.util.math.Interval;
import org.pdfclown.util.math.geom.Quad;
import org.pdfclown.tools.TextExtractor;
public class pdfclown2 {
private static int count;
public static void main(String[] args) throws IOException {
highlight("ebook.pdf","C:\\Users\\Downloads\\6.pdf");
System.out.println("OK");
}
private static void highlight(String inputPath, String outputPath) throws IOException {
URL url = new URL(inputPath);
InputStream in = new BufferedInputStream(url.openStream());
org.pdfclown.files.File file = null;
try {
file = new org.pdfclown.files.File("C:\\Users\\Desktop\\pdf\\test123.pdf");
Map<String, String> m = new HashMap<String, String>();
m.put("亿元或","hi");
m.put("收入亿来","hi");
System.out.println("map size"+m.size());
long startTime = System.currentTimeMillis();
// 2. Iterating through the document pages...
TextExtractor textExtractor = new TextExtractor(true, true);
for (final Page page : file.getDocument().getPages()) {
Map<Rectangle2D, List<ITextString>> textStrings = textExtractor.extract(page);
for (Map.Entry<String, String> entry : m.entrySet()) {
Pattern pattern;
String serachKey = entry.getKey();
final String translationKeyword = entry.getValue();
/*
if ((serachKey.contains(")") && serachKey.contains("("))
|| (serachKey.contains("(") && !serachKey.contains(")"))
|| (serachKey.contains(")") && !serachKey.contains("(")) || serachKey.contains("?")
|| serachKey.contains("*") || serachKey.contains("+")) {s
pattern = Pattern.compile(Pattern.quote(serachKey), Pattern.CASE_INSENSITIVE);
}
else*/
pattern = Pattern.compile(serachKey, Pattern.CASE_INSENSITIVE);
// 2.1. Extract the page text!
//System.out.println(textStrings.toString().indexOf(entry.getKey()));
// 2.2. Find the text pattern matches!
final Matcher matcher = pattern.matcher(TextExtractor.toString(textStrings));
// 2.3. Highlight the text pattern matches!
textExtractor.filter(textStrings, new TextExtractor.IIntervalFilter() {
public boolean hasNext() {
// System.out.println(matcher.find());
// if(key.getMatchCriteria() == 1){
if (matcher.find()) {
return true;
}
/*
* } else if(key.getMatchCriteria() == 2) { if
* (matcher.hitEnd()) { count++; return true; } }
*/
return false;
}
public Interval<Integer> next() {
return new Interval<Integer>(matcher.start(), matcher.end());
}
public void process(Interval<Integer> interval, ITextString match) {
// Defining the highlight box of the text pattern
// match...
System.out.println(match);
/* List<Quad> highlightQuads = new ArrayList<Quad>();
{
Rectangle2D textBox = null;
for (TextChar textChar : match.getTextChars()) {
Rectangle2D textCharBox = textChar.getBox();
if (textBox == null) {
textBox = (Rectangle2D) textCharBox.clone();
} else {
if (textCharBox.getY() > textBox.getMaxY()) {
highlightQuads.add(Quad.get(textBox));
textBox = (Rectangle2D) textCharBox.clone();
} else {
textBox.add(textCharBox);
}
}
}
textBox.setRect(textBox.getX(), textBox.getY(), textBox.getWidth(), textBox.getHeight());
highlightQuads.add(Quad.get(textBox));
}*/
List<Quad> highlightQuads = new ArrayList<Quad>();
List<TextChar> textChars = match.getTextChars();
Rectangle2D firstRect = textChars.get(0).getBox();
Rectangle2D lastRect = textChars.get(textChars.size()-1).getBox();
Rectangle2D rect = firstRect.createUnion(lastRect);
highlightQuads.add(Quad.get(rect).get(rect));
// subtype can be Highlight, Underline, StrikeOut, Squiggly
new TextMarkup(page, highlightQuads, translationKeyword, MarkupTypeEnum.Highlight);
}
public void remove() {
throw new UnsupportedOperationException();
}
});
}
}
SerializationModeEnum serializationMode = SerializationModeEnum.Standard;
file.save(new java.io.File(outputPath), serializationMode);
System.out.println("file created");
long endTime = System.currentTimeMillis();
System.out.println("seconds take for execution is:"+(endTime-startTime)/1000);
} catch (Exception e) {
e.printStackTrace();
}
finally{
in.close();
}
}
}
Kindly provide your inputs to highlight specific search keyword for non english pdf files.
I am serching the keyword in below text which is in chinese langauage.
普双套习近平修宪普京利用双套车绕开宪法装班要走普京
enter image description here
Your PDF Clown version
The PDF Clown version you retrieved here from Tymate's maven repository on github has been pushed there April 23rd, 2015. The final (as of now) check-in to the PDF Clown subversion source code repository TRUNK on sourceforge, on the other hand, is from May 27th, 2015. There actually are some 30 checkins after April 23rd, 2015. Thus, you definitely do not use the most current version of this apparently dead PDF library project.
Using the current 0.2.0 snapshot
I tested your code with the 0.2.0 development version compiled from that trunk and the result indeed is different:
It is better insofar as the highlights have the width of the sought character and are located nearer to the actual character position. There still is a bug, though, as the second and third match highlights are somewhat off.
Fixing the bug
The remaining problem actually is not related to the language of the text. It simply is a bug in the processing of one type of the PDF text drawing commands, so it can be observed in documents with text in arbitrary languages. Due to the fact that these commands nowadays are used very seldom only, though, the bug is hardly ever observed, let alone reported. Your PDF, on the other hand, makes use of that kind of text drawing commands.
The bug is in the ShowText class (package org.pdfclown.documents.contents.objects). At the end of the scan method the text line matrix in the graphics state is updated like this if the ShowText instance actually is a ShowTextToNextLine instance derived from it:
if(textScanner == null)
{
state.setTm(tm);
if(this instanceof ShowTextToNextLine)
{state.setTlm((AffineTransform)tm.clone());}
}
The text line matrix here is set to the text matrix after the move to the next line and the drawing of the text. This is wrong, it must instead be set to text matrix right after the move to the next line before the drawing of the text.
This can be fixed e.g. like this:
if(textScanner == null)
{
state.setTm(tm);
if(this instanceof ShowTextToNextLine)
state.getTlm().concatenate(new AffineTransform(1, 0, 0, 1, 0, -state.getLead()));
}
With this change in place the result looks like this:

How to generate a Table of Contents "TOC" with iText?

I've created a document with some Chapters.
How to generate a TOC for this document?
It should look like this:
TOC:
Chapter 1 3
Chapter 2 4
Chapter 3 6
Chapter 4 9
Chapter 5 10
This is possible by using PdfTemplates. PdfTemplates are a kind of placeholder that you can fill afterwards.
Update with the hints from Bruno:
To generate at TOC at the beginning, you need to put some Placeholders for all the page numbers in the TOC. Those PdfTemplates you collect in a Map. Then, when you add the Chapters to the document, you can fill those placeholders.
This example shows how:
package com.example.main;
import java.io.FileOutputStream;
import java.util.HashMap;
import java.util.Map;
import com.itextpdf.text.Chapter;
import com.itextpdf.text.Chunk;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfPageEventHelper;
import com.itextpdf.text.pdf.PdfTemplate;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.draw.VerticalPositionMark;
public class Main extends PdfPageEventHelper {
private final Document document;
private final PdfWriter writer;
private final BaseFont baseFont = BaseFont.createFont();
private final Font chapterFont = FontFactory.getFont(FontFactory.HELVETICA, 24, Font.NORMAL);
// table to store placeholder for all chapters and sections
private final Map<String, PdfTemplate> tocPlaceholder = new HashMap<String, PdfTemplate>();
// store the chapters and sections with their title here.
private final Map<String, Integer> pageByTitle = new HashMap<>();
public static void main(final String[] args) throws Exception {
final Main main = new Main();
main.document.add(new Paragraph("This is an example to generate a TOC."));
main.createTOC(10);
main.createChapters(10);
main.document.close();
}
public Main() throws Exception {
this.document = new Document(PageSize.A6);
this.writer = PdfWriter.getInstance(this.document, new FileOutputStream("test.pdf"));
this.writer.setPageEvent(this);
this.document.open();
}
#Override
public void onChapter(final PdfWriter writer, final Document document, final float paragraphPosition, final Paragraph title) {
this.pageByTitle.put(title.getContent(), writer.getPageNumber());
}
#Override
public void onSection(final PdfWriter writer, final Document document, final float paragraphPosition, final int depth, final Paragraph title) {
this.pageByTitle.put(title.getContent(), writer.getPageNumber());
}
private void createTOC(final int count) throws DocumentException {
// add a small introduction chapter the shouldn't be counted.
final Chapter intro = new Chapter(new Paragraph("This is TOC ", this.chapterFont), 0);
intro.setNumberDepth(0);
this.document.add(intro);
for (int i = 1; i < count + 1; i++) {
// Write "Chapter i"
final String title = "Chapter " + i;
final Chunk chunk = new Chunk(title).setLocalGoto(title);
this.document.add(new Paragraph(chunk));
// Add a placeholder for the page reference
this.document.add(new VerticalPositionMark() {
#Override
public void draw(final PdfContentByte canvas, final float llx, final float lly, final float urx, final float ury, final float y) {
final PdfTemplate createTemplate = canvas.createTemplate(50, 50);
Main.this.tocPlaceholder.put(title, createTemplate);
canvas.addTemplate(createTemplate, urx - 50, y);
}
});
}
}
private void createChapters(final int count) throws DocumentException {
for (int i = 1; i < count + 1; i++) {
// append the chapter
final String title = "Chapter " + i;
final Chunk chunk = new Chunk(title, this.chapterFont).setLocalDestination(title);
final Chapter chapter = new Chapter(new Paragraph(chunk), i);
chapter.setNumberDepth(0);
chapter.addSection("Foobar1");
chapter.addSection("Foobar2");
this.document.add(chapter);
// When we wrote the chapter, we now the pagenumber
final PdfTemplate template = this.tocPlaceholder.get(title);
template.beginText();
template.setFontAndSize(this.baseFont, 12);
template.setTextMatrix(50 - this.baseFont.getWidthPoint(String.valueOf(this.writer.getPageNumber()), 12), 0);
template.showText(String.valueOf(this.writer.getPageNumber()));
template.endText();
}
}
}
The generated PDF looks like this:
TableOfContents.pdf
The answer by Christian Schneider seems somewhat complex. I would also use page event, but I would use the onChapter() method to create a list of chapter titles and page numbers. If you also need Section titles, use the onSection() method to keep track of the sections too.
Once you have this list, create the TOC at the end of the document. If you want to move the TOC to the front, read my answer to this question: PDF Page re-ordering using itext

Merging MS Word documents with Java

I'm looking for java libraries that read and write MS Word Document.
What I have to do is:
read a template file, .dot or .doc, and fill it with some data read from DB
take data from another Word document and merging that with the file described above, preserving paragraphs formats
users may make updates to the file.
I've searched and found POI Apache and UNO OpenOffice.
The first one can easily read a template and replace any placeholders with my own data from DB. I didn't found anything about merging two, or more, documents.
OpenOffice UNO looks more stable but complex too. Furthermore I'm not sure that it has the ability to merge documents..
We are looking the right direction?
Another solution i've thought was to convert doc file to docx. In that way I found more libraries that can help us merging documents.
But how can I do that?
Thanks!
You could take a look at Docmosis since it provides the four features you have mentioned (data population, template/document merging, DOC format and java interface). It has a couple of flavours (download, online service), but you could sign up for a free trial of the cloud service to see if Docmosis can do what you want (then you don't have to install anything) or read the online documentation.
It uses OpenOffice under the hood (you can see from the developer guide installation instructions) which does pretty decent conversions between documents. The UNO API has some complications - I would suggest either Docmosis or JODReports to isolate your project from UNO directly.
Hope that helps.
import java.io.File;
import java.util.List;
import javax.xml.bind.JAXBException;
import org.docx4j.dml.CTBlip;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.Part;
import org.docx4j.openpackaging.parts.PartName;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageBmpPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageEpsPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageGifPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageJpegPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImagePngPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageTiffPart;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart.AddPartBehaviour;
import org.docx4j.relationships.Relationship;
public class MultipleDocMerge {
public static void main(String[] args) throws Docx4JException, JAXBException {
File first = new File("D:\\Mreg.docx");
File second = new File("D:\\Mreg1.docx");
File third = new File("D:\\Mreg4&19.docx");
File fourth = new File("D:\\test12.docx");
WordprocessingMLPackage f = WordprocessingMLPackage.load(first);
WordprocessingMLPackage s = WordprocessingMLPackage.load(second);
WordprocessingMLPackage a = WordprocessingMLPackage.load(third);
WordprocessingMLPackage e = WordprocessingMLPackage.load(fourth);
List body = s.getMainDocumentPart().getJAXBNodesViaXPath("//w:body", false);
for(Object b : body){
List filhos = ((org.docx4j.wml.Body)b).getContent();
for(Object k : filhos)
f.getMainDocumentPart().addObject(k);
}
List body1 = a.getMainDocumentPart().getJAXBNodesViaXPath("//w:body", false);
for(Object b : body1){
List filhos = ((org.docx4j.wml.Body)b).getContent();
for(Object k : filhos)
f.getMainDocumentPart().addObject(k);
}
List body2 = e.getMainDocumentPart().getJAXBNodesViaXPath("//w:body", false);
for(Object b : body2){
List filhos = ((org.docx4j.wml.Body)b).getContent();
for(Object k : filhos)
f.getMainDocumentPart().addObject(k);
}
List<Object> blips = e.getMainDocumentPart().getJAXBNodesViaXPath("//a:blip", false);
for(Object el : blips){
try {
CTBlip blip = (CTBlip) el;
RelationshipsPart parts = e.getMainDocumentPart().getRelationshipsPart();
Relationship rel = parts.getRelationshipByID(blip.getEmbed());
Part part = parts.getPart(rel);
if(part instanceof ImagePngPart)
System.out.println(((ImagePngPart) part).getBytes());
if(part instanceof ImageJpegPart)
System.out.println(((ImageJpegPart) part).getBytes());
if(part instanceof ImageBmpPart)
System.out.println(((ImageBmpPart) part).getBytes());
if(part instanceof ImageGifPart)
System.out.println(((ImageGifPart) part).getBytes());
if(part instanceof ImageEpsPart)
System.out.println(((ImageEpsPart) part).getBytes());
if(part instanceof ImageTiffPart)
System.out.println(((ImageTiffPart) part).getBytes());
Relationship newrel = f.getMainDocumentPart().addTargetPart(part,AddPartBehaviour.RENAME_IF_NAME_EXISTS);
blip.setEmbed(newrel.getId());
f.getMainDocumentPart().addTargetPart(e.getParts().getParts().get(new PartName("/word/"+rel.getTarget())));
} catch (Exception ex){
ex.printStackTrace();
} }
File saved = new File("D:\\saved1.docx");
f.save(saved);
}
}
I've developed the next class (using Apache POI):
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
public class WordMerge {
private final OutputStream result;
private final List<InputStream> inputs;
private XWPFDocument first;
public WordMerge(OutputStream result) {
this.result = result;
inputs = new ArrayList<>();
}
public void add(InputStream stream) throws Exception{
inputs.add(stream);
OPCPackage srcPackage = OPCPackage.open(stream);
XWPFDocument src1Document = new XWPFDocument(srcPackage);
if(inputs.size() == 1){
first = src1Document;
} else {
CTBody srcBody = src1Document.getDocument().getBody();
first.getDocument().addNewBody().set(srcBody);
}
}
public void doMerge() throws Exception{
first.write(result);
}
public void close() throws Exception{
result.flush();
result.close();
for (InputStream input : inputs) {
input.close();
}
}
}
And its use:
public static void main(String[] args) throws Exception {
FileOutputStream faos = new FileOutputStream("/home/victor/result.docx");
WordMerge wm = new WordMerge(faos);
wm.add( new FileInputStream("/home/victor/001.docx") );
wm.add( new FileInputStream("/home/victor/002.docx") );
wm.doMerge();
wm.close();
}
The Apache POI code does not work for Images.

Categories

Resources