I have a placeholder image in my docx file and I want to replace it with new image. The problem is - the placeholder image has an attribute "in front of text", but the new image has not. As a result the alignment breaks. Here is my code snippet and the docx with placeholder and the resulting docx.
.......
replaceImage(doc, "Рисунок 1", qr, 50, 50);
ByteArrayOutputStream out = new ByteArrayOutputStream();
doc.write(out);
out.close();
return out.toByteArray();
}
}
public XWPFDocument replaceImage(XWPFDocument document, String imageOldName, byte[] newImage, int newImageWidth, int newImageHeight) throws Exception {
try {
int imageParagraphPos = -1;
XWPFParagraph imageParagraph = null;
List<IBodyElement> documentElements = document.getBodyElements();
for (IBodyElement documentElement : documentElements) {
imageParagraphPos++;
if (documentElement instanceof XWPFParagraph) {
imageParagraph = (XWPFParagraph) documentElement;
if (imageParagraph.getCTP() != null && imageParagraph.getCTP().toString().trim().contains(imageOldName)) {
break;
}
}
}
if (imageParagraph == null) {
throw new Exception("Unable to replace image data due to the exception:\n"
+ "'" + imageOldName + "' not found in in document.");
}
ParagraphAlignment oldImageAlignment = imageParagraph.getAlignment();
// remove old image
boolean isDeleted = document.removeBodyElement(imageParagraphPos);
// now add new image
XWPFParagraph newImageParagraph = document.createParagraph();
XWPFRun newImageRun = newImageParagraph.createRun();
newImageParagraph.setAlignment(oldImageAlignment);
try (InputStream is = new ByteArrayInputStream(newImage)) {
newImageRun.addPicture(is, XWPFDocument.PICTURE_TYPE_JPEG, "qr",
Units.toEMU(newImageWidth), Units.toEMU(newImageHeight));
}
// set new image at the old image position
document.setParagraph(newImageParagraph, imageParagraphPos);
// NOW REMOVE REDUNDANT IMAGE FORM THE END OF DOCUMENT
document.removeBodyElement(document.getBodyElements().size() - 1);
return document;
} catch (Exception e) {
throw new Exception("Unable to replace image '" + imageOldName + "' due to the exception:\n" + e);
}
}
The image with placeholder:
enter image description here
The resulting image:
enter image description here
To replace picture templates in Microsoft Word there is no need to delete them.
The storage is as so:
The embedded media is stored as binary file. This is the picture data (XWPFPictureData). In the document a picture element (XWPFPicture) links to that picture data.
The XWPFPicture has settings for position, size and text flow. These dont need to be changed.
The changing is needed in XWPFPictureData. There one can replace the old binary content with the new.
So the need is to find the XWPFPicture in the document. There is a non visual picture name stored while inserting the picture in the document. So if one knows that name, then this could be a criteriea to find the picture.
If found one can get the XWPFPictureData from found XWPFPicture. There is method XWPFPicture.getPictureDatato do so. Then one can replace the old binary content of XWPFPictureData with the new. XWPFPictureData is a package part. So it has PackagePart.getOutputStream to get an output stream to write to.
Following complete example shows that all.
The source.docx needs to have an embedded picture named "QRTemplate.jpg". This is the name of the source file used while inserting the picture into Word document using Word GUI. And there needs to be a file QR.jpg which contains the new content.
The result.docx then has all pictures named "QRTemplate.jpg" replaced with the content of the given file QR.jpg.
import java.io.FileInputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class WordReplacePictureData {
static XWPFPicture getPictureByName(XWPFRun run, String pictureName) {
if (pictureName == null) return null;
for (XWPFPicture picture : run.getEmbeddedPictures()) {
String nonVisualPictureName = picture.getCTPicture().getNvPicPr().getCNvPr().getName();
if (pictureName.equals(nonVisualPictureName)) {
return picture;
}
}
return null;
}
static void replacePictureData(XWPFPictureData source, String pictureResultPath) {
try ( FileInputStream in = new FileInputStream(pictureResultPath);
OutputStream out = source.getPackagePart().getOutputStream();
) {
byte[] buffer = new byte[2048];
int length;
while ((length = in.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
static void replacePicture(XWPFRun run, String pictureName, String pictureResultPath) {
XWPFPicture picture = getPictureByName(run, pictureName);
if (picture != null) {
XWPFPictureData source = picture.getPictureData();
replacePictureData(source, pictureResultPath);
}
}
public static void main(String[] args) throws Exception {
String templatePath = "./source.docx";
String resultPath = "./result.docx";
String pictureTemplateName = "QRTemplate.jpg";
String pictureResultPath = "./QR.jpg";
try ( XWPFDocument document = new XWPFDocument(new FileInputStream(templatePath));
FileOutputStream out = new FileOutputStream(resultPath);
) {
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
for (XWPFRun run : paragraph.getRuns()) {
replacePicture(run, pictureTemplateName, pictureResultPath);
}
}
}
document.write(out);
}
}
}
I have a dirty workaround. Since the text block on the right side of the image is static, I replaced the text with screen-shot on the original docx. And now, when the placeholder image been substituted by the new image, everything is rendered as expected.
Related
I am currently trying to generate a new docx file from an existing template in dotx format. I want to change the firstname, lastname, etc. in the header but I'm not able to access them for some reason...
My approach is the following:
public void generateDocX(Long id) throws IOException, InvalidFormatException {
//Get user per id
EmployeeDTO employeeDTO = employeeService.getEmployee(id);
//Location where the new docx file will be saved
FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));
//Get the template for generating the new docx file
File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
OPCPackage pkg = OPCPackage.open(template);
XWPFDocument document = new XWPFDocument(pkg);
for (XWPFHeader header : document.getHeaderList()) {
List<XWPFParagraph> paragraphs = header.getParagraphs();
System.out.println("Total paragraphs in header are: " + paragraphs.size());
System.out.println("Total elements in the header are: " + header.getBodyElements().size());
for (XWPFParagraph paragraph : paragraphs) {
System.out.println("Paragraph text is: " + paragraph.getText());
List<XWPFRun> runs = paragraph.getRuns();
for (XWPFRun run : runs) {
String runText = run.getText(run.getTextPosition());
System.out.println("Run text is: " + runText);
}
}
}
//Write the changes to the new docx file and close the document
document.write(outputStream);
document.close();
}
The output in the console is either 1, null or empty string... I've tried several approaches from here, here and here but without any luck...
Here is what's inside the template.dotx
IBody.getParagraphs and IBody.getBodyElements- only get the paragraphs or body elements which are directly in that IBody. But your paragraphs are not directly in there but are in a separate text box or text frame. That's why they cannot be got this way.
Since *.docx is a ZIP archive containijg XML files for document, headers and footers, one could get all text runs of one IBody by creating a XmlCursor which selects all w:r XML elements. For a XWPFHeader this could look like so:
private List<XmlObject> getAllCTRs(XWPFHeader header) {
CTHdrFtr ctHdrFtr = header._getHdrFtr();
XmlCursor cursor = ctHdrFtr.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInHdrFtr.add(obj);
}
return ctrInHdrFtr;
}
Now we have a list of all XML elements in that header which are text-run-elements in Word.
We could have a more general getAllCTRs which gets all CTR elements from any kind of IBody like so:
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
Now we have a list of all XML elements in that IBody which are text-run-elements in Word.
Having that we can get the text out of them like so:
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
This probably shows the next challenge. Because Word is very messy in creating text-run-elements. For example your placeholder <<Firstname>> can be split into text-runs << + Firstname + >>. The reason migt be different formatting or spell checking or something else. Even this is possible: << + Lastname + >>; << + YearOfBirth + >>. Or even this: <<Firstname + >> << + Lastname>>; << + YearOfBirth>>. You see, replacing the placeholders with text is nearly impossible because the placeholders may be split into multiple tex-runs.
To avoid this the template.dotx needs to be created from users who know what they are doing.
At first turn spell check off. Grammar check as well. If not, all found possible spell errors or grammar violations are in separate text-runs to mark them accordingly.
Second make sure the whole placeholder is eaqual formatted. Different formatted text also must be in separate text-runs.
I am really skeptic that this will work properly. But try it yourself.
Complete example:
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.List;
import java.util.ArrayList;
public class WordEditAllIBodys {
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
if (text != null && text.contains(placeHolder)) {
text = text.replace(placeHolder, textValue);
ctText.setStringValue(text);
obj.set(ctr);
}
}
}
}
public void generateDocX() throws Exception {
FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));
//Get the template for generating the new docx file
File template = new File("./template.dotx");
XWPFDocument document = new XWPFDocument(new FileInputStream(template));
//traverse all headers
for (XWPFHeader header : document.getHeaderList()) {
printAllTextInTextRunsOfIBody(header);
replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all footers
for (XWPFFooter footer : document.getFooterList()) {
printAllTextInTextRunsOfIBody(footer);
replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse document body; note: tables needs not be traversed separately because they are in document body
printAllTextInTextRunsOfIBody(document);
replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");
//traverse all footnotes
for (XWPFFootnote footnote : document.getFootnotes()) {
printAllTextInTextRunsOfIBody(footnote);
replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all endnotes
for (XWPFEndnote endnote : document.getEndnotes()) {
printAllTextInTextRunsOfIBody(endnote);
replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
}
//since document was opened from *.dotx the content type needs to be changed
document.getPackage().replaceContentType(
"application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");
//Write the changes to the new docx file and close the document
document.write(outputStream);
outputStream.close();
document.close();
}
public static void main(String[] args) throws Exception {
WordEditAllIBodys app = new WordEditAllIBodys();
app.generateDocX();
}
}
Btw.: Since your document was opened from *.dotx the content type needs to be changed from wordprocessingml.template to wordprocessingml.document. Else Word will not open the resulting *.docx document. See Converting a file with ".dotx" extension (template) to "docx" (Word File).
As I am skeptic about the replacing-placeholder-text-approach, my preferred way is filling forms. See Problem with processing word document java. Of course such form fields cannot be used in header or footer. So headers or footers schould be created form scratch at whole.
hey guys sorry for long post and bad language and if there is unnecessary details
i created multiple 1page pdfs from one pdf template using excel document
i have now
something like this
tempfile0.pdf
tempfile1.pdf
tempfile2.pdf
...
im trying to merge all files in one single pdf using itext5
but it semmes that the pages in the resulted pdf are not in the order i wanted
per exemple
tempfile0.pdf in the first page
tempfile1. int the 2000 page
here is the code im using.
the procedure im using is:
1 filling a from from a hashmap
2 saving the filled form as one pdf
3 merging all the files in one single pdf
public void fillPdfitext(int debut,int fin) throws IOException, DocumentException {
for (int i =debut; i < fin; i++) {
HashMap<String, String> currentData = dataextracted[i];
// textArea.appendText("\n"+pdfoutputname +" en cours de preparation\n ");
PdfReader reader = new PdfReader(this.sourcePdfTemplateFile.toURI().getPath());
String outputfolder = this.destinationOutputFolder.toURI().getPath();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputfolder+"\\"+"tempcontrat"+debut+"-" +i+ "_.pdf"));
// get the document catalog
AcroFields acroForm = stamper.getAcroFields();
// as there might not be an AcroForm entry a null check is necessary
if (acroForm != null) {
for (String key : currentData.keySet()) {
try {
String fieldvalue=currentData.get(key);
if (key=="ADRESSE1"){
fieldvalue = currentData.get("ADRESSE1")+" "+currentData.get("ADRESSE2") ;
acroForm.setField("ADRESSE", fieldvalue);
}
if (key == "IMEI"){
acroForm.setField("NUM_SERIE_PACK", fieldvalue);
}
acroForm.setField(key, fieldvalue);
// textArea.appendText(key + ": "+fieldvalue+"\t\t");
} catch (Exception e) {
// e.printStackTrace();
}
}
stamper.setFormFlattening(true);
}
stamper.close();
}
}
this is the code for merging
public void Merge() throws IOException, DocumentException
{
File[] documentPaths = Main.objetapp.destinationOutputFolder.listFiles((dir, name) -> name.matches( "tempcontrat.*\\.pdf" ));
Arrays.sort(documentPaths, NameFileComparator.NAME_INSENSITIVE_COMPARATOR);
byte[] mergedDocument;
try (ByteArrayOutputStream memoryStream = new ByteArrayOutputStream())
{
Document document = new Document();
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.open();
for (File docPath : documentPaths)
{
PdfReader reader = new PdfReader(docPath.toURI().getPath());
try
{
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
}
finally
{
pdfSmartCopy.freeReader(reader);
reader.close();
}
}
document.close();
mergedDocument = memoryStream.toByteArray();
}
FileOutputStream stream = new FileOutputStream(this.destinationOutputFolder.toURI().getPath()+"\\"+
this.sourceDataFile.getName().replaceFirst("[.][^.]+$", "")+".pdf");
try {
stream.write(mergedDocument);
} finally {
stream.close();
}
documentPaths=null;
Runtime r = Runtime.getRuntime();
r.gc();
}
my question is how to keep the order of the files the same in the resulting pdf
It is because of naming of files. Your code
new FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
tempcontrat0-0_.pdf
tempcontrat0-1_.pdf
...
tempcontrat0-10_.pdf
tempcontrat0-11_.pdf
...
tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0 character using leftPad() method of org.apache.commons.lang.StringUtils or java.text.DecimalFormat and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream and after stamper.close() create new PdfReader for bytes from that stream and call pdfSmartCopy.getImportedPage() for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}
I am using Apache POI XWPF components and java, to extract data from a .xml file into a word document. So far so good, but I am struggling to create a table of contents.
I have to create a table of contents at the start of the method and then I update it at the end to get all the new headers. Currently I use doc.createTOC(), where doc is a variable created from XWPFDocument, to create the table at the start and then I use doc.enforceUpdateFields() to update everything at the end of the document. But when I open the document after I ran the program, the table of contents is empty, but the navigation panel does include some of the headers I specified.
A comment recommended that I include some code. So i started off by create the document from a template:
XWPFDocument doc = new XWPFDocument(new FileInputStream("D://Template.docx"));
I then create a table of contents:
doc.createTOC();
Then throughout the method I add headers to the document:
XWPFParagraph documentControlHeading = doc.createParagraph();
documentControlHeading.setPageBreak(true);
documentControlHeading.setAlignment(ParagraphAlignment.LEFT);
documentControlHeading.setStyle("Tier1Header");
After all the headers are added, I want to update the document so that all the new headers will appear in the table of contents. I do this buy using the following command:
doc.enforceUpdateFields();
Hmmm... I am looking at the createTOC() method code, and it appears that it looks for styles that look like Heading #. So Tier1Header would not be found. Try creating your text first, and use styles like Heading 1 for your headings. Then add the TOC using createTOC(). It should find all the headings when the TOC is created. I do not know if enforceUpdateFields() affects the TOC.
//Your docx template should contain the following or something similar text //which will be searched for and replaced with a WORD TOC.
//${TOC}
public static void main(String[] args) throws IOException, OpenXML4JException {
XWPFDocument docTemplate = null;
try {
File file = new File(PATH_TO_FILE); //"C:\\Reports\\Template.docx";
FileInputStream fis = new FileInputStream(file);
docTemplate = new XWPFDocument(fis);
generateTOC(docTemplate);
saveDocument(docTemplate);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (docTemplate != null) {
docTemplate.close();
}
}
}
private static void saveDocument(XWPFDocument docTemplate) throws FileNotFoundException, IOException {
FileOutputStream outputFile = null;
try {
outputFile = new FileOutputStream(OUTFILENAME);
docTemplate.write(outputFile);
} finally {
if (outputFile != null) {
outputFile.close();
}
}
}
public static void generateTOC(XWPFDocument document) throws InvalidFormatException, FileNotFoundException, IOException {
String findText = "${TOC}";
String replaceText = "";
for (XWPFParagraph p : document.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
int pos = r.getTextPosition();
String text = r.getText(pos);
if (text != null && text.contains(findText)) {
text = text.replace(findText, replaceText);
r.setText(text, 0);
addField(p, "TOC \\o \"1-3\" \\h \\z \\u");
break;
}
}
}
}
private static void addField(XWPFParagraph paragraph, String fieldName) {
CTSimpleField ctSimpleField = paragraph.getCTP().addNewFldSimple();
// ctSimpleField.setInstr(fieldName + " \\* MERGEFORMAT ");
ctSimpleField.setInstr(fieldName);
ctSimpleField.addNewR().addNewT().setStringValue("<<fieldName>>");
}
This is the code of createTOC(), obtained by inspecting XWPFDocument.class:
public void createTOC() {
CTSdtBlock block = getDocument().getBody().addNewSdt();
TOC toc = new TOC(block);
for (XWPFParagraph par : this.paragraphs) {
String parStyle = par.getStyle();
if ((parStyle != null) && (parStyle.startsWith("Heading"))) try {
int level = Integer.valueOf(parStyle.substring("Heading".length())).intValue();
toc.addRow(level, par.getText(), 1, "112723803");
} catch (NumberFormatException e) {
e.printStackTrace();
}
}
}
As you can see, it adds to the TOC all paragraphs having styles named "HeadingX", with X being a number. But, unfortunately, that's not sufficent. The method, in fact, is bugged/uncomplete in its implementation.
The page number passed to addRow() is always 1, it's not even calculated.
So, at the end, you will have a TOC with all your paragraphs and the trailing dots giving the proper indentation, but the pages will be always equal to "1".
EDIT
...but, there's a solution here.
I am currently modifying the logic for generating a PDF report, using itext package (itext-2.1.7.jar). The report is generated first, then - table of contents (TOC) and then TOC is inserted into the report file, using PdfStamper and it's method - replacePage:
PdfStamper stamper = new PdfStamper(new PdfReader(finalFile.getAbsolutePath()),
new FileOutputStream(reportFile));
stamper.replacePage(new PdfReader(tocFile.getAbsolutePath()), 1, 2);
However, I wanted TOC to also have internal links to point to the right chapter. Using replacePage() method, unfortunately, removes all links and annotations, so I tried another way. I've used a method I've found to merge the report file and the TOC file. It did the trick, meaning that the links were now preserved, but they don't seem to work, the user can click them, but nothing happens. I've placed internal links on other places in the report file, for testing purposes, and they seem to work, but they don't work when placed on the actual TOC. TOC is generated separately, here's how I've created the links (addLeadingDots is a local method, not pertinent to the problem):
Chunk anchor = new Chunk(idx + ". " + chapterName + getLeadingDots(chapterName, pageNumber) + " " + pageNumber, MainReport.FONT12);
String anchorLocation = idx+chapterName.toUpperCase().replaceAll("\\s","");
anchor.setLocalGoto(anchorLocation);
line = new Paragraph();
line.add(anchor);
line.setLeading(6);
table.addCell(line);
And this is how I've created anchor destinations on the chapters in the report file:
Chunk target = new Chunk(title.toUpperCase(), font);
String anchorDestination = title.toUpperCase().replace(".", "").replaceAll("\\s","");
target.setLocalDestination(anchorDestination);
Paragraph p = new Paragraph(target);
p.setAlignment(Paragraph.ALIGN_CENTER);
doc.add(p);
Finally, here's the method that supposed to merge report file and the TOC:
public static void mergePDFs(String originalFilePath, String fileToInsertPath, String outputFile, int location) {
PdfReader originalFileReader = null;
try {
originalFileReader = new PdfReader(originalFilePath);
} catch (IOException ex) {
System.out.println("ITextHelper.addPDFToPDF(): can't read original file: " + ex);
}
PdfReader fileToAddReader = null;
try {
fileToAddReader = new PdfReader(fileToInsertPath);
} catch (IOException ex) {
System.out.println("ITextHelper.addPDFToPDF(): can't read fileToInsert: " + ex);
}
if (originalFileReader != null && fileToAddReader != null) {
int numberOfOriginalPages = originalFileReader.getNumberOfPages();
Document document = new Document();
PdfCopy copy = null;
try {
copy = new PdfCopy(document, new FileOutputStream(outputFile));
document.open();
for (int i = 1; i <= numberOfOriginalPages; i++) {
if (i == location) {
for (int j = 1; j <= fileToAddReader.getNumberOfPages(); j++) {
copy.addPage(copy.getImportedPage(fileToAddReader, j));
}
}
copy.addPage(copy.getImportedPage(originalFileReader, i));
}
document.close();
} catch (DocumentException | FileNotFoundException ex) {
m_logger.error("ITextHelper.addPDFToPDF(): can't read output location: " + ex);
} catch (IOException ex) {
m_logger.error(Level.SEVERE, ex);
}
}
}
So, the main question is, what happens to the links, after both pdf documents are merged and how to make them work? I'm not too experienced with itext, so any help is appreciated.
Thanks for your help
How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?
Same question was answered here:
a quick way to do this, is to remove the fields from the acrofrom.
For this you just need to get the document catalog, then the acroform
and then remove all fields from this acroform.
The graphical representation is linked with the annotation and stay in
the document.
So I wrote this code:
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
public class PdfBoxTest {
public void test() throws Exception {
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form-field --> stop");
return;
}
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
// set the text in the form-field <-- does work
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setValue("Test-String");
}
}
// remove form-field but keep text ???
// acroForm.getFields().clear(); <-- does not work
// acroForm.setFields(null); <-- does not work
// acroForm.setFields(new ArrayList()); <-- does not work
// ???
pdDoc.save("E:\\Form-Test-Result.pdf");
pdDoc.close();
}
}
With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().
Simplified code with an example call of this method:
//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
//Fill the document
...
//Flatten the document
pDAcroForm.flatten();
//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();
Note: dynamic XFA forms cannot be flatten.
For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.
This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)
This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.
First, edit the forms using Acrobat Pro. Make them hidden and read-only.
Then you need to use two libraries: PDFBox and PDFClown.
PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).
Single field example code:
// PDF Clown code
File file = new File("Some file path");
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");
PageStamper stamper = new PageStamper();
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());
// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path");
composer.setFont(font, 10);
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY();
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate));
// Actually delete the form field!
field.delete();
stamper.flush();
// Create new buffer to output to...
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard);
byte[] bytes = buffer.toByteArray();
// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
// Phew. Finally.
pdfDocument.save("Some file path");
Probably some typos here and there, but this should be enough to get the gist :)
After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1.
This is what documentation stands about that:
If set, the user may not change the value of the field.
Any associated widget annotations will not interact
with the user; that is, they will not respond to mouse
clicks or change their appearance in response to
mouse motions. This flag is useful for fields whose
values are computed or imported from a database.
so the code could look like that (using pdfbox lib):
public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> acroFormFields = form.getFields();
System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));
for(PDField field: acroFormFields) {
makeAcroFieldReadOnly(field);
}
}
private static void makeAcroFieldReadOnly(PDField field) {
field.getDictionary().setInt("Ff",1);
}
setReadOnly did work for me as shown below -
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setReadOnly(true);
}
}
Solution to flattening acroform AND retaining the form field values using pdfBox:
see solution at https://mail-archives.apache.org/mod_mbox/pdfbox-users/201604.mbox/%3C3BC7E352-9447-4458-AAC3-5A9B70B4CCAA#fileaffairs.de%3E
The solution that worked for me with pdfbox 2.0.1:
File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);
field.setValue("new-value");
pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();
I didn't need to do the 2 extra steps in the above solution link:
// correct the missing page link for the annotations
// Add the missing resources to the form
I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:
selected "create Pdf Form"
Submit format of "PDF" - I found this gave smaller pdf file size than selecting "FDF" but still operated as a pdf form.
Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.
In order to really "flatten" an acrobat form field there seems to be much more to do than at the first glance.
After examining the PDF standard I managed to achieve real flatening in three steps:
save field value
remove widgets
remove form field
All three steps can be done with pdfbox (I used 1.8.5). Below I will sketch how I did it.
A very helpful tool in order to understand whats going on is the PDF Debugger.
Save the field
This is the most complicated step of the three.
In order to save the field's value you have to save its content to the pdf's content for each of the field's widgets. Easiest way to do so is drawing each widget's appearance to the widget's page.
void saveFieldValue( PDField field ) throws IOException
{
PDDocument document = getDocument( field );
// see PDField.getWidget()
for( PDAnnotationWidget widget : getWidgets( field ) )
{
PDPage parentPage = getPage( widget );
try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true ))
{
writeContent( contentStream, widget );
}
}
}
void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget )
throws IOException
{
PDAppearanceStream appearanceStream = getAppearanceStream( widget );
PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() );
AffineTransform transformation = getPositioningTransformation( widget.getRectangle() );
contentStream.drawXObject( xobject, transformation );
}
The appearance is an XObject stream containing all of the widget's content (value, font, size, rotation, etc.). You simply need to place it at the right position on the page which you can extract from the widget's rectangle.
Remove widgets
As noted above each field may have multiple widgets. A widget takes care of how a form field can be edited, triggers, displaying when not editing and such stuff.
In order to remove one you have to remove it from its page's annotations.
void removeWidget( PDAnnotationWidget widget ) throws IOException
{
PDPage widgetPage = getPage( widget );
List<PDAnnotation> annotations = widgetPage.getAnnotations();
PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget );
if( deleteCandidate != null && annotations.remove( deleteCandidate ) )
widgetPage.setAnnotations( annotations );
}
Note that the annotations may not contain the exact PDAnnotationWidget since it's a kind of a wrapper. You have to remove the one with matching COSObject.
Remove form field
As final step you remove the form field itself. This is not very different to the other posts above.
void removeFormfield( PDField field ) throws IOException
{
PDAcroForm acroForm = field.getAcroForm();
List<PDField> acroFields = acroForm.getFields();
List<PDField> removeCandidates = getFields( acroFields, field.getPartialName() );
if( removeAll( acroFields, removeCandidates ) )
acroForm.setFields( acroFields );
}
Note that I used a custom removeAll method here since the removeCandidates.removeAll() didn't work as expected for me.
Sorry that I cannot provide all the code here but with the above you should be able to write it yourself.
I don't have enough points to comment but SJohnson's response of setting the field to read only worked perfectly for me. I am using something like this with PDFBox:
private void setFieldValueAndFlatten(PDAcroForm form, String fieldName, String fieldValue) throws IOException {
PDField field = form.getField(fieldName);
if(field != null){
field.setValue(fieldValue);
field.setReadonly(true);
}
}
This will write your field value and then when you open the PDF after saving it will have your value and not be editable.
This is the code I came up with after synthesizing all of the answers I could find on the subject. This handles flattening text boxes, combos, lists, checkboxes, and radios:
public static void flattenPDF (PDDocument doc) throws IOException {
//
// find the fields and their kids (widgets) on the input document
// (each child widget represents an appearance of the field data on the page, there may be multiple appearances)
//
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> tmpfields = form.getFields();
PDResources formresources = form.getDefaultResources();
Map formfonts = formresources.getFonts();
PDAnnotation ann;
//
// for each input document page convert the field annotations on the page into
// content stream
//
List<PDPage> pages = catalog.getAllPages();
Iterator<PDPage> pageiterator = pages.iterator();
while (pageiterator.hasNext()) {
//
// get next page from input document
//
PDPage page = pageiterator.next();
//
// add the fonts from the input form to this pages resources
// so the field values will display in the proper font
//
PDResources pageResources = page.getResources();
Map pageFonts = pageResources.getFonts();
pageFonts.putAll(formfonts);
pageResources.setFonts(pageFonts);
//
// Create a content stream for the page for appending
//
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
//
// Find the appearance widgets for all fields on the input page and insert them into content stream of the page
//
for (PDField tmpfield : tmpfields) {
List widgets = tmpfield.getKids();
if(widgets == null) {
widgets = new ArrayList();
widgets.add(tmpfield.getWidget());
}
Iterator<COSObjectable> widgetiterator = widgets.iterator();
while (widgetiterator.hasNext()) {
COSObjectable next = widgetiterator.next();
if (next instanceof PDField) {
PDField foundfield = (PDField) next;
ann = foundfield.getWidget();
} else {
ann = (PDAnnotation) next;
}
if (ann.getPage().equals(page)) {
COSDictionary dict = ann.getDictionary();
if (dict != null) {
if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if (ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSStream stream = (COSStream) ap.getDictionaryObject("N");
if (stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
contentStream.appendRawCommands("Q\n");
}
} else if (tmpfield instanceof PDChoiceButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if(ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS);
COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D);
if (d != null) {
COSStream stream = (COSStream) d.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
if (!(tmpfield instanceof PDCheckbox)){
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
}
COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N);
if (n != null) {
COSStream stream = (COSStream) n.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
contentStream.appendRawCommands("Q\n");
}
}
}
}
}
}
// delete any field widget annotations and write it all to the page
// leave other annotations on the page
COSArrayList newanns = new COSArrayList();
List anns = page.getAnnotations();
ListIterator annotiterator = anns.listIterator();
while (annotiterator.hasNext()) {
COSObjectable next = (COSObjectable) annotiterator.next();
if (!(next instanceof PDAnnotationWidget)) {
newanns.add(next);
}
}
page.setAnnotations(newanns);
contentStream.close();
}
//
// Delete all fields from the form and their widgets (kids)
//
for (PDField tmpfield : tmpfields) {
List kids = tmpfield.getKids();
if(kids != null) kids.clear();
}
tmpfields.clear();
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
}
Full class here:
https://gist.github.com/jribble/beddf7620536939f88db
This is the answer of Thomas, from the PDFBox-Mailinglist:
You will need to get the Fields over the COSDictionary. Try this
code...
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();
I thought I'd share our approach that worked with PDFBox 2+.
We've used the PDAcroForm.flatten() method.
The fields needed some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.
Finally what worked was the following:
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = PDDocument.load(new File(src));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
PDResources resources = new PDResources();
acroForm.setDefaultResources(resources);
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.flatten();
doc.save(dst);
doc.close();
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}
If the PDF document doesn't actually contain form fields but you still want to flatten other elements like markups, the following works quite well. FYI It was implemented for C#
public static void FlattenPdf(string fileName)
{
PDDocument doc = PDDocument.load(new java.io.File(fileName));
java.util.List annots = doc.getPage(0).getAnnotations();
for (int i = 0; i < annots.size(); ++i)
{
PDAnnotation annot = (PDAnnotation)annots.get(i);
annot.setLocked(true);
annot.setReadOnly(true);
annot.setNoRotate(true);
}
doc.save(fileName);
doc.close();
}
This effectively locks all markups in the document and they will no longer be editable.
pdfbox c# annotations