How to set plain header in docx file using apache poi?

How to set plain header in docx file using apache poi? - java

I would like to create a header for docx document using apache poi but I have difficulties. I have no working code to show. I would like to ask for some piece of code as starting point.

There's an Apache POI Unit test that covers your very case - you're looking for TestXWPFHeader#testSetHeader(). It covers starting with a document with no headers or footers set, then adding them
Your code would basically be something like:
XWPFHeaderFooterPolicy policy = sampleDoc.getHeaderFooterPolicy();
if (policy.getDefaultHeader() == null && policy.getFirstPageHeader() == null
&& policy.getDefaultFooter() == null) {
// Need to create some new headers
// The easy way, gives a single empty paragraph
XWPFHeader headerD = policy.createHeader(policy.DEFAULT);
headerD.getParagraphs(0).createRun().setText("Hello Header World!");
// Or the full control way
CTP ctP1 = CTP.Factory.newInstance();
CTR ctR1 = ctP1.addNewR();
CTText t = ctR1.addNewT();
t.setStringValue("Paragraph in header");
XWPFParagraph p1 = new XWPFParagraph(ctP1, sampleDoc);
XWPFParagraph[] pars = new XWPFParagraph[1];
pars[0] = p1;
policy.createHeader(policy.FIRST, pars);
} else {
// Already has a header, change it
}
See the XWPFHeaderFooterPolicy JavaDocs for a bit more on creating headers and footers.
It isn't the nicest, so it could ideally use some kind soul submitting a patch to make it nicer (hint hint...!), but it can work as the unit tests show

Based on the previous answer, just copy and paste:
public void test1() throws IOException{
XWPFDocument sampleDoc = new XWPFDocument();
XWPFHeaderFooterPolicy policy = sampleDoc.getHeaderFooterPolicy();
//in an empty document always will be null
if(policy==null){
CTSectPr sectPr = sampleDoc.getDocument().getBody().addNewSectPr();
policy = new XWPFHeaderFooterPolicy( sampleDoc, sectPr );
}
if (policy.getDefaultHeader() == null && policy.getFirstPageHeader() == null
&& policy.getDefaultFooter() == null) {
XWPFHeader headerD = policy.createHeader(policy.DEFAULT);
headerD.getParagraphs().get(0).createRun().setText("Hello Header World!");
}
FileOutputStream out = new FileOutputStream(System.currentTimeMillis()+"_test1_header.docx");
sampleDoc.write(out);
out.close();
sampleDoc.close();
}

Related

Apache Poi XWPF - How do we split a docx into two sections?

I have an existing document (in bytes) that I parsed into XWPFDocument using
InputStream is = new ByteArrayInputStream(docuByte);
XWPFDocument docx = new XWPFDocument(OPCPackage.open(is));
This document has at least 5 pages. I am planning to set blank footers on first two pages (title and TOC page), and a page footer from third page and up.
In order to do this, I understand that I need to separate the document into two different sections.
section 1 - first and second page
section 2 - third page and up
However, I could not find a method that would enable me to split the document into two sections. Would anyone know how to implement this?

There is no special method to add section breaks in XWPFDocument up to now. So one needs using the underlying org.openxmlformats.schemas.wordprocessingml.x2006.main.* classes.
A section break in Office Open XML Word documents (*.docx) is a paragraph having section properties setting in paragraph properties. So the need is to insert such a paragraph into the document. To insert a paragraph XWPFDocument provides a method insertNewParagraph(org.apache.xmlbeans.XmlCursor cursor). But to get this cursor position, one needs to know where the paragraph shall be inserted. This can be a already present paragraph containing a certain text for example.
The inserted section properties are then relevant for the section above that paragraph.
The document body also has section properties which are relevant for the last section.
The following code shows that. It searches for a paragraph containing a certain text. Then it inserts a paragraph having section properties, which are a copy of the former last section properties, before that found paragraph. Then it removes all header/footer settings from the new inserted section properties. After that the section above the new inserted paragraph has no header/footer settings while former header/footer settings remains for the last section.
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
public class WordInsertSectionbreak {
static org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr getDocumentBodySectPr(XWPFDocument document) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1 ctDocument = document.getDocument();
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody ctBody = ctDocument.getBody();
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrDocumentBody = ctBody.getSectPr();
return ctSectPrDocumentBody;
}
static org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr getNextSectPr(XWPFParagraph paragraph) {
// get the section settings of next section in document
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrNextSect = null;
// maybe next section settings are in a paragraph
XWPFDocument document = paragraph.getDocument();
int pos = document.getPosOfParagraph(paragraph);
for (int p = pos; p < document.getParagraphs().size(); p++) {
paragraph = document.getParagraphArray(p);
if (paragraph.getCTP().getPPr() != null) {
ctSectPrNextSect = paragraph.getCTP().getPPr().getSectPr();
}
if (ctSectPrNextSect != null) break;
}
// if not in a paragraph next section settings are in documetn body
if (ctSectPrNextSect == null) {
ctSectPrNextSect = getDocumentBodySectPr(document);
}
return ctSectPrNextSect;
}
static XWPFParagraph insertSectionbreak(XWPFDocument document, org.apache.xmlbeans.XmlCursor cursor) {
XWPFParagraph paragraph = null;;
// insert a paragraph for section settings for new section above and section break.
paragraph = document.insertNewParagraph(cursor);
// get next section properties, which were section properties for previous section above
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrNextSect = getNextSectPr(paragraph);
// set a copy of section properties for previous section above as section properties for new section
if (ctSectPrNextSect != null) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrNewSect = (org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr)ctSectPrNextSect.copy();
paragraph.getCTP().addNewPPr().setSectPr(ctSectPrNewSect);
return paragraph;
}
return null;
}
static XWPFParagraph getParagraphByText(XWPFDocument document, String text) {
for (XWPFParagraph paragraph : document.getParagraphs()) {
String paragraphText = paragraph.getText();
if (paragraphText.contains(text)) {
return paragraph;
}
}
return null;
}
static void removeHeadersAndFooters(XWPFParagraph sectionBreakParagraph) {
if (sectionBreakParagraph == null) return;
if (sectionBreakParagraph.getCTP().getPPr() != null) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPr = sectionBreakParagraph.getCTP().getPPr().getSectPr();
// remove headers and footers from section
for (int i = ctSectPr.getHeaderReferenceArray().length-1; i >= 0; i--) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHdrFtrRef ctHdrFtrRef = ctSectPr.getHeaderReferenceArray(i);
ctSectPr.removeHeaderReference(i);
}
for (int i = ctSectPr.getFooterReferenceArray().length-1; i >= 0; i--) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHdrFtrRef ctHdrFtrRef = ctSectPr.getFooterReferenceArray(i);
ctSectPr.removeFooterReference(i);
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordDocument.docx"));
XWPFParagraph paragraph = getParagraphByText(document, "Some text to mark where section break shall be inserted");
if (paragraph != null) {
XWPFParagraph sectionBreakParagraph = insertSectionbreak(document, paragraph.getCTP().newCursor());
if (sectionBreakParagraph != null) {
removeHeadersAndFooters(sectionBreakParagraph);
}
}
FileOutputStream out = new FileOutputStream("./WordDocumentResult.docx");
document.write(out);
out.close();
document.close();
}
}
Code is tested and works using current apache poi 5.2.2.

iText Fill Form / Copy Page to new Document

I'm useing iText to fill a template PDF which contains a AcroForm.
Now I want to use this template to create a new PDF with dynamically pages.
My idea is it to fill the template PDF, copy the page with the written fields and add it to a new file. They main Problem is that our customer want to designe the template by them self. So I'm not sure if I try the right way to solve this Problem.
So I've created this code which don't work right now I get the error com.itextpdf.io.IOException: PDF header not found.
My Code
x = 1;
try (PdfDocument finalDoc = new PdfDocument(new PdfWriter("C:\\Users\\...Final.pdf"))) {
for (HashMap<String, String> map : testValues) {
String path1 = "C:\\Users\\.....Temp.pdf"
InputStream template = templateValues.get("Template");
PdfWriter writer = new PdfWriter(path1);
try (PdfDocument pdfDoc = new PdfDocument(new PdfReader(template), writer)) {
PdfAcroForm form = PdfAcroForm.getAcroForm(pdfDoc, true);
for (HashMap.Entry<String, String> map2 : map.entrySet()) {
if (form.getField(map2.getKey()) != null) {
Map<String, PdfFormField> fields = form.getFormFields();
fields.get(map2.getKey()).setValue(map2.getValue());
}
}
} catch (IOException | PdfException ex) {
System.err.println("Ex2: " + ex.getMessage());
}
if (x != 0 && (x % 5) == 0) {
try (PdfDocument tempDoc = new PdfDocument(new PdfReader(path1))) {
PdfPage page = tempDoc.getFirstPage();
finalDoc.addPage(page.copyTo(finalDoc));
} catch (IOException | PdfException ex) {
System.err.println("Ex3: " + ex.getMessage());
}
}
x++;
}
} catch (IOException | PdfException ex) {
System.err.println("Ex: " + ex.getMessage());
}

Part 1 - PDF Header is Missing
this appears to be caused by you attempting to re-read an InputStream w/in a loop that has already been read (and, depending on the configuration of the PdfReader, closed). Solving for this depends on the specific type of InputStream being used - if you want to leave it as a simple InputStream (vs. a more specific yet more capable InputStream type) then you'll need to first slurp up the bytes from the stream into memory (e.g. a ByteArrayOutputStream) then create your PDFReaders based on those bytes.
i.e.
ByteArrayOutputStream templateBuffer = new ByteArrayOutputStream();
while ((int c = template.read()) > 0) templateBuffer.write(c);
for (/* your loop */) {
...
PdfDocument filledInAcroFormTemplate = new PdfDocument(new PdfReader(new ByteArrayInputStream(templateBuffer.toByteArray())), new PdfWriter(tmp))
...
Part 2 - other problems
Couple of things
make sure to grab the recently released 7.0.1 version of iText since it included a couple of fixes wrt/ AcroForm handling
you can probably get away with using ByteArrayOutputStreams for your temporary PDFs (vs. writing them out to files) - i'll use this approach in the example below
PdfDocument/PdfPage is in the "kernel" module, yet AcroForms are in the "form" module (meaning PdfPage is intentionally unaware of AcroForms) - IPdfPageExtraCopier is sortof the bridge between the modules. In order to properly copy AcroForms, you need to use the two-arg copyTo() version, passing an instance of PdfPageFormCopier
field names must be unique in the document (the "absolute" field name that is - i'll skip field hierarcies for now). Since we're looping through and adding the fields from the template multiple times, we need to come up with a strategy to rename the fields to ensure uniqueness (the current API is actually a little bit clunky in this area)
File acroFormTemplate = new File("someTemplate.pdf");
Map<String, String> someMapOfFieldToValues = new HashMap<>();
try (
PdfDocument finalOutput = new PdfDocument(new PdfWriter(new FileOutputStream(new File("finalOutput.pdf")));
) {
for (/* some looping condition */int x = 0; x < 5; x++) {
// for each iteration of the loop, create a temporary in-memory
// PDF to handle form field edits.
ByteArrayOutputStream tmp = new ByteArrayOutputStream();
try (
PdfDocument filledInAcroFormTemplate = new PdfDocument(new PdfReader(new FileInputStream(acroFormTemplate)), new PdfWriter(tmp));
) {
PdfAcroForm acroForm = PdfAcroForm.getAcroForm(filledInAcroFormTemplate, true);
for (PdfFormField field : acroForm.getFormFields().values()) {
if (someMapOfFieldToValues.containsKey(field.getFieldName())) {
field.setValue(someMapOfFieldToValues.get(field.getFieldName()));
}
}
// NOTE that because we're adding the template multiple times
// we need to adopt a field renaming strategy to ensure field
// uniqueness in the final document. For demonstration's sake
// we'll just rename them prefixed w/ our loop counter
List<String> fieldNames = new ArrayList<>();
fieldNames.addAll(acroForm.getFormFields().keySet()); // avoid ConfurrentModification
for (String fieldName : fieldNames) {
acroForm.renameField(fieldName, x+"_"+fieldName);
}
}
// the temp PDF needs to be "closed" for all the PDF finalization
// magic to happen...so open up new read-only version to act as
// the source for the merging from our in-memory bucket-o-bytes
try (
PdfDocument readOnlyFilledInAcroFormTemplate = new PdfDocument(new PdfReader(new ByteArrayInputStream(tmp.toByteArray())));
) {
// although PdfPage.copyTo will probably work for simple pages, PdfDocument.copyPagesTo
// is a more comprehensive copy (wider support for copying Outlines and Tagged content)
// so it's more suitable for general page-copy use. Also, since we're copying AcroForm
// content, we need to use the PdfPageFormCopier
readOnlyFilledInAcroFormTemplate.copyPagesTo(1, 1, finalOutput, new PdfPageFormCopier());
}
}
}

Close your PdfDocuments when you are done with adding content to them.

Adding footer to ms word using POI api

I searched a lot and getting some results in which some sample code is there but no one is working. All are either getting null pointer exception or if document is generated then at the time of opening file (.docx) giving error and displaying message
A text/xml declaration may occur only at the very beginning of innput.
I thought may be I am adding some content and then adding footer is giving some problem so I pasted my footer code at very beginning now this time I am getting
index out of bound exception
Here is my complete code
String fileName ="Book.docx";
String folderPath=SystemProperties.get(SystemProperties.TMP_DIR)+File.separator+"liferay" + File.separator + "document_preview";
String filePath=folderPath+File.separator+fileName;
File file=new File(filePath);
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraphOne = document.createParagraph();
paragraphOne.setAlignment(ParagraphAlignment.CENTER);
XWPFRun paragraphOneRunOne = paragraphOne.createRun();
paragraphOneRunOne.setText("Training Report");
paragraphOneRunOne.addBreak();
XWPFTable table = document.createTable();
XWPFTableRow tableRowOne = table.getRow(0);
tableRowOne.getCell(0).setText("No");
tableRowOne.createCell().setText("Name");
XWPFHeaderFooterPolicy headerFooterPolicy = document.getHeaderFooterPolicy();
if (headerFooterPolicy == null) {
CTBody body = document.getDocument().getBody();
CTSectPr sectPr = body.getSectPr();
if (sectPr == null) {
sectPr = body.addNewSectPr();
}
headerFooterPolicy = new XWPFHeaderFooterPolicy(document, sectPr);
}
CTP ctP1 = CTP.Factory.newInstance();
CTR ctR1 = ctP1.addNewR();
CTText t = ctR1.addNewT();
t.setStringValue("first footer");
XWPFParagraph codePara = new XWPFParagraph(ctP1);
XWPFParagraph[] newparagraphs = new XWPFParagraph[1];
newparagraphs[0] = codePara;
XWPFFooter xwpfFooter = null;
xwpfFooter = headerFooterPolicy.createFooter(XWPFHeaderFooterPolicy.DEFAULT);
FileOutputStream fileoutOfTraining = new FileOutputStream(file);
document.write(fileoutOfTraining);
fileoutOfTraining.flush();
fileoutOfTraining.close();
downloadOperation(file, fileName, resourceResponse);
code in downloadOperation method
HttpServletResponse httpServletResponse =PortalUtil.getHttpServletResponse(resourceResponse);
BufferedInputStream input = null;
BufferedOutputStream output = null;
httpServletResponse.setHeader("Content-Disposition", "attachment; filename=\""+fileName+"\"; filename*=UTF-8''"+fileName);
int DEFAULT_BUFFER_SIZE=1024;
try {
input = new BufferedInputStream(new FileInputStream(file), DEFAULT_BUFFER_SIZE);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
resourceResponse.flushBuffer();
output = new BufferedOutputStream(httpServletResponse.getOutputStream(), DEFAULT_BUFFER_SIZE);
} catch (IOException e) {
e.printStackTrace();
}
byte[] buffer = new byte[2*DEFAULT_BUFFER_SIZE];
int length;
try {
while ((length = input.read(buffer)) > 0) {
output.write(buffer, 0, length);
}
output.flush();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
output.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Please help me to generate footer, this is my code, if I add footer code after my paragraph and table then no run time error but error in opening generated file, if I placed footer at before the contents that I want to add in documents then I am getting error "index out of bound exception".
Please help me if any one having any code snippet or at least some pointers towards the solution.
Thanks

I faced this issue and the solution is
We have to use 3.10 final poi jar.
3.9 having this problem.
Please remove jars of previous version and add jars of 3.10 final version in which this bug is fixed.
Jars requires are:
poi-3.10-FINAL.jar
poi-ooxml-3.10-FINAL.jar
poi-ooxml-schemas-3.10-FINAL.jar
Easily available in net:
http://mvnrepository.com/artifact/org.apache.poi/poi/3.10-FINAL
XWPFDocument document = new XWPFDocument();
CTP ctp = CTP.Factory.newInstance();
CTR ctr = ctp.addNewR();
CTRPr rpr = ctr.addNewRPr();
CTText textt = ctr.addNewT();
textt.setStringValue( " Page 1" );
XWPFParagraph codePara = new XWPFParagraph( ctp, document );
XWPFParagraph[] newparagraphs = new XWPFParagraph[1];
newparagraphs[0] = codePara;
CTSectPr sectPr = document.getDocument().getBody().addNewSectPr();
XWPFHeaderFooterPolicy headerFooterPolicy = new XWPFHeaderFooterPolicy( document, sectPr );
headerFooterPolicy.createFooter( STHdrFtr.DEFAULT, newparagraphs );
The above code is working perfectly, please use 3.10 wars those I mentioned above.

I don't know if adding image to header from scratch works in new version, but I know for a fact that "templating" works just perfectly. I created the template document in word, adapted the header the way I needed it to be (in my case, logo-image was on the right, paragraph with dummy text on the left, and another image that separates the upper two objects from the content, on the header bottom, and all was repeating on all the pages) and left the rest of the document empty. On the very beginning, I didn't create the XWPFDocument by calling the ...new XWPFDocument() anymore, I created it this way:
XWPFDocument doc = new XWPFDocument(new FileInputStream("pathTo/template.docx"));
Than just fill your document with your content, and if you need to update the header text for different exports like I did, call the update header function, that in my case looks something like this:
public void updateHeader() throws InvalidFormatException, IOException {
// load the header policy from template and update the paragraph text
XWPFHeaderFooterPolicy headerFooterPolicy = document
.getHeaderFooterPolicy();
XWPFHeader defaultHeader = headerFooterPolicy.getDefaultHeader();
defaultHeader.getParagraphs().get(0).getRuns().get(0)
.setText(profileName, 0);
// this is only to put some space between the content in the header and the real content
defaultHeader.getParagraphs()
.get(defaultHeader.getParagraphs().size() - 1)
.setSpacingAfter(300);
}
As far as I know, this works since 3.10, and if you stumble upon some java security issues, try the current nightly version where many of these security issues were resolved. For me it even worked on Google App Engine.

Read .doc file content and write into pdf file in java

I'm writing a java code that utilizes Apache-poi to read ms-office .doc file and itext jar API's to create and write into pdf file. I have done reading texts and tables printed in the .doc file. Now i'm looking for a solution that reads images written in the document. I have coded as following to read images in the document file. Why this code is not working.
public static void main(String[] args) {
POIFSFileSystem fs = null;
Document document = new Document();
WordExtractor extractor = null ;
try {
fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\tableandImage.doc"));
HWPFDocument hdocument=new HWPFDocument(fs);
extractor = new WordExtractor(hdocument);
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/tableandImage.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
Range range=hdocument.getRange();
String readText=null;
PdfPTable createTable;
CharacterRun run;
PicturesTable picture;
for(int i=0;i<range.numParagraphs();i++) {
Paragraph par = range.getParagraph(i);
readText=par.text();
if(!par.isInTable()) {
if(readText.endsWith("\n")) {
readText=readText+"\n";
document.add(new com.itextpdf.text.Paragraph(readText));
} if(readText.endsWith("\r")) {
readText += "\n";
document.add(new com.itextpdf.text.Paragraph(readText));
}
run =range.getCharacterRun(i);
picture=hdocument.getPicturesTable();
if(picture.hasPicture(run)) {
//if(run.isSpecialCharacter()) {
Picture pic=picture.extractPicture(run, true);
byte[] picturearray=pic.getContent();
com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
document.add(image);
}
} else if (par.isInTable()) {
Table table = range.getTable(par);
TableRow tRow1= table.getRow(0);
int numColumns=tRow1.numCells();
createTable=new PdfPTable(numColumns);
for (int rowId=0;rowId<table.numRows();rowId++) {
TableRow tRow = table.getRow(rowId);
for (int cellId=0;cellId<tRow.numCells();cellId++) {
TableCell tCell = tRow.getCell(cellId);
PdfPCell c1 = new PdfPCell(new Phrase(tCell.text()));
createTable.addCell(c1);
}
}
document.add(createTable);
}
}
}catch(IOException e) {
System.out.println("IO Exception");
e.printStackTrace();
}
catch(Exception exep) {
exep.printStackTrace();
}finally {
document.close();
}
}
The problems are:
1. Condition if(picture.hasPicture(run)) is not satisfying but document has jpeg image.
I'm getting following exception while reading table.
java.lang.IllegalArgumentException: This paragraph is not the first one in the table
at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:876)
at pagecode.ReadDocxOrDocFile.main(ReadDocxOrDocFile.java:113)
Can anybody help me to solve the problem.
Thank you.

Regarding your exception:
Your code iterates over all paragraphs and calls isInTable() for each one of them. Since tables are commonly composed of several such paragraphs, your call to getTable() also gets executed several times for a single table.
However, what your code should do instead is to find the first paragraph of a table, then process all paragraphs therein (via getRow(m).getCell(n)) and ultimately continue with the outer loop in the first paragraph after the table. Codewise this may look roughly like the following (assuming no merged cells, no nested tables and no other funny edge cases):
if (par.isInTable()) {
Table table = range.getTable(par);
for (int rn=0; rn<table.numRows(); rn++) {
TableRow row = table.getRow(rn);
for (int cn=0; cn<row.numCells(); cn++) {
TableCell cell = row.getCell(cn);
for (int pn=0; pn<cell.numParagraphs(); pn++) {
Paragraph cellParagraph = cell.getParagraph(pn);
// your PDF conversion code goes here
}
}
}
i += table.numParagraphs()-1; // skip the already processed (table-)paragraphs in the outer loop
}
Regarding the pictures issue:
Am I guessing right that you are trying to obtain the picture which is anchored within a given paragraph? Unfortunately, the predefined methods of POI only work if the picture is not embedded within a field (which is rather rare, actually). For field-based images (i.e. preview images of embedded OLEs) you should do something like the following (untested!):
PictureStore pictureStore = new PictureStore(hdocument);
// bla bla ...
for (int cr=0; cr < par.numCharacterRuns(); cr++) {
CharacterRun characterRun = par.getCharacterRun(cr);
Field field = hdocument.getFields().getFieldByStartOffset(FieldsDocumentPart.MAIN, characterRun.getStartOffset());
if (field != null && field.getType() == 0x3A) { // 0x3A is type "EMBED"
Picture pic = pictureStore.getPicture(field.secondSubrange(characterRun));
}
}
For a list of possible values of Field.getType() see here.

PDFBox: How to "flatten" a PDF-form?

How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?
Same question was answered here:
a quick way to do this, is to remove the fields from the acrofrom.
For this you just need to get the document catalog, then the acroform
and then remove all fields from this acroform.
The graphical representation is linked with the annotation and stay in
the document.
So I wrote this code:
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
public class PdfBoxTest {
public void test() throws Exception {
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form-field --> stop");
return;
}
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
// set the text in the form-field <-- does work
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setValue("Test-String");
}
}
// remove form-field but keep text ???
// acroForm.getFields().clear(); <-- does not work
// acroForm.setFields(null); <-- does not work
// acroForm.setFields(new ArrayList()); <-- does not work
// ???
pdDoc.save("E:\\Form-Test-Result.pdf");
pdDoc.close();
}
}

With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().
Simplified code with an example call of this method:
//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
//Fill the document
...
//Flatten the document
pDAcroForm.flatten();
//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();
Note: dynamic XFA forms cannot be flatten.
For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.

This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)
This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.
First, edit the forms using Acrobat Pro. Make them hidden and read-only.
Then you need to use two libraries: PDFBox and PDFClown.
PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).
Single field example code:
// PDF Clown code
File file = new File("Some file path");
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");
PageStamper stamper = new PageStamper();
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());
// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path");
composer.setFont(font, 10);
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY();
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate));
// Actually delete the form field!
field.delete();
stamper.flush();
// Create new buffer to output to...
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard);
byte[] bytes = buffer.toByteArray();
// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
// Phew. Finally.
pdfDocument.save("Some file path");
Probably some typos here and there, but this should be enough to get the gist :)

After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1.
This is what documentation stands about that:
If set, the user may not change the value of the field.
Any associated widget annotations will not interact
with the user; that is, they will not respond to mouse
clicks or change their appearance in response to
mouse motions. This flag is useful for fields whose
values are computed or imported from a database.
so the code could look like that (using pdfbox lib):
public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {
PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> acroFormFields = form.getFields();
System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));
for(PDField field: acroFormFields) {
makeAcroFieldReadOnly(field);
}
}
private static void makeAcroFieldReadOnly(PDField field) {
field.getDictionary().setInt("Ff",1);
}

setReadOnly did work for me as shown below -
#SuppressWarnings("unchecked")
List<PDField> fields = acroForm.getFields();
for (PDField field : fields) {
if (field.getFullyQualifiedName().equals("formfield1")) {
field.setReadOnly(true);
}
}

Solution to flattening acroform AND retaining the form field values using pdfBox:
see solution at https://mail-archives.apache.org/mod_mbox/pdfbox-users/201604.mbox/%3C3BC7E352-9447-4458-AAC3-5A9B70B4CCAA#fileaffairs.de%3E
The solution that worked for me with pdfbox 2.0.1:
File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);
field.setValue("new-value");
pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();
I didn't need to do the 2 extra steps in the above solution link:
// correct the missing page link for the annotations
// Add the missing resources to the form
I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:
selected "create Pdf Form"
Submit format of "PDF" - I found this gave smaller pdf file size than selecting "FDF" but still operated as a pdf form.
Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.

In order to really "flatten" an acrobat form field there seems to be much more to do than at the first glance.
After examining the PDF standard I managed to achieve real flatening in three steps:
save field value
remove widgets
remove form field
All three steps can be done with pdfbox (I used 1.8.5). Below I will sketch how I did it.
A very helpful tool in order to understand whats going on is the PDF Debugger.
Save the field
This is the most complicated step of the three.
In order to save the field's value you have to save its content to the pdf's content for each of the field's widgets. Easiest way to do so is drawing each widget's appearance to the widget's page.
void saveFieldValue( PDField field ) throws IOException
{
PDDocument document = getDocument( field );
// see PDField.getWidget()
for( PDAnnotationWidget widget : getWidgets( field ) )
{
PDPage parentPage = getPage( widget );
try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true ))
{
writeContent( contentStream, widget );
}
}
}
void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget )
throws IOException
{
PDAppearanceStream appearanceStream = getAppearanceStream( widget );
PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() );
AffineTransform transformation = getPositioningTransformation( widget.getRectangle() );
contentStream.drawXObject( xobject, transformation );
}
The appearance is an XObject stream containing all of the widget's content (value, font, size, rotation, etc.). You simply need to place it at the right position on the page which you can extract from the widget's rectangle.
Remove widgets
As noted above each field may have multiple widgets. A widget takes care of how a form field can be edited, triggers, displaying when not editing and such stuff.
In order to remove one you have to remove it from its page's annotations.
void removeWidget( PDAnnotationWidget widget ) throws IOException
{
PDPage widgetPage = getPage( widget );
List<PDAnnotation> annotations = widgetPage.getAnnotations();
PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget );
if( deleteCandidate != null && annotations.remove( deleteCandidate ) )
widgetPage.setAnnotations( annotations );
}
Note that the annotations may not contain the exact PDAnnotationWidget since it's a kind of a wrapper. You have to remove the one with matching COSObject.
Remove form field
As final step you remove the form field itself. This is not very different to the other posts above.
void removeFormfield( PDField field ) throws IOException
{
PDAcroForm acroForm = field.getAcroForm();
List<PDField> acroFields = acroForm.getFields();
List<PDField> removeCandidates = getFields( acroFields, field.getPartialName() );
if( removeAll( acroFields, removeCandidates ) )
acroForm.setFields( acroFields );
}
Note that I used a custom removeAll method here since the removeCandidates.removeAll() didn't work as expected for me.
Sorry that I cannot provide all the code here but with the above you should be able to write it yourself.

I don't have enough points to comment but SJohnson's response of setting the field to read only worked perfectly for me. I am using something like this with PDFBox:
private void setFieldValueAndFlatten(PDAcroForm form, String fieldName, String fieldValue) throws IOException {
PDField field = form.getField(fieldName);
if(field != null){
field.setValue(fieldValue);
field.setReadonly(true);
}
}
This will write your field value and then when you open the PDF after saving it will have your value and not be editable.

This is the code I came up with after synthesizing all of the answers I could find on the subject. This handles flattening text boxes, combos, lists, checkboxes, and radios:
public static void flattenPDF (PDDocument doc) throws IOException {
//
// find the fields and their kids (widgets) on the input document
// (each child widget represents an appearance of the field data on the page, there may be multiple appearances)
//
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> tmpfields = form.getFields();
PDResources formresources = form.getDefaultResources();
Map formfonts = formresources.getFonts();
PDAnnotation ann;
//
// for each input document page convert the field annotations on the page into
// content stream
//
List<PDPage> pages = catalog.getAllPages();
Iterator<PDPage> pageiterator = pages.iterator();
while (pageiterator.hasNext()) {
//
// get next page from input document
//
PDPage page = pageiterator.next();
//
// add the fonts from the input form to this pages resources
// so the field values will display in the proper font
//
PDResources pageResources = page.getResources();
Map pageFonts = pageResources.getFonts();
pageFonts.putAll(formfonts);
pageResources.setFonts(pageFonts);
//
// Create a content stream for the page for appending
//
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
//
// Find the appearance widgets for all fields on the input page and insert them into content stream of the page
//
for (PDField tmpfield : tmpfields) {
List widgets = tmpfield.getKids();
if(widgets == null) {
widgets = new ArrayList();
widgets.add(tmpfield.getWidget());
}
Iterator<COSObjectable> widgetiterator = widgets.iterator();
while (widgetiterator.hasNext()) {
COSObjectable next = widgetiterator.next();
if (next instanceof PDField) {
PDField foundfield = (PDField) next;
ann = foundfield.getWidget();
} else {
ann = (PDAnnotation) next;
}
if (ann.getPage().equals(page)) {
COSDictionary dict = ann.getDictionary();
if (dict != null) {
if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if (ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSStream stream = (COSStream) ap.getDictionaryObject("N");
if (stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
contentStream.appendRawCommands("Q\n");
}
} else if (tmpfield instanceof PDChoiceButton) {
COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP");
if(ap != null) {
contentStream.appendRawCommands("q\n");
COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect");
if (rectarray != null) {
float[] rect = rectarray.toFloatArray();
String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n";
contentStream.appendRawCommands(s);
}
COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS);
COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D);
if (d != null) {
COSStream stream = (COSStream) d.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
if (!(tmpfield instanceof PDCheckbox)){
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
}
COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N);
if (n != null) {
COSStream stream = (COSStream) n.getDictionaryObject(cbValue);
if(stream != null) {
InputStream ioStream = stream.getUnfilteredStream();
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int amountRead = 0;
while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) {
byteArray.write(buffer, 0, amountRead);
}
contentStream.appendRawCommands(byteArray.toString() + "\n");
}
}
contentStream.appendRawCommands("Q\n");
}
}
}
}
}
}
// delete any field widget annotations and write it all to the page
// leave other annotations on the page
COSArrayList newanns = new COSArrayList();
List anns = page.getAnnotations();
ListIterator annotiterator = anns.listIterator();
while (annotiterator.hasNext()) {
COSObjectable next = (COSObjectable) annotiterator.next();
if (!(next instanceof PDAnnotationWidget)) {
newanns.add(next);
}
}
page.setAnnotations(newanns);
contentStream.close();
}
//
// Delete all fields from the form and their widgets (kids)
//
for (PDField tmpfield : tmpfields) {
List kids = tmpfield.getKids();
if(kids != null) kids.clear();
}
tmpfields.clear();
// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();
}
Full class here:
https://gist.github.com/jribble/beddf7620536939f88db

This is the answer of Thomas, from the PDFBox-Mailinglist:
You will need to get the Fields over the COSDictionary. Try this
code...
PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray fields = acroFormDict.getDictionaryObject("Fields");
fields.clear();

I thought I'd share our approach that worked with PDFBox 2+.
We've used the PDAcroForm.flatten() method.
The fields needed some preprocessing and most importantly the nested field structure had to be traversed and DV and V checked for values.
Finally what worked was the following:
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = PDDocument.load(new File(src));
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
PDResources resources = new PDResources();
acroForm.setDefaultResources(resources);
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.flatten();
doc.save(dst);
doc.close();
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}

If the PDF document doesn't actually contain form fields but you still want to flatten other elements like markups, the following works quite well. FYI It was implemented for C#
public static void FlattenPdf(string fileName)
{
PDDocument doc = PDDocument.load(new java.io.File(fileName));
java.util.List annots = doc.getPage(0).getAnnotations();
for (int i = 0; i < annots.size(); ++i)
{
PDAnnotation annot = (PDAnnotation)annots.get(i);
annot.setLocked(true);
annot.setReadOnly(true);
annot.setNoRotate(true);
}
doc.save(fileName);
doc.close();
}
This effectively locks all markups in the document and they will no longer be editable.
pdfbox c# annotations

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to set plain header in docx file using apache poi? - java

I would like to create a header for docx document using apache poi but I have difficulties. I have no working code to show. I would like to ask for some piece of code as starting point.

Related

Apache Poi XWPF - How do we split a docx into two sections?

iText Fill Form / Copy Page to new Document

Adding footer to ms word using POI api

Read .doc file content and write into pdf file in java

PDFBox: How to "flatten" a PDF-form?

Categories

Resources