Updating the text of a XWPFParagraph using Apache POI

Updating the text of a XWPFParagraph using Apache POI - java

I have been able to loop through all paragraphs in a document and get at the text and everything and I have read and understood how you can create a document from scratch. But how can I update and replace the text in a paragraph? I can do createRun in a paragraph but that will just create a new piece of text in it.
...
FileInputStream fis = new FileInputStream("Muu.docx");
XWPFDocument myDoc = new XWPFDocument(fis);
XWPFParagraph[] myParas = myDoc.getParagraphs();
...
My theory is that I need to get at the existing "run" in the paragraph I want to change, or delete the paragraph and add it again) but I cannot find methods to do that.

You can't change the text on a XWPFParagraph directly. A XWPFParagraph is made up of one or more XWPFRun instances. These provide the way to set the text.
To change the text, your code would want to be something like:
public void changeText(XWPFParagraph p, String newText) {
List<XWPFRun> runs = p.getRuns();
for(int i = runs.size() - 1; i > 0; i--) {
p.removeRun(i);
}
XWPFRun run = runs.get(0);
run.setText(newText, 0);
}
That will ensure you only have one text run (the first one), and will replace all the text to be what you provided.

Related

How to delete first character after table using POI

I am attempting to format a Word document that has multiple tables. I need to delete line breaks that occur after table. How to i achieve this programatically in Java ?
I am currently trying it with the following code and it does not work
org.apache.xmlbeans.XmlCursor cursor = xwpfTable.getCTTbl().newCursor();
cursor.toEndToken();
cursor.toNextToken();
cursor.removeChars(2);
Further Clarification : We are receiving non-formatted word files from external source. We need to eliminate paragraph (extra lines in-between tables) when the table has only 1 row. Currently I are using a macro and achieving this by code :
For Each t In doc.Tables
Set myrange = doc.Characters(t.Range.End + 1)
If myrange.Text = Chr(13) Then
myrange.Delete
End If
Thanks in advance
What I am trying to remove:

According to your screenshot you wants to remove empty paragraphs which are placed immediately after tables.
This is possible, although i am wondering why those paragraphs are there. After removing those paragraphs, in Word the tables are not more editable as single tables but only as rows within one table. Is this what you want?
Anyway, as said removing the empty paragraphs after the tables is possible. To do so, you could traversing the body elements of the document. If there is a XWPFTable immediately followed by a XWPFParagraph and this XWPFParagraph does not have any text runs in it, then remove that XWPFParagraph from the document.
Example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class WordRemoveEmptyParagraphs {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordTables.docx"));
int thisBodyElementPos = 0;
int nextBodyElementPos = 1;
IBodyElement thisBodyElement = null;
IBodyElement nextBodyElement = null;
if (document.getBodyElements().size() > 1) { // document must have at least two body elements
do {
thisBodyElement = document.getBodyElements().get(thisBodyElementPos);
nextBodyElement = document.getBodyElements().get(nextBodyElementPos);
if (thisBodyElement instanceof XWPFTable && nextBodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)nextBodyElement;
if (paragraph.getRuns().size() == 0) { // if paragraph does not have any text runs in it
document.removeBodyElement(nextBodyElementPos);
}
}
thisBodyElementPos++;
nextBodyElementPos = thisBodyElementPos + 1;
} while (nextBodyElementPos < document.getBodyElements().size());
}
FileOutputStream out = new FileOutputStream("./WordTablesChanged.docx");
document.write(out);
out.close();
document.close();
}
}

How to compute =SUM(Above) function in docx using apache poi

I am trying to work with apache poi for docx format file and I am stuck at using formulas in table. For instance see the image :
I did try setting text to "=SUM(ABOVE)" but it doesnt work this way.
I think I might need to set custom xml data here but I am not sure how to proceed. I tried following piece of code :
XWPFTable table = document.createTable();
//create first row
XWPFTableRow tableRowOne = table.getRow(0);
table.getRow(0).createCell();
table.getRow(0).getCell(0).setText("10");
table.getRow(0).createCell();
table.getRow(0).getCell(1).setText("=SUM(ABOVE)");

What I am doing in case of such requirements is as follows:
First, creating the simplest possible Word document having the required things in it using the Word GUI. Then have a look into what Word has created to get a idea what needs to be created using apache poi.
In concrete here:
Do creating the simplest possible table in Word which has a field {=SUM(ABOVE)} in it. Save that as *.docx. Now unzip that *.docx (Office Open XML files like *.docx are simply ZIP archive). Have a look at /word/document.xml in that archive. There you will find something like:
<w:tc>
<w:p>
<w:fldSimple w:instr="=SUM(ABOVE)"/>
...
</w:p>
</w:tc>
This is XML for a table cell having a paragraph having a fldSimple element in it where instr attribute contains the formula.
Now we know, we need the table cell XWPFTableCell and the XWPFParagraph in it. Then we need set a fldSimple element in this paragaraph where instr attribute contains the formula.
This would be as simple as
paragraphInCell.getCTP().addNewFldSimple().setInstr("=SUM(ABOVE)");
But of course something must tell Word the need to calculate the formula when the document opens. The simplest solution for this is setting the field "dirty". That leads to the need for updating the field while opening the document in Word. It also leads to a confirming message dialog about the need for updating.
Complete example using apache poi 4.1.0:
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSimpleField;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STOnOff;
public class CreateWordTableSumAbove {
public static void main(String[] args) throws Exception {
XWPFDocument document= new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("The table:");
//create the table
XWPFTable table = document.createTable(4,3);
table.setWidth("100%");
for (int row = 0; row < 3; row++) {
for (int col = 0; col < 3; col++) {
if (col < 2) table.getRow(row).getCell(col).setText("row " + row + ", col " + col);
else table.getRow(row).getCell(col).setText("" + ((row + 1) * 1234));
}
}
//set Sum row
table.getRow(3).getCell(0).setText("Sum:");
//get paragraph from cell where the sum field shall be contained
XWPFParagraph paragraphInCell = null;
if (table.getRow(3).getCell(2).getParagraphs().size() == 0) paragraphInCell = table.getRow(3).getCell(2).addParagraph();
else paragraphInCell = table.getRow(3).getCell(2).getParagraphs().get(0);
//set sum field in
CTSimpleField sumAbove = paragraphInCell.getCTP().addNewFldSimple();
sumAbove.setInstr("=SUM(ABOVE)");
//set sum field dirty, so it must be calculated while opening the document
sumAbove.setDirty(STOnOff.TRUE);
paragraph = document.createParagraph();
FileOutputStream out = new FileOutputStream("create_table.docx");
document.write(out);
out.close();
document.close();
}
}
That all only works properly when the document is opened using Microsoft Word. LibreOffice Writer is not able storing such formula fields into Office Open XML (*.docx) format nor is it able reading such Office Open XML formula fields properly.

How to add a hyperlink to a XWPFRun

I want to format the text of a XWPF Run as a hyperlink. I am able to add it to the paragraph with the code given below but the adds it in a separate line.
public static void appendExternalHyperlink(String url, String text, XWPFParagraph paragraph){
//Add the link as External relationship
String id=paragraph.getDocument().getPackagePart().addExternalRelationship(url, XWPFRelation.HYPERLINK.getRelation()).getId();
//Append the link and bind it to the relationship
CTHyperlink cLink=paragraph.getCTP().addNewHyperlink();
cLink.setId(id);
//Create the linked text
CTText ctText=CTText.Factory.newInstance();
ctText.setStringValue(text);
CTR ctr=CTR.Factory.newInstance();
ctr.setTArray(new CTText[]{ctText});
CTRPr rpr = ctr.addNewRPr();
CTColor colour = CTColor.Factory.newInstance();
colour.setVal("0000FF"); rpr.setColor(colour);
CTRPr rpr1 = ctr.addNewRPr(); rpr1.addNewU().setVal(STUnderline.SINGLE);
//Insert the linked text into the link
cLink.setRArray(new CTR[]{ctr});
}
And I invoke it like:
XWPFParagraph eduPara = doc.createParagraph();
eduPara.setAlignment(ParagraphAlignment.LEFT);
eduPara.setVerticalAlignment(TextAlignment.TOP);
XWPFRun eduRun7 = eduPara.createRun();
appendExternalHyperlink(center.getEduImpFile(), center.getEduImpFile(), eduPara);
eduRun7.addBreak();
Here center is an object that holds the values I need to print.The get functions give output in String format.
The output I get is as follows:
Program Output
I want the hyperlink to be in the same line as the previous run generating the text "File uploaded:"

This was a mistake on my part as it was going to the next-line because there was not enough space to place the line.

Copy contents from docx with bullets intact with Apache POI

I am trying to copy contents from a docx file to the clipboard eventually. The code I have come up with so far is:
package config;
public class buffer {
public static void main(String[] args) throws IOException, XmlException {
XWPFDocument srcDoc = new XWPFDocument(new FileInputStream("D:\\rules.docx"));
XWPFDocument destDoc = new XWPFDocument();
OutputStream out = new FileOutputStream("D:\\test.docx");
for (IBodyElement bodyElement : srcDoc.getBodyElements()) {
XWPFParagraph srcPr = (XWPFParagraph) bodyElement;
XWPFParagraph dstPr = destDoc.createParagraph();
dstPr.createRun();
int pos = destDoc.getParagraphs().size() - 1;
destDoc.setParagraph(srcPr, pos);
}
destDoc.write(out);
out.close();
}
}
This does fetch the bullets but numbers them. I want to retain the original bullet format. Is there a way to do this?

You'll need to handle the numbering definition (in the numbering part) correctly.
The most reliable thing to do would be to copy the definition (both the instance list and the abstract one) across, and renumber it (ie give it a new ID) so that it is unique.
Then of course you'll need to update the ID's in your paragraph to match.
Note that the above is a solution only for the question you have asked.
You'll run into problems if your content contains a rel to some other part (eg an image). And you'tr not handling the style definition etc.

How do you find/replace a placeholder in a .docx file with Apache POI?

I have a file, "template.docx" that I would like to have placeholders (ie. [serial number]) that can be replaced with a string or maybe a table. I am using Apache POI and no i cannot use docx4j.
Is there a way to have the program iterate over all occurrences of "[serial number]" and replace them with a string? Many of these tags will be inside a large table so is there some equivalent command with the Apache POI to just pressing ctrl+f in word and using replace all?
Any suggestions would be appreciated, thanks

XWPFDocument (docx) has different kind of sub-elements like XWPFParagraphs, XWPFTables, XWPFNumbering etc.
Once you create XWPFDocument object via:
document = new XWPFDocument(inputStream);
You can iterate through all of Paragraphs:
document.getParagraphsIterator();
When you iterator through Paragraphs, For each Paragraph you will get multiple XWPFRuns which are multiple text blocks with same styling, some times same styling text blocks will be split into multiple XWPFRuns in which case you should look into this question to avoid splitting of your Runs, doing so will help identify your placeHolders without merging multiple Runs within same Paragraph. At this point you should expect that your placeHolder will not be split in multiple runs if that's the case then you can go ahead and Iterate over 'XWPFRun's for each paragraph and look for text matching your placeHolder, something like this will help:
XWPFParagraph para = (XWPFParagraph) xwpfParagraphElement;
for (XWPFRun run : para.getRuns()) {
if (run.getText(0) != null) {
String text = run.getText(0);
Matcher expressionMatcher = expression.matcher(text);
if (expressionMatcher.find() && expressionMatcher.groupCount() > 0) {
System.out.println("Expression Found...");
}
}
}
Where expressionMatcher is Matcher based on a RegularExpression for particular PlaceHolder. Try having regex that matches something optional before your PlaceHolder and after as well e.g \([]*)(PlaceHolderGroup)([]*)^, trust me it works best.
Once you find the right XWPFRun extract text of your interest in it and create a replacement text which should be easy enough, then you should replace new text with previous text in this particular run by:
run.setText(text, 0);
If you were to replace this whole XWPFRun with a completely a new XWPFRun or perhaps insert a new Paragraph/Table after the Paragraph owning this run, you would probably run into a few problems, like A. ConcurrentModificationException which means you cannot modify this List(of XWPFRuns) you are iterating and B. finding the position of new Element to insert. To resolve these issues you should have a List<XWPFParagraph> of XWPFParagarphs that can hold paras after which new Element is to be inserted. Once you have your List of replacement you can iterator over it and for each replacement Paragraph you simply get a cursor and insert new element at that cursor:
for (XWPFParagraph para: paras) {
XmlCursor cursor = (XmlCursor) para.getCTP().newCursor();
XWPFTable newTable = para.getBody().insertNewTbl(cursor);
//Generate your XWPF table based on what's inside para with your own logic
}
To create an XWPFTable, read this.
Hope this helps someone.

// Text nodes begin with w:t in the word document
final String XPATH_TO_SELECT_TEXT_NODES = "//w:t";
try {
// Open the input file
String fileName="test.docx";
String[] splited=fileName.split(".");
File dir=new File("D:\\temp\\test.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new FileInputStream(dir));
// Build a list of "text" elements
List<?> texts = wordMLPackage.getMainDocumentPart().getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);
HashMap<String, String> mappings = new HashMap<String, String>();
mappings.put("1", "one");
mappings.put("2", "two");
// Loop through all "text" elements
Text text = null;
for (Object obj : texts) {
text = (Text) ((JAXBElement<?>) obj).getValue();
String textToReplace = text.getValue();
if (mappings.keySet().contains(textToReplace)) {
text.setValue(mappings.get(textToReplace));
}
}
wordMLPackage.save(new java.io.File("D:/temp/forPrint.docx"));//your path
} catch (Exception e) {
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Updating the text of a XWPFParagraph using Apache POI - java

Related

How to delete first character after table using POI

How to compute =SUM(Above) function in docx using apache poi

How to add a hyperlink to a XWPFRun

Copy contents from docx with bullets intact with Apache POI

How do you find/replace a placeholder in a .docx file with Apache POI?

Categories

Resources