Apache POI: ${my_placeholder} is treated as three different runs

Apache POI: ${my_placeholder} is treated as three different runs - java

I have a .docx template with placeholders to be filled, such as ${programming_language}, ${education}, etc.
The placeholder keywords must be easily distinguished from the other plain words, hence they are enclosed with ${ }.
for (XWPFTable table : doc.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("run text: " + run.text());
/** replace text here, etc. */
}
}
}
}
}
I want to extract the placeholders together with the enclosing ${ } characters. The problem is, that is seems like the enclosing characters are treated as different runs...
run text: ${
run text: programming_language
run text: }
run text: Some plain text here
run text: ${
run text: education
run text: }
Instead, I would like to achieve the following effect:
run text: ${programming_language}
run text: Some plain text here
run text: ${education}
I have tried using other enclosing characters, such as: { }, < >, # #, etc.
I do not want to do some weird concatenations of runs, etc. I want to have it in a single XWPFRun.
If I cannot find the proper solution, I will just make it like so: VAR_PROGRAMMING_LANGUGE, VAR_EDUCATION, I think.

Current apache poi 4.1.2 provides TextSegment to deal with those Word text-run issues. XWPFParagraph.searchText searches for a string in a paragraph and returns a TextSegment. This provides access to the begin run and the end run of that text in that paragraph (BeginRun and EndRun). It also provides access to the start character position in begin run and end character position in end run (BeginChar and EndChar).
It additionally provides access to the index of the text element in the text run (BeginText and EndText). This always should be 0, because default text runs only have one text element.
Having this, we can do the following:
Replace the found partial string in begin run by the replacement. To do so, get the text part which was before the searched string and concatenate the replacement to it. After that the begin run fully contains the replacement.
Delete all text runs between begin run and end run as they contain parts of the searched string which is not more needed.
Let remain only the text part after the searched string in end run.
Doing so we are able replacing text which is in multiple text runs.
Following example shows this.
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String textToFind = "${This is the text to find}"; // might be in different runs
String replacement = "Replacement text";
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
replaceTextSegment(paragraph, textToFind, replacement);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
}
}
Above code works not in all cases because XWPFParagraph.searchText has bugs. So I will provide a better searchText method:
/**
* this methods parse the paragraph and search for the string searched.
* If it finds the string, it will return true and the position of the String
* will be saved in the parameter startPos.
*
* #param searched
* #param startPos
*/
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
int startRun = startPos.getRun(),
startText = startPos.getText(),
startChar = startPos.getChar();
int beginRunPos = 0, candCharPos = 0;
boolean newList = false;
//CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
java.util.List<XWPFRun> runs = paragraph.getRuns();
int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
//for (int runPos = startRun; runPos < rArray.length; runPos++) {
for (int runPos = startRun; runPos < runs.size(); runPos++) {
//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
int textPos = 0, charPos;
//CTR ctRun = rArray[runPos];
CTR ctRun = runs.get(runPos).getCTR();
XmlCursor c = ctRun.newCursor();
c.selectPath("./*");
try {
while (c.toNextSelection()) {
XmlObject o = c.getObject();
if (o instanceof CTText) {
if (textPos >= startText) {
String candidate = ((CTText) o).getStringValue();
if (runPos == startRun) {
charPos = startChar;
} else {
charPos = 0;
}
for (; charPos < candidate.length(); charPos++) {
if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
beginTextPos = textPos;
beginCharPos = charPos;
beginRunPos = runPos;
newList = true;
}
if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
if (candCharPos + 1 < searched.length()) {
candCharPos++;
} else if (newList) {
TextSegment segment = new TextSegment();
segment.setBeginRun(beginRunPos);
segment.setBeginText(beginTextPos);
segment.setBeginChar(beginCharPos);
segment.setEndRun(runPos);
segment.setEndText(textPos);
segment.setEndChar(charPos);
return segment;
}
} else {
candCharPos = 0;
}
}
}
textPos++;
} else if (o instanceof CTProofErr) {
c.removeXml();
} else if (o instanceof CTRPr) {
//do nothing
} else {
candCharPos = 0;
}
}
} finally {
c.dispose();
}
}
return null;
}
This will be called like:
...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...

Just like someone has commented your question, you can't have control where or when Word will split the paragraph in some runs. If the other answer still didn't help you, then I have the way I got around it:
First of all, this "solution" have a big problem, but still, I will put it here for the reason that someone can solve it.
public void mainMethod(XWPFParagraph paragraph) {
if (paragraph.getRuns().size() > 1) {
String myRun = unifyRuns(paragraph.getRuns());
// make the verification of placeholders ${...}
paragraph.getRuns().get(0).setText(myRun);
while(paragraph.getRuns().size() > 1) {
paragraph.removeRun(1);
}
}
}
private String unifyRuns(List<XWPFRun> runElements) {
StringBuilder unifiedRun = new StringBuilder();
for (XWPFRun run : runElements) {
unifiedRun.append(run);
}
return unifiedRun.toString();
}
The code may contain some error since I'm doing it as I remember.
The problem here is that when Word separates paragraphs into runs, it doesn't do it for nothing, because when there are texts with different fonts (like font-family or font-size), it separates the texts in different runs.
In the text "Here's my bold text", Word will split the text to separate the bold and normal text. Then, the code above is a bad solution if you are using POI to create large documents with different types of fonts. In that case you would need to verify first if the run is actualy in bold, then you will treat the placeholders.
Again, this a "solution" that i found, and it's not complete yet. Sorry for english errors, i'm using Google Translate to write this answer.

Related

Checking if item lore contains contains string (loren.contains("§eSigned from "))

I only want to check for:
if (lore.contains("§eSigned of ")) {
but it doesn't get that it does contain "§eSigned of "
I wrote a Minecraft Command /sign you can add a lore to an item ("Signed of playerrank | playername").
Then i wanted to add an /unsign command to remove this lore.
ItemStack is = p.getItemInHand();
ItemMeta im = is.getItemMeta();
List<String> lore = im.hasLore() ? im.getLore() : new ArrayList<String>();
if (lore.contains("§eSigned of " + getChatName(p))) { // this line is important!
for (int i = 0; i < 3; i++) {
int size = lore.size();
lore.remove(size - 1);
}
im.setLore(lore);
is.setItemMeta(im);
p.setItemInHand(is);
sendMessage(p, "§aThis item is no longer signed");
} else {
sendMessage(p, "§aThis item is not signed!");
}
return CommandResult.None;
Everything works fine until you e.g. change your name. than you can't remove the sign because getChatName(p) has changed.
To fix this i only want to check
if (lore.contains("§eSigned of ")) {
but than it doesn't get it and returns false. (it says lore does not contain "§eSigned of ")
I tried a lot but it only works with the string "§eSigned of " and getChatName(p).
As the documentation "contains" searches for the specific string so it should work as I thought right?
Add:
getChatName(p) returns the rank of the player and the playername like: "Member | domi"
sendMessage(p, "") sends a simple message in the Minecraft chat

The problem you run into is that contains(String) looks for a matching string. What you search for is a check if any string in the list starts with "§eSigned of ".
I would suggest adding a function isSignedItem like this:
private boolean isSignedItem(List<String> lore) {
for (String st : lore)
if (st.startsWith("§eSigned of "))
return true;
return false;
}
and then to use this function to check if the item is signed or not:
[...]
List<String> lore = im.hasLore() ? im.getLore() : new ArrayList<String>();
if (isSignedItem(lore)) { // this line is important!
for (int i = 0; i < 3; i++) {
int size = lore.size();
lore.remove(size - 1);
}
[...]

How to get hyperlink boundaries of inline words with Aspose Words for Androd?

The android app reading paragraphs and some properties in Ms Word document with Aspose Words for Android library. It's getting paragraph text, style name and is seperated value. There are some words have hyperlink in paragraph line. How to get start and end boundaries of the hyperlink of words? For example:
This is an inline hyperlink paragraph example that the start bound is 18 and end bound is 27.
public static ArrayList<String[]> GetBookLinesByTag(String file) {
ArrayList<String[]> bookLines = new ArrayList<>();
try {
Document doc = new Document(file);
ParagraphCollection paras = doc.getFirstSection().getBody().getParagraphs();
for(int i = 0; i < paras.getCount(); i++){
String styleName = paras.get(i).getParagraphFormat().getStyleName().trim();
String isStyleSeparator = Integer.toString(paras.get(i).getBreakIsStyleSeparator() ? 1 : 0);
String content = paras.get(i).toString(SaveFormat.TEXT).trim();
bookLines.add(new String[]{content, styleName, isStyleSeparator});
}
} catch (Exception e){}
return bookLines;
}
Edit:
Thanks Alexey Noskov, solved with you.
public static ArrayList<String[]> GetBookLinesByTag(String file) {
ArrayList<String[]> bookLines = new ArrayList<>();
try {
Document doc = new Document(file);
ParagraphCollection paras = doc.getFirstSection().getBody().getParagraphs();
for(int i = 0; i < paras.getCount(); i++){
String styleName = paras.get(i).getParagraphFormat().getStyleName().trim();
String isStyleSeparator = Integer.toString(paras.get(i).getBreakIsStyleSeparator() ? 1 : 0);
String content = paras.get(i).toString(SaveFormat.TEXT).trim();
for (Field field : paras.get(i).getRange().getFields()) {
if (field.getType() == FieldType.FIELD_HYPERLINK) {
FieldHyperlink hyperlink = (FieldHyperlink) field;
String urlId = hyperlink.getSubAddress();
String urlText = hyperlink.getResult();
// Reformat linked text: urlText:urlId
content = urlText + ":" + urlId;
}
}
bookLines.add(new String[]{content, styleName, isStyleSeparator});
}
} catch (Exception e){}
return bookLines;
}

Hyperlinks in MS Word documents are represented as fields. If you press Alt+F9 in MS Word you will see something like this
{ HYPERLINK "https://aspose.com" }
Follow the link to learn more about fields in Aspose.Words document model and in MS Word.
https://docs.aspose.com/display/wordsjava/Introduction+to+Fields
In your case you need to locate position of FieldStart – this will be the start position, then measure length of content between FieldSeparator and FieldEnd – start position plus the calculated length will the end position.
Disclosure: I work at Aspose.Words team.

Replace a word in a XSLFTextRun with Apache POI

I am using Apache POI to modify a pptx. I am trying to replace one word in a XSLFTextShape while keeping the formatting of the other words in the XSLFTextShape .
What I tried so far is the following:
private static void replaceText(XSLFTextShape textShape, String marker, String newText){
textShape.setText(textShape.getText().replace(marker, newText));
}
This is replacing the word I want, but the formatting of the other words in the same XSLFTextShape is changed. For example: If I have a word in the same XSLFTextShape which is red, the color of this word is changed to black even though I am not changing anything in this word.
Therefore I tried to replace the word in the XSLFTextRun. This is the code I wrote:
private static void replaceText(XSLFTextShape textShape, String marker, String newText){
List<XSLFTextParagraph> textParagraphList = textShape.getTextParagraphs();
textParagraphList.forEach(textParagraph -> {
List<XSLFTextRun> textRunList = textParagraph.getTextRuns();
textRunList.forEach(textRun -> {
if(textRun.getRawText().contains(marker)){
textRun.setText(textRun.getRawText().replace(marker, newText));
}
});
});
//String text = textShape.getText();
//textShape.setText(textShape.getText().replace(marker, newText));
//String text2 = textShape.getText();
}
I am not getting any error when running this code, but the word is not replaced and I really don't get why.
If I add the line textShape.setText(textShape.getText().replace(marker, newText)); it is replaced. But while debugging, I see that textShape.getText()gives the same result before and after this line.
Thanks for your help!

Next example work:
private static void replace(XSLFTextParagraph paragraph, String searchValue, String replacement) {
List<XSLFTextRun> textRuns = paragraph.getTextRuns();
for (XSLFTextRun textRun : textRuns) {
if (hasReplaceableItem(textRun.getRawText(), searchValue)) {
String replacedText = StringUtils.replace(textRun.getRawText(), searchValue, replacement);
textRun.setText(replacedText);
break;
}
}
}
private static boolean hasReplaceableItem(String runText, String searchValue) {
return StringUtils.contains(runText, searchValue);
}
For example I'm used:
org.apache.poi:poi-ooxml:5.0.0
org.apache.commons:commons-lang3:3.9
Pictures before and after:

Ok, so I made it work somehow with a (not very nice) workaround:
I am saving the text and style per paragraph, delete the text in the shape and then add new paragraphs with the textruns (including style). Here is the code:
private static void replaceTextButKeepStyle(XSLFTextShape textShape, final String marker, final String text) {
List<XSLFTextParagraph> textParagraphList = textShape.getTextParagraphs();
ArrayList<ArrayList<TextAndStyle>> textAndStylesTable = new ArrayList<>();
for (int i = 0; i < textParagraphList.size(); i++) {
ArrayList<TextAndStyle> textAndStylesParagraph = new ArrayList<>();
XSLFTextParagraph textParagraph = textParagraphList.get(i);
List<XSLFTextRun> textRunList = textParagraph.getTextRuns();
for (Iterator it2 = textRunList.iterator(); it2.hasNext(); ) {
Object object = it2.next();
XSLFTextRun textRun = (XSLFTextRun) object;
//get Color:
PaintStyle.SolidPaint solidPaint = (PaintStyle.SolidPaint) textRun.getFontColor();
int color = solidPaint.getSolidColor().getColor().getRGB();
//save text & styles:
TextAndStyle textAndStyle = new TextAndStyle(textRun.getRawText(), textRun.isBold(),
color, textRun.getFontSize());
//replace text if marker:
if (textAndStyle.getText().contains(marker)) {
textAndStyle.setText(textAndStyle.getText().replace(marker, text));
}
textAndStylesParagraph.add(textAndStyle);
}
textAndStylesTable.add(textAndStylesParagraph);
}
//delete text and add new text with correct styles:
textShape.clearText();
textAndStylesTable.forEach(textAndStyles -> {
XSLFTextParagraph textParagraph = textShape.addNewTextParagraph();
textAndStyles.forEach(textAndStyle -> {
TextRun newTextrun = textParagraph.addNewTextRun();
newTextrun.setText(textAndStyle.getText());
newTextrun.setFontColor(new Color(textAndStyle.getColorRgb()));
newTextrun.setBold(textAndStyle.isBold());
newTextrun.setFontSize(textAndStyle.getFontsize());
});
});
}

How to Highlight a text for a Pargraph in MS word using POI

I am developing a compare tool for the word document, whenever there is difference in both the document i need to highlight the substring in the paragraph.When i try to highlight using run, its highlighting whole paragraph instead of the sub string.
Can you please guide us, how can i achieve this for a substring.

I had the same problem. Here I post a sample method where you highlight a substring contained in a run.
private int highlightSubsentence(String sentence, XWPFParagraph p, int i) {
//get the current run Style - here I might need to save the current style
XWPFRun currentRun = p.getRuns().get(i);
String currentRunText = currentRun.text();
int sentenceLength = sentence.length();
int sentenceBeginIndex = currentRunText.indexOf(sentence);
int addedRuns = 0;
p.removeRun(i);
//Create, if necessary, a run before the highlight part
if (sentenceBeginIndex > 0) {
XWPFRun before = p.insertNewRun(i);
before.setText(currentRunText.substring(0, sentenceBeginIndex));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
// highlight the interesting part
XWPFRun sentenceRun = p.insertNewRun(i + addedRuns);
sentenceRun.setText(currentRunText.substring(sentenceBeginIndex, sentenceBeginIndex + sentenceLength));
currentStyle.copyStyle(sentenceRun);
CTShd cTShd = sentenceRun.getCTR().addNewRPr().addNewShd();
cTShd.setFill("00FFFF");
//Create, if necessary, a run after the highlight part
if (sentenceBeginIndex + sentenceLength != currentRunText.length()) {
XWPFRun after = p.insertNewRun(i + addedRuns + 1);
after.setText(currentRunText.substring(sentenceBeginIndex + sentenceLength));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
return addedRuns;
}
You might need to save the formatting style of the run you delete in order to have the new runs with the old formatting.
Also, if the string you need to highlight is spread over more than one run, you will need to highlight all of them, but the core method is the one I posted.
On the Style Question:
I had a class Style that saved all the Styles of the old Run in private fields (for the respective classes you can look at what XWPFRun returns.
These are the sub-styles that I needed. There are others obviously I didn't cover
Style(XWPFRun run, XWPFDefaultRunStyle defaultRunStyle) {
fontSize = run.getFontSize();
fontFamily = run.getFontFamily();
bold = run.isBold();
italic = run.isItalic();
strike = run.isStrikeThrough();
underline = run.getUnderline();
color = run.getColor();
shadingColor = getShadeColor(run);
highlightColor = getHighlightedColor(run);
}
I copied the old style in the new run when needed.
public void copyStyle(XWPFRun newRun) {
if (fontSize != -1) {
newRun.setFontSize(fontSize);
}
newRun.setFontFamily(fontFamily);
newRun.setBold(bold);
newRun.setItalic(italic);
newRun.setStrikeThrough(strike);
newRun.setColor(color);
newRun.setUnderline(underline);
if (shadingColor != null) {
addShading(newRun, shadingColor);
}
if (highlightColor != null) {
addHighlight(newRun, highlightColor);
}
}
To add Shading and Highligh already present I used:
public static void addHighlight(XWPFRun run, STHighlightColor.Enum hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getHighlight() == null) {
run.getCTR().getRPr().addNewHighlight();
}
run.getCTR().getRPr().getHighlight().setVal(hexColor);
}
public static void addShading(XWPFRun run, Object hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getShd() == null) {
run.getCTR().getRPr().addNewShd();
}
run.getCTR().getRPr().getShd().setFill(hexColor);
}

Create table in word doc using docx4j in specific bookmark:

I’m creating a table in specific bookmark location (means it's adisplay table in word doc), but after converting word to PDF it can't show table in PDF because of bookmark is inside a w:p!!
<w:p w:rsidR="00800BD9" w:rsidRDefault="00800BD9">
<w:bookmarkStart w:id="0" w:name="abc"/>
<w:bookmarkEnd w:id="0"/>
</w:p>
Now I want to find the paragraph (using the bookmark), then replace the paragraph with the table.
Does anybody have any suggestion?
Here is my code:
private void replaceBookmarkContents(List<Object> paragraphs, Map<DataFieldName, String> data) throws Exception {
Tbl table = TblFactory.createTable(3,3,9600);
RangeFinder rt = new RangeFinder("CTBookmark", "CTMarkupRange");
new TraversalUtil(paragraphs, rt);
for (CTBookmark bm : rt.getStarts()) {
// do we have data for this one?
if (bm.getName()==null) continue;
String value = data.get(new DataFieldName(bm.getName()));
if (value==null) continue;
try {
// Can't just remove the object from the parent,
// since in the parent, it may be wrapped in a JAXBElement
List<Object> theList = null;
if (bm.getParent() instanceof P) {
System.out.println("OK!");
theList = ((ContentAccessor)(bm.getParent())).getContent();
} else {
continue;
}
int rangeStart = -1;
int rangeEnd=-1;
int i = 0;
for (Object ox : theList) {
Object listEntry = XmlUtils.unwrap(ox);
if (listEntry.equals(bm)) {
if (DELETE_BOOKMARK) {
rangeStart=i;
} else {
rangeStart=i+1;
}
} else if (listEntry instanceof CTMarkupRange) {
if ( ((CTMarkupRange)listEntry).getId().equals(bm.getId())) {
if (DELETE_BOOKMARK) {
rangeEnd=i;
} else {
rangeEnd=i-1;
}
break;
}
}
i++;
}
if (rangeStart>0) {
// Delete the bookmark range
for (int j=rangeStart; j>0; j--) {
theList.remove(j);
}
// now add a run
org.docx4j.wml.R run = factory.createR();
run.getContent().add(table);
theList.add(rangeStart, run);
}
} catch (ClassCastException cce) {
log.error(cce.getMessage(), cce);
}
}
}

The problem with blindly replacing a bookmark inside a paragraph with a table is that you'll end up with w:p/w:tbl (or with your code, w:p/w:r/w:tbl!) which is not allowed by the file format.
To avoid this issue, you could make the bookmark a sibling of the paragraph, or you could change your code so that once you have found a bookmark inside a paragraph, if you are replacing it with a table, you replace the parent p instead.
Note that bookmark find/replace is a brittle basis for document generation. If your requirements are other than modest, you'd be better off using content controls instead.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache POI: ${my_placeholder} is treated as three different runs - java

Related

Checking if item lore contains contains string (loren.contains("§eSigned from "))

How to get hyperlink boundaries of inline words with Aspose Words for Androd?

Replace a word in a XSLFTextRun with Apache POI

How to Highlight a text for a Pargraph in MS word using POI

Create table in word doc using docx4j in specific bookmark:

Categories

Resources