Replace a word in a XSLFTextRun with Apache POI

Replace a word in a XSLFTextRun with Apache POI - java

I am using Apache POI to modify a pptx. I am trying to replace one word in a XSLFTextShape while keeping the formatting of the other words in the XSLFTextShape .
What I tried so far is the following:
private static void replaceText(XSLFTextShape textShape, String marker, String newText){
textShape.setText(textShape.getText().replace(marker, newText));
}
This is replacing the word I want, but the formatting of the other words in the same XSLFTextShape is changed. For example: If I have a word in the same XSLFTextShape which is red, the color of this word is changed to black even though I am not changing anything in this word.
Therefore I tried to replace the word in the XSLFTextRun. This is the code I wrote:
private static void replaceText(XSLFTextShape textShape, String marker, String newText){
List<XSLFTextParagraph> textParagraphList = textShape.getTextParagraphs();
textParagraphList.forEach(textParagraph -> {
List<XSLFTextRun> textRunList = textParagraph.getTextRuns();
textRunList.forEach(textRun -> {
if(textRun.getRawText().contains(marker)){
textRun.setText(textRun.getRawText().replace(marker, newText));
}
});
});
//String text = textShape.getText();
//textShape.setText(textShape.getText().replace(marker, newText));
//String text2 = textShape.getText();
}
I am not getting any error when running this code, but the word is not replaced and I really don't get why.
If I add the line textShape.setText(textShape.getText().replace(marker, newText)); it is replaced. But while debugging, I see that textShape.getText()gives the same result before and after this line.
Thanks for your help!

Next example work:
private static void replace(XSLFTextParagraph paragraph, String searchValue, String replacement) {
List<XSLFTextRun> textRuns = paragraph.getTextRuns();
for (XSLFTextRun textRun : textRuns) {
if (hasReplaceableItem(textRun.getRawText(), searchValue)) {
String replacedText = StringUtils.replace(textRun.getRawText(), searchValue, replacement);
textRun.setText(replacedText);
break;
}
}
}
private static boolean hasReplaceableItem(String runText, String searchValue) {
return StringUtils.contains(runText, searchValue);
}
For example I'm used:
org.apache.poi:poi-ooxml:5.0.0
org.apache.commons:commons-lang3:3.9
Pictures before and after:

Ok, so I made it work somehow with a (not very nice) workaround:
I am saving the text and style per paragraph, delete the text in the shape and then add new paragraphs with the textruns (including style). Here is the code:
private static void replaceTextButKeepStyle(XSLFTextShape textShape, final String marker, final String text) {
List<XSLFTextParagraph> textParagraphList = textShape.getTextParagraphs();
ArrayList<ArrayList<TextAndStyle>> textAndStylesTable = new ArrayList<>();
for (int i = 0; i < textParagraphList.size(); i++) {
ArrayList<TextAndStyle> textAndStylesParagraph = new ArrayList<>();
XSLFTextParagraph textParagraph = textParagraphList.get(i);
List<XSLFTextRun> textRunList = textParagraph.getTextRuns();
for (Iterator it2 = textRunList.iterator(); it2.hasNext(); ) {
Object object = it2.next();
XSLFTextRun textRun = (XSLFTextRun) object;
//get Color:
PaintStyle.SolidPaint solidPaint = (PaintStyle.SolidPaint) textRun.getFontColor();
int color = solidPaint.getSolidColor().getColor().getRGB();
//save text & styles:
TextAndStyle textAndStyle = new TextAndStyle(textRun.getRawText(), textRun.isBold(),
color, textRun.getFontSize());
//replace text if marker:
if (textAndStyle.getText().contains(marker)) {
textAndStyle.setText(textAndStyle.getText().replace(marker, text));
}
textAndStylesParagraph.add(textAndStyle);
}
textAndStylesTable.add(textAndStylesParagraph);
}
//delete text and add new text with correct styles:
textShape.clearText();
textAndStylesTable.forEach(textAndStyles -> {
XSLFTextParagraph textParagraph = textShape.addNewTextParagraph();
textAndStyles.forEach(textAndStyle -> {
TextRun newTextrun = textParagraph.addNewTextRun();
newTextrun.setText(textAndStyle.getText());
newTextrun.setFontColor(new Color(textAndStyle.getColorRgb()));
newTextrun.setBold(textAndStyle.isBold());
newTextrun.setFontSize(textAndStyle.getFontsize());
});
});
}

Related

How to properly handle comma inside a quoted string using opencsv?

I'm trying to read csv file that contains strings both quoted and not.
If string is quoted, it should save it's quote chars.
Beside that, if string contains comma, it should not be split.
I've tried multiple ways but nothing works as of now.
Current test data:
"field1 (with use of , we lose the other part)",some description
field2,"Dear %s, some text"
Getting 1st field of mapped bean
Expected result:
"field1 (with use of , we lose the other part)"
field2
Current result:
"field1 (with use of
field2
Here is the code:
public class CsvToBeanReaderTest {
#Test
void shouldIncludeDoubleQuotes() {
String testData =
"\"field1 (with use of , we lose the other part)\",some description\n"
+
"field2,\"Dear %s, some text\"";
RFC4180ParserBuilder rfc4180ParserBuilder = new RFC4180ParserBuilder();
rfc4180ParserBuilder.withQuoteChar(ICSVWriter.NO_QUOTE_CHARACTER);
ICSVParser rfc4180Parser = rfc4180ParserBuilder.build();
CSVReaderBuilder builder = new CSVReaderBuilder(new StringReader(testData));
CSVReader reader = builder
.withCSVParser(rfc4180Parser)
.build();
List<TestClass> result = new CsvToBeanBuilder<TestClass>(reader)
.withType(TestClass.class)
.withEscapeChar('\"')
.build()
.parse();
result.forEach(testClass -> System.out.println(testClass.getField1()));
}
private List<TestClass> readTestData(String testData) {
return new CsvToBeanBuilder<TestClass>(new StringReader(testData))
.withType(TestClass.class)
.withSeparator(',')
.withSkipLines(0)
.withIgnoreEmptyLine(true)
.build()
.parse();
}
public static final class TestClass {
#CsvBindByPosition(position = 0)
private String field1;
#CsvBindByPosition(position = 1)
private String description;
public String toCsvFormat() {
return String.join(",",
field1,
description);
}
public String getField1() {
return field1;
}
}
}
I've found out that if I comment or remove rfc4180ParserBuilder.withQuoteChar(ICSVWriter.NO_QUOTE_CHARACTER); the string will be parsed correctly, but I will lose the quote char which should not be lost. Is there any suggestions what can be done? (I would prefer not to switch on other csv libraries)

Apache POI: ${my_placeholder} is treated as three different runs

I have a .docx template with placeholders to be filled, such as ${programming_language}, ${education}, etc.
The placeholder keywords must be easily distinguished from the other plain words, hence they are enclosed with ${ }.
for (XWPFTable table : doc.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("run text: " + run.text());
/** replace text here, etc. */
}
}
}
}
}
I want to extract the placeholders together with the enclosing ${ } characters. The problem is, that is seems like the enclosing characters are treated as different runs...
run text: ${
run text: programming_language
run text: }
run text: Some plain text here
run text: ${
run text: education
run text: }
Instead, I would like to achieve the following effect:
run text: ${programming_language}
run text: Some plain text here
run text: ${education}
I have tried using other enclosing characters, such as: { }, < >, # #, etc.
I do not want to do some weird concatenations of runs, etc. I want to have it in a single XWPFRun.
If I cannot find the proper solution, I will just make it like so: VAR_PROGRAMMING_LANGUGE, VAR_EDUCATION, I think.

Current apache poi 4.1.2 provides TextSegment to deal with those Word text-run issues. XWPFParagraph.searchText searches for a string in a paragraph and returns a TextSegment. This provides access to the begin run and the end run of that text in that paragraph (BeginRun and EndRun). It also provides access to the start character position in begin run and end character position in end run (BeginChar and EndChar).
It additionally provides access to the index of the text element in the text run (BeginText and EndText). This always should be 0, because default text runs only have one text element.
Having this, we can do the following:
Replace the found partial string in begin run by the replacement. To do so, get the text part which was before the searched string and concatenate the replacement to it. After that the begin run fully contains the replacement.
Delete all text runs between begin run and end run as they contain parts of the searched string which is not more needed.
Let remain only the text part after the searched string in end run.
Doing so we are able replacing text which is in multiple text runs.
Following example shows this.
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String textToFind = "${This is the text to find}"; // might be in different runs
String replacement = "Replacement text";
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
replaceTextSegment(paragraph, textToFind, replacement);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
}
}
Above code works not in all cases because XWPFParagraph.searchText has bugs. So I will provide a better searchText method:
/**
* this methods parse the paragraph and search for the string searched.
* If it finds the string, it will return true and the position of the String
* will be saved in the parameter startPos.
*
* #param searched
* #param startPos
*/
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
int startRun = startPos.getRun(),
startText = startPos.getText(),
startChar = startPos.getChar();
int beginRunPos = 0, candCharPos = 0;
boolean newList = false;
//CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
java.util.List<XWPFRun> runs = paragraph.getRuns();
int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
//for (int runPos = startRun; runPos < rArray.length; runPos++) {
for (int runPos = startRun; runPos < runs.size(); runPos++) {
//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
int textPos = 0, charPos;
//CTR ctRun = rArray[runPos];
CTR ctRun = runs.get(runPos).getCTR();
XmlCursor c = ctRun.newCursor();
c.selectPath("./*");
try {
while (c.toNextSelection()) {
XmlObject o = c.getObject();
if (o instanceof CTText) {
if (textPos >= startText) {
String candidate = ((CTText) o).getStringValue();
if (runPos == startRun) {
charPos = startChar;
} else {
charPos = 0;
}
for (; charPos < candidate.length(); charPos++) {
if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
beginTextPos = textPos;
beginCharPos = charPos;
beginRunPos = runPos;
newList = true;
}
if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
if (candCharPos + 1 < searched.length()) {
candCharPos++;
} else if (newList) {
TextSegment segment = new TextSegment();
segment.setBeginRun(beginRunPos);
segment.setBeginText(beginTextPos);
segment.setBeginChar(beginCharPos);
segment.setEndRun(runPos);
segment.setEndText(textPos);
segment.setEndChar(charPos);
return segment;
}
} else {
candCharPos = 0;
}
}
}
textPos++;
} else if (o instanceof CTProofErr) {
c.removeXml();
} else if (o instanceof CTRPr) {
//do nothing
} else {
candCharPos = 0;
}
}
} finally {
c.dispose();
}
}
return null;
}
This will be called like:
...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...

Just like someone has commented your question, you can't have control where or when Word will split the paragraph in some runs. If the other answer still didn't help you, then I have the way I got around it:
First of all, this "solution" have a big problem, but still, I will put it here for the reason that someone can solve it.
public void mainMethod(XWPFParagraph paragraph) {
if (paragraph.getRuns().size() > 1) {
String myRun = unifyRuns(paragraph.getRuns());
// make the verification of placeholders ${...}
paragraph.getRuns().get(0).setText(myRun);
while(paragraph.getRuns().size() > 1) {
paragraph.removeRun(1);
}
}
}
private String unifyRuns(List<XWPFRun> runElements) {
StringBuilder unifiedRun = new StringBuilder();
for (XWPFRun run : runElements) {
unifiedRun.append(run);
}
return unifiedRun.toString();
}
The code may contain some error since I'm doing it as I remember.
The problem here is that when Word separates paragraphs into runs, it doesn't do it for nothing, because when there are texts with different fonts (like font-family or font-size), it separates the texts in different runs.
In the text "Here's my bold text", Word will split the text to separate the bold and normal text. Then, the code above is a bad solution if you are using POI to create large documents with different types of fonts. In that case you would need to verify first if the run is actualy in bold, then you will treat the placeholders.
Again, this a "solution" that i found, and it's not complete yet. Sorry for english errors, i'm using Google Translate to write this answer.

How to get hyperlink boundaries of inline words with Aspose Words for Androd?

The android app reading paragraphs and some properties in Ms Word document with Aspose Words for Android library. It's getting paragraph text, style name and is seperated value. There are some words have hyperlink in paragraph line. How to get start and end boundaries of the hyperlink of words? For example:
This is an inline hyperlink paragraph example that the start bound is 18 and end bound is 27.
public static ArrayList<String[]> GetBookLinesByTag(String file) {
ArrayList<String[]> bookLines = new ArrayList<>();
try {
Document doc = new Document(file);
ParagraphCollection paras = doc.getFirstSection().getBody().getParagraphs();
for(int i = 0; i < paras.getCount(); i++){
String styleName = paras.get(i).getParagraphFormat().getStyleName().trim();
String isStyleSeparator = Integer.toString(paras.get(i).getBreakIsStyleSeparator() ? 1 : 0);
String content = paras.get(i).toString(SaveFormat.TEXT).trim();
bookLines.add(new String[]{content, styleName, isStyleSeparator});
}
} catch (Exception e){}
return bookLines;
}
Edit:
Thanks Alexey Noskov, solved with you.
public static ArrayList<String[]> GetBookLinesByTag(String file) {
ArrayList<String[]> bookLines = new ArrayList<>();
try {
Document doc = new Document(file);
ParagraphCollection paras = doc.getFirstSection().getBody().getParagraphs();
for(int i = 0; i < paras.getCount(); i++){
String styleName = paras.get(i).getParagraphFormat().getStyleName().trim();
String isStyleSeparator = Integer.toString(paras.get(i).getBreakIsStyleSeparator() ? 1 : 0);
String content = paras.get(i).toString(SaveFormat.TEXT).trim();
for (Field field : paras.get(i).getRange().getFields()) {
if (field.getType() == FieldType.FIELD_HYPERLINK) {
FieldHyperlink hyperlink = (FieldHyperlink) field;
String urlId = hyperlink.getSubAddress();
String urlText = hyperlink.getResult();
// Reformat linked text: urlText:urlId
content = urlText + ":" + urlId;
}
}
bookLines.add(new String[]{content, styleName, isStyleSeparator});
}
} catch (Exception e){}
return bookLines;
}

Hyperlinks in MS Word documents are represented as fields. If you press Alt+F9 in MS Word you will see something like this
{ HYPERLINK "https://aspose.com" }
Follow the link to learn more about fields in Aspose.Words document model and in MS Word.
https://docs.aspose.com/display/wordsjava/Introduction+to+Fields
In your case you need to locate position of FieldStart – this will be the start position, then measure length of content between FieldSeparator and FieldEnd – start position plus the calculated length will the end position.
Disclosure: I work at Aspose.Words team.

How to Highlight a text for a Pargraph in MS word using POI

I am developing a compare tool for the word document, whenever there is difference in both the document i need to highlight the substring in the paragraph.When i try to highlight using run, its highlighting whole paragraph instead of the sub string.
Can you please guide us, how can i achieve this for a substring.

I had the same problem. Here I post a sample method where you highlight a substring contained in a run.
private int highlightSubsentence(String sentence, XWPFParagraph p, int i) {
//get the current run Style - here I might need to save the current style
XWPFRun currentRun = p.getRuns().get(i);
String currentRunText = currentRun.text();
int sentenceLength = sentence.length();
int sentenceBeginIndex = currentRunText.indexOf(sentence);
int addedRuns = 0;
p.removeRun(i);
//Create, if necessary, a run before the highlight part
if (sentenceBeginIndex > 0) {
XWPFRun before = p.insertNewRun(i);
before.setText(currentRunText.substring(0, sentenceBeginIndex));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
// highlight the interesting part
XWPFRun sentenceRun = p.insertNewRun(i + addedRuns);
sentenceRun.setText(currentRunText.substring(sentenceBeginIndex, sentenceBeginIndex + sentenceLength));
currentStyle.copyStyle(sentenceRun);
CTShd cTShd = sentenceRun.getCTR().addNewRPr().addNewShd();
cTShd.setFill("00FFFF");
//Create, if necessary, a run after the highlight part
if (sentenceBeginIndex + sentenceLength != currentRunText.length()) {
XWPFRun after = p.insertNewRun(i + addedRuns + 1);
after.setText(currentRunText.substring(sentenceBeginIndex + sentenceLength));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
return addedRuns;
}
You might need to save the formatting style of the run you delete in order to have the new runs with the old formatting.
Also, if the string you need to highlight is spread over more than one run, you will need to highlight all of them, but the core method is the one I posted.
On the Style Question:
I had a class Style that saved all the Styles of the old Run in private fields (for the respective classes you can look at what XWPFRun returns.
These are the sub-styles that I needed. There are others obviously I didn't cover
Style(XWPFRun run, XWPFDefaultRunStyle defaultRunStyle) {
fontSize = run.getFontSize();
fontFamily = run.getFontFamily();
bold = run.isBold();
italic = run.isItalic();
strike = run.isStrikeThrough();
underline = run.getUnderline();
color = run.getColor();
shadingColor = getShadeColor(run);
highlightColor = getHighlightedColor(run);
}
I copied the old style in the new run when needed.
public void copyStyle(XWPFRun newRun) {
if (fontSize != -1) {
newRun.setFontSize(fontSize);
}
newRun.setFontFamily(fontFamily);
newRun.setBold(bold);
newRun.setItalic(italic);
newRun.setStrikeThrough(strike);
newRun.setColor(color);
newRun.setUnderline(underline);
if (shadingColor != null) {
addShading(newRun, shadingColor);
}
if (highlightColor != null) {
addHighlight(newRun, highlightColor);
}
}
To add Shading and Highligh already present I used:
public static void addHighlight(XWPFRun run, STHighlightColor.Enum hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getHighlight() == null) {
run.getCTR().getRPr().addNewHighlight();
}
run.getCTR().getRPr().getHighlight().setVal(hexColor);
}
public static void addShading(XWPFRun run, Object hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getShd() == null) {
run.getCTR().getRPr().addNewShd();
}
run.getCTR().getRPr().getShd().setFill(hexColor);
}

Eclipse select text from under the cursor/caret and return it

Working on an eclipse plugin, and doing some features for my editor, I have this method which selects highlighted text from the editor and returns it as a string:
public String getCurrentSelection() {
IEditorPart part = PlatformUI.getWorkbench().getActiveWorkbenchWindow()
.getActivePage().getActiveEditor();
if (part instanceof ITextEditor) {
final ITextEditor editor = (ITextEditor) part;
ISelection sel = editor.getSelectionProvider().getSelection();
if (sel instanceof TextSelection) {
ITextSelection textSel = (ITextSelection) sel;
return textSel.getText();
}
}
return null;
}
But now I want that if I place my cursor inside a word it will select that whole word and return it as a string.
Besides a complicated algorithm where I parse the entire editor, get the cursor location, search for spaces left and right and whatnot, is there any easier way to get the text, where the cursor is placed, as a string?

I managed to get something working. For anyone facing the same problem, the following code works (for me, at least):
private String getTextFromCursor() {
IEditorPart part = PlatformUI.getWorkbench().getActiveWorkbenchWindow()
.getActivePage().getActiveEditor();
TextEditor editor = null;
if (part instanceof TextEditor) {
editor = (TextEditor) part;
}
if (editor == null) {
return "";
}
StyledText text = (StyledText) editor.getAdapter(Control.class);
int caretOffset = text.getCaretOffset();
IDocumentProvider dp = editor.getDocumentProvider();
IDocument doc = dp.getDocument(editor.getEditorInput());
IRegion findWord = CWordFinder.findWord(doc, caretOffset);
String text2 = "";
if (findWord.getLength() != 0)
text2 = text.getText(findWord.getOffset(), findWord.getOffset()
+ findWord.getLength() - 1);
return text2;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace a word in a XSLFTextRun with Apache POI - java

Related

How to properly handle comma inside a quoted string using opencsv?

Apache POI: ${my_placeholder} is treated as three different runs

How to get hyperlink boundaries of inline words with Aspose Words for Androd?

How to Highlight a text for a Pargraph in MS word using POI

Eclipse select text from under the cursor/caret and return it

Categories

Resources