Removing an XWPFParagraph keeps the paragraph symbol (¶) for it - java

I am trying to remove a set of contiguous paragraphs from a Microsoft Word document, using Apache POI.
From what I have understood, deleting a paragraph is possible by removing all of its runs, this way:
/*
* Deletes the given paragraph.
*/
public static void deleteParagraph(XWPFParagraph p) {
if (p != null) {
List<XWPFRun> runs = p.getRuns();
//Delete all the runs
for (int i = runs.size() - 1; i >= 0; i--) {
p.removeRun(i);
}
p.setPageBreak(false); //Remove the eventual page break
}
}
In fact, it works, but there's something strange. The block of removed paragraphs does not disappear from the document, but it's converted in a set of empty lines. It's just like every paragraph would be converted into a new line.
By printing the paragraphs' content from code I can see, in fact, a space (for each one removed). Looking at the content directly from the document, with the formatting mark's visualization enabled, I can see this:
The vertical column of ¶ corresponds to the block of deleted elements.
Do you have an idea for that? I'd like my paragraphs to be completely removed.
I also tried by replacing the text (with setText()) and by removing eventual spaces that could be added automatically, this way:
p.setSpacingAfter(0);
p.setSpacingAfterLines(0);
p.setSpacingBefore(0);
p.setSpacingBeforeLines(0);
p.setIndentFromLeft(0);
p.setIndentFromRight(0);
p.setIndentationFirstLine(0);
p.setIndentationLeft(0);
p.setIndentationRight(0);
But with no luck.

I would delete paragraphs by deleting paragraphs, not by deleting only the runs in this paragraphs. Deleting paragraphs is not part of the apache poi high level API. But using XWPFDocument.getDocument().getBody() we can get the low level CTBody and there is a removeP(int i).
Example:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import java.awt.Desktop;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
public class WordRemoveParagraph {
/*
* Deletes the given paragraph.
*/
public static void deleteParagraph(XWPFParagraph p) {
XWPFDocument doc = p.getDocument();
int pPos = doc.getPosOfParagraph(p);
//doc.getDocument().getBody().removeP(pPos);
doc.removeBodyElement(pPos);
}
public static void main(String[] args) throws IOException, InvalidFormatException {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
int pNumber = doc.getParagraphs().size() -1;
while (pNumber >= 0) {
XWPFParagraph p = doc.getParagraphs().get(pNumber);
if (p.getParagraphText().contains("delete")) {
deleteParagraph(p);
}
pNumber--;
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
System.out.println("Done");
Desktop.getDesktop().open(new File("result.docx"));
}
}
This deletes all paragraphs from the document source.docx where the text contains "delete" and saves the result in result.docx.
Edited:
Although doc.getDocument().getBody().removeP(pPos); works, it will not update the XWPFDocument's paragraphs list. So it will destroy paragraph iterators and other accesses to that list since the list is only updated while reading the document again.
So the better approach is using doc.removeBodyElement(pPos); instead. removeBodyElement(int pos) does exactly the same as doc.getDocument().getBody().removeP(pos); if the pos is pointing to a pagagraph in the document body since that paragraph is an BodyElement too. But in addition, it will update the XWPFDocument's paragraphs list.

When you are inside of a table you need to use the functions of the XWPFTableCell instead of the XWPFDocument:
cell.removeParagraph(cell.getParagraphs().indexOf(para));

Related

Apache Poi - Java-: How to add text containing blank lines as separate paragraphs to a Word document using Apache POI?

I am unable to add text containing blank lines as separate paragraphs to a word document.
If I try to add the following text that contains 3 different paragraphs.
Some text here.
Another text here.
Another one here.
what I get is 1. Some text here. 2. Another text here. 3. Another one here. as if they were the same paragraph.
Is it possible to add a text containing blank lines as separate paragraphs to a Word document using Apache POI?
public static void addingMyParagraphs(XWPFDocument doc, String text) throws InvalidFormatException, IOException {
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
run.setFontFamily("Times new Roman");
}
--In the method below MyText variable is a textArea variable that's part of a javaFx application.
public void CreatingDocument() throws IOException, InvalidFormatException {
String theText = myText.getText();
addingMyParagraphs(doc, theText);
FileOutputStream output = new FileOutputStream("MyDocument.docx");
doc.write(output);
output.close();
}
}
You need to split your text into "paragraphs" and add each paragraph separately to your WORD document. This has nothing to do with JavaFX.
Here is an example that uses text blocks to simulate the text that is entered into the [JavaFX] TextArea. Explanations after the code.
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class PoiWord0 {
public static void main(String[] args) {
String text = """
1. Some text here.
2. Another text here.
3. Another one here.
""";
String[] paras = text.split("(?m)^[ \\t]*\\r?\\n");
try (XWPFDocument doc = new XWPFDocument();
FileOutputStream output = new FileOutputStream("MyDocument.docx")) {
for (String para : paras) {
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(para.stripTrailing());
}
doc.write(output);
}
catch (IOException xIo) {
xIo.printStackTrace();
}
}
}
I assume that a paragraph delimiter is a blank line, so I split the text on the blank lines. This still leaves the trailing newline character in each element of the array. I use stripTrailing() to remove that newline.
Now I have an array of paragraphs, so I simply add a new paragraph to the [WORD] document for each array element.
Note that the above code was written using JDK 15.
The regex for splitting the text came from the SO question entitled Remove empty line from a multi-line string with Java
try-with-resources was added in Java 7.
stripTrailing() was added in JDK 11

How to delete first character after table using POI

I am attempting to format a Word document that has multiple tables. I need to delete line breaks that occur after table. How to i achieve this programatically in Java ?
I am currently trying it with the following code and it does not work
org.apache.xmlbeans.XmlCursor cursor = xwpfTable.getCTTbl().newCursor();
cursor.toEndToken();
cursor.toNextToken();
cursor.removeChars(2);
Further Clarification : We are receiving non-formatted word files from external source. We need to eliminate paragraph (extra lines in-between tables) when the table has only 1 row. Currently I are using a macro and achieving this by code :
For Each t In doc.Tables
Set myrange = doc.Characters(t.Range.End + 1)
If myrange.Text = Chr(13) Then
myrange.Delete
End If
Thanks in advance
What I am trying to remove:
According to your screenshot you wants to remove empty paragraphs which are placed immediately after tables.
This is possible, although i am wondering why those paragraphs are there. After removing those paragraphs, in Word the tables are not more editable as single tables but only as rows within one table. Is this what you want?
Anyway, as said removing the empty paragraphs after the tables is possible. To do so, you could traversing the body elements of the document. If there is a XWPFTable immediately followed by a XWPFParagraph and this XWPFParagraph does not have any text runs in it, then remove that XWPFParagraph from the document.
Example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class WordRemoveEmptyParagraphs {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordTables.docx"));
int thisBodyElementPos = 0;
int nextBodyElementPos = 1;
IBodyElement thisBodyElement = null;
IBodyElement nextBodyElement = null;
if (document.getBodyElements().size() > 1) { // document must have at least two body elements
do {
thisBodyElement = document.getBodyElements().get(thisBodyElementPos);
nextBodyElement = document.getBodyElements().get(nextBodyElementPos);
if (thisBodyElement instanceof XWPFTable && nextBodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)nextBodyElement;
if (paragraph.getRuns().size() == 0) { // if paragraph does not have any text runs in it
document.removeBodyElement(nextBodyElementPos);
}
}
thisBodyElementPos++;
nextBodyElementPos = thisBodyElementPos + 1;
} while (nextBodyElementPos < document.getBodyElements().size());
}
FileOutputStream out = new FileOutputStream("./WordTablesChanged.docx");
document.write(out);
out.close();
document.close();
}
}

Setting a style on an XWPFRun by style ID

I am trying to apply named styles to individual runs in an XWPFDocument, and I am seeing strange results.
The javadoc for XWPFRun describes the setStyle method, but the style appears to not be applied in the final document. I say appears, because in the QuickLook preview in Finder, the style does appear on the run as expected. In the example below, I am applying a named style to the hyperlink, which appears as expected in the preview on the right, but not in Word on the left.
So clearly POI is actually doing something to apply the style, but Word is not rendering the style. I tried several other .docx readers, all of which produced similar results.
So I started peeling apart the style and applying the attributes to the run individually, which does work in Word. This is one of those things that seems like I must just be missing something. I can of course write a routine that can read in an existing style and apply it to a run like this, but I would much rather not. I have searched for answers, but this part of POI seems to be very much a work in progress.
So am I just missing something obvious, or am I going to just have to suck it up and do this the painful way?
//This does not work.
run.setStyle(styleId);
if(docStyles.styleExist(styleId))
{
/*
In order to set the style on the run, we need to manually
determine the properties of the style, and set them on the
run individually.
This makes no sense.
*/
XWPFStyle style = docStyles.getStyle(styleId);
CTStyle ctStyle = style.getCTStyle();
CTRPr ctRpr = ctStyle.getRPr();
if (ctRpr.isSetB())
{
CTOnOff onOff = ctRpr.getB();
STOnOff.Enum stOnOff = onOff.getVal();
boolean bold = (stOnOff == STOnOff.TRUE);
run.setBold(bold);
}
if(ctRpr.isSetU())
{
CTUnderline underline = ctRpr.getU();
STUnderline.Enum val = underline.getVal();
UnderlinePatterns underlinePattern = UnderlinePatterns.valueOf(val.intValue());
run.setUnderline(underlinePattern);
}
// ... //
}
else
{
System.out.println("404: Style not found");
}
If the XWPfDocument is created from a template, then this template must contain the named style "Hyperlink" already. That means, it must contain in /word/styles.xml the entry in latent styles
...
<w:latentStyles...
...
<w:lsdException w:name="Hyperlink" w:qFormat="1"/>
...
as well as the style definition
...
<w:style w:type="character" w:styleId="Hyperlink">
<w:name w:val="Hyperlink"/>
<w:basedOn w:val="..."/>
<w:uiPriority w:val="99"/>
<w:unhideWhenUsed/>
<w:qFormat/>
<w:rsid w:val="00072FE4"/>
<w:rPr>
<w:color w:val="0000FF" w:themeColor="hyperlink"/>
<w:u w:val="single"/>
</w:rPr>
</w:style>
...
If that is true then the following code works for me using apache poi 4.0.0:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHyperlink;
public class CreateWordStyledHyperlinkRunFromTemplate {
static XWPFHyperlinkRun createHyperlinkRun(XWPFParagraph paragraph, String uri) throws Exception {
String rId = paragraph.getPart().getPackagePart().addExternalRelationship(
uri,
XWPFRelation.HYPERLINK.getRelation()
).getId();
CTHyperlink cthyperLink=paragraph.getCTP().addNewHyperlink();
cthyperLink.setId(rId);
cthyperLink.addNewR();
return new XWPFHyperlinkRun(
cthyperLink,
cthyperLink.getRArray(0),
paragraph
);
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("Template.docx"));
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("This is a text paragraph having a link to Google ");
XWPFHyperlinkRun hyperlinkrun = createHyperlinkRun(paragraph, "https://www.google.de");
hyperlinkrun.setText("https://www.google.de");
XWPFStyles styles = document.getStyles();
if (styles.styleExist("Hyperlink")) {
System.out.println("Style Hyperlink exists."); //Template must contain named style "Hyperlink" already
hyperlinkrun.setStyle("Hyperlink");
} else {
hyperlinkrun.setColor("0000FF");
hyperlinkrun.setUnderline(UnderlinePatterns.SINGLE);
}
run = paragraph.createRun();
run.setText(" in it.");
FileOutputStream out = new FileOutputStream("CreateWordStyledHyperlinkRunFromTemplate.docx");
document.write(out);
out.close();
document.close();
}
}
Note there is not any possibility for creating XWPFHyperlinkRun except using the low level org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHyperlink class.
It produces:

Copy contents from docx with bullets intact with Apache POI

I am trying to copy contents from a docx file to the clipboard eventually. The code I have come up with so far is:
package config;
public class buffer {
public static void main(String[] args) throws IOException, XmlException {
XWPFDocument srcDoc = new XWPFDocument(new FileInputStream("D:\\rules.docx"));
XWPFDocument destDoc = new XWPFDocument();
OutputStream out = new FileOutputStream("D:\\test.docx");
for (IBodyElement bodyElement : srcDoc.getBodyElements()) {
XWPFParagraph srcPr = (XWPFParagraph) bodyElement;
XWPFParagraph dstPr = destDoc.createParagraph();
dstPr.createRun();
int pos = destDoc.getParagraphs().size() - 1;
destDoc.setParagraph(srcPr, pos);
}
destDoc.write(out);
out.close();
}
}
This does fetch the bullets but numbers them. I want to retain the original bullet format. Is there a way to do this?
You'll need to handle the numbering definition (in the numbering part) correctly.
The most reliable thing to do would be to copy the definition (both the instance list and the abstract one) across, and renumber it (ie give it a new ID) so that it is unique.
Then of course you'll need to update the ID's in your paragraph to match.
Note that the above is a solution only for the question you have asked.
You'll run into problems if your content contains a rel to some other part (eg an image). And you'tr not handling the style definition etc.

How to edit docx using Java

I need replace cerain words or phrases in docx-file and save it with another name. I know that my problem is not unik and I tried find solution in the web. But I still can't get a result that I need.
I found two ways to solwe my task but came to the deadlock in each case.
1. Unfold docx like a zip-file, change xml with main content and pack into archive again. But after that manipulations I can't open new changed docx in MS Word. It is odd because I can do the similar steps by hands (without Java, using WinRar) and get a correct result file.
So can you explain me how to archive docx content to get a correct file using Java?
Using external API. I get an advice to use docx4j Java library. But all tat I can with it is just replace a label (like ${label}) in template with any words (I used VariableReplace sample). But I want change words that I want without using a template with labels.
I hope for a help.
I had this code. I hope that it helps you to resolve your problem. With it, you can read from a .docx find the word that you would change. Change this word and save the new paragraphs in new document.
//WriteDocx.java
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.*;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
public class WriteDocx
{
public static void main(String[] args) throws Exception {
int count = 0;
XWPFDocument document = new XWPFDocument();
XWPFDocument docx = new XWPFDocument(new FileInputStream("Bonjour1.docx"));
XWPFWordExtractor we = new XWPFWordExtractor(docx);
String text = we.getText() ;
if(text.contains("SMS")){
text = text.replace("SMS", "sms");
System.out.println(text);
}
char[] c = text.toCharArray();
for(int i= 0; i < c.length;i++){
if(c[i] == '\n'){
count ++;
}
}
System.out.println(c[0]);
StringTokenizer st = new StringTokenizer(text,"\n");
XWPFParagraph para = document.createParagraph();
para.setAlignment(ParagraphAlignment.CENTER);
XWPFRun run = para.createRun();
run.setBold(true);
run.setFontSize(36);
run.setText("Apache POI works well!");
List<XWPFParagraph>paragraphs = new ArrayList<XWPFParagraph>();
List<XWPFRun>runs = new ArrayList<XWPFRun>();
int k = 0;
for(k=0;k<count+1;k++){
paragraphs.add(document.createParagraph());
}
k=0;
while(st.hasMoreElements()){
paragraphs.get(k).setAlignment(ParagraphAlignment.LEFT);
paragraphs.get(k).setSpacingAfter(0);
paragraphs.get(k).setSpacingBefore(0);
run = paragraphs.get(k).createRun();
run.setText(st.nextElement().toString());
k++;
}
document.write(new FileOutputStream("test2.docx"));
}
}
PS: XWPFDocument docx = new XWPFDocument(new FileInputStream("Bonjour1.docx"))
You must change "Bonjour1.docx" with the name of file from where you would replace certain words or phrases.
I use APACHE POI library
And I take some code from this site HANDLING MS WORD DOCUMENTS USING APACHE POI
UPDATE
If you want to change arbitrary words, you can do that easily enough with docx4j.
But first you need to find them.
You can find your words using an XPath query, or by traversing the document tree in Java.

Categories

Resources