Create table in word doc using docx4j in specific bookmark: - java

I’m creating a table in specific bookmark location (means it's adisplay table in word doc), but after converting word to PDF it can't show table in PDF because of bookmark is inside a w:p!!
<w:p w:rsidR="00800BD9" w:rsidRDefault="00800BD9">
<w:bookmarkStart w:id="0" w:name="abc"/>
<w:bookmarkEnd w:id="0"/>
</w:p>
Now I want to find the paragraph (using the bookmark), then replace the paragraph with the table.
Does anybody have any suggestion?
Here is my code:
private void replaceBookmarkContents(List<Object> paragraphs, Map<DataFieldName, String> data) throws Exception {
Tbl table = TblFactory.createTable(3,3,9600);
RangeFinder rt = new RangeFinder("CTBookmark", "CTMarkupRange");
new TraversalUtil(paragraphs, rt);
for (CTBookmark bm : rt.getStarts()) {
// do we have data for this one?
if (bm.getName()==null) continue;
String value = data.get(new DataFieldName(bm.getName()));
if (value==null) continue;
try {
// Can't just remove the object from the parent,
// since in the parent, it may be wrapped in a JAXBElement
List<Object> theList = null;
if (bm.getParent() instanceof P) {
System.out.println("OK!");
theList = ((ContentAccessor)(bm.getParent())).getContent();
} else {
continue;
}
int rangeStart = -1;
int rangeEnd=-1;
int i = 0;
for (Object ox : theList) {
Object listEntry = XmlUtils.unwrap(ox);
if (listEntry.equals(bm)) {
if (DELETE_BOOKMARK) {
rangeStart=i;
} else {
rangeStart=i+1;
}
} else if (listEntry instanceof CTMarkupRange) {
if ( ((CTMarkupRange)listEntry).getId().equals(bm.getId())) {
if (DELETE_BOOKMARK) {
rangeEnd=i;
} else {
rangeEnd=i-1;
}
break;
}
}
i++;
}
if (rangeStart>0) {
// Delete the bookmark range
for (int j=rangeStart; j>0; j--) {
theList.remove(j);
}
// now add a run
org.docx4j.wml.R run = factory.createR();
run.getContent().add(table);
theList.add(rangeStart, run);
}
} catch (ClassCastException cce) {
log.error(cce.getMessage(), cce);
}
}
}

The problem with blindly replacing a bookmark inside a paragraph with a table is that you'll end up with w:p/w:tbl (or with your code, w:p/w:r/w:tbl!) which is not allowed by the file format.
To avoid this issue, you could make the bookmark a sibling of the paragraph, or you could change your code so that once you have found a bookmark inside a paragraph, if you are replacing it with a table, you replace the parent p instead.
Note that bookmark find/replace is a brittle basis for document generation. If your requirements are other than modest, you'd be better off using content controls instead.

Related

Apache POI: ${my_placeholder} is treated as three different runs

I have a .docx template with placeholders to be filled, such as ${programming_language}, ${education}, etc.
The placeholder keywords must be easily distinguished from the other plain words, hence they are enclosed with ${ }.
for (XWPFTable table : doc.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("run text: " + run.text());
/** replace text here, etc. */
}
}
}
}
}
I want to extract the placeholders together with the enclosing ${ } characters. The problem is, that is seems like the enclosing characters are treated as different runs...
run text: ${
run text: programming_language
run text: }
run text: Some plain text here
run text: ${
run text: education
run text: }
Instead, I would like to achieve the following effect:
run text: ${programming_language}
run text: Some plain text here
run text: ${education}
I have tried using other enclosing characters, such as: { }, < >, # #, etc.
I do not want to do some weird concatenations of runs, etc. I want to have it in a single XWPFRun.
If I cannot find the proper solution, I will just make it like so: VAR_PROGRAMMING_LANGUGE, VAR_EDUCATION, I think.
Current apache poi 4.1.2 provides TextSegment to deal with those Word text-run issues. XWPFParagraph.searchText searches for a string in a paragraph and returns a TextSegment. This provides access to the begin run and the end run of that text in that paragraph (BeginRun and EndRun). It also provides access to the start character position in begin run and end character position in end run (BeginChar and EndChar).
It additionally provides access to the index of the text element in the text run (BeginText and EndText). This always should be 0, because default text runs only have one text element.
Having this, we can do the following:
Replace the found partial string in begin run by the replacement. To do so, get the text part which was before the searched string and concatenate the replacement to it. After that the begin run fully contains the replacement.
Delete all text runs between begin run and end run as they contain parts of the searched string which is not more needed.
Let remain only the text part after the searched string in end run.
Doing so we are able replacing text which is in multiple text runs.
Following example shows this.
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String textToFind = "${This is the text to find}"; // might be in different runs
String replacement = "Replacement text";
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
replaceTextSegment(paragraph, textToFind, replacement);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
}
}
Above code works not in all cases because XWPFParagraph.searchText has bugs. So I will provide a better searchText method:
/**
* this methods parse the paragraph and search for the string searched.
* If it finds the string, it will return true and the position of the String
* will be saved in the parameter startPos.
*
* #param searched
* #param startPos
*/
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
int startRun = startPos.getRun(),
startText = startPos.getText(),
startChar = startPos.getChar();
int beginRunPos = 0, candCharPos = 0;
boolean newList = false;
//CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
java.util.List<XWPFRun> runs = paragraph.getRuns();
int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
//for (int runPos = startRun; runPos < rArray.length; runPos++) {
for (int runPos = startRun; runPos < runs.size(); runPos++) {
//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
int textPos = 0, charPos;
//CTR ctRun = rArray[runPos];
CTR ctRun = runs.get(runPos).getCTR();
XmlCursor c = ctRun.newCursor();
c.selectPath("./*");
try {
while (c.toNextSelection()) {
XmlObject o = c.getObject();
if (o instanceof CTText) {
if (textPos >= startText) {
String candidate = ((CTText) o).getStringValue();
if (runPos == startRun) {
charPos = startChar;
} else {
charPos = 0;
}
for (; charPos < candidate.length(); charPos++) {
if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
beginTextPos = textPos;
beginCharPos = charPos;
beginRunPos = runPos;
newList = true;
}
if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
if (candCharPos + 1 < searched.length()) {
candCharPos++;
} else if (newList) {
TextSegment segment = new TextSegment();
segment.setBeginRun(beginRunPos);
segment.setBeginText(beginTextPos);
segment.setBeginChar(beginCharPos);
segment.setEndRun(runPos);
segment.setEndText(textPos);
segment.setEndChar(charPos);
return segment;
}
} else {
candCharPos = 0;
}
}
}
textPos++;
} else if (o instanceof CTProofErr) {
c.removeXml();
} else if (o instanceof CTRPr) {
//do nothing
} else {
candCharPos = 0;
}
}
} finally {
c.dispose();
}
}
return null;
}
This will be called like:
...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...
Just like someone has commented your question, you can't have control where or when Word will split the paragraph in some runs. If the other answer still didn't help you, then I have the way I got around it:
First of all, this "solution" have a big problem, but still, I will put it here for the reason that someone can solve it.
public void mainMethod(XWPFParagraph paragraph) {
if (paragraph.getRuns().size() > 1) {
String myRun = unifyRuns(paragraph.getRuns());
// make the verification of placeholders ${...}
paragraph.getRuns().get(0).setText(myRun);
while(paragraph.getRuns().size() > 1) {
paragraph.removeRun(1);
}
}
}
private String unifyRuns(List<XWPFRun> runElements) {
StringBuilder unifiedRun = new StringBuilder();
for (XWPFRun run : runElements) {
unifiedRun.append(run);
}
return unifiedRun.toString();
}
The code may contain some error since I'm doing it as I remember.
The problem here is that when Word separates paragraphs into runs, it doesn't do it for nothing, because when there are texts with different fonts (like font-family or font-size), it separates the texts in different runs.
In the text "Here's my bold text", Word will split the text to separate the bold and normal text. Then, the code above is a bad solution if you are using POI to create large documents with different types of fonts. In that case you would need to verify first if the run is actualy in bold, then you will treat the placeholders.
Again, this a "solution" that i found, and it's not complete yet. Sorry for english errors, i'm using Google Translate to write this answer.

getting first level of categorisation from Notes view

I have a categorized Notes view, let say the first categorized column is TypeOfVehicle the second categorized column is Model and the third categorized column is Manufacturer.
I would like to collect only the values for the first category and return it as json object:
I am facing two problems:
- I can not read the value for the category, the column values are emptry and when I try to access the underlying document it is null
the script won't hop over to the category/sibling on the same level.
can someone explain me what am I doing wrong here?
private Object getFirstCategory() {
JsonJavaObject json = new JsonJavaObject();
try{
String server = null;
String filepath = null;
server = props.getProperty("server");
filepath = props.getProperty("filename");
Database db;
db = utils.getSession().getDatabase(server, filepath);
if (db.isOpen()) {
View vw = db.getView("transport");
if (null != vw) {
vw.setAutoUpdate(false);
ViewNavigator nav;
nav = vw.createViewNav();
JsonJavaArray arr = new JsonJavaArray();
Integer count = 0;
ViewEntry tmpentry;
ViewEntry entry = nav.getFirst();
while (null != entry) {
Vector<?> columnValues = entry.getColumnValues();
if(entry.isCategory()){
System.out.println("entry notesid = " + entry.getNoteID());
Document doc = entry.getDocument();
if(null != doc){
if (doc.hasItem("TypeOfVehicle ")){
System.out.println("category has not " + "TypeOfVehicle ");
}
else{
System.out.println("category IS " + doc.getItemValueString("TypeOfVehicle "));
}
} else{
System.out.println("doc is null");
}
JsonJavaObject row = new JsonJavaObject();
JsonJavaObject jo = new JsonJavaObject();
String TypeOfVehicle = String.valueOf(columnValues.get(0));
if (null != TypeOfVehicle ) {
if (!TypeOfVehicle .equals("")){
jo.put("TypeOfVehicle ", TypeOfVehicle );
} else{
jo.put("TypeOfVehicle ", "Not categorized");
}
} else {
jo.put("TypeOfVehicle ", "Not categorized");
}
row.put("request", jo);
arr.put(count, row);
count++;
tmpentry = nav.getNextSibling(entry);
entry.recycle();
entry = tmpentry;
} else{
//tmpentry = nav.getNextCategory();
//entry.recycle();
//entry = tmpentry;
}
}
json.put("data", arr);
vw.setAutoUpdate(true);
vw.recycle();
}
}
} catch (Exception e) {
OpenLogUtil.logErrorEx(e, JSFUtil.getXSPContext().getUrl().toString(), Level.SEVERE, null);
}
return json;
}
What you're doing wrong is trying to treat any single view entry as both a category and a document. A single view entry can only be one of a category, a document, or a total.
If you have an entry for which isCategory() returns true, then for the same entry:
isDocument() will return false.
getDocument() will return null.
getNoteID() will return an empty string.
If the only thing you need is top-level categories, then get the first entry from the navigator and iterate over entries using nav.getNextSibling(entry) as you're already doing, but:
Don't try to get documents, note ids, or fields.
Use entry.getColumnValues().get(0) to get the value of the first column for each category.
If the view contains any uncategorised documents, it's possible that entry.getColumnValues().get(0) might throw an exception, so you should also check that entry.getColumnValues().size() is at least 1 before trying to get a value.
If you need any extra data beyond just top-level categories, then note that subcategories and documents are children of their parent categories.
If an entry has a subcategory, nav.getChild(entry) will get the first subcategory of that entry.
If an entry has no subcategories, but is a category which contains documents, nav.getChild(entry) will get the first document in that category.

Apache POI Java - Write to excel and dynamically update cells

My java spring boot app needs to create a new excel file based on the contents of my DB. My current solution places all the data from my DB and inserts it in my excel sheet, but I want to improve it by not stating what the cell values are. For example, although it works, my solution has 34 fields so I am stating the userRow.createCell line 34 times for each field which is repetitive. Ideally I want to say create the cell(n) and take all the values from each row in the DB. How can this be done? Another for loop within this for loop? Every example I looked at online seems to specifically state what the cell value is.
List<CaseData> cases = (List<CaseData>) model.get("cases");
Sheet sheet = workbook.createSheet("PIE Cases");
int rowCount = 1;
for (CaseData pieCase : cases) {
Row userRow = sheet.createRow(rowCount++);
userRow.createCell(0).setCellValue(pieCase.getCaseId());
userRow.createCell(1).setCellValue(pieCase.getAcknowledgementReceivedDate());
}
Use the Reflection API
Example:
try {
Class caseDataObj = CaseData.class;
Method [] methods = caseDataObj.getDeclaredMethods();
Sheet sheet = workbook.createSheet("PIE Cases");
int rowCount = 1;
for(CaseData cd : cases) {
int cellIndex = 0;
Row userRow = sheet.createRow(rowCount++);
for (Method method : methods) {
String methodName = method.getName();
if(methodName.startsWith("get")) {
// Assuming all getters return String
userRow.createCell(cellIndex++).setCellValue((String) method.invoke(cd));
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
There are probably many ways to do this, You can try something like this, this is how I usually go about it for things like what you are doing.
public enum DATA {
CASE_ID(0),
ACK_RECIEVED(1),
ETC(2);
//ETC(3) and so on
public int index;
DATA(int index) {
this.index = index;
}
public Object parse(CaseData data) throws Exception {
switch (this) {
case CASE_ID:
return data.getCaseId();
case ACK_RECIEVED:
return data.getAcknowledgementReceivedDate();
case ETC:
return "etc...";
default: return null;
}
}
}
Then, the implementation is:
List<CaseData> cases = (List<CaseData>) model.get("cases");
Sheet sheet = workbook.createSheet("PIE Cases");
int rowCount = 1;
for (CaseData pieCase : cases) {
Row userRow = sheet.createRow(rowCount++);
for (DATA DAT : DATA.values()) {
userRow.createCell(DAT.index).setCellValue(DAT.parse(pieCase));
}
}

How to Highlight a text for a Pargraph in MS word using POI

I am developing a compare tool for the word document, whenever there is difference in both the document i need to highlight the substring in the paragraph.When i try to highlight using run, its highlighting whole paragraph instead of the sub string.
Can you please guide us, how can i achieve this for a substring.
I had the same problem. Here I post a sample method where you highlight a substring contained in a run.
private int highlightSubsentence(String sentence, XWPFParagraph p, int i) {
//get the current run Style - here I might need to save the current style
XWPFRun currentRun = p.getRuns().get(i);
String currentRunText = currentRun.text();
int sentenceLength = sentence.length();
int sentenceBeginIndex = currentRunText.indexOf(sentence);
int addedRuns = 0;
p.removeRun(i);
//Create, if necessary, a run before the highlight part
if (sentenceBeginIndex > 0) {
XWPFRun before = p.insertNewRun(i);
before.setText(currentRunText.substring(0, sentenceBeginIndex));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
// highlight the interesting part
XWPFRun sentenceRun = p.insertNewRun(i + addedRuns);
sentenceRun.setText(currentRunText.substring(sentenceBeginIndex, sentenceBeginIndex + sentenceLength));
currentStyle.copyStyle(sentenceRun);
CTShd cTShd = sentenceRun.getCTR().addNewRPr().addNewShd();
cTShd.setFill("00FFFF");
//Create, if necessary, a run after the highlight part
if (sentenceBeginIndex + sentenceLength != currentRunText.length()) {
XWPFRun after = p.insertNewRun(i + addedRuns + 1);
after.setText(currentRunText.substring(sentenceBeginIndex + sentenceLength));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
return addedRuns;
}
You might need to save the formatting style of the run you delete in order to have the new runs with the old formatting.
Also, if the string you need to highlight is spread over more than one run, you will need to highlight all of them, but the core method is the one I posted.
On the Style Question:
I had a class Style that saved all the Styles of the old Run in private fields (for the respective classes you can look at what XWPFRun returns.
These are the sub-styles that I needed. There are others obviously I didn't cover
Style(XWPFRun run, XWPFDefaultRunStyle defaultRunStyle) {
fontSize = run.getFontSize();
fontFamily = run.getFontFamily();
bold = run.isBold();
italic = run.isItalic();
strike = run.isStrikeThrough();
underline = run.getUnderline();
color = run.getColor();
shadingColor = getShadeColor(run);
highlightColor = getHighlightedColor(run);
}
I copied the old style in the new run when needed.
public void copyStyle(XWPFRun newRun) {
if (fontSize != -1) {
newRun.setFontSize(fontSize);
}
newRun.setFontFamily(fontFamily);
newRun.setBold(bold);
newRun.setItalic(italic);
newRun.setStrikeThrough(strike);
newRun.setColor(color);
newRun.setUnderline(underline);
if (shadingColor != null) {
addShading(newRun, shadingColor);
}
if (highlightColor != null) {
addHighlight(newRun, highlightColor);
}
}
To add Shading and Highligh already present I used:
public static void addHighlight(XWPFRun run, STHighlightColor.Enum hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getHighlight() == null) {
run.getCTR().getRPr().addNewHighlight();
}
run.getCTR().getRPr().getHighlight().setVal(hexColor);
}
public static void addShading(XWPFRun run, Object hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getShd() == null) {
run.getCTR().getRPr().addNewShd();
}
run.getCTR().getRPr().getShd().setFill(hexColor);
}

How to fetch data of multiple HTML tables through Web Scraping in Java

I was trying to scrape the data of a website and to some extents I succeed in my goal. But, there is a problem that the web page I am trying to scrape have got multiple HTML tables in it. Now, when I execute my program it only retrieves the data of the first table in the CSV file and not retrieving the other tables. My java class code is as follows.
public static void parsingHTML() throws Exception {
//tbodyElements = doc.getElementsByTag("tbody");
for (int i = 1; i <= 1; i++) {
Elements table = doc.getElementsByTag("table");
if (table.isEmpty()) {
throw new Exception("Table is not found");
}
elements = table.get(0).getElementsByTag("tr");
for (Element trElement : elements) {
trElement2 = trElement.getElementsByTag("tr");
tdElements = trElement.getElementsByTag("td");
File fold = new File("C:\\convertedCSV9.csv");
fold.delete();
File fnew = new File("C:\\convertedCSV9.csv");
FileWriter sb = new FileWriter(fnew, true);
//StringBuilder sb = new StringBuilder(" ");
//String y = "<tr>";
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
//Element tdElement1 = it.next();
//final String content2 = tdElement1.text();
if (it.hasNext()) {
sb.append("\r\n");
}
for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement2 = it.next();
final String content = tdElement2.text();
//stringjoiner.add(content);
//sb.append(formatData(content));
if (it2.hasNext()) {
sb.append(formatData(content));
sb.append(" , ");
}
if (!it.hasNext()) {
String content1 = content.replaceAll(",$", " ");
sb.append(formatData(content1));
//it2.next();
}
}
System.out.println(sb.toString());
sb.flush();
sb.close();
}
System.out.println(sampleList.add(tdElements));
}
}
}
What I analyze is that there is a loop which is only checking tr tds. So, after first table there is a style sheet on the HTML page. May be due to style sheet loop is breaking. I think that's the reason it is proceeding to the next table.
P.S: here's the link which I am trying to scrap
http://www.mufap.com.pk/nav_returns_performance.php?tab=01
What you do just at the beginning of your code will not work:
// loop just once, why
for (int i = 1; i <= 1; i++) {
Elements table = doc.getElementsByTag("table");
if (table.isEmpty()) {
throw new Exception("Table is not found");
}
elements = table.get(0).getElementsByTag("tr");
Here you loop just once, read all table elements and then process all tr elements for the first table you find. So even if you would loop more than once, you would always process the first table.
You will have to iterate all table elements, e.g.
for(Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
// process "td"s and so on
}
}
Edit Since you're having troubles with the code above, here's a more thorough example. Note that I'm using Jsoup to read and parse the HTML (you didn't specify what you are using)
Document doc = Jsoup
.connect("http://www.mufap.com.pk/nav_returns_performance.php?tab=01")
.get();
for (Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
// skip header "tr"s and process only data "tr"s
if (trElement.hasClass("tab-data1")) {
StringJoiner tdj = new StringJoiner(",");
for (Element tdElement : trElement.getElementsByTag("td")) {
tdj.add(tdElement.text());
}
System.out.println(tdj);
}
}
}
This will concat and print all data cells (those having the class tab-data1). You will still have to modify it to write to your CSV file though.
Note: in my tests this processes 21 tables, 243 trs and 2634 tds.

Categories

Resources