Cannot get text from table using docx4j

Cannot get text from table using docx4j - java

I'm simply trying to output the data found in tables, however I have only managed to print out memory locations and other obj info. Here I'm using a tablefinder to locate all of my tables in a word doc then traversing through them. I'm just so unbelievably stuck how to print out the data contained in these tables. Below is an image of the Text.docx I am working with along with a snippet of the code. To be clear I'm not sure if I should accessing a table row (Tr) as this code snippet shows, or the parent Tbl object to print out the text contained within the table. In this case, I just want it to print "I", "Just", "Want"... etc.
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("C:\\Users\\1120248\\Test\\Test.docx"));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
TableFinder finder = new TableFinder();
new TraversalUtil(documentPart.getContent(), finder);
System.out.println("Found " + finder.tblList.size() + "tables");
for (Object o : finder.tblList) {
Object o2 = XmlUtils.unwrap(o);
if (o2 instanceof org.docx4j.wml.Tbl) {
Tbl tbl = (Tbl)o2;
Tr t = (Tr)tbl.getContent().get(0);
System.out.println(t.getContent());
System.out.println(t.toString());
System.out.println(XmlUtils.unwrap(t.getContent().get(0)));
}
}
This is the output produced by this setup:
[javax.xml.bind.JAXBElement#a146b11, javax.xml.bind.JAXBElement#f438904, javax.xml.bind.JAXBElement#4ed5a1b0, javax.xml.bind.JAXBElement#18d003cd, javax.xml.bind.JAXBElement#3135bf25, javax.xml.bind.JAXBElement#22ad1bae]
org.docx4j.wml.Tr#4116f66a
org.docx4j.wml.Tc#59c04bee

Work for me
TableFinder finder = new TableFinder();
finder.walkJAXBElements(documentPart.getContent());

For those who will be stuck on this question. For visibility, the comment by #JasonPlutext is the answer. Tr - Tc - P - R - Text. Table row, to Table cell to Paragraph and R and then add text.

Related

How can I duplicate a table in word with apache poi? [duplicate]

I have a table in the docx template.
Depending on the number of objects, I have to duplicate the table as many times as I have objects. Duplicate tables must be after the table from the template.
I have several tables in the template that should behave like this.
XmlCursor take the place of the first table from the template and put the next one there. I want to insert the next table after the previous one, which I added myself, but xmlcursor does not return the table item I added, but returns "STARTDOC"
XmlCursor cursor = docx.getTables().get(pointer).getCTTbl().newCursor();
cursor.toEndToken();
while (cursor.toNextToken() != XmlCursor.TokenType.START) ;
XWPFParagraph newParagraph = docx.insertNewParagraph(cursor);
newParagraph.createRun().setText("", 0);
cursor.toParent();
cursor.toEndToken();
while (cursor.toNextToken() != XmlCursor.TokenType.START) ;
docx.insertNewTbl(cursor);
CTTbl ctTbl = CTTbl.Factory.newInstance();
ctTbl.set(docx.getTables().get(numberTableFromTemplate).getCTTbl());
XWPFTable tableCopy = new XWPFTable(ctTbl, docx);
docx.setTable(index + 1, tableCopy);

Not clear what you are aiming for with the cursor.toParent();. And I also cannot reproduce the issue having only your small code snippet. But having a complete working example may possible help you.
Assuming we have following template:
Then following code:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTbl;
public class WordCopyTableAfterTable {
static XmlCursor setCursorToNextStartToken(XmlObject object) {
XmlCursor cursor = object.newCursor();
cursor.toEndToken(); //Now we are at end of the XmlObject.
//There always must be a next start token.
while(cursor.hasNextToken() && cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);
//Now we are at the next start token and can insert new things here.
return cursor;
}
static void removeCellValues(XWPFTableCell cell) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (int i = paragraph.getRuns().size()-1; i >= 0; i--) {
paragraph.removeRun(i);
}
}
}
public static void main(String[] args) throws Exception {
//The data. Each row a new table.
String[][] data= new String[][] {
new String[] {"John Doe", "5/23/2019", "1234.56"},
new String[] {"Jane Doe", "12/2/2019", "34.56"},
new String[] {"Marie Template", "9/20/2019", "4.56"},
new String[] {"Hans Template", "10/2/2019", "4567.89"}
};
String value;
XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));
XWPFTable tableTemplate;
CTTbl cTTblTemplate;
XWPFTable tableCopy;
XWPFTable table;
XWPFTableRow row;
XWPFTableCell cell;
XmlCursor cursor;
XWPFParagraph paragraph;
XWPFRun run;
//get first table (the template)
tableTemplate = document.getTableArray(0);
cTTblTemplate = tableTemplate.getCTTbl();
cursor = setCursorToNextStartToken(cTTblTemplate);
//fill in first data in first table (the template)
for (int c = 0; c < data[0].length; c++) {
value = data[0][c];
row = tableTemplate.getRow(1);
cell = row.getCell(c);
removeCellValues(cell);
cell.setText(value);
}
paragraph = document.insertNewParagraph(cursor); //insert new empty paragraph
cursor = setCursorToNextStartToken(paragraph.getCTP());
//fill in next data, each data row in one table
for (int t = 1; t < data.length; t++) {
table = document.insertNewTbl(cursor); //insert new empty table at position t
cursor = setCursorToNextStartToken(table.getCTTbl());
tableCopy = new XWPFTable((CTTbl)cTTblTemplate.copy(), document); //copy the template table
//fill in data in tableCopy
for (int c = 0; c < data[t].length; c++) {
value = data[t][c];
row = tableCopy.getRow(1);
cell = row.getCell(c);
removeCellValues(cell);
cell.setText(value);
}
document.setTable(t, tableCopy); //set tableCopy at position t instead of table
paragraph = document.insertNewParagraph(cursor); //insert new empty paragraph
cursor = setCursorToNextStartToken(paragraph.getCTP());
}
paragraph = document.insertNewParagraph(cursor);
run = paragraph.createRun();
run.setText("Inserted new text below last table.");
cursor = setCursorToNextStartToken(paragraph.getCTP());
FileOutputStream out = new FileOutputStream("WordResult.docx");
document.write(out);
out.close();
document.close();
}
}
leads to following result:
Is that about what you wanted to achieve?
Please note how I insert the additional tables.
Using table = document.insertNewTbl(cursor); a new empty table is inserted at position t. This table is placed into the document body. So this table must be taken for adjusting the cursor.
Then tableCopy = new XWPFTable((CTTbl)cTTblTemplate.copy(), document); copys the template table. Then this copy is filled with data. And then it is set into the document at position t using document.setTable(t, tableCopy);.
Unfortunately apache poi is incomplete here. XWPFDocument.setTable only sets the internally ArrayLists but not the underlying XML. XWPFDocument.insertNewTbl sets the underlying XML but only using an empty table. So we must do it that ugly complicated way.

Set items in table java web scraping

Okey so my problem is next: I use web scraping to take some data from web page IMDB in this case, that data is titles of movies, and I already tried to print it in console and that works fine. My problem is that I can not save that titles in my table columns, I put all needed codes for this problem, and I cant find where I made mistake and why titles wont show in table columns. At the and, user need to pick one title and that title need to be stored in text field. Have someone any idea, please?
I have table:
TableColumn izborAuta = new TableColumn("Izbor auta");
TableColumn lokacijaPreuzimanja = new TableColumn("Lokacija
preuzimanja");
TableColumn lokacijaVracanja = new TableColumn("Lokacija Vracanja");
TableColumn cena = new TableColumn("Cena");
I have this code to setup columns:
izborAuta.setCellValueFactory(new PropertyValueFactory<Vozila, String>
("izborAuta"));
lokacijaPreuzimanja.setCellValueFactory(new
PropertyValueFactory<Vozila, String>("lokacijaPreuzimanja"));
lokacijaVracanja.setCellValueFactory(new
PropertyValueFactory<Vozila, String>("lokacijaVracanja"));
cena.setCellValueFactory(new PropertyValueFactory<Vozila, String>
("cena"));
tableView.setItems(Baza.baza.prikazBaze());
tableView.getColumns().addAll(izborAuta, lokacijaPreuzimanja,
lokacijaVracanja, cena);
I have this code, so when I pick one item from table that item need to be stored in textField:
tableView.setOnMouseClicked((e) -> {
Vozila v = (Vozila)
tableView.getSelectionModel().getSelectedItem();
txIzborAuta.setText(v.getIzborAuta());
txLokacijaPreuzimanja.setText(v.getLokacijaPreuzimanja());
txLokacijaVracanja.setText(v.getLokacijaVracanja());
txCena.setText(v.getCena());
});
And at the end I use web scraping to save items in table:
Document doc = Jsoup.connect("https://www.imdb.com/chart/top?
ref_=nv_mv_250").get();
Elements elems = doc.select("table.chart.full-width");
for (Element e : elems) {
String izborAuta = e.select(".titleColumn").text();
String lokacijaPreuzimanja = e.select(".titleColumn").text();
String lokacijaVracanja = e.select(".titleColumn").text();
String cena = e.select(".titleColumn").text();
Vozila v = new Vozila();
v.setIzborAuta(izborAuta);
v.setLokacijaPreuzimanja(lokacijaPreuzimanja);
v.setLokacijaVracanja(lokacijaVracanja);
v.setCena(cena + " " + "RSD");
Baza.insertVozila(v);
}

Add content that prevents page breaks inside

Using docx4j I'm adding multiple dynamically filled subTemplates to my main template.
I don't want to have page breaks inside those subTemplates (unless even a whole page is too small for one).
Therefore: If a subTemplate would break inside, I want to move the whole subTemplate to the next page.
How do I do this?
My code so far:
//...
WordprocessingMLPackage mainTemplate = getWp();//ignore this method
List<WordprocessingMLPackage> projectTemplates = new ArrayList<>();
List<Project> projects = getProjects();//ignore this method
for (Project project : projects) {
WordprocessingMLPackage template = getWpProject();//ignore this method
//fill template with content from project
//...
projectList.add(template);
}
//Here's the part that will have to be changed I think:
//Since the projectTemplate only consists of tables I just added all its tables to the main template
for (WordprocessingMLPackage temp : projectTemplates){
List<Object> tables = doc.getAllElementFromObject(temp.getMainDocumentPart(), Tbl.class);
for (Object table : tables) {
mainTemplate.getMainDocumentPart().addObject(table);
}
}
If you can think of a way to change the .docx template with Word to achieve my goal feel free to suggest it.
And if you have suggestions for code improvement in general just write a comment.

I made this "workaround" that works nicely for me:
I count all rows together and also check if the text inside the rows breaks (with an approximate threshold).
Then I add the rows of each project up and as soon as there are too many rows I insert a break before the current project and start over.
final int maxRowCountPerPage = 44;
final int maxLettersPerLineInDescr = 55;
int totalRowCount = 0;
WordprocessingMLPackage mainTemplate = getWp();
//Iterate over projects
for (Project project : getProjects()) {
WordprocessingMLPackage template = this.getWpProject();
String projectDescription = project.getDescr();
//Fill template...
//Count the lines
int rowsInProjectDescr = (int) Math.floor((double) projectDescription.length() / maxLettersPerLineInDescr);
int projectRowCount = 0;
List<Object> tables = doc.getAllElementFromObject(template.getMainDocumentPart(), Tbl.class);
for (Object table : tables) {
List<Object> rows = doc.getAllElementFromObject(table, Tr.class);
int tableRowCount = rows.size();
projectRowCount += tableRowCount;
}
//System.out.println("projectRowCount before desc:" + projectRowCount);
projectRowCount += rowsInProjectDescr;
//System.out.println("projectRowCount after desc:" + projectRowCount);
totalRowCount += projectRowCount;
//System.out.println("totalRowCount: " + totalRowCount);
//Break page if too many lines for page
if (totalRowCount > maxRowCountPerPage) {
addPageBreak(wp);
totalRowCount = projectRowCount;
}
//Add project template to main template
for (Object table : tables) {
mainTemplate.getMainDocumentPart().addObject(table);
}
}
If you notice a way to make the code nicer, let me know in a comment!

Table Content Extraction Section wise in .docx file

I am using Apache POI 3.9 to extract table contents from a .docx file.This doc contains multiple tables under different sections.I could extract all the table contents irrespective of the sections , but i want to extract table content under particular sections only.Can anyone help ?
.docx outline:
Section 1: ABC
Table 1:
Table 2:
Section 2 :CDE
Table 3:
Table 4:
Table Extraction Code:
XWPFDocument documentContent = new XWPFDocument(inputStream);
Iterator<IBodyElement> bodyElementIterator = documentContent.getBodyElementsIterator();
while(bodyElementIterator.hasNext())
{
IBodyElement element = bodyElementIterator.next();
if("TABLE".equalsIgnoreCase(element.getElementType().name()))
{
List<XWPFTable> tableList = element.getBody().getTables();
//Extract the table row name and their corresponding values from the word stream content
tableRowValues = getTableRowValues(tableList);
}
}
Method:
private static ArrayList getTableRowValues(List tableList) {
ArrayList<String> tableValues = new ArrayList<String>();
for (XWPFTable xwpfTable : tableList)
{
List<XWPFTableRow> row = xwpfTable.getRows();
for (XWPFTableRow xwpfTableRow : row)
{
List<XWPFTableCell> cell = xwpfTableRow.getTableCells();
for (XWPFTableCell xwpfTableCell : cell)
{
List<XWPFParagraph> para = xwpfTableCell.getParagraphs();
for (XWPFParagraph xwpfTablePara : para)
{
if(xwpfTablePara!=null)
{
tableValues.add( xwpfTablePara.getText());
}
}
}
}
}
return tableValues;
}

I did about the same thing.
With this code I extract all the sections with tables underneath it :
Iterator<IBodyElement> iter = xdoc.getBodyElementsIterator();
while (iter.hasNext())
{
IBodyElement elem = iter.next();
if (elem instanceof XWPFParagraph)
{
relevantText.setText(((XWPFParagraph) elem).getText());
relevantText.addBreak();
relevantText.addCarriageReturn();
}
else if (elem instanceof XWPFTable)
{
relevantText.addBreak();
relevantText.setText(((XWPFTable) elem).getText());
relevantText.addCarriageReturn();
}
}
You can create an if-statement before the getText() so it only extract text when the right conditions are true.
In example you can check on; styles, text etc.
paragraph.getStyle() //filters on word styles, eg ""header1"
paragraph.getNumFmt() //filters on bullet text
For more see the documentation from Apache
https://poi.apache.org/

how to Create table in word doc using docx4j in specific bookmark without overwritting the word doc

I need to create a table at the location of particular bookmark. ie i need to find the bookmark and insert the table . how can i do this using docx4j
Thanks in Advance
Sorry Jason, I am new to Stackoverflow so i couldnt write my problem clearly, here is my situation and problem.
I made changes in that code as you suggested and to my needs, and the code is here
//loop through the bookmarks
for (CTBookmark bm : rt.getStarts()) {
// do we have data for this one?
String bmname =bm.getName();
// find the right bookmark (in this case i have only one bookmark so check if it is not null)
if (bmname!=null) {
String value = "some text for testing run";
//if (value==null) continue;
List<Object> theList = null;
//create bm list
theList = ((ContentAccessor)(bm.getParent())).getContent();
// I set the range as 1 (I assume this start range is to say where the start the table creating)
int rangeStart = 1;
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
// create the table
Tbl table = factory.createTbl();
//add boards to the table
addBorders(table);
for(int rows = 0; rows<1;rows++)
{// create a row
Tr row = factory.createTr();
for(int colm = 0; colm<1;colm++)
{
// create a cell
Tc cell = factory.createTc();
// add the content to cell
cell.getContent().add(wordPackage.getMainDocumentPart()
.createParagraphOfText("cell"+colm));
// add the cell to row
row.getContent().add(cell);
}
// add the row to table
table.getContent().add(row);
// now add a run (to test whether run is working or not)
org.docx4j.wml.R run = factory.createR();
org.docx4j.wml.Text t = factory.createText();
run.getContent().add(t);
t.setValue(value);
//add table to list
theList.add(rangeStart, table);
//add run to list
//theList.add(rangeStart, run);
}
I dont need to delete text in bookmark so i removed it.
I dont know whats the problem, program is compiling but I cannot open the word doc , it says "unknown error". I test to write some string "value" it writes perfectly in that bookmark and document is opening but not in the case of table. Please help me
Thanks in advance

You can adapt the sample code BookmarksReplaceWithText.java
In your case:
line 89: the parent won't be p, it'll be body or tc. You could remove the test.
line 128: instead of adding a run, you want to insert a table
You can use TblFactory to create your table, or the docx4j webapp to generate code from a sample docx.

For some reason bookmark replacement with table didn't workout for me, so I relied on text replacement with table. I created my tables from HTML using XHTML importer for my use case
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String xhtml= <your table HTML>;
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
int ct = 0;
List<Integer> tableIndexes = new ArrayList<>();
List<Object> documentContents = documentPart.getContent();
for (Object o: documentContents) {
if (o.toString().contains("PlaceholderForTable1")) {
tableIndexes.add(ct);
}
ct++;
}
for (Integer i: tableIndexes) {
documentPart.getContent().remove(i.intValue());
documentPart.getContent().addAll(i.intValue(), XHTMLImporter.convert( xhtml, null));
}
In my input word doc, I defined text 'PlaceholderForTable1' where I want to insert my table.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cannot get text from table using docx4j - java

Work for me TableFinder finder = new TableFinder(); finder.walkJAXBElements(documentPart.getContent());

For those who will be stuck on this question. For visibility, the comment by #JasonPlutext is the answer. Tr - Tc - P - R - Text. Table row, to Table cell to Paragraph and R and then add text.

Related

How can I duplicate a table in word with apache poi? [duplicate]

Set items in table java web scraping

Add content that prevents page breaks inside

Table Content Extraction Section wise in .docx file

how to Create table in word doc using docx4j in specific bookmark without overwritting the word doc

Categories

Resources