find out strike out text of word document using java - java

Is there any way to find out that the text of a cell in a table in docx is strike out? Using java......
I have extracted tables from a word document. The tables contains strike out text too. I want to know if it is possible to know whether the text is strike out or not.

If you have all the XWPFTable t you could use this method to find all the runs in the table and find out which is strike through. Run contains all the text that is formatted in the same way.
private void exploreTable(XWPFTable t) {
for (XWPFTableRow row : t.getRows()) {
for (XWPFTableCell c : row.getTableCells()) {
for (XWPFParagraph p : c.getParagraphs()) {
for (XWPFRun run : p.getRuns()) {
if(run.isStrikeThrough()) {.....}
}
}
}
}
}

Related

Write text and tables in to word, with whitespaces/enters

I'm writing text and text from tables into a word document.
With the following code the tables are placed under the right paragraphs.
Iterator<IBodyElement> iter = xdoc.getBodyElementsIterator();
while (iter.hasNext())
{
IBodyElement elem = iter.next();
if (elem instanceof XWPFParagraph)
{
relevantText.setText(((XWPFParagraph) elem).getText());
} else if (elem instanceof XWPFTable)
{
tabellen.setText(((XWPFTable) elem).getText());
}
}
Now when I try to make a whitespace/enter with addBreak() or addCarriageReturn() the order of my document is wrong. The table text is placed after all the text.
Has anyone a solution for this?
I had the same problem a couple of days ago. did you create 2 diffrent runs for the paragraphs and the tables?
Because I did, and when I changed it to 1 run it did work for me.
Like this:
XWPFRun text = paragraph.createRun();

In Apache POI, Is there a way to access XWPF elements by id their id?

I have word document (it is docx and xml based), I want to find a table and populate it programmatically. I am using Apache POI, XWPF API.
Is there a way to access XWPF elements by their id?
How can I create uniqueness between XWPF elements then alter using java?
Thanks
What I have implemented is a find replace feature(from here);
In my template docx file I am using "id like texts", __heading1__, __subjectname__, Then replacing with them using code below. For tables #axel-richters solution may be suitable.
private void findReplace(String a, String b, CustomXWPFDocument document){
for (XWPFParagraph p : document.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null && text.contains(a)) {
text = text.replace(a, b);
r.setText(text, 0);
}
}
}
}
}

Extracting heading and paragraphs from doc and docx files using apache-poi

I am trying to read Microsoft word documents via apache-poi and found that there are couple of convenient methods provided to scan through document like getText(), getParagraphList() etc.. But my use case is slightly different and the way we want to scan through any document is, it should give us events/information like heading, paragraph, table in the same sequence as they appear in document. It will help me in preparing a document structure like,
<content>
<section>
<heading> ABC </heading>
<paragraph>xyz </paragraph>
<paragraph>scanning through APIs</paragraph>
<section>
.
.
.
</content>
The main intent is to maintain the relationship between heading and paragraphs as in original document. Not sure but can something like this work for me,
Iterator<IBodyElement> itr = doc.getBodyElementsIterator();
while(itr.hasNext()) {
IBodyElement ele = itr.next();
System.out.println(ele.getElementType());
}
I was able to get the paragraph list but not heading information using this code. Just to mention, I would be interested in all headings, they might be explicitly marked as heading by using style or by using large font size.
Headers aren't stored inline in the main document, they live elsewhere, which is why you're not getting them as body elements. Body elements are things like sections, paragraphs and tables, not headers, so you have to fetch them yourself.
If you look at this code in Apache Tika, you'll see an example of how to do so. Assuming you're iterating over the body elements, and want headers / footers of paragraphs, you'll want code something like this (based on the Tika code):
for(IBodyElement element : bodyElement.getBodyElements()) {
if(element instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)element;
XWPFHeaderFooterPolicy headerFooterPolicy = null;
if (paragraph.getCTP().getPPr() != null) {
CTSectPr ctSectPr = paragraph.getCTP().getPPr().getSectPr();
if(ctSectPr != null) {
headerFooterPolicy = new XWPFHeaderFooterPolicy(document, ctSectPr);
// Handle Header
}
}
// Handle paragraph
if (headerFooterPolicy != null) {
// Handle footer
}
}
if(element instanceof XWPFTable) {
XWPFTable table = (XWPFTable)element;
// Handle table
}
if (element instanceof XWPFSDT){
XWPFSDT sdt = (XWPFSDT) element;
// Handle SDT
}
}

Docx4j - Images in the document

How can we remove an image from the docx4j.
Say I have 10 images, and i want to replace 8 images with my own byte array/binary data, and I want to delete remaining 2.
I am also having trouble in locating images.
Is it somehow possible to replace text placeholders in the document with images?
Refer to this post : http://vixmemon.blogspot.com/2013/04/docx4j-replace-text-placeholders-with.html
for(Object obj : elemetns){
if(obj instanceof Tbl){
Tbl table = (Tbl) obj;
List rows = getAllElementFromObject(table, Tr.class);
for(Object trObj : rows){
Tr tr = (Tr) trObj;
List cols = getAllElementFromObject(tr, Tc.class);
for(Object tcObj : cols){
Tc tc = (Tc) tcObj;
List texts = getAllElementFromObject(tc, Text.class);
for(Object textObj : texts){
Text text = (Text) textObj;
if(text.getValue().equalsIgnoreCase("${MY_PLACE_HOLDER}")){
File file = new File("C:\\image.jpeg");
P paragraphWithImage = addInlineImageToParagraph(createInlineImage(file));
tc.getContent().remove(0);
tc.getContent().add(paragraphWithImage);
}
}
System.out.println("here");
}
}
System.out.println("here");
}
}
wordMLPackage.save(new java.io.File("C:\\result.docx"));
See docx4j checking checkboxes for the 2 approaches to finding stuff (XPath, or non XPath traversal).
VariableReplace allows you to replace text placeholders, but not with images. I think there may be code floating around (in the docx4j forums?) which extends it to do that.
But I'd suggest you use content control databinding instead. See how to create a new word from template with docx4j
You can use base64 encoded images in your XML data, and docx4j and/or Word will do the rest.

How to get the alias of of a Cell for MS Excel by using Apache POI

I have an Excel file, some cells in the file has alias, I want to loop all the cells in the file and print the one has an alias, I am using the Apache POI(the Java API for Microsoft Documents) to do this, but I didn't find the method to get an alias of a cell, please see my code below.
for (int i=0;i<wb.getNumberOfSheets();++i) {
Sheet sheet1 = wb.getSheetAt ;
for (Row row : sheet1) {
for (Cell cell : row) {
// Check if the Cell has an alias
}
}
}
How to add an alias for a Cell
Mouse click to select a cell in a Sheet, then edit the Name box(the one left to the function box) to input a alias for the cell and press enter, from this point, you can select the cell by clicking the drop down arrow at the right of the Name box and select the alias for it. see the picture for details.
Any idea?
you can follow this in your code:
1) first,you can get the alias name of the whole excel:
int NameTotalNumber = workbook.getNumberOfNames();
2) then you can get them in your loop like this:
for (int NameIndex =0; NameIndex<NameTotalNumber; NameIndex++)
{
Name nameList = wb.getNameAt(NameIndex);
System.out.println( "AliasName: "+nameList.getNameName());
}
i have deliver this.
for (int NameIndex =0; NameIndex<NameTotalNumber; NameIndex++)
{
Name nameList = wb.getNameAt(NameIndex);
System.out.println( nameList.getNameName());
}

Categories

Resources