In my word template file I have some tables and sometimes the second column of them is formatted as an enumeration.
Using docx4j I'm filling it with dynamic content and if there's only one entry I need to get rid of the enumeration style.
I found a place deep down in the structure that has a value for enumeration but when setting it to null, I don't see any changes in my template.
//This value is "Listenabsatz" (German) and I want to get rid of it
//Setting this value to "" or setting pStyle to null didn't help
In my actual code this is the place where I'm trying to change it:
Tr templateRow = (Tr) rows.get(0);
Tc cell = (Tc) ((javax.xml.bind.JAXBElement) templateRow.getContent().get(1)).getValue();
P par = (P) (cell.getContent().get(0));
PPr parStyle = par.getPPr();
if (parStyle.getPStyle() != null && parStyle.getPStyle().getVal() != null) {
How can I remove that enumeration style succesfully?
Aspose code is inserting Viewmaster(vertical) with default date to
select as a text inside. I want to replace with some text as shown in
the image.
Followed the code mentioned in ViewMaster(vertical) using Aspose
to generate the ViewMaster(Vertical) in the word/pdf. can someone help
in getting the right code to replace the date with text
Date is set in structured document tag. You can use code like this to get and modify value of this SDT:
// Get structured document tags from footer.
NodeCollection tags = doc.FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].GetChildNodes(NodeType.StructuredDocumentTag, true);
foreach (StructuredDocumentTag tag in tags)
if (tag.Title.Equals("Date") && tag.SdtType == SdtType.Date)
tag.IsShowingPlaceholderText = false;
tag.FullDate = DateTime.Now;
// By default SDT is minded to XML. We can simply remove mapping to use value set in FullDate property.
If you do not need date, but need to insert some custom text, you can remove the tag and insert a simple paragraph with text instead. For example:
// Get structured document tags from footer.
NodeCollection tags = doc.FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].GetChildNodes(NodeType.StructuredDocumentTag, true);
foreach (StructuredDocumentTag tag in tags)
if (tag.Title.Equals("Date") && tag.SdtType == SdtType.Date)
// Put an empty paragraph ater the structured document tag
Paragraph p = new Paragraph(doc);
tag.ParentNode.InsertAfter(p, tag);
// Remove tag
// move DocumentBuilder to the newly inserted paragraph and insert some text.
builder.Write("This is my custom vertical text");
I am playing around with nutch. I am trying to write something which also include detecting specific nodes in the DOM structure and extracting text data from around the node. e.g. text from parent nodes, sibling nodes etc. I researched and read some examples and then tried writing a plugin that will do this for an image node. Some of the code,
if("img".equalsIgnoreCase(nodeName) && nodeType == Node.ELEMENT_NODE){
String imageUrl = "No Url";
String altText = "No Text";
String imageName = "No Image Name"; //For the sake of simpler code, default values set to
//avoid nullpointerException in findMatches method
NamedNodeMap attributes = currentNode.getAttributes();
List<String>ParentNodesText = new ArrayList<String>();
ParentNodesText = getSurroundingText(currentNode);
//Analyze the attributes values inside the img node. <img src="xxx" alt="myPic">
for(int i = 0; i < attributes.getLength(); i++){
Attr attr = (Attr)attributes.item(i);
imageUrl = getImageUrl(base, attr);
imageName = getImageName(imageUrl);
else if("alt".equalsIgnoreCase(attr.getName())){
altText = attr.getValue().toLowerCase();
private List<String> getSurroundingText(Node currentNode){
List<String> SurroundingText = new ArrayList<String>();
while(currentNode != null){
if(currentNode.getNodeType() == Node.TEXT_NODE){
String text = currentNode.getNodeValue().trim();
if(currentNode.getPreviousSibling() != null && currentNode.getPreviousSibling().getNodeType() == Node.TEXT_NODE){
String text = currentNode.getPreviousSibling().getNodeValue().trim();
currentNode = currentNode.getParentNode();
return SurroundingText;
This doesn't seem to work properly. img tag gets detected, Image name and URL gets retrieved but no more help. the getSurroundingText module looks too ugly, I tried but couldn't improve it. I don't have clear idea from where and how can I extract text which could be related to the image. Any help please?
you're on the right track, on the other hand, take a look at this example HTML of code:
<img src="" alt="test image" title="awesome title">
In your case, I think that the problem lies in the sibling nodes of the img node, for instance you're looking for the direct siblings, and you may think that on the previous example these would be the span nodes, but in this case are some dummy text nodes so when you ask for the sibling node of the img you'll get this empty node with no actual text.
If we rewrite the previous HTML as: <div><span>test1</span><img src="" alt="test image" title="awesome title"><span>test2</span></div> then the sibling nodes of the img would be the span nodes that you want.
I'm assuming that in the previous example you want to get both "text1" and "text2", in that case you need to actually keep moving until you find some Node.ELEMENT_NODE and then fetch the text inside that node. One good practice would be to not grab anything that you find, but limit your scope to p,span,div to improve the accuracy.
I am trying to read Microsoft word documents via apache-poi and found that there are couple of convenient methods provided to scan through document like getText(), getParagraphList() etc.. But my use case is slightly different and the way we want to scan through any document is, it should give us events/information like heading, paragraph, table in the same sequence as they appear in document. It will help me in preparing a document structure like,
<heading> ABC </heading>
<paragraph>xyz </paragraph>
<paragraph>scanning through APIs</paragraph>
The main intent is to maintain the relationship between heading and paragraphs as in original document. Not sure but can something like this work for me,
Iterator<IBodyElement> itr = doc.getBodyElementsIterator();
while(itr.hasNext()) {
IBodyElement ele =;
I was able to get the paragraph list but not heading information using this code. Just to mention, I would be interested in all headings, they might be explicitly marked as heading by using style or by using large font size.
Headers aren't stored inline in the main document, they live elsewhere, which is why you're not getting them as body elements. Body elements are things like sections, paragraphs and tables, not headers, so you have to fetch them yourself.
If you look at this code in Apache Tika, you'll see an example of how to do so. Assuming you're iterating over the body elements, and want headers / footers of paragraphs, you'll want code something like this (based on the Tika code):
for(IBodyElement element : bodyElement.getBodyElements()) {
if(element instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)element;
XWPFHeaderFooterPolicy headerFooterPolicy = null;
if (paragraph.getCTP().getPPr() != null) {
CTSectPr ctSectPr = paragraph.getCTP().getPPr().getSectPr();
if(ctSectPr != null) {
headerFooterPolicy = new XWPFHeaderFooterPolicy(document, ctSectPr);
// Handle Header
// Handle paragraph
if (headerFooterPolicy != null) {
// Handle footer
if(element instanceof XWPFTable) {
XWPFTable table = (XWPFTable)element;
// Handle table
if (element instanceof XWPFSDT){
XWPFSDT sdt = (XWPFSDT) element;
// Handle SDT
I have a form that runs a java agent on the WebQueryOpen event. This agent pulls data from a DB2 database and then puts them into the computed text fields I have placed on the form and are displayed whenever I open the form in the browser. This is working for me. However, when I try to use RichTextFields I get a ClassCastException error. No document is actually saved, I just open the form in the browser using this domino URL -
Sample code of simple text field - Displayed with w/o problems
Document sampledoc = agentContext.getDocumentContext();
String samplestr = "sample data from db2";
sampledoc.replaceItemValue("sampletextfield", samplestr);
When I tried using rich text field
Document sampledoc = agentContext.getDocumentContext();
String samplestr = "sample data from db2";
RichTextItem rtsample = (RichTextItem)sampledoc.getFirstItem('samplerichtextfield');
rtsample.appendText(samplestr); // ClassCastException error
Basically, I wanted to use rich text field so that it could accommodate more characters in case I pull a very long string data.
Screenshot of the field (As you can see it's a RichText)
The problem is that you're trying to access a regular Item as a RichTextItem.
The RichTextItem are special fields that are created with its own method just like this:
RichTextItem rtsample = (RichTextItem)sampledoc.createRichTextItem('samplerichtextfield');
It's different to the regular Items that can be created with a simple sampledoc.replaceItemValue(etc).
So, if you want to know if a item is RichTextItem and if it does not exist, create it, you can do this:
RichTextItem rti = null;
Item item = doc.getFirstItem("somefield");
if (item != null) {
if (item instanceof RichTextItem) {
rti = (RichTextItem) item;
} else {
} else {
rti = doc.createRichTextItem("somefield");
How can we remove an image from the docx4j.
Say I have 10 images, and i want to replace 8 images with my own byte array/binary data, and I want to delete remaining 2.
I am also having trouble in locating images.
Is it somehow possible to replace text placeholders in the document with images?
Refer to this post :
for(Object obj : elemetns){
if(obj instanceof Tbl){
Tbl table = (Tbl) obj;
List rows = getAllElementFromObject(table, Tr.class);
for(Object trObj : rows){
Tr tr = (Tr) trObj;
List cols = getAllElementFromObject(tr, Tc.class);
for(Object tcObj : cols){
Tc tc = (Tc) tcObj;
List texts = getAllElementFromObject(tc, Text.class);
for(Object textObj : texts){
Text text = (Text) textObj;
File file = new File("C:\\image.jpeg");
P paragraphWithImage = addInlineImageToParagraph(createInlineImage(file));
See docx4j checking checkboxes for the 2 approaches to finding stuff (XPath, or non XPath traversal).
VariableReplace allows you to replace text placeholders, but not with images. I think there may be code floating around (in the docx4j forums?) which extends it to do that.
But I'd suggest you use content control databinding instead. See how to create a new word from template with docx4j
You can use base64 encoded images in your XML data, and docx4j and/or Word will do the rest.