Write text and tables in to word, with whitespaces/enters

Write text and tables in to word, with whitespaces/enters - java

I'm writing text and text from tables into a word document.
With the following code the tables are placed under the right paragraphs.
Iterator<IBodyElement> iter = xdoc.getBodyElementsIterator();
while (iter.hasNext())
{
IBodyElement elem = iter.next();
if (elem instanceof XWPFParagraph)
{
relevantText.setText(((XWPFParagraph) elem).getText());
} else if (elem instanceof XWPFTable)
{
tabellen.setText(((XWPFTable) elem).getText());
}
}
Now when I try to make a whitespace/enter with addBreak() or addCarriageReturn() the order of my document is wrong. The table text is placed after all the text.
Has anyone a solution for this?

I had the same problem a couple of days ago. did you create 2 diffrent runs for the paragraphs and the tables?
Because I did, and when I changed it to 1 run it did work for me.
Like this:
XWPFRun text = paragraph.createRun();

Related

find out strike out text of word document using java

Is there any way to find out that the text of a cell in a table in docx is strike out? Using java......
I have extracted tables from a word document. The tables contains strike out text too. I want to know if it is possible to know whether the text is strike out or not.

If you have all the XWPFTable t you could use this method to find all the runs in the table and find out which is strike through. Run contains all the text that is formatted in the same way.
private void exploreTable(XWPFTable t) {
for (XWPFTableRow row : t.getRows()) {
for (XWPFTableCell c : row.getTableCells()) {
for (XWPFParagraph p : c.getParagraphs()) {
for (XWPFRun run : p.getRuns()) {
if(run.isStrikeThrough()) {.....}
}
}
}
}
}

In Apache POI, Is there a way to access XWPF elements by id their id?

I have word document (it is docx and xml based), I want to find a table and populate it programmatically. I am using Apache POI, XWPF API.
Is there a way to access XWPF elements by their id?
How can I create uniqueness between XWPF elements then alter using java?
Thanks

What I have implemented is a find replace feature(from here);
In my template docx file I am using "id like texts", __heading1__, __subjectname__, Then replacing with them using code below. For tables #axel-richters solution may be suitable.
private void findReplace(String a, String b, CustomXWPFDocument document){
for (XWPFParagraph p : document.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
if (text != null && text.contains(a)) {
text = text.replace(a, b);
r.setText(text, 0);
}
}
}
}
}

How to draw lines between records for plain text output format (.txt)?

My problem is that when i try to export the jasperprint to text, the lines/rectangles doesn't appear in the .txt files though it works with pdf files.
I tried to modify the pen width but nothing appear .
My code is:
JRDesignLine ltest = new JRDesignLine();
ltest.setBackcolor(Color.black);
ltest.setForecolor(Color.black);
ltest.setX(10);
ltest.setY(200);
ltest.setMode(JRDesignStaticText.MODE_OPAQUE);
ltest.setWidth(500);
ltest.setHeight(10);
ltest.setPen(JRDesignLine.FILL_SOLID);
bandHeader.addElement(ltest);
Any suggestions please.

You are asking about the plain text format and it is very simple.
The problem
The JRTextExporter is skipping JRLine elements and does not draw the borders (JRLineBox).
The snippet of JRTextExporter:
protected void exportElements(List<JRPrintElement> elements) {
for (int i = 0; i < elements.size(); i++) {
Object element = elements.get(i);
if (element instanceof JRPrintText) {
exportText((JRPrintText) element);
} else if (element instanceof JRPrintFrame) {
JRPrintFrame frame = (JRPrintFrame) element;
setFrameElementsOffset(frame, false);
try {
exportElements(frame.getElements());
} finally {
restoreElementOffsets();
}
}
}
As you can see only the JRPrintText will be prinited. And the protected void exportText(JRPrintText) method does not know anything about Boxes. You can compare it whith the source of JRRtfExporter.exportText(JRPrintText) method.
Solutions
You can try to use rtf output format (JRRtfExporter)
Another idea (not elegant and a little bit ugly) is to use staticText or textField (with or without condition) to simulate borders.
You can try to set net.sf.jasperreports.export.text.line.separator property. For example you can set value:
_________________________________________________________
. It is \r\n at the start, the some numbers of underscore and \r\n at the end of line. The reports properties can be something like this:
<property name="net.sf.jasperreports.export.text.page.height" value="66"/>
<property name="net.sf.jasperreports.export.text.page.width" value="94"/>
<property name="net.sf.jasperreports.export.text.line.separator" value="
_________________________________________________________
"/>
The output can be look better with JRCsvExporter and net.sf.jasperreports.csv.field.delimiter, net.sf.jasperreports.csv.record.delimiter properties

Extracting heading and paragraphs from doc and docx files using apache-poi

I am trying to read Microsoft word documents via apache-poi and found that there are couple of convenient methods provided to scan through document like getText(), getParagraphList() etc.. But my use case is slightly different and the way we want to scan through any document is, it should give us events/information like heading, paragraph, table in the same sequence as they appear in document. It will help me in preparing a document structure like,
<content>
<section>
<heading> ABC </heading>
<paragraph>xyz </paragraph>
<paragraph>scanning through APIs</paragraph>
<section>
.
.
.
</content>
The main intent is to maintain the relationship between heading and paragraphs as in original document. Not sure but can something like this work for me,
Iterator<IBodyElement> itr = doc.getBodyElementsIterator();
while(itr.hasNext()) {
IBodyElement ele = itr.next();
System.out.println(ele.getElementType());
}
I was able to get the paragraph list but not heading information using this code. Just to mention, I would be interested in all headings, they might be explicitly marked as heading by using style or by using large font size.

Headers aren't stored inline in the main document, they live elsewhere, which is why you're not getting them as body elements. Body elements are things like sections, paragraphs and tables, not headers, so you have to fetch them yourself.
If you look at this code in Apache Tika, you'll see an example of how to do so. Assuming you're iterating over the body elements, and want headers / footers of paragraphs, you'll want code something like this (based on the Tika code):
for(IBodyElement element : bodyElement.getBodyElements()) {
if(element instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)element;
XWPFHeaderFooterPolicy headerFooterPolicy = null;
if (paragraph.getCTP().getPPr() != null) {
CTSectPr ctSectPr = paragraph.getCTP().getPPr().getSectPr();
if(ctSectPr != null) {
headerFooterPolicy = new XWPFHeaderFooterPolicy(document, ctSectPr);
// Handle Header
}
}
// Handle paragraph
if (headerFooterPolicy != null) {
// Handle footer
}
}
if(element instanceof XWPFTable) {
XWPFTable table = (XWPFTable)element;
// Handle table
}
if (element instanceof XWPFSDT){
XWPFSDT sdt = (XWPFSDT) element;
// Handle SDT
}
}

Docx4j - Images in the document

How can we remove an image from the docx4j.
Say I have 10 images, and i want to replace 8 images with my own byte array/binary data, and I want to delete remaining 2.
I am also having trouble in locating images.
Is it somehow possible to replace text placeholders in the document with images?

Refer to this post : http://vixmemon.blogspot.com/2013/04/docx4j-replace-text-placeholders-with.html
for(Object obj : elemetns){
if(obj instanceof Tbl){
Tbl table = (Tbl) obj;
List rows = getAllElementFromObject(table, Tr.class);
for(Object trObj : rows){
Tr tr = (Tr) trObj;
List cols = getAllElementFromObject(tr, Tc.class);
for(Object tcObj : cols){
Tc tc = (Tc) tcObj;
List texts = getAllElementFromObject(tc, Text.class);
for(Object textObj : texts){
Text text = (Text) textObj;
if(text.getValue().equalsIgnoreCase("${MY_PLACE_HOLDER}")){
File file = new File("C:\\image.jpeg");
P paragraphWithImage = addInlineImageToParagraph(createInlineImage(file));
tc.getContent().remove(0);
tc.getContent().add(paragraphWithImage);
}
}
System.out.println("here");
}
}
System.out.println("here");
}
}
wordMLPackage.save(new java.io.File("C:\\result.docx"));

See docx4j checking checkboxes for the 2 approaches to finding stuff (XPath, or non XPath traversal).
VariableReplace allows you to replace text placeholders, but not with images. I think there may be code floating around (in the docx4j forums?) which extends it to do that.
But I'd suggest you use content control databinding instead. See how to create a new word from template with docx4j
You can use base64 encoded images in your XML data, and docx4j and/or Word will do the rest.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Write text and tables in to word, with whitespaces/enters - java

I had the same problem a couple of days ago. did you create 2 diffrent runs for the paragraphs and the tables? Because I did, and when I changed it to 1 run it did work for me. Like this: XWPFRun text = paragraph.createRun();

Related

find out strike out text of word document using java

In Apache POI, Is there a way to access XWPF elements by id their id?

How to draw lines between records for plain text output format (.txt)?

Extracting heading and paragraphs from doc and docx files using apache-poi

Docx4j - Images in the document

Categories

Resources