Updating an MSWord document with Apache POI

Updating an MSWord document with Apache POI - java

I'm trying to update a Microsoft Word document using Apache POI. The msword document is a template that contains a number of placeholders in the form "${place.holder}" and all I need to do is to replace the holders with specific values. What I've got so far is
private void start() throws FileNotFoundException, IOException {
POIFSFileSystem fsfilesystem = null;
HWPFDocument hwpfdoc = null;
InputStream resourceAsStream = getClass().getResourceAsStream("/path/to/document/templates/RMA FORM.doc");
try {
fsfilesystem = new POIFSFileSystem(resourceAsStream );
hwpfdoc = new HWPFDocument(fsfilesystem);
Range range = hwpfdoc.getRange();
range.replaceText("${rma.number}","08739");
range.replaceText("${customer.name}", "Roger Swann");
FileOutputStream fos = new FileOutputStream(new File("C:\\temp\\updatedTemplate.doc"));
hwpfdoc.write(fos);
fos.flush();
fos.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
The program runs without errors. If I look in the output file with a Hex editor I can see that the placeholders have been replaced by the program. However, when I try to open the document with MSWord, MSWord crashes.
Is there a step (series of steps) that I'm missing, or am I basically out of luck with this? Do I need to adjust any counters because the length of the replacement text is not the same as the length of the replaced text?
Regards

use new FileInputStream() instead of getClass().getResourceAsStream("/path/to/document/templates/RMA FORM.doc");

Related

Convert DOCX to PDF - Java

I generate a docx document in runtime and I want to convert it to PDF without actually saving the file locally
public byte[] convertToPDF(byte[] docxDocument) {
try {
InputStream doc = new ByteArrayInputStream(docxDocument);
XWPFDocument xwpfDocument = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new ByteArrayOutputStream();
PdfConverter.getInstance().convert(xwpfDocument, out, options);
//return data; ???
} catch (IOException e) {
LOGGER.error("Could not convert docx to PDF", e);
}
}
PdfConverter is void. How can I achieve this?

I think you should use out which keeps the converted data
return out.toByteArray();

IText html to pdf wrapping line

Hello I'm creating javafx app with iText. I have html editor to write text and I want to create pdf from it. Everything works but when I have a really long line that is wrapped in html editor, in pdf it isn't wrapped, its out of page, how can I set wrapping page? here is my code:
PdfWriter writer = null;
try {
writer = new PdfWriter("doc.pdf");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements(editor.getHtmlText());
} catch (IOException e) {
e.printStackTrace();
}
// add elements to document
for (IElement p : list) {
document.add((IBlockElement) p);
}
// close document
document.close();
I also want to set line spacing for this text
Thank you for help

I don't get any errors for the following code:
public class stack_overflow_0008 extends AbstractSupportTicket{
private static String LONG_PIECE_OF_TEXT =
"Once upon a midnight dreary, while I pondered, weak and weary," +
"Over many a quaint and curious volume of forgotten lore—" +
"While I nodded, nearly napping, suddenly there came a tapping," +
"As of some one gently rapping, rapping at my chamber door." +
"Tis some visitor,” I muttered, “tapping at my chamber door—" +
"Only this and nothing more.";
public static void main(String[] args)
{
PdfWriter writer = null;
try {
writer = new PdfWriter(getOutputFile());
} catch (FileNotFoundException e) {
e.printStackTrace();
}
//Initialize PDF document
PdfDocument pdf = new PdfDocument(writer);
// Initialize document
Document document = new Document(pdf, PageSize.A4);
List<IElement> list = null;
try {
list = HtmlConverter.convertToElements("<p>" + LONG_PIECE_OF_TEXT + "</p>");
} catch (IOException e) {
e.printStackTrace();
}
for (IElement p : list) {
document.add((IBlockElement) p);
}
document.close();
}
}
The document is a single (A4) page PDF with one string neatly wrapped.
I think perhaps the content of your string is to blame?
Could you post the HTML you get from this editor object?
Update:
Using the code from this answer on the HTML shared in a new comment to the question, I get the following result:
As you can see, the content is distributed over two lines. No content "falls off the page."

Apache POI Table of contents not updating

I am using Apache POI XWPF components and java, to extract data from a .xml file into a word document. So far so good, but I am struggling to create a table of contents.
I have to create a table of contents at the start of the method and then I update it at the end to get all the new headers. Currently I use doc.createTOC(), where doc is a variable created from XWPFDocument, to create the table at the start and then I use doc.enforceUpdateFields() to update everything at the end of the document. But when I open the document after I ran the program, the table of contents is empty, but the navigation panel does include some of the headers I specified.
A comment recommended that I include some code. So i started off by create the document from a template:
XWPFDocument doc = new XWPFDocument(new FileInputStream("D://Template.docx"));
I then create a table of contents:
doc.createTOC();
Then throughout the method I add headers to the document:
XWPFParagraph documentControlHeading = doc.createParagraph();
documentControlHeading.setPageBreak(true);
documentControlHeading.setAlignment(ParagraphAlignment.LEFT);
documentControlHeading.setStyle("Tier1Header");
After all the headers are added, I want to update the document so that all the new headers will appear in the table of contents. I do this buy using the following command:
doc.enforceUpdateFields();

Hmmm... I am looking at the createTOC() method code, and it appears that it looks for styles that look like Heading #. So Tier1Header would not be found. Try creating your text first, and use styles like Heading 1 for your headings. Then add the TOC using createTOC(). It should find all the headings when the TOC is created. I do not know if enforceUpdateFields() affects the TOC.

//Your docx template should contain the following or something similar text //which will be searched for and replaced with a WORD TOC.
//${TOC}
public static void main(String[] args) throws IOException, OpenXML4JException {
XWPFDocument docTemplate = null;
try {
File file = new File(PATH_TO_FILE); //"C:\\Reports\\Template.docx";
FileInputStream fis = new FileInputStream(file);
docTemplate = new XWPFDocument(fis);
generateTOC(docTemplate);
saveDocument(docTemplate);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (docTemplate != null) {
docTemplate.close();
}
}
}
private static void saveDocument(XWPFDocument docTemplate) throws FileNotFoundException, IOException {
FileOutputStream outputFile = null;
try {
outputFile = new FileOutputStream(OUTFILENAME);
docTemplate.write(outputFile);
} finally {
if (outputFile != null) {
outputFile.close();
}
}
}
public static void generateTOC(XWPFDocument document) throws InvalidFormatException, FileNotFoundException, IOException {
String findText = "${TOC}";
String replaceText = "";
for (XWPFParagraph p : document.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
int pos = r.getTextPosition();
String text = r.getText(pos);
if (text != null && text.contains(findText)) {
text = text.replace(findText, replaceText);
r.setText(text, 0);
addField(p, "TOC \\o \"1-3\" \\h \\z \\u");
break;
}
}
}
}
private static void addField(XWPFParagraph paragraph, String fieldName) {
CTSimpleField ctSimpleField = paragraph.getCTP().addNewFldSimple();
// ctSimpleField.setInstr(fieldName + " \\* MERGEFORMAT ");
ctSimpleField.setInstr(fieldName);
ctSimpleField.addNewR().addNewT().setStringValue("<<fieldName>>");
}

This is the code of createTOC(), obtained by inspecting XWPFDocument.class:
public void createTOC() {
CTSdtBlock block = getDocument().getBody().addNewSdt();
TOC toc = new TOC(block);
for (XWPFParagraph par : this.paragraphs) {
String parStyle = par.getStyle();
if ((parStyle != null) && (parStyle.startsWith("Heading"))) try {
int level = Integer.valueOf(parStyle.substring("Heading".length())).intValue();
toc.addRow(level, par.getText(), 1, "112723803");
} catch (NumberFormatException e) {
e.printStackTrace();
}
}
}
As you can see, it adds to the TOC all paragraphs having styles named "HeadingX", with X being a number. But, unfortunately, that's not sufficent. The method, in fact, is bugged/uncomplete in its implementation.
The page number passed to addRow() is always 1, it's not even calculated.
So, at the end, you will have a TOC with all your paragraphs and the trailing dots giving the proper indentation, but the pages will be always equal to "1".
EDIT
...but, there's a solution here.

iText continuous PDF editing java

I'm using iText library to create and add data to a PDF.
I want to add some textLines and an image to the PDF more than once until i close the file.
numOfSamples = timeIHitTheButton();
.
.
.
*a loop tha call it the number of times choosen by numOfSamples*
DSM.saveData();
The DataStore (DSM is a DataStore instance) class creates the Document doc.pdf correctly and DSM.addText() and DSM.addPicture() prints correctly three textlines an an image on the file, but only if I press the button just once !!
I WANT TO WRITE THE SAME STRING AND AN IMAGE EVERY TIME I PRESS THE BUTTON (if I press it once i have one sample, if trwice i have two samples etc). IF I PRESS IT JUST ONCE AND I TERMINATE, I GET MY PDF WITH THE STRING AND THE PICTURES, BUT IF I PRESS IT MORE THAN ONCE, I GOT AN UNREADABLE AND DAMAGED PDF FILE. I DON'T KNOW WHY. HOW CAN I CONTINUE WRITIN A PICTURE AND THE STRING CONTINUOSLY UNTIL THE NUMBER OF SAMPLES IS FINISHED?
Here i post some code if useful ("newPic1.jpg" "newPic2.jpg" etc are the stored pictures to add to the PDF togheter with the text.):
public class DataStore{ ....
.
.
.
public DataStore(String Str1, String Str2, String Str3, int numOfSemples)
throws Exception{
document = new Document();
String1 = str1;
String2 = str2;
String3 = str3;
Samples = numOfSemples;
document.open();
}
privatevoid saveData(){
if(!created){
this.createFile();
created=true;
}
this.addText();
this.addPicture();
}
private void createFile(){
try {
OutputStream file = new FileOutputStream(
new File("Doc.pdf"));
PdfWriter.getInstance(document, file);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (DocumentException e) {
e.printStackTrace();
}
}
private void addText(){
try {
if(Samples > 0)
document.open();
document.add(new Paragraph(Double.toString(String1)));
document.add(new Paragraph(Double.toString(String2)));
document.add(new Paragraph(Double.toString(String3)));
} catch (DocumentException e) {
e.printStackTrace();
}
}
private void addPicture(){
try {
Image img = Image.getInstance("NewPic" + Samples + ".jpg");
document.add(img);
} catch (BadElementException bee) {
bee.printStackTrace();
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} catch (DocumentException dee) {
dee.printStackTrace();
}
if(Samples == 0)
document.close();
else Samples--;
}
}

You use iText commands in the wrong order:
Your DataStore constructor creates a new Document and calls its open method (which is too early as there is no writer yet).
Some time later, in the first saveData call, you call createFile which creates the PdfWriter.
In all saveData calls addText is called which for Samples > 0 opens the document again each time (which is ok at the first time but shall not be done multiple times).
Eventually, in the saveData call with Samples == 0 you close the document.
Thus, in essence you do this:
document = new Document();
document.open();
[...]
PdfWriter.getInstance(document, file);
[...]
[for `Samples` times]
document.open();
[add some paragraphs]
[add an image]
[end for]
document.close();
Compare this to how it should be done:
// step 1
Document document = new Document();
// step 2
PdfWriter.getInstance(document, new FileOutputStream(filename));
// step 3
document.open();
// step 4
[add content to the PDF]
// step 5
document.close();
(copied from the HelloWorld.java sample from iText in Action — 2nd Edition)
Only for Samples == 1 you have it about right (the superfluous document.open() in the constructor being ignored as there is no writer yet); for larger values of Samples, though, you open the document multiple times with a writer present which will likely append a PDF start over and over again to the output stream.
Quite likely you can fix the issue by removing all your current document.open() calls (including the if(Samples > 0) in addText()) and add one in createFile() right after PdfWriter.getInstance(document, file).

Java get plain Text from RTF

I have on my database a column that holds text in RTF format.
How can I get only the plain text of it, using Java?

RTFEditorKit rtfParser = new RTFEditorKit();
Document document = rtfParser.createDefaultDocument();
rtfParser.read(new ByteArrayInputStream(rtfBytes), document, 0);
String text = document.getText(0, document.getLength());
this should work

If you can try "AdvancedRTFEditorKit", it might be cool. Try here http://java-sl.com/advanced_rtf_editor_kit.html
I have used it to create a complete RTF editor, with all the supports MS Word has.

Apache POI will also read Microsoft Word formats, not just RTF.
POI
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public String getRtfText(String fileName) {
File rtfFile = null;
WordExtractor rtfExtractor = null ;
try {
rtfFile = new File(fileName);
//A FileInputStream obtains input bytes from a file.
FileInputStream inStream = new FileInputStream(rtfFile.getAbsolutePath());
//A HWPFDocument used to read document file from FileInputStream
HWPFDocument doc=new HWPFDocument(inStream);
rtfExtractor = new WordExtractor(doc);
}
catch(Exception ex)
{
System.out.println(ex.getMessage());
}
//This Array stores each line from the document file.
String [] rtfArray = rtfExtractor.getParagraphText();
String rtfString = "";
for(int i=0; i < rtfArray.length; i++) rtfString += rtfArray[i];
System.out.println(rtfString);
return rtfString;
}

This works if the RTF text is in a JEditorPane
String s = getPlainText(aJEditorPane.getDocument());
String getPlainText(Document doc) {
try {
return doc.getText(0, doc.getLength());
}
catch (BadLocationException ex) {
System.err.println(ex);
return null;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Updating an MSWord document with Apache POI - java

use new FileInputStream() instead of getClass().getResourceAsStream("/path/to/document/templates/RMA FORM.doc");

Related

Convert DOCX to PDF - Java

IText html to pdf wrapping line

Apache POI Table of contents not updating

iText continuous PDF editing java

Java get plain Text from RTF

Categories

Resources