Very many examples exist for reading and editing/replacing bookmarks in XWPF word document.
But I want to create a document and create new bookmarks.
Create document - no problem:
private void createWordDoc() throws IOException {
XWPFDocument document = new XWPFDocument();
File tempDocFile = new File(pathName+"\\temp.docx");
FileOutputStream out = new FileOutputStream(tempDocFile);
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("testing string ");
document.write(out);
out.close();
}
How can I make a bookmark on text "testing string"?
This is not implemented in high level classes of apache poi until now. Therefore low level CTP and CTBookmark are needed.
Example:
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import java.math.BigInteger;
public class CreateWordBookmark {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
//bookmark before the run
CTBookmark bookmark = paragraph.getCTP().addNewBookmarkStart();
bookmark.setName("before_testing_string");
bookmark.setId(BigInteger.valueOf(0));
paragraph.getCTP().addNewBookmarkEnd().setId(BigInteger.valueOf(0));
//bookmark the run
bookmark = paragraph.getCTP().addNewBookmarkStart();
bookmark.setName("testing_string");
bookmark.setId(BigInteger.valueOf(1));
XWPFRun run = paragraph.createRun();
run.setText("testing string ");
paragraph.getCTP().addNewBookmarkEnd().setId(BigInteger.valueOf(1));
//bookmark after the run
bookmark = paragraph.getCTP().addNewBookmarkStart();
bookmark.setName("after_testing_string");
bookmark.setId(BigInteger.valueOf(2));
paragraph.getCTP().addNewBookmarkEnd().setId(BigInteger.valueOf(2));
document.write(new FileOutputStream("CreateWordBookmark.docx"));
document.close();
}
}
Related
I'm a rookie, really. I'm building my first project (if I can finish it).
I want to extract PDF text with formatting and location, and then write to .docx file. I checked the PDFBox API documentation, but I'm not sure if I want to get the location of the text, then should I traverse the rows? Or traverse the characters? I studied these three carefully.
Text coordinates when stripping from PDFBox
Get font of each line using PDFBox
How to extract font styles of text contents using pdfbox?
And here is my DEMO:
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import java.io.IOException;
import java.util.List;
public class PDFTextExtractor extends PDFTextStripper {
/**
* Instantiate a new PDFTextStripper object.
*
* #throws IOException If there is an error loading the properties.
*/
public PDFTextExtractor() throws IOException {
}
String prevFont = "";
#Override
protected void writeString(String text, List<TextPosition> textPositions) throws IOException {
StringBuilder sb = new StringBuilder();
for (TextPosition position : textPositions){
String font = position.getFont().getName();
float x = position.getX();
float y = position.getY();
float fontSize = position.getFontSizeInPt();
if (font != null && !font.equals(prevFont)){
sb.append("[").append(font.split("-")[0]).append("+").append(font.split("-")[1]).append("+").append(fontSize).append("]");
prevFont = font;
}
sb.append(position.getUnicode());
}
writeString(sb.toString());
}
#Override
public String getText(PDDocument doc) throws IOException {
return super.getText(doc);
}
}
And i calling it like here:
FileOutputStream outputStream = new FileOutputStream(EXPORT_PATH + file.getName().split("\\.")[0] + ".docx");
try (PDDocument originalPDF = PDDocument.load(file);
XWPFDocument doc = new XWPFDocument()) {
//get All pages
PDPageTree pageList = originalPDF.getDocumentCatalog().getPages();
for (PDPage page : pageList){
//Parse Content
PDFTextStripper stripper = new PDFTextExtractor();
stripper.setSortByPosition(true);
String ss = stripper.getText(originalPDF);
System.out.println(ss);
//Write Content
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText(ss);
run.addBreak(BreakType.PAGE);
}
doc.write(outputStream);
originalPDF.close();
outputStream.close();
}
The appendExternalHyperlink() method (source) is not working in the footer of a XWPFDocument. In the footer the result is not getting recognised as a hyperlink.
I am new to Apache POI and have no experiences with the low level stuff. Can someone explain what is the problem in here, please?
public class FooterProblem {
public static void main(final String[] args) throws Exception {
final XWPFDocument docx = new XWPFDocument();
final XWPFParagraph para = docx.createParagraph();
final XWPFRun paraRun = para.createRun();
paraRun.setText("Email: ");
appendExternalHyperlink("mailto:me#example.com", "me#example.com", para);
final XWPFParagraph footer = docx.createFooter(HeaderFooterType.DEFAULT).createParagraph();
final XWPFRun footerRun = footer.createRun();
footerRun.setText("Email: ");
appendExternalHyperlink("mailto:me#example.com", "me#example.com", footer);
final FileOutputStream out = new FileOutputStream("FooterProblem.docx");
docx.write(out);
out.close();
docx.close();
}
public static void appendExternalHyperlink(final String url, final String text, final XWPFParagraph paragraph) {
// Add the link as External relationship
final String id = paragraph.getDocument().getPackagePart()
.addExternalRelationship(url, XWPFRelation.HYPERLINK.getRelation()).getId();
// Append the link and bind it to the relationship
final CTHyperlink cLink = paragraph.getCTP().addNewHyperlink();
cLink.setId(id);
// Create the linked text
final CTText ctText = CTText.Factory.newInstance();
ctText.setStringValue(text);
final CTR ctr = CTR.Factory.newInstance();
ctr.setTArray(new CTText[] { ctText });
// Insert the linked text into the link
cLink.setRArray(new CTR[] { ctr });
}
}
The footer[n].xml is its own package part and needs its own relations. But your code creates the external hyperlink relations for the document.xml package part always. It always uses paragraph.getDocument(). This is wrong.
The following code provides a method for creating a XWPFHyperlinkRun in a given XWPFParagraph and gets the correct package part to put the relations on. It uses paragraph.getPart() to get the correct part. So this method works for paragraphs in the document body as well as in header and/or footer.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.wp.usermodel.HeaderFooterType;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHyperlink;
public class CreateWordHyperlinks {
static XWPFHyperlinkRun createHyperlinkRun(XWPFParagraph paragraph, String uri) throws Exception {
String rId = paragraph.getPart().getPackagePart().addExternalRelationship(
uri,
XWPFRelation.HYPERLINK.getRelation()
).getId();
CTHyperlink cthyperLink=paragraph.getCTP().addNewHyperlink();
cthyperLink.setId(rId);
cthyperLink.addNewR();
return new XWPFHyperlinkRun(
cthyperLink,
cthyperLink.getRArray(0),
paragraph
);
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("This is a text paragraph having a link to Google ");
XWPFHyperlinkRun hyperlinkrun = createHyperlinkRun(paragraph, "https://www.google.de");
hyperlinkrun.setText("https://www.google.de");
hyperlinkrun.setColor("0000FF");
hyperlinkrun.setUnderline(UnderlinePatterns.SINGLE);
run = paragraph.createRun();
run.setText(" in it.");
XWPFFooter footer = document.createFooter(HeaderFooterType.DEFAULT);
paragraph = footer.createParagraph();
run = paragraph.createRun();
run.setText("Email: ");
hyperlinkrun = createHyperlinkRun(paragraph, "mailto:me#example.com");
hyperlinkrun.setText("me#example.com");
hyperlinkrun.setColor("0000FF");
hyperlinkrun.setUnderline(UnderlinePatterns.SINGLE);
FileOutputStream out = new FileOutputStream("CreateWordHyperlinks.docx");
document.write(out);
out.close();
document.close();
}
}
I am doing a placeholder replacements in docx file and after that I need to convert file to PDF. All of my efforts are ending in
fr.opensagres.poi.xwpf.converter.core.XWPFConverterException: java.lang.NullPointerException
at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:71)
at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:39)
at fr.opensagres.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:46).
I am using these dependencies:
implementation("org.apache.poi:poi-ooxml:3.17")
implementation("fr.opensagres.xdocreport:fr.opensagres.xdocreport.converter.docx.xwpf:2.0.1")
If I try to convert source (unchanged) docx file, everything works as it should, but when I do replace placeholders and save document, everything crashes.
Piece of my code:
FileInputStream fis = new FileInputStream(COPIED);
XWPFDocument doc = new XWPFDocument(fis);
doc.createStyles();
for (XWPFParagraph p : doc.getParagraphs()) {
List<XWPFRun> runs = p.getRuns();
if (runs != null) {
for (XWPFRun r : runs) {
String text = r.getText(0);
StringSubstitutor substitutor = new StringSubstitutor(fieldsForReport);
String replacedText = substitutor.replace(text);
r.setText(replacedText, 0);
}
}
}
for (XWPFTable tbl : doc.getTables()) {
for (XWPFTableRow row : tbl.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph p : cell.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
StringSubstitutor substitutor = new StringSubstitutor(fieldsForReport);
String replacedText = substitutor.replace(text);
r.setText(replacedText, 0);
}
}
}
}
}
FileOutputStream fos = new FileOutputStream(COPIED);
doc.write(fos);
doc.close();
FileInputStream fis = new FileInputStream(COPIED);
XWPFDocument document = new XWPFDocument(fis);
PdfOptions options = PdfOptions.create();
PdfConverter converter = (PdfConverter) PdfConverter.getInstance();
converter.convert(document, new FileOutputStream(DEST), options);
document.close();
The following works for me using the newest apache poi version 4.0.1 and the newest version 2.0.2 of fr.opensagres.poi.xwpf.converter.core and consorts.
import java.io.InputStream;
import java.io.OutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.File;
//needed jars: fr.opensagres.poi.xwpf.converter.core-2.0.2.jar,
// fr.opensagres.poi.xwpf.converter.pdf-2.0.2.jar,
// fr.opensagres.xdocreport.itext.extension-2.0.2.jar,
// itext-4.2.1.jar
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
//needed jars: apache poi and it's dependencies
//inclusive ooxml-schemas-1.4.jar
import org.apache.poi.xwpf.usermodel.*;
public class DOCXToPDFConverterSampleMin {
public static void main(String[] args) throws Exception {
String docPath = "./WordDocument.docx";
String pdfPath = "./WordDocument.pdf";
InputStream in = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(in);
for (XWPFParagraph paragraph : document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
String text = run.getText(0);
if (text != null && text.contains("$name$")) {
text = text.replace("$name$", "Axel Richter");
run.setText(text, 0);
} else if (text != null && text.contains("$date$")) {
text = text.replace("$date$", "2019-02-28");
run.setText(text, 0);
}
}
}
for (XWPFTable table : document.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
String text = run.getText(0);
if (text != null && text.contains("$name$")) {
text = text.replace("$name$", "Axel Richter");
run.setText(text,0);
} else if (text != null && text.contains("$date$")) {
text = text.replace("$date$", "2019-02-28");
run.setText(text, 0);
}
}
}
}
}
}
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("This is new Text in this document.");
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
document.close();
out.close();
}
}
I've been trying to add .png image to .docx file header with Apache POI. I did´t find a method that help me. someone know how do it?
Whith this code I could add only text.
XWPFDocument docc = new XWPFDocument();
CTP ctpHeader = CTP.Factory.newInstance();
CTR ctrHeader = ctpHeader.addNewR();
CTText ctHeader = ctrHeader.addNewT();
String headerText = "mi encabezado";
ctHeader.setStringValue(headerText);
XWPFParagraph headerParagraph = new XWPFParagraph(ctpHeader, docc); XWPFParagraph[] parsHeader = new XWPFParagraph[1];
parsHeader[0] = headerParagraph; header.createHeader(XWPFHeaderFooterPolicy.DEFAULT, parsHeader);
Example for creating a Word document with header and footer and an image in the header:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTabStop;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STTabJc;
import java.math.BigInteger;
public class CreateWordHeaderFooter {
public static void main(String[] args) throws Exception {
XWPFDocument doc= new XWPFDocument();
// the body content
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("The Body:");
paragraph = doc.createParagraph();
run=paragraph.createRun();
run.setText("Lorem ipsum....");
// create header start
CTSectPr sectPr = doc.getDocument().getBody().addNewSectPr();
XWPFHeaderFooterPolicy headerFooterPolicy = new XWPFHeaderFooterPolicy(doc, sectPr);
XWPFHeader header = headerFooterPolicy.createHeader(XWPFHeaderFooterPolicy.DEFAULT);
paragraph = header.getParagraphArray(0);
paragraph.setAlignment(ParagraphAlignment.LEFT);
CTTabStop tabStop = paragraph.getCTP().getPPr().addNewTabs().addNewTab();
tabStop.setVal(STTabJc.RIGHT);
int twipsPerInch = 1440;
tabStop.setPos(BigInteger.valueOf(6 * twipsPerInch));
run = paragraph.createRun();
run.setText("The Header:");
run.addTab();
run = paragraph.createRun();
String imgFile="Koala.png";
run.addPicture(new FileInputStream(imgFile), XWPFDocument.PICTURE_TYPE_PNG, imgFile, Units.toEMU(50), Units.toEMU(50));
// create footer start
XWPFFooter footer = headerFooterPolicy.createFooter(XWPFHeaderFooterPolicy.DEFAULT);
paragraph = footer.getParagraphArray(0);
paragraph.setAlignment(ParagraphAlignment.CENTER);
run = paragraph.createRun();
run.setText("The Footer:");
doc.write(new FileOutputStream("test.docx"));
}
}
Edit Mar 29 2016:
This had worked until apache poi 3.13. Now with 3.14 it works not more. Reason: POI will not save the blip reference for images in header paragraphs anymore.
/word/header1.xml:
Code compiled and run with 3.13:
...
<pic:blipFill><a:blip r:embed="rId1"/>
...
Same code compiled and run with 3.14:
...
<pic:blipFill><a:blip r:embed=""/>
...
Edit Mar 31 2016:
Found the problem. Someone was the opinion the public final PackageRelationship getPackageRelationship() needs to be deprecated. So in XWPFRun.java the code in public XWPFPicture addPicture(...) was changed
from version 3.13:
...
CTBlipFillProperties blipFill = pic.addNewBlipFill();
CTBlip blip = blipFill.addNewBlip();
blip.setEmbed(picData.getPackageRelationship().getId());
...
to version 3.14:
...
CTBlipFillProperties blipFill = pic.addNewBlipFill();
CTBlip blip = blipFill.addNewBlip();
blip.setEmbed(parent.getDocument().getRelationId(picData));
...
But parent.getDocument() is the XWPFDocument always while the picData possible is related to the XWPFHeaderFooter.
At the beginning of the public XWPFPicture addPicture(...) the programmers have already know this.
...
if (parent.getPart() instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)parent.getPart();
relationId = headerFooter.addPictureData(pictureData, pictureType);
picData = (XWPFPictureData) headerFooter.getRelationById(relationId);
} else {
XWPFDocument doc = parent.getDocument();
relationId = doc.addPictureData(pictureData, pictureType);
picData = (XWPFPictureData) doc.getRelationById(relationId);
}
...
So if the depreciation should really be enforced, this if..else must also be used while setting the blipID. But why the depreciation at all?
The apache poi 3.14 version lol
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTabStop;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STTabJc;
import java.math.BigInteger;
public class CreateWordHeaderFooter {
public static void main(String[] args) throws Exception {
XWPFDocument doc= new XWPFDocument();
// the body content
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("The Body:");
paragraph = doc.createParagraph();
run=paragraph.createRun();
run.setText("Lorem ipsum....");
// create header start
CTSectPr sectPr = doc.getDocument().getBody().addNewSectPr();
XWPFHeaderFooterPolicy headerFooterPolicy = new XWPFHeaderFooterPolicy(doc, sectPr);
XWPFHeader header = headerFooterPolicy.createHeader(XWPFHeaderFooterPolicy.DEFAULT);
paragraph = header.getParagraphArray(0);
paragraph.setAlignment(ParagraphAlignment.LEFT);
CTTabStop tabStop = paragraph.getCTP().getPPr().addNewTabs().addNewTab();
tabStop.setVal(STTabJc.RIGHT);
int twipsPerInch = 1440;
tabStop.setPos(BigInteger.valueOf(6 * twipsPerInch));
run = paragraph.createRun();
run.setText("The Header:");
run.addTab();
run = paragraph.createRun();
String imgFile="Koala.png";
XWPFPicture picture = run.addPicture(new FileInputStream(imgFile), XWPFDocument.PICTURE_TYPE_PNG, imgFile, Units.toEMU(50), Units.toEMU(50));
System.out.println(picture); //XWPFPicture is added
System.out.println(picture.getPictureData()); //but without access to XWPFPictureData (no blipID)
String blipID = "";
for(XWPFPictureData picturedata : header.getAllPackagePictures()) {
blipID = header.getRelationId(picturedata);
System.out.println(blipID); //the XWPFPictureData are already there
}
picture.getCTPicture().getBlipFill().getBlip().setEmbed(blipID); //now they have a blipID also
System.out.println(picture.getPictureData());
// create footer start
XWPFFooter footer = headerFooterPolicy.createFooter(XWPFHeaderFooterPolicy.DEFAULT);
paragraph = footer.getParagraphArray(0);
paragraph.setAlignment(ParagraphAlignment.CENTER);
run = paragraph.createRun();
run.setText("The Footer:");
doc.write(new FileOutputStream("test.docx"));
}
}
Edit Mar 28 2017:
In apache poi version 3.16 Beta 2 this seems to be fixed since the following code works using apache poi version 3.16 Beta 2:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTabStop;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STTabJc;
import java.math.BigInteger;
public class CreateWordHeaderFooter2 {
public static void main(String[] args) throws Exception {
XWPFDocument doc= new XWPFDocument();
// the body content
XWPFParagraph paragraph = doc.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("The Body:");
paragraph = doc.createParagraph();
run=paragraph.createRun();
run.setText("Lorem ipsum....");
// create header start
CTSectPr sectPr = doc.getDocument().getBody().addNewSectPr();
XWPFHeaderFooterPolicy headerFooterPolicy = new XWPFHeaderFooterPolicy(doc, sectPr);
XWPFHeader header = headerFooterPolicy.createHeader(XWPFHeaderFooterPolicy.DEFAULT);
paragraph = header.createParagraph();
paragraph.setAlignment(ParagraphAlignment.LEFT);
CTTabStop tabStop = paragraph.getCTP().getPPr().addNewTabs().addNewTab();
tabStop.setVal(STTabJc.RIGHT);
int twipsPerInch = 1440;
tabStop.setPos(BigInteger.valueOf(6 * twipsPerInch));
run = paragraph.createRun();
run.setText("The Header:");
run.addTab();
run = paragraph.createRun();
String imgFile="Koala.png";
run.addPicture(new FileInputStream(imgFile), XWPFDocument.PICTURE_TYPE_PNG, imgFile, Units.toEMU(50), Units.toEMU(50));
// create footer start
XWPFFooter footer = headerFooterPolicy.createFooter(XWPFHeaderFooterPolicy.DEFAULT);
paragraph = footer.createParagraph();
paragraph.setAlignment(ParagraphAlignment.CENTER);
run = paragraph.createRun();
run.setText("The Footer:");
doc.write(new FileOutputStream("test.docx"));
}
}
I am trying to merge two documents lets say
Document 1: Merger1.doc
Document 2: Merger2.doc
I would like to store it into a new file doc2.docx.
I have used this piece of code to do this, but it is throwing some error.
CODE:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.CharacterRun;
import org.apache.poi.hwpf.usermodel.Range;
public class MergerFiles {
public static void main (String[] args) throws Exception {
// POI apparently can't create a document from scratch,
// so we need an existing empty dummy document
HWPFDocument doc = new HWPFDocument(new FileInputStream("C:\\Users\\pallavi123\\Desktop\\Merger1.docx"));
Range range = doc.getRange();
//I can get the entire Document and insert it in the tmp.doc
//However any formatting in my word document is lost.
HWPFDocument doc2 = new HWPFDocument(new FileInputStream("C:\\Users\\pallavi123\\Desktop\\Merger2.docx"));
Range range2 = doc2.getRange();
range.insertAfter(range2.text());
//I can get the information (text only) for each character run/paragraph or section.
//Again any formatting in my word document is lost.
HWPFDocument doc3 = new HWPFDocument(new FileInputStream("D:\\doc2.docx"));
Range range3 = doc3.getRange();
for(int i=0;i<range3.numCharacterRuns();i++){
CharacterRun run3 = range3.getCharacterRun(i);
range.insertAfter(run3.text());
}
OutputStream out = new FileOutputStream("D:\\result.doc");
doc.write(out);
out.flush();
out.close();
}
}
ERROR CODE:
Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:108)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:120)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:133)
at MergerFiles.main(MergerFiles.java:11)
Am i missing any jar file or the way am using code is wrong. Need your valuable suggestions.
Thanks in Advance.
I've developed the next class:
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
public class WordMerge {
private final OutputStream result;
private final List<InputStream> inputs;
private XWPFDocument first;
public WordMerge(OutputStream result) {
this.result = result;
inputs = new ArrayList<>();
}
public void add(InputStream stream) throws Exception{
inputs.add(stream);
OPCPackage srcPackage = OPCPackage.open(stream);
XWPFDocument src1Document = new XWPFDocument(srcPackage);
if(inputs.size() == 1){
first = src1Document;
} else {
CTBody srcBody = src1Document.getDocument().getBody();
first.getDocument().addNewBody().set(srcBody);
}
}
public void doMerge() throws Exception{
first.write(result);
}
public void close() throws Exception{
result.flush();
result.close();
for (InputStream input : inputs) {
input.close();
}
}
}
And its use:
public static void main(String[] args) throws Exception {
FileOutputStream faos = new FileOutputStream("/home/victor/result.docx");
WordMerge wm = new WordMerge(faos);
wm.add( new FileInputStream("/home/victor/001.docx") );
wm.add( new FileInputStream("/home/victor/002.docx") );
wm.doMerge();
wm.close();
}
I have a suggestion!
First the main method; the parameters are: test1=firstDocxFileName, test2=secondDocxFileName, dest=destinationFileName; document is a global variable;
public void mergeDocx(String test1, String test2, String dest){
try {
XWPFDocument doc1 = new XWPFDocument(new FileInputStream(new File(test1)));
XWPFDocument doc2 = new XWPFDocument(new FileInputStream(new File(test2)));
document = new XWPFDocument();
passaElementi(doc1);
passaElementi(doc2);
passaStili(doc1,doc2);
OutputStream out = new FileOutputStream(new File(dest));
document.write(out);
out.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The private method 'passaElementi'copies and paste the body elements from doc1 to document object;I don't know what is XWPFSDT object...; (pay attention: i don't copy all the document but only the body!! .. for headers, sections, footers it proceed similarly) (the integer variables i and j are global and 0 at the beginning obviously)
private void passaElementi(XWPFDocument doc1){
for(IBodyElement e : doc1.getBodyElements()){
if(e instanceof XWPFParagraph){
XWPFParagraph p = (XWPFParagraph) e;
if(p.getCTP().getPPr()!=null && p.getCTP().getPPr().getSectPr()!=null){
continue;
}else{
document.createParagraph();
document.setParagraph(p, i);
i++;
}
}else if(e instanceof XWPFTable){
XWPFTable t = (XWPFTable)e;
document.createTable();
document.setTable(j, t);
j++;
}else if(e instanceof XWPFSDT){
// boh!
}
}
}
The private method 'passaStili' copies and paste styles from doc1 and doc2 to document object;
private void passaStili(XWPFDocument doc1, XWPFDocument doc2){
try {
CTStyles c1 = doc1.getStyle();
CTStyles c2 = doc2.getStyle();
int size1 = c1.getStyleList().size();
int size2 = c2.getStyleList().size();
for(int i = 0; i<size2; i++ ){
c1.addNewStyle();
c1.setStyleArray(size1+i, c2.getStyleList().get(i));
}
document.createStyles().setStyles(c1);
} catch (XmlException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
I don't handle exceptions to be fast!
Leave a like if you liked it!
Best regards!
B.M.
You should use XWPFDocument instead of HWPFDocument.
The documentation states:
The partner to HWPF for the new Word 2007 .docx format is XWPF. Whilst HWPF and XWPF provide similar features, there is not a common interface across the two of them at this time.
Change your code to:
XWPFDocument doc = new XWPFDocument(new FileInputStream("..."));
XWPFDocument doc2 = new XWPFDocument(new FileInputStream("..."));
XWPFDocument doc3 = new XWPFDocument(new FileInputStream("..."));
when you use HWPFDocument,should use doc file (not docx)