Storing MS Word XML data into XWPFDocument ( apache POI)

Storing MS Word XML data into XWPFDocument ( apache POI) - java

Can anyone help , I've been trying to store the MS Word format XML data into XWPFDocument of Apache Poi
But it is giving me error
here is the code
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class WordMerger {
public static void main(String[] args) {
try {
Open the first document
XWPFDocument doc1 = new XWPFDocument(new FileInputStream("document1.xml"));
Open the second document
XWPFDocument doc2 = new XWPFDocument(new FileInputStream("document2.xml"));
Iterate through the paragraphs of the second document
for (XWPFParagraph p : doc2.getParagraphs()) {
Create a new paragraph in the first document
XWPFParagraph newParagraph = doc1.createParagraph();
Iterate through the runs of the current paragraph in the second document
for (XWPFRun r : p.getRuns()) {
Create a new run in the new paragraph of the first document
XWPFRun newRun = newParagraph.createRun();
Copy the text and formatting of the current run to the new run
newRun.setText(r.getText(0));
newRun.setBold(r.isBold());
newRun.setItalic(r.isItalic());
newRun.setUnderline(r.getUnderline());
newRun.setColor(r.getColor());
newRun.setFontFamily(r.getFontFamily());
newRun.setFontSize(r.getFontSize());
}
}
Save the merged document
doc1.write(new FileOutputStream("merged_document.xml"));
Close the documents
doc1.close();
doc2.close();
System.out.println("Documents merged successfully!");
catch (IOException e) {
e.printStackTrace();
}
}
}
is their any way i can read the file and store in it

Related

PDF Producer properties getting changed when I use merge of Itext7 for FOP Generated PDF Statements

I've PDF Statements getting generated using Apache FOP 2.6. For those statements when I go to the properties, it tells the producer is Apache FOP like the one in the below screen
Now I want to merge those statements to create a single PDF File. For that I used PdfMerger from Itext7. But when I see the properties, the producer property changes to Itext7 like the one in the below screen
Here's the code I've used.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.kernel.utils.PdfMerger;
public class ApacheFOPExample {
public static void main(String[] args) {
mergePdfs();
}
public static void mergePdfs() {
try {
PdfDocument pdf = new PdfDocument(new PdfWriter(new File("D:\\Workspace\\Itext\\xsl_employee_merged.pdf")));
PdfMerger merger = new PdfMerger(pdf);
//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(new File("D:\\Workspace\\Itext\\xsl_employee.pdf")));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());
//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(new File("D:\\Workspace\\Itext\\xsl_employee_another.pdf")));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
I want to merge or copy multiple multiple statements as a single PDF but with the properties intact. How can I achieve it? To merge, is there any alternative way that I can follow to achieve this?
Note: I can't use Itext5 classes to do this.
Thank You

How to set margin for whole document and all sections by apache poi

I want to edit whole document page margin by apache-poi and I want all sections to be changed. This is my code:
XWPFDocument docx = new XWPFDocument(OPCPackage.open("template.docx"));
CTSectPr sectPr = docx.getDocument().getBody().getSectPr();
CTPageMar pageMar = sectPr.getPgMar();
pageMar.setLeft(BigInteger.valueOf(1200L));
pageMar.setTop(BigInteger.valueOf(500L));
pageMar.setRight(BigInteger.valueOf(800L));
pageMar.setBottom(BigInteger.valueOf(1440L));
docx.write(new FileOutputStream("test2.docx"));
But only the latest section is changed, not all sections and not whole document.
What should I do to change all sections' margin and whole document's margin?

If the document is separated into sections then the SectPrs for the first sections are in PPr elements within the paragraphs which are the section separators. Only the SectPr for the last section is directly within the Body. So we need looping through all paragraphs to get all SectPrs.
Example:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPageMar;
import java.util.List;
import java.util.ArrayList;
import java.math.BigInteger;
public class WordGetAllSectPr {
public static List<CTSectPr> getAllSectPr(XWPFDocument document) {
List<CTSectPr> allSectPr = new ArrayList<>();
for (XWPFParagraph paragraph : document.getParagraphs()) {
if (paragraph.getCTP().getPPr() != null && paragraph.getCTP().getPPr().getSectPr() != null) {
allSectPr.add(paragraph.getCTP().getPPr().getSectPr());
}
}
allSectPr.add(document.getDocument().getBody().getSectPr());
return allSectPr;
}
public static void main(String[] args) throws Exception {
XWPFDocument docx = new XWPFDocument(new FileInputStream("template.docx"));
List<CTSectPr> allSectPr = getAllSectPr(docx);
System.out.println(allSectPr.size());
for (CTSectPr sectPr : allSectPr) {
CTPageMar pageMar = sectPr.getPgMar();
pageMar.setLeft(BigInteger.valueOf(1200L));
pageMar.setTop(BigInteger.valueOf(500L));
pageMar.setRight(BigInteger.valueOf(800L));
pageMar.setBottom(BigInteger.valueOf(1440L));
}
docx.write(new FileOutputStream("test2.docx"));
docx.close();
}
}

Put image in word document

I am trying to create a word document with apache poi which will contain a jpeg picture. I ve found code to do so from here stackoverflow. However, when I run the code a docx is created, it seems with its size that contains the jpg image but I couldn't open it.
My code is the following:
import org.apache.poi.util.Units;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.BreakType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
public class SimpleImages {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument();
XWPFParagraph p = doc.createParagraph();
XWPFRun r = p.createRun();
//for(String imgFile : args) {
String imgFile = "mosaic.jpg";
int format =XWPFDocument.PICTURE_TYPE_JPEG;
r.setText(imgFile);
r.addBreak();
r.addPicture(new FileInputStream(imgFile), format, imgFile, Units.toEMU(200), Units.toEMU(200)); // 200x200 pixels
r.addBreak(BreakType.PAGE);
//}
FileOutputStream out = new FileOutputStream("images.docx");
doc.write(out);
out.close();
}
}
When I tried to open my docx I am receiving:
the file file.docx cannot be opened because there are problems with
the contents
.

I had the same problem but its got resolved. Previously i was using poi 3.10 version and that was culprit for the issue. I just updated it to 3.12 and issue got resolved

Insert piece of .doc .docx file to another by using the Apache POI HWPF or XWPF

Can somebody help me to integrate some MS Word document to another.
I can open, edit and save, but only with one MS Word document.
My simple code only creates, edits and saves .docx:
import java.io.FileOutputStream;
import org.apache.poi.xwpf.usermodel.*;
public class SimpleDocument {
public void SimpleDocument() throws Exception {
XWPFDocument doc = new XWPFDocument();
XWPFParagraph p1 = doc.createParagraph();
p1.setAlignment(ParagraphAlignment.CENTER);
p1.setAlignment(ParagraphAlignment.LEFT);//setVerticalAlignment(TextAlignment.TOP);
XWPFRun r1 = p1.createRun();
r1.setBold(true);
r1.setText("The quick brown fox");
r1.setFontFamily("Courier");
r1.setUnderline(UnderlinePatterns.DOT_DOT_DASH);
XWPFParagraph p2 = doc.createParagraph();
p2.setAlignment(ParagraphAlignment.RIGHT);
XWPFRun r2 = p2.createRun();
r2.setText("jumped over the lazy dog");
FileOutputStream out = new FileOutputStream("C:/simple.docx");
doc.write(out);
out.close();
}
}
How to combine two pieces of formatted text (RANGE, PARAGRAPH) ?

try the following code:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.*;
public class test {
public static void main(String[] args) throws Exception {
// POI apparently can't create a document from scratch,
// so we need an existing empty dummy document
HWPFDocument doc = new HWPFDocument(new FileInputStream("D:\\src.doc"));
Range range = doc.getRange();
CharacterRun run = range
.insertAfter("Text After copied file contents!");
run.setBold(true);
OutputStream out = new FileOutputStream("D:\\result.doc");
doc.write(out);
out.flush();
out.close();
}
}

Convert DOC file to DOCX with Java

I need to use DOCX files (actually the XML contained in them) in a Java software I'm currently developing, but some people in my company still use the DOC format.
Do you know if there is a way to convert a DOC file to the DOCX format using Java ? I know it's possible using C#, but that's not an option
I googled it, but nothing came up...
Thanks

You may try Aspose.Words for Java. It allows you to load a DOC file and save it as DOCX format. The code is very simple as shown below:
// Open a document.
Document doc = new Document("input.doc");
// Save document.
doc.save("output.docx");
Please see if this helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.

Check out JODConverter to see if it fits the bill. I haven't personally used it.

Use newer versions of jars jodconverter-core-4.2.2.jar and jodconverter-local-4.2.2.jar
String inputFile = "*.doc";
String outputFile = "*.docx";
LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
.install()
.officeHome(getDefaultOfficeHome()) //your path to openoffice
.build();
try {
localOfficeManager.start();
final DocumentFormat format
= DocumentFormat.builder()
.from(DefaultDocumentFormatRegistry.DOCX)
.build();
LocalConverter
.make()
.convert(new FileInputStream(new File(inputFile)))
.as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
.to(new File(outputFile))
.as(format)
.execute();
} catch (OfficeException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
} catch (FileNotFoundException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
} finally {
OfficeUtils.stopQuietly(localOfficeManager);
}

JODConvertor calls OpenOffice/LibreOffice via a network protocol. It can therefore 'do anything you can do in OpenOffice'. This includes converting formats. But it only does as good a job as whatever version of OpenOffice you are running. I have some art in one of my docs, and it doesn't convert them as I hoped.
JODConvertor is no longer supported, according to the google code web site for v3.
To get JOD to do the job you need to do something like
private static void transformBinaryWordDocToDocX(File in, File out)
{
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
docx.setStoreProperties(DocumentFamily.TEXT,
Collections.singletonMap("FilterName", "MS Word 2007 XML"));
converter.convert(in, out, docx);
}
private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
w2003xml.setInputFamily(DocumentFamily.TEXT);
w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
converter.convert(in, out, w2003xml);
}
private static OfficeManager officeManager;
#BeforeClass
public static void setupStatic() throws IOException {
/*officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome("C:/Program Files/LibreOffice 3.6")
.buildOfficeManager();
*/
officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();
officeManager.start();
}
#AfterClass
public static void shutdownStatic() throws IOException {
officeManager.stop();
}
For this to work you need to be running LibreOffice as a networked server ( I could not get the 'run on demand' part of JODConvertor to work under windows with LO 3.6 very well )

To convert DOC file to HTML look at this
(Convert Word doc to HTML programmatically in Java)
Use this: http://poi.apache.org/
Or use this :
XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));
XWPFWordExtractor wx = new XWPFWordExtractor(docx);
String text = wx.getText();
System.out.println("text = "+text);

I needed the same conversion ,after researching a lot found Jodconvertor can be useful in it , you can download the jar from
https://code.google.com/p/jodconverter/downloads/list
Add jodconverter-core-3.0-beta-4-sources.jar file to your project lib
//1) Create OfficeManger Object
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome(new File("/opt/libreoffice4.4"))
.buildOfficeManager();
officeManager.start();
// 2) Create JODConverter converter
OfficeDocumentConverter converter = new OfficeDocumentConverter(
officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
docx.setStoreProperties(DocumentFamily.TEXT,
Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
"docx/AdvancedTable.docx"), docx);

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class TestCon {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx"));
System.out.println("Document testing completed");
} catch (Exception e) {
System.out.println("Exception during test");
e.printStackTrace();
} finally {
// close the document
document.close();
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Storing MS Word XML data into XWPFDocument ( apache POI) - java

Related

PDF Producer properties getting changed when I use merge of Itext7 for FOP Generated PDF Statements

How to set margin for whole document and all sections by apache poi

Put image in word document

Insert piece of .doc .docx file to another by using the Apache POI HWPF or XWPF

Convert DOC file to DOCX with Java

Categories

Resources