Apache POI - DOCX To PDF Conversion

Apache POI - DOCX To PDF Conversion - java

I am trying to convert a docx file into pdf file using POI. Getting following error.
Using poi-3.17 ,
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class WordToPDF {
public static void main(String[] args) {
WordToPDF cwoWord = new WordToPDF();
System.out.println("Start");
cwoWord.ConvertToPDF("D:\\2067536.docx", "D:\\2067536.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
document.createStyles();
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
System.out.println("Done");
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Here is the Error happening
Exception in thread "main" org.apache.poi.xwpf.converter.core.XWPFConverterException: java.lang.NullPointerException
at org.apache.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:70)
at org.apache.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:38)
at org.apache.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:45)
at WordToPDF.ConvertToPDF(WordToPDF.java:27)
at WordToPDF.main(WordToPDF.java:17)
Caused by: java.lang.NullPointerException
at org.apache.poi.xwpf.converter.pdf.internal.PdfMapper.visitHeader(PdfMapper.java:178)
at org.apache.poi.xwpf.converter.pdf.internal.PdfMapper.visitHeader(PdfMapper.java:111)
at org.apache.poi.xwpf.converter.core.XWPFDocumentVisitor.visitHeaderRef(XWPFDocumentVisitor.java:1142)
at org.apache.poi.xwpf.converter.core.MasterPageManager.visitHeadersFooters(MasterPageManager.java:213)
at org.apache.poi.xwpf.converter.core.MasterPageManager.addSection(MasterPageManager.java:180)
at org.apache.poi.xwpf.converter.core.MasterPageManager.compute(MasterPageManager.java:127)
at org.apache.poi.xwpf.converter.core.MasterPageManager.initialize(MasterPageManager.java:90)
at org.apache.poi.xwpf.converter.core.XWPFDocumentVisitor.visitBodyElements(XWPFDocumentVisitor.java:232)
at org.apache.poi.xwpf.converter.core.XWPFDocumentVisitor.start(XWPFDocumentVisitor.java:199)
at org.apache.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:56)
... 4 more
As this is a null pointer error I am unable to understand what exactly the issue might be, any help is appreciated. Thank you.

Libre Office Saved my life, Simple one liner command for docx to pdf conversion works like a charm.
Detailed answer here
Command `libreoffice --headless --convert-to pdf test.docx --outdir /pdf` is not working

Related

Reading a .docx file with Apache POI

I want to read and print out a whole .docx file into the console for now.
I read that you cannot do it without Apache POI or Docx4J, I tried both and failed twice.
Also I am aware that this question already exists on Stackoverflow but I am afraid it might be outdated.
This is my code with Apache POI right now.
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class test {
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (int i = 0; i < paragraphs.size(); i++) {
System.out.println(paragraphs.get(i).getParagraphText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readDocxFile("C:\\Basics.docx");
}
}
It was taken from another question on here but, it does not work.
I get following Error message:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/compress/archivers/zip/ZipFile
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307)
at org.apache.poi.ooxml.util.PackageHelper.open(PackageHelper.java:37)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:142)
at test.readDocxFile(test.java:16)
at test.main(test.java:28)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.compress.archivers.zip.ZipFile
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 5 more

It's due to a library that is not directly included in POI.
If you use maven add the following dependency to your project :
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>

i get an error when i run this code

package demo;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.openxml4j.opc.*;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class DocxToPdf {
public static void main(String[] args){
try
{
String inputFile = "F:\\MY WORK\\CollectionPractice\\WebContent\\APCR1.docx";
String outputFile = "F:\\MY WORK\\CollectionPractice\\WebContent\\APCR1.pdf";
System.out.println("inputFile:" + inputFile + ",outputFile:" + outputFile);
FileInputStream in = new FileInputStream(inputFile);
XWPFDocument document = new XWPFDocument(in);
File outFile = new File(outputFile);
OutputStream out = new FileOutputStream(outFile);
PdfOptions options = null;
PdfConverter.getInstance().convert(document, out, options);
} catch (Exception e) {
e.printStackTrace();
}
}
}
when i run this code an error occur like these and i have used following jar files also.
error:
java.lang.NoSuchMethodError: org.apache.poi.POIXMLDocumentPart.getPackageRelationship()Lorg/apache/poi/openxml4j/opc/PackageRelationship;
jars:
List of jar files

You likely have jar-versions of POI mixed up. The error indicates that the class that was loaded did not have a method that the calling class saw during compilation, so you have a different version of POI in your classpath.
See "Component Map" at https://poi.apache.org/overview.html for the different components that are included and which jars they end up, make sure you only have one of these jars in your classpath, not multiple different versions.

How to read docx file content in java api using poi jar

I have done reading doc file now i'm trying to read docx file content. when i searched for sample code i found many, nothing worked. check the code for reference...
import java.io.*;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
public class createPdfForDocx {
public static void main(String[] args) {
InputStream fs = null;
Document document = new Document();
XWPFWordExtractor extractor = null ;
try {
fs = new FileInputStream("C:\\DATASTORE\\test.docx");
//XWPFDocument hdoc=new XWPFDocument(fs);
XWPFDocument hdoc=new XWPFDocument(OPCPackage.open(fs));
//XWPFDocument hdoc=new XWPFDocument(fs);
extractor = new XWPFWordExtractor(hdoc);
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/test.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
String fileData=extractor.getText();
System.out.println(fileData);
document.add(new Paragraph(fileData));
System.out.println(" pdf document created");
} catch(IOException e) {
System.out.println("IO Exception");
e.printStackTrace();
} catch(Exception ex) {
ex.printStackTrace();
}finally {
document.close();
}
}//end of main()
}//end of class
For the above code i'm getting following Exception:
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:107)
at pagecode.createPdfForDocx.main(createPdfForDocx.java:20)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:521)
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58)
... 4 more
Caused by: java.lang.NoSuchMethodError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTStyles.getStyleList()Ljava/util/List;
at org.apache.poi.xwpf.usermodel.XWPFStyles.onDocumentRead(XWPFStyles.java:78)
at org.apache.poi.xwpf.usermodel.XWPFStyles.<init>(XWPFStyles.java:59)
... 9 more
Please help
Thank you

This is covered in the Apache POI FAQ! The entry you want is I'm using the poi-ooxml-schemas jar, but my code is failing with "java.lang.NoClassDefFoundError: org/openxmlformats/schemas/something"
The short answer is to switch the poi-ooxml-schemas jar for the full ooxml-schemas-1.1 jar. The full answer is given in the FAQ

For reading excels or docx file if you want to solve errors you need to add all jars then you wont get any error.

How to handle this error on kabeja?

I need to generate SVG from given DXF file. I try to archive that by using kabeja package. This is the code that they gave on their web page.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import org.xml.sax.ContentHandler;
import org.kabeja.dxf.DXFDocument;
import org.kabeja.parser.DXFParseException;
import org.kabeja.parser.Parser;
import org.kabeja.parser.ParserBuilder;
import org.kabeja.svg.SVGGenerator;
import org.kabeja.xml.SAXGenerator;
public class MyClass{
public MyClass(){
...
}
public void parseFile(String sourceFile) {
Parser parser = ParserBuilder.createDefaultParser();
try {
parser.parse(new FileInputStream(sourceFile));
DXFDocument doc = parser.getDocument();
//the SVG will be emitted as SAX-Events
//see org.xml.sax.ContentHandler for more information
ContentHandler myhandler = new ContentHandlerImpl();
//the output - create first a SAXGenerator (SVG here)
SAXGenerator generator = new SVGGenerator();
//setup properties
generator.setProperties(new HashMap());
//start the output
generator.generate(doc,myhandler);
} catch (DXFParseException e) {
e.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
Hear is the code that provided by the kabeja development group on sourceforge web site. But in above code I noticed that some of the classes are missing on their new package. for example
ContentHandler myhandler = new ContentHandlerImpl();
In this line it create contentHandlerImpl object but with new kabeja package it dosn't have that class.So because of this it doesn't generate SVG file. So could some one explain me how to archive my target by using this package.

Try to read symbol ContentHandlerImpl not found from kabeja's forum

Reading MS Word 2007 using Java

I am trying to read a Microsoft word file through Java. I have included all the .jar files from Apache poi-3.8-beta1 to my classpath. However, when I try running this, I get the following exception:
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at readingmsword07.Main.main(Main.java:27)
Following is my code:
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class Main {
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("C:\\TrialDoc.docx");
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor =
new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am using the XWPFWordExtractor since I am trying to read a 2007 word document but for some reason I am unable to figure out the right POI that deals with this.
Any help is much appreciated. Thanks in advance!
~ Woods

remove the line,
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache POI - DOCX To PDF Conversion - java

Libre Office Saved my life, Simple one liner command for docx to pdf conversion works like a charm. Detailed answer here Command `libreoffice --headless --convert-to pdf test.docx --outdir /pdf` is not working

Related

Reading a .docx file with Apache POI

i get an error when i run this code

How to read docx file content in java api using poi jar

How to handle this error on kabeja?

Reading MS Word 2007 using Java

Categories

Resources