Issues converting docx to PDF in java

Issues converting docx to PDF in java - java

I am trying to convert a word docx file to a pdf file, all using java, and without any user interaction.
This is my code so far, I am using the docx4j library.
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package richard.fileupload;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.docx4j.convert.out.pdf.PdfConversion;
import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
/**
*
* #author Richard
*/
public class pdfConverter {
public static void main(String[] args) {
createPDF();
createPDF();
}
private static void createPDF() {
try {
long start = System.currentTimeMillis();
// 1) Load DOCX into WordprocessingMLPackage
InputStream is = new FileInputStream(new File(
"D:/HelloWorld.docx"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
// 2) Prepare Pdf settings
PdfSettings pdfSettings = new PdfSettings();
// 3) Convert WordprocessingMLPackage to Pdf
OutputStream out = new FileOutputStream(new File(
"D:/HelloWorld.pdf"));
PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
wordMLPackage);
converter.output(out, pdfSettings);
System.err.println("Generate pdf/HelloWorld.pdf with "
+ (System.currentTimeMillis() - start) + "ms");
} catch (Throwable e) {
e.printStackTrace();
}
}
}
However I am getting this error when I try to run, it compiles just fine
run:
java.lang.NoClassDefFoundError: org/apache/log4j/Logger
at org.docx4j.openpackaging.Base.<clinit>(Base.java:42)
at richard.fileupload.pdfConverter.createPDF(pdfConverter.java:32)
at richard.fileupload.pdfConverter.main(pdfConverter.java:21)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.Logger
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 3 more
java.lang.NoClassDefFoundError: Could not initialize class org.docx4j.openpackaging.packages.WordprocessingMLPackage
at richard.fileupload.pdfConverter.createPDF(pdfConverter.java:32)
at richard.fileupload.pdfConverter.main(pdfConverter.java:22)
BUILD SUCCESSFUL (total time: 0 seconds)
any idea what the error is and what is the cause of this?
EDIT I am now getting this error
java.lang.NoClassDefFoundError: org/apache/fop/apps/FopFactory
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:231)
at richard.fileupload.pdfConverter.createPDF(pdfConverter.java:43)
at richard.fileupload.pdfConverter.main(pdfConverter.java:22)

Related

Apache POI - DOCX To PDF Conversion

I am trying to convert a docx file into pdf file using POI. Getting following error.
Using poi-3.17 ,
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class WordToPDF {
public static void main(String[] args) {
WordToPDF cwoWord = new WordToPDF();
System.out.println("Start");
cwoWord.ConvertToPDF("D:\\2067536.docx", "D:\\2067536.pdf");
}
public void ConvertToPDF(String docPath, String pdfPath) {
try {
InputStream doc = new FileInputStream(new File(docPath));
XWPFDocument document = new XWPFDocument(doc);
document.createStyles();
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(pdfPath));
PdfConverter.getInstance().convert(document, out, options);
System.out.println("Done");
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
Here is the Error happening
Exception in thread "main" org.apache.poi.xwpf.converter.core.XWPFConverterException: java.lang.NullPointerException
at org.apache.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:70)
at org.apache.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:38)
at org.apache.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:45)
at WordToPDF.ConvertToPDF(WordToPDF.java:27)
at WordToPDF.main(WordToPDF.java:17)
Caused by: java.lang.NullPointerException
at org.apache.poi.xwpf.converter.pdf.internal.PdfMapper.visitHeader(PdfMapper.java:178)
at org.apache.poi.xwpf.converter.pdf.internal.PdfMapper.visitHeader(PdfMapper.java:111)
at org.apache.poi.xwpf.converter.core.XWPFDocumentVisitor.visitHeaderRef(XWPFDocumentVisitor.java:1142)
at org.apache.poi.xwpf.converter.core.MasterPageManager.visitHeadersFooters(MasterPageManager.java:213)
at org.apache.poi.xwpf.converter.core.MasterPageManager.addSection(MasterPageManager.java:180)
at org.apache.poi.xwpf.converter.core.MasterPageManager.compute(MasterPageManager.java:127)
at org.apache.poi.xwpf.converter.core.MasterPageManager.initialize(MasterPageManager.java:90)
at org.apache.poi.xwpf.converter.core.XWPFDocumentVisitor.visitBodyElements(XWPFDocumentVisitor.java:232)
at org.apache.poi.xwpf.converter.core.XWPFDocumentVisitor.start(XWPFDocumentVisitor.java:199)
at org.apache.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:56)
... 4 more
As this is a null pointer error I am unable to understand what exactly the issue might be, any help is appreciated. Thank you.

Libre Office Saved my life, Simple one liner command for docx to pdf conversion works like a charm.
Detailed answer here
Command `libreoffice --headless --convert-to pdf test.docx --outdir /pdf` is not working

Reading a .docx file with Apache POI

I want to read and print out a whole .docx file into the console for now.
I read that you cannot do it without Apache POI or Docx4J, I tried both and failed twice.
Also I am aware that this question already exists on Stackoverflow but I am afraid it might be outdated.
This is my code with Apache POI right now.
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class test {
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (int i = 0; i < paragraphs.size(); i++) {
System.out.println(paragraphs.get(i).getParagraphText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readDocxFile("C:\\Basics.docx");
}
}
It was taken from another question on here but, it does not work.
I get following Error message:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/compress/archivers/zip/ZipFile
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:307)
at org.apache.poi.ooxml.util.PackageHelper.open(PackageHelper.java:37)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:142)
at test.readDocxFile(test.java:16)
at test.main(test.java:28)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.compress.archivers.zip.ZipFile
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 5 more

It's due to a library that is not directly included in POI.
If you use maven add the following dependency to your project :
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>

Getting error while trying to copy a picture in doc file

I am getting below error while trying to copy a pic in a do file through selenium.
This is the error which I am getting -
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at LeadFreeTest.docCapture.main(docCapture.java:17)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
Below is code
package LeadFreeTest;
import java.io.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class docCapture {
#SuppressWarnings("resource")
public static void main(String[] args) throws IOException, InvalidFormatException
{
XWPFDocument docx = new XWPFDocument();
XWPFParagraph par = docx.createParagraph();
XWPFRun run = par.createRun();
run.setText("Hello, World. This is my first java generated docx-file. Have fun.");
run.setFontSize(13);
InputStream pic = new FileInputStream("C:\\Naveeen\\TestScreenShot\\LoginPage.png");
//byte [] picbytes = IOUtils.toByteArray(pic);
//run.addPicture(picbytes, Document.PICTURE_TYPE_JPEG);
run.addPicture(pic, Document.PICTURE_TYPE_JPEG, "3", 0, 0);
FileOutputStream out = new FileOutputStream("C:\\Naveeen\\TestScreenShot\\LoginPage.doc");
docx.write(out);
out.close();
pic.close();
}
}

You need to add the XML beans dependency to your classpath hense the
ClassNotFoundException: org.apache.xmlbeans.XmlException
The library is usually called xmlbeans-x.x.x.jar
You can find it here.

NoClassDefFoundError error in crawljax

import java.io.File;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import org.apache.commons.io.FileUtils;
import com.crawljax.browser.EmbeddedBrowser.BrowserType;
import com.crawljax.condition.NotXPathCondition;
import com.crawljax.core.CrawljaxRunner;
import com.crawljax.core.configuration.BrowserConfiguration;
import com.crawljax.core.configuration.CrawljaxConfiguration;
import com.crawljax.core.configuration.CrawljaxConfiguration.CrawljaxConfigurationBuilder;
import com.crawljax.core.configuration.Form;
import com.crawljax.core.configuration.InputSpecification;
/**
* Example of running Crawljax with the CrawlOverview plugin on a single-page web app. The crawl
* will produce output using the {#link CrawlOverview} plugin.
*/
class app {
private static final long WAIT_TIME_AFTER_EVENT = 200;
private static final long WAIT_TIME_AFTER_RELOAD = 20;
private static final String URL = "http://demo.crawljax.com";
/**
* Run this method to start the crawl.
*
* #throws IOException
* when the output folder cannot be created or emptied.
*/
public static void main(String[] args) throws IOException {
CrawljaxConfigurationBuilder builder = CrawljaxConfiguration.builderFor(URL);
builder.crawlRules().insertRandomDataInInputForms(false);
// click these elements
builder.crawlRules().clickDefaultElements();
builder.crawlRules().click("div");
builder.setMaximumStates(10);
builder.setMaximumDepth(3);
builder.crawlRules().clickElementsInRandomOrder(true);
// Set timeouts
builder.crawlRules().waitAfterReloadUrl(WAIT_TIME_AFTER_RELOAD, TimeUnit.MILLISECONDS);
builder.crawlRules().waitAfterEvent(WAIT_TIME_AFTER_EVENT, TimeUnit.MILLISECONDS);
// We want to use two browsers simultaneously.
builder.setBrowserConfig(new BrowserConfiguration(BrowserType.FIREFOX, 1));
CrawljaxRunner crawljax;
crawljax = new CrawljaxRunner(builder.build());
crawljax.call();
}
private static InputSpecification getInputSpecification() {
InputSpecification input = new InputSpecification();
Form contactForm = new Form();
contactForm.field("male").setValues(true, false);
contactForm.field("female").setValues(false, true);
contactForm.field("name").setValues("Bob", "Alice", "John");
contactForm.field("phone").setValues("1234567890", "1234888888", "");
contactForm.field("mobile").setValues("123", "3214321421");
contactForm.field("type").setValues("Student", "Teacher");
contactForm.field("active").setValues(true); input.setValuesInForm(contactForm).beforeClickElement("button").withText("Save");
return input;
}
}
this is a code for crawljax, it give me error at when i run it, error is
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/ImmutableList
at com.crawljax.core.configuration.CrawljaxConfiguration$CrawljaxConfigurationBuilder.<init>(CrawljaxConfiguration.java:28)
at com.crawljax.core.configuration.CrawljaxConfiguration$CrawljaxConfigurationBuilder.<init>(CrawljaxConfiguration.java:26)
at com.crawljax.core.configuration.CrawljaxConfiguration.builderFor(CrawljaxConfiguration.java:217)
at app.main(NewClass.java:34)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.ImmutableList
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 4 more

Download and add Guava library on to your classpath:
Guava Library download

How to read docx file content in java api using poi jar

I have done reading doc file now i'm trying to read docx file content. when i searched for sample code i found many, nothing worked. check the code for reference...
import java.io.*;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
public class createPdfForDocx {
public static void main(String[] args) {
InputStream fs = null;
Document document = new Document();
XWPFWordExtractor extractor = null ;
try {
fs = new FileInputStream("C:\\DATASTORE\\test.docx");
//XWPFDocument hdoc=new XWPFDocument(fs);
XWPFDocument hdoc=new XWPFDocument(OPCPackage.open(fs));
//XWPFDocument hdoc=new XWPFDocument(fs);
extractor = new XWPFWordExtractor(hdoc);
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/test.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
String fileData=extractor.getText();
System.out.println(fileData);
document.add(new Paragraph(fileData));
System.out.println(" pdf document created");
} catch(IOException e) {
System.out.println("IO Exception");
e.printStackTrace();
} catch(Exception ex) {
ex.printStackTrace();
}finally {
document.close();
}
}//end of main()
}//end of class
For the above code i'm getting following Exception:
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:107)
at pagecode.createPdfForDocx.main(createPdfForDocx.java:20)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:521)
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58)
... 4 more
Caused by: java.lang.NoSuchMethodError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTStyles.getStyleList()Ljava/util/List;
at org.apache.poi.xwpf.usermodel.XWPFStyles.onDocumentRead(XWPFStyles.java:78)
at org.apache.poi.xwpf.usermodel.XWPFStyles.<init>(XWPFStyles.java:59)
... 9 more
Please help
Thank you

This is covered in the Apache POI FAQ! The entry you want is I'm using the poi-ooxml-schemas jar, but my code is failing with "java.lang.NoClassDefFoundError: org/openxmlformats/schemas/something"
The short answer is to switch the poi-ooxml-schemas jar for the full ooxml-schemas-1.1 jar. The full answer is given in the FAQ

For reading excels or docx file if you want to solve errors you need to add all jars then you wont get any error.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Issues converting docx to PDF in java - java

Related

Apache POI - DOCX To PDF Conversion

Reading a .docx file with Apache POI

Getting error while trying to copy a picture in doc file

NoClassDefFoundError error in crawljax

How to read docx file content in java api using poi jar

Categories

Resources