Convert docx file into PDF with Java

Convert docx file into PDF with Java - java

I'am looking for some "stable" method to convert DOCX file from MS WORD into PDF. Since now I have used OpenOffice installed as listener but it often hangs. The problem is that we have situations when many users want to convert SXW,DOCX files into PDF at the same time. Is there some other possibility? I tryed with examples from this site: https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ but the output result is not good (converted documents have errors and layout is quite modified).
here is "source" docx document:
here is document converted with docx4j with some exception text inside document. Also the text in upper right corner is missing.
this one is PDF created with OpenOffice as converter from docx to pdf. Some text is missing "upper right corner"
Is there some other option to convert docx into pdf with Java?

There are lot of methods to do conversion
One of the used method is using POI and DOCX4j
InputStream is = new FileInputStream(new File("your Docx PAth"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
List sections = wordMLPackage.getDocumentModel().getSections();
for (int i = 0; i < sections.size(); i++) {
wordMLPackage.getDocumentModel().getSections().get(i)
.getPageDimensions();
}
Mapper fontMapper = new IdentityPlusMapper();
PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
"Comic Sans MS");//set your desired font
fontMapper.getFontMappings().put("Algerian", font);
wordMLPackage.setFontMapper(fontMapper);
PdfSettings pdfSettings = new PdfSettings();
org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
wordMLPackage);
//To turn off logger
List<Logger> loggers = Collections.<Logger> list(LogManager
.getCurrentLoggers());
loggers.add(LogManager.getRootLogger());
for (Logger logger : loggers) {
logger.setLevel(Level.OFF);
}
OutputStream out = new FileOutputStream(new File("Your OutPut PDF path"));
conversion.output(out, pdfSettings);
System.out.println("DONE!!");
This works perfect and even tried on multiple DOCX files.

Related

Apache POI convert HTML/XHTML to DOC/DOCX

I need to transform HTML to a doc file, the HTML is filled with custom information and the images and CSS change depending on what is request.
I'm trying to use Apache POI for this, but I'm having an error
`
org.apache.poi.xwpf.converter.core.XWPFConverterException: java.lang.IllegalStateException: Expecting one Styles document part, but found 0
at org.apache.poi.xwpf.converter.xhtml.XHTMLConverter.convert(XHTMLConverter.java:72)
at org.apache.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:58)
at org.apache.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:38)
at org.apache.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:45)
My code is this:
// Load the HTML file
//Doc file
String htmlFile = "pathToHtml/file.html";
//String htmlFile = parseHTMLTemplate(disputeLetterDetails, template, fileExtension);
//new File(htmlFile);
//File file = new FileReader(htmlFile);
Path path = Path.of(htmlFile);
OutputStream in = new FileOutputStream(htmlFile, true);
// Create a new XWPFDocument
XWPFDocument document = new XWPFDocument();
// Set up the XHTML options
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("./images/")));
options.setExtractor(new FileImageExtractor(new File("./images/")));
// Convert the HTML to XWPFDocument
XHTMLConverter.getInstance().convert(document, in, options);
// Save the document to a .doc file
FileOutputStream out = new FileOutputStream("pathToHtml/OUT_from_XHTML_TEST.docx");
document.write(out);
out.close();
`
I want to get a docx file from an HTML file with the same styles but I'm getting this error `
org.apache.poi.xwpf.converter.core.XWPFConverterException: java.lang.IllegalStateException: Expecting one Styles document part, but found 0
at org.apache.poi.xwpf.converter.xhtml.XHTMLConverter.convert(XHTMLConverter.java:72)
at org.apache.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:58)
at org.apache.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:38)
at org.apache.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:45)
`

iText PDF display marathi, hindi and different languages in android

How to display different languages like Marathi, Hindi or any languages in pdf using ITEXT Pdf Library in android and java.

You can use any custom font for your language but I recommend Googles Nato Fonts which support wide variety of languages which you can easily get from here https://www.google.com/get/noto/.
Here are some fonts link for some languages:
Marathi & Hindi - https://www.google.com/get/noto/#sans-deva
Telugu - https://www.google.com/get/noto/#sans-telu
Arabic - https://www.google.com/get/noto/#sans-arab
You can search any language and get .tff file from there. And put in your resources folder. For Android put in assets folder and refer font file as assets/filename.ttf
Now here is the sample code to set marathi font -
File pdfFile = new File("marathi.pdf");
try {
PdfWriter pdfWriter = new PdfWriter(pdfFile);
PdfDocument pdfDocument = new PdfDocument(pdfWriter);
Document document = new Document(pdfDocument);
//font
final FontSet set = new FontSet();
set.addFont("assets/NotoSans-Regular.ttf");
document.setFontProvider(new FontProvider(set));
document.setProperty(Property.FONT, new String[]{"MyFontFamilyName"});
Paragraph paragraph = new Paragraph("अंतरिक्ष यान से दूर नीचे पृथ्वी शानदार ढंग से जगमगा रही थी ।");
document.add(paragraph);
document.close();
pdfDocument.close();
pdfWriter.close();
} catch (IOException e) {
e.printStackTrace();
}

Convert DOC [HWPFDocument] to pdf [with font, Table and images] using java

converting doc file to pdf
I am using the following code :
POIFSFileSystem fs = null;
Document Pdfdocument = new Document();
fs = new POIFSFileSystem(new FileInputStream(srcFile));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
PdfWriter writer = PdfWriter.getInstance(Pdfdocument, new
FileOutputStream(targetFile));
Pdfdocument.open();
writer.setPageEmpty(true);
Pdfdocument.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
Pdfdocument.add(new Paragraph(paragraphs[i]));
}
This generates a pdf without formatting and images
even fonts will be missing.
Since WordExtractor uses only text
is there any other way to convert with fonts and images.
Convertion form doc(HWPFDocument) but not on docx
I have referred these SO links
Convert doc to pdf using Apache POI
https://stackoverflow.com/a/6210694/6032482
how to convert doc,docx files to pdf in java programatically
and many more but found
they all use WordExtractor .
Note:
I can't use library office
nor
Aspose
Can it be done using:
ApachePOI
DOCX4j
itext

How to add text watermark to pdf in Java using Apache PDFBox?

I am not getting any tutorial for adding a text watermark in a PDF file? Can you all please guide me, I am very new to PDFBOX.
Its not duplicate, the link in the comment didn't help me. I want to add text, not an image to the pdf.

Here is an example using PDFBox 2.0.2. This will load a PDF and write some text in the bottom right corner in a red transparent font. If it is a multiple page PDF the watermark will appear on every page. It might not be production ready, as I am not sure if there are some additional null conditions that need to be checked, but it should get you running in the right direction.
Keep in mind that this particular block of code will not modify the original PDF, but will create a new PDF using the Tmp_(filename) as the output.
private static void watermarkPDF (File fileStored) {
File tmpPDF;
PDDocument doc;
tmpPDF = new File(fileStored.getParent() + System.getProperty("file.separator") +"Tmp_"+fileStored.getName());
doc = PDDocument.load(fileStored);
for(PDPage page:doc.getPages()){
PDPageContentStream cs = new PDPageContentStream(doc, page, AppendMode.APPEND, true, true);
String ts = "Some sample text";
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 14.0f;
PDResources resources = page.getResources();
PDExtendedGraphicsState r0 = new PDExtendedGraphicsState();
r0.setNonStrokingAlphaConstant(0.5f);
cs.setGraphicsStateParameters(r0);
cs.setNonStrokingColor(255,0,0);//Red
cs.beginText();
cs.setFont(font, fontSize);
cs.setTextMatrix(Matrix.getTranslateInstance(0f,0f));
cs.showText(ts);
cs.endText();
}
cs.close();
}
doc.save(tmpPDF);
}

Converting a pdf to word document using java

I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\\java\\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}

In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
The code is quite simple:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.

You can use 7-pdf library
have a look at this it may help :
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!

Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConverter can help you.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert docx file into PDF with Java - java

Related

Apache POI convert HTML/XHTML to DOC/DOCX

iText PDF display marathi, hindi and different languages in android

Convert DOC [HWPFDocument] to pdf [with font, Table and images] using java

How to add text watermark to pdf in Java using Apache PDFBox?

Converting a pdf to word document using java

Categories

Resources