I am looking for any Java library which can convert Word doc into PowerPoint format.
I have looked into some libraries such as documents4j which converts many of the formats but not Word doc into PowerPoint.
Look at Apache Poi: https://poi.apache.org.
From their mission statement:
The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java. Apache POI is your Java Excel solution (for Excel 97-2008). We have a complete API for porting other OOXML and OLE2 formats and welcome others to participate.
OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT as well as MFC serialization API based file formats. The project provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document Properties (HPSF).
I have extensively used this library with Word and Excel and it is really working well.
Related
I'm learning about data driven testing using Selenium and Excel. I'm taking an online course that has asked used to add the Apache poi and poi-ooxml dependencies in Maven.
I'm struggling to understand what the differences between the two are. Are both required in order to retrieve data in Excel and pass these to our tests?
Thanks
Excel files has long history
Excel 97-2003 workbook:
This is a legacy Excel file that follows a binary file format. The file extension of the format is .xls.
Excel 97-2003 in terms of apache poi is called - Horrible Spreadsheet Format As the Excel file format is complex and contains a number of tricky characteristics,
apache-poi jar has code to handle these file
Excel 2007+ workbook:
This is the default XML-based file format for Excel 2007 and later versions. It follows the Office Open XML (OOXML) format, which is a zipped, XML-based file format developed by Microsoft for representing office documents. The file extension of the format is .xlsx. ( DOCX,PPTX are other OOXML based examples).
Excel 2007+ workbook in terms of apache poi is called - XML Spreadsheet Format -these file format are advanced version of HSSF and has additional features, code to handle these files are written in apache-poi-ooxml jar
More reading
As .xls is almost dead but still some applications use it, so for backward compatibility both dependencies are required.
here is what Apache have to say -
HSSF Excel XLS poi For HSSF only, if common SS is needed see below
Common SS Excel XLS and XLSX poi-ooxml WorkbookFactory and friends
all require poi-ooxml, not just core poi
you can read more at their official website http://poi.apache.org/components/index.html#components
What ways are there to convert an RTF to PDF that contains a table in the document in Windows or Unix using Java?
The option we have tried here are:
ITEXT - But the table inside the rtf document is not coming properly once converted to PDF. In short the PDF doesn't contain the Table. Here is the code gist. ITEXT for rtf to pdf java code gist
POI - Does apache POI support RTF document parsing? But I found that it is not supported. POI support for RTF
TIKA - Using Tika I am able to read the document, but the table in RTFis not parsed correctly and I don't know how to convert it to PDF. TIKA java code for reading rtf
We have looked into other options. Is possible to develop or convert RTF to PDF with Java?
Other options we looked into are in this link
Yes, its possible. Take a look at JasperReports!
http://community.jaspersoft.com/project/jasperreports-library
There is also a good API available from Jaspersoft to code your custom PDF-engine and your custom datasource. Start with iReport (UI-Editor).
Is there any java library which can be used for converted Microsoft Word files (doc/docx) to Open Document Text format(.odt) formats. Free library would be preferable.
I don't know about any libraries that do it directly, but it should be relatively easy to exact the bits you're interested from a .docx using poi:
http://poi.apache.org/
and then write them to an ODT format using ODFDOM:
http://incubator.apache.org/odftoolkit/odfdom/index.html
This should be relatively straightforward for simple documents, but if your use case calls for complex doucments containing pictures etc, this might become a LOT harder.
Anyway, hope this helps at least some ;)
I believe everything you need is in this post: http://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/
For instance:
JODConverter : JODConverter automates conversions between office
document formats using OpenOffice.org or LibreOffice. Supported
formats include OpenDocument, PDF, RTF, HTML, Word, Excel, PowerPoint,
and Flash. It can be used as a Java library, a command line tool, or a
web application.
I have a Java web application that generates an MS Word document in the WordML format (a single XML file in Word 2003 XML format with a .xml file extension). I would like to automatically convert this into the newer Office Open XML format so that the document could be saved as a .docx file (which in essence is a zip file containing multiple XML files).
This has to be fully automated, and cannot require the user to download the file and convert it manually. Furthermore, the user cannot be assumed to have MS Word installed (they could be using LibreOffice instead).
I have been looking for a Java library I could use to do this, but couldn't find any that converts .xml to .docx. The only converter I could find was JODconverter but it doesn't support conversion from .xml to .docx.
Is there a Java library that could do this sort of conversion? Or maybe should I be looking for a non-Java solution? Maybe a Python module could do this? (For example a Python script could take the files generated by the Java app and convert them do .docx.)
If you can't modify your app to emit Flat OPC XML, you could write an XSLT to convert from Word 2003 XML format to Flat OPC XML. They are quite similar.
Then, docx4j (disclosure: I maintain this) supports Flat OPC XML to docx.
how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.