To convert docs into HTML - java

We are working on a Java Project and the requirement is that we need a HTML converter for normal office docs, which will convert the docs to HTML. Further we need to present those html pages in a viewer.
I got many solution but some convert only to doc, some to docx only, I need one single solution to convert doc, docx and other document converter into HTML.

Have a look at Apache POI, there is a bunch of converting-classes.
Link to the POI

You can get it through Libre Office command line...
"'C:/Program Files/LibreOffice 4/program/soffice.exe' --headless --convert-to html --outdir converted/ 'uploads/filename.doc'");
Here C:/Program Files/LibreOffice 4/program/soffice.exe is tha path of the executable file...
You can download Libre Office from this link...
http://www.libreoffice.org/download/libreoffice-fresh/

Related

LibreOffice(4.4.3) Headless PDF Conversion issue for some MSWords documents

I am able to convert most of the word documents(doc & docx) to PDF on windows.
"soffice.exe" --headless --convert-to pdf --outdir "C:\Ok" "C:\Ok\Test_Original.doc"
But a few documents are not getting converted and I see the following intermediate file generated
.~lock.Test_Original.pdf#
No errors or warnings.Not sure how I can get around this issue. Any suggestions or alternate solutions to LibreOffice. I intend to convert docs to pdf on the server.

Convert the RTF to PDF using JAVA which reads the tables in rtf document

What ways are there to convert an RTF to PDF that contains a table in the document in Windows or Unix using Java?
The option we have tried here are:
ITEXT - But the table inside the rtf document is not coming properly once converted to PDF. In short the PDF doesn't contain the Table. Here is the code gist. ITEXT for rtf to pdf java code gist
POI - Does apache POI support RTF document parsing? But I found that it is not supported. POI support for RTF
TIKA - Using Tika I am able to read the document, but the table in RTFis not parsed correctly and I don't know how to convert it to PDF. TIKA java code for reading rtf
We have looked into other options. Is possible to develop or convert RTF to PDF with Java?
Other options we looked into are in this link
Yes, its possible. Take a look at JasperReports!
http://community.jaspersoft.com/project/jasperreports-library
There is also a good API available from Jaspersoft to code your custom PDF-engine and your custom datasource. Start with iReport (UI-Editor).

Open Microsoft Word docx file with Java

How can I open a Microsoft Word docx file in Java? furthermore, how can I open it if it is password protected?
For instance,
File f = new File("hello.docx");
Please try to avoid responding with things such as "you shouldn't do this." I have a good reason for this, so please stick to the question when you answer. thanks a lot!
There is Apache POI project for working with MS Office files. DOCX file is just a zip file with series of XML files inside, so you can unzip the file and work with XML. The XML spec (Open XML) is known.
I haven't personally used it, but it looks like Apache POI will work for you: http://poi.apache.org/
You can use docx4j too. http://www.docx4java.org/trac/docx4j
I have used both docx4j and Apache's POI libraries, if you are working with .docx I would recommend .docx4j. Automated alot of the process of creating a .docx.
There is a great exmaple here : http://java.dzone.com/articles/create-complex-word-docx
on how to create a .docx using the docx4j package.
If the docx is password protected, it won't be a zip file. It will be a compound file. See Overview of Protected Office Open XML Documents
To read a compound file in Java, use POIFS. POIFS is part of POI (docx4j uses it as well, so if you download the docx4j distribution, you'll be able to use the POIFS API)
Once you have decrypted the encrypted package, you can read it using docx4j or POI.
Edit: OK, now docx4j can handle password-protected docx automatically.
Have you tried to open it using the Open Office api? It can work with a lot of documents types.
I used it with MS Excel files .xls ( old version ) format.
Hope this can help you.

How to convert from pptx to html

I have an ooxml file that contain the notes section from a pptx slide (extracted using POI).
Is there a way (framework, software) to convert it to html and still keep the original design (Bold, italic...)
of the notes.
Edit:
Never mind i developed my own pptx notes to html parser.
I have used JODConverter library to convert openoffice ppt to html before.
http://code.google.com/p/jodconverter/wiki/GettingStarted
There is alibrary called docx4java that has the capability to extract MS XML formatted files (docx, xlsx, pptx). Check it and check the sample SvgExporter on http://www.docx4java.org/svn/docx4j/trunk/docx4j/src/pptx4j/java/org/pptx4j/convert/out/svginhtml/.
I used this library and it worked well in extracting the DOCX format as HTML or PDF.

API's For converting A file into PDF

I want some API's and some Document So that i can convert any file into PDF..
The file may be Doc , exl, ppt ..etc .
My requirement is, i have a file EX:- Doc file and i just wants to convert it into PDF.. using java .
Any suggestion will be helpful...
I would recommend you taking a look at Flying Saucer (former xhtmlrenderer) which makes creating PDF files extremely easy from XML and HTML files (internally it uses iText).
HTML/XML can be used as a intermediate format making this a quite flexible solution.
Use
http://pdfbox.apache.org/
and
http://poi.apache.org/
If you want to generate a PDF from an XML document, you can try Apache FOP, which follows the XSL-FO standard.
http://xmlgraphics.apache.org/fop/
So a smart process could be: extract data from your various document formats using POI, odftoolkit (for OenDocument) or other tools, inject them into an XML container, and then translate them into PDF using FOP.
Apache's poi API is best for convert any file to pdf
You can use Itext . It is well documented and comes with ton of examples.

Categories

Resources