how to convert .HTML file to .DOC file...? - java

i am working in JSF Application and in this application using some code i have create a HTML file that is store in my server and now i want to convert that HTML file in to .DOC file ..... so please help me

There are tools that convert HTML to PDF: Convert an HTML file to a PDF with their pictures and styles using Java
Then use: http://pdf-to-doc.software.informer.com/

You could strip it of script and comments etc, then load it into Word and save it as .doc.

A solution is i have Download a
officetools.OfficeFile;
this jar file and after write some code that can easy get from the net...

You can convert HTML to XML (using XSLT) and load it into Word afterwards. I did it while converting HTML page with tables to XML with XSLT file attached and it worked just fine.

Related

To convert docs into HTML

We are working on a Java Project and the requirement is that we need a HTML converter for normal office docs, which will convert the docs to HTML. Further we need to present those html pages in a viewer.
I got many solution but some convert only to doc, some to docx only, I need one single solution to convert doc, docx and other document converter into HTML.
Have a look at Apache POI, there is a bunch of converting-classes.
Link to the POI
You can get it through Libre Office command line...
"'C:/Program Files/LibreOffice 4/program/soffice.exe' --headless --convert-to html --outdir converted/ 'uploads/filename.doc'");
Here C:/Program Files/LibreOffice 4/program/soffice.exe is tha path of the executable file...
You can download Libre Office from this link...
http://www.libreoffice.org/download/libreoffice-fresh/

HTML+ Images + CSS To PDF using Google Drive API

I know it's possible to convert an HTML file to PDF using Google Drive (HTML2PDF using Google Drive API) but I'd like to know if this HTML has images and CSS files is possible and how to do that.
You need convert HTML to a Docs file and export it as PDF. During the docs conversion most of the non-trivial styles are being trimmed. Basic coloring, sizing and positioning will all you'll get. The exported PDF is the Docs' file's PDF version. Images will be preserved though.
You can make experiments by uploading your html files to Google Drive on drive.google.com with conversion settings on and see the results.
For images you could try this: Embedding Base64 Images
Worked for me when uploading by web. Should work with my solution https://stackoverflow.com/a/21711109/592042
Css can be written right into html file.

How to extract data from a pdf file using JPedal?

Actually I am attempting to extract the data from a PDF file but I didn't find any example in the internet and I am asking if there is any possibility that I can use the JPedal library to open to read the data from a PDF file.
You can use PDFBox from Apache.
I am not familiar with JPedal, but I write lots of code that generates and processes pdf files. I use IText and highly recommend it. If you have a specific question on how to process a pdf file, let me know.

How to convert from pptx to html

I have an ooxml file that contain the notes section from a pptx slide (extracted using POI).
Is there a way (framework, software) to convert it to html and still keep the original design (Bold, italic...)
of the notes.
Edit:
Never mind i developed my own pptx notes to html parser.
I have used JODConverter library to convert openoffice ppt to html before.
http://code.google.com/p/jodconverter/wiki/GettingStarted
There is alibrary called docx4java that has the capability to extract MS XML formatted files (docx, xlsx, pptx). Check it and check the sample SvgExporter on http://www.docx4java.org/svn/docx4j/trunk/docx4j/src/pptx4j/java/org/pptx4j/convert/out/svginhtml/.
I used this library and it worked well in extracting the DOCX format as HTML or PDF.

How to add metadata to html file

I am using Apache Tika to convert RTF documents to HTML.
In Tika's RTFParser class I made changes to generate HTML file using HTMLEditorKit and now I'm able to generate the HTML file.
I want to add the metadata tags into the head tag of the generated HTML file.
Can anybody give me an idea to how to proceed?
Check this out:
Add Metadata
I'm not sure that this will help, but I think it is worth to check out.

Categories

Resources