convert docx to doc with java - java

I have a legacy software which produces a xml and then with help of docx4j a docx document . I must also create a microsoft doc document from the xml file with java.
How can I do that. I'd really appreciate for any help.
Thanks

Look into poi. It's pretty much the defacto standard for modifying Microsoft documents with Java.

docx4j has POI as a dependency, and POI has reasonable support for the legacy binary doc format (hwpf). So you could use that to convert to doc without introducing additional dependencies. Basically, iterate through your content, and emit each paragraph/table/image in doc format. That would be the reverse of convert/in/Doc.java.
However, the devil is in the detail, and it would be a lot of work if your documents contain a variety of features. This assertion stands whether you were doing docx4j to binary doc (hwpf), or POI's own xwpf to hwpf, since POI doesn't have a common interface across the two of them.
So instead of using POI for this, I'd use JODConverter to drive LibreOffice (or OpenOffice, their docx features are a bit different) to convert docx to legacy binary .doc.
The JODConverter approach is definitely the path of least resistance, and will generally give good results. The downside with it is that if you find something which isn't supported properly, you'll have to wait for the LO/OO guys to fix it, which wouldn't be the case if you did decide to build binary doc output for docx4j using POI. If you did build this, we'd happily accept it as a contribution :-)

Related

Is there a way (in java) to generate human editable Microsoft Word documents from human readable template?

I am searching for a way for my Java application to generate Word document using some kind of Template (the data for the document will be provided by the application)
Here are the requirements :
- The template should be editable for a non-developper human being. Creating a Jasper template using the adequate tool or editing a Word document with some kind of templating language is compliant. Asking for editing the xml file of the document is not
- The results should be easily editable for a human being, using Microsoft Word. For example, the document generated by Jasper or Birt is not compliant, as the table layout prevent any easy edition.
For the moment, I looked at the following solutions, finding no one which match the two requirements :
Jasper. The document generated are not easily edited
Birt. Same Problem
Generating the xml using a template motor (velocity, Freemarker). I cannot ask for the final client to edit this kind of XML file...
You can check out Templater. It has pretty good demo page.
Disclamer: I'm the author.
LibreOffice
LibreOffice is an open-source implementation of an app suite similar to Microsoft Office. Besides supporting the standardized OpenDocument format, it also reads and writes Microsoft Office formats.
LibreOffice offers a Java API. So you may be able to programmatically create documents from a template.
In the past we’ve done something similar, modifying a document with search-and-replace and document-variables.
Apache Poi
Apache Poi is an open-source library for reading and writing Microsoft Office compatible documents.
I don't know its details but you might take a look.
JODReports (open source) and Docmosis (commercial) are designed to use normal/human-managed documents as templates (Word, OpenOffice, etc), merge in your data and return editable documents, PDFs etc. Please note I work for Docmosis.
Both JODReports and Docmosis provide a Java API.
If you are interested in automating Open Office or Libre Office directly (as mentioned in Basil's answer), this blog about converting Doc to Pdf will give you a quick-start to:
load a doc file as a template
search and replace
export to file (pdf in the example)
To change the output format to Doc instead of PDF:
propertyValues[1].Value = "writer_pdf_Export";
to
propertyValues[1].Value = "MS Word 97";
I hope that helps.
Was searching for this kind of solution as well, and I found XDocReport, including an example of a table. I will give it a try.

Creating Docx, PDF, XSL-FO

[Background Info]
We had a solution in place to use Word automation serverside to convert HTM documents into Docx, PDF or Print documents. This solution broke in the latest version of Windows Server 2012. We learned that MS does not intend on Word working in this manner and after trouble shooting with MS support Engineers we have come to the conclusion that it will never work.
[Currently]
I am currently researching potential technologies and tools that my company can use to regain this functionality. We need to be able to create Docx, PDF and print files to a local printer.
I have looked into a number of tool already and I am currently leaning towards Apache FOP this seems to handle PDF and Printing for us.
However, I'm looking for some advice and suggested tools that we could use to implement a pure Java approach. Currently our application creates HTM files with all the required information. So ideally we would like to take these HTM files and "Convert" them into Docx/XLS-FO format.
[Question]
So my question that I'm hoping you will be able to help me with.
What is the best tools that I can use to get from
HTM to Docx
HTM to PDF
Or what would be the best process for achieving this? has anyone had success finding a solution for this in the past?
Thank You
It depends on the level of control and the complexity of the source HTML. There are HTML to FO stylesheets but you might find them wanting for your specific need.
So you could use the Jericho parser to read the HTML and generate FO. Or you generate the target format directly using Apache PDFBox and Apache POI
It all boils down to the level of control you want/need
docx4j-ImportXHTML will get you from XHTML to docx. From there, you can use docx4j (or some other solution eg LibreOffice/OpenOffice) to do docx to PDF.
docx4j supports docx to XSL FO, and by default uses FOP.

Convert .doc/.docx documents to .odt (Open document text) and vice versa using java.

Is there any java library which can be used for converted Microsoft Word files (doc/docx) to Open Document Text format(.odt) formats. Free library would be preferable.
I don't know about any libraries that do it directly, but it should be relatively easy to exact the bits you're interested from a .docx using poi:
http://poi.apache.org/
and then write them to an ODT format using ODFDOM:
http://incubator.apache.org/odftoolkit/odfdom/index.html
This should be relatively straightforward for simple documents, but if your use case calls for complex doucments containing pictures etc, this might become a LOT harder.
Anyway, hope this helps at least some ;)
I believe everything you need is in this post: http://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/
For instance:
JODConverter : JODConverter automates conversions between office
document formats using OpenOffice.org or LibreOffice. Supported
formats include OpenDocument, PDF, RTF, HTML, Word, Excel, PowerPoint,
and Flash. It can be used as a Java library, a command line tool, or a
web application.

API's For converting A file into PDF

I want some API's and some Document So that i can convert any file into PDF..
The file may be Doc , exl, ppt ..etc .
My requirement is, i have a file EX:- Doc file and i just wants to convert it into PDF.. using java .
Any suggestion will be helpful...
I would recommend you taking a look at Flying Saucer (former xhtmlrenderer) which makes creating PDF files extremely easy from XML and HTML files (internally it uses iText).
HTML/XML can be used as a intermediate format making this a quite flexible solution.
Use
http://pdfbox.apache.org/
and
http://poi.apache.org/
If you want to generate a PDF from an XML document, you can try Apache FOP, which follows the XSL-FO standard.
http://xmlgraphics.apache.org/fop/
So a smart process could be: extract data from your various document formats using POI, odftoolkit (for OenDocument) or other tools, inject them into an XML container, and then translate them into PDF using FOP.
Apache's poi API is best for convert any file to pdf
You can use Itext . It is well documented and comes with ton of examples.

Can Java POI write image to word document?

Anyone know if it is possible?
And got any sample code for this?
Or any other java API that can do this?
The Office 2007 format is based on XML and so can probably be written to using XML tools. However there is this library which claims to be able to write DocX format word documents.
The only other alternative is to use a Java-COM Bridge and use COM to manipulate word. This is probably not a good idea though - I would suggest finding a simpler way.
For example, Word can easily read RTF documents and you can generate .rtf documents from within Java. You don't have to use the Microsoft Word format!
As others have said POI isn't going to allow you to do anything really fancy - plus it doesn't support Office 2007+ formats. Treating MS Word as a component that provides this type of functionality via COM is most likely the best approach here (unless you are running on a non-Windows OS or just can't guarantee that Word will be installed on the machine).
If you do go the COM route, I recommend that you look into the JACOB project. You do need to be somewhat familiar with COM (which has a very steep learning curve), but the library works quite well and is easier than trying to do it in native code with a JNI wrapper.
If you are using docx, you could try docx4j.
See the AddImage sample
Surely:
Take a look at this: http://code.google.com/p/java2word
Word 2004+ is XML based. The above framework gets the image, convert to Base64 representation and adds it to the XML.
When you open your Word Document, there will be your image.
Simple like this:
IDocument myDoc = new Document2004();
myDoc.getBody().addEle("path/myImage.png"));
Java2Word is one API to generate Word Docs using obviously Java code. J2W takes care of all implementation and XML generation behind the scenes.
As far as can be gathered from the project website: no.
POI's HWPF can extract an MS Word document's text and perform simple modifications (basically deleting and inserting text).
AFAIK it can't do much more than that.
Also keep in mind that HWPF works only with the older MS Word (97) format, not the latest ones.
Not sure if Java out of the box can do it directly. But i've read about a component that can pretty much do anything in terms of automating word document generation without having Word. Aspose Words
JasperReports uses this API alternatively to POI, because it supports images:
JExcelAPI
I didn't try it yet and don't know how good/bad it is.

Categories

Resources