How to manipulate .doc files - java

I need to create a little desktop app in Java that creates for me a .doc file and writes a bit of text into the file. I found an interesting tool called Aspose, but i saw it is not free at all.
Do yoy know what kind of, java API can i use for doing that(for free)?
Is it possible to do that only with the java SE libraries?
What do you think would be the easiest and fastest way to achive this goal?

I suggest you have a look at the Apache POI framework, specifically the HWPF - Java API to Handle Microsoft Word Files:
HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java. It also provides limited read only support for the older Word 6 and Word 95 file formats.

if you are going with .doc then as a learning excercise, open a Word document with some content (ideally similar to what you want to create) then save that as XML, and review the contents.
you will need to do some basic DOM parsing and management in your code to insert the right stuff.

By .doc file, I assume you mean Microsoft Office? Reading and writing Office file formats is something of a black art. Does it have to be a .doc format file specifically? A lot easier would be to write out a Rich Text Format file (.rtf) that Word could load.

And if you don't need to use .doc specifically I would suggest you use .odt, http://www.jopendocument.org/.

Related

Convert .doc/.docx documents to .odt (Open document text) and vice versa using java.

Is there any java library which can be used for converted Microsoft Word files (doc/docx) to Open Document Text format(.odt) formats. Free library would be preferable.
I don't know about any libraries that do it directly, but it should be relatively easy to exact the bits you're interested from a .docx using poi:
http://poi.apache.org/
and then write them to an ODT format using ODFDOM:
http://incubator.apache.org/odftoolkit/odfdom/index.html
This should be relatively straightforward for simple documents, but if your use case calls for complex doucments containing pictures etc, this might become a LOT harder.
Anyway, hope this helps at least some ;)
I believe everything you need is in this post: http://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/
For instance:
JODConverter : JODConverter automates conversions between office
document formats using OpenOffice.org or LibreOffice. Supported
formats include OpenDocument, PDF, RTF, HTML, Word, Excel, PowerPoint,
and Flash. It can be used as a Java library, a command line tool, or a
web application.

Java generate PDF from RTF

I want to generate PDF file from RTF file.
I have tried following.
Itext
It's already outdated and new version doesn't support rtf.
JDocConverter
It uses OpenOffice on the background. it is working fine, there is only one problem. Open office doesn't support drawing object in RTF.
Any other possible and reliable solutions?
Note: It would be fine don't use any commercial software.
Windows has native convert RTF to PDF using command line, however it will to a degree be limited, so it will use direct convert text and images, but it will depend on rtf syntax as to which drawn objects are supported. WORD ART drawing objects need MS Word to print
The output looks reasonable but here is the source in MSWord where the art was clearly not handled by the non-word printout.
Under Windows you could print to CutePDF Writer. This freeware uses Ghostscript as a back end.
You may try Aspose.Words for Java to convert RTF file to PDF format. You can load a file in RTF format into Aspose.Words for Java and then save it to PDF format. Please note that while loading specify RTF as LoadFormat value and pass PDF as SaveFormat value while saving the document. This doesn't require OpenOffice or any other software to be installed for the conversion to work.
Disclosure: I work as developer evangelist at Aspose.
Best way to do it is use MS Office. And Ms Office is able to save file in PDF format (you need install some addons I think).

Java: Placing a Header in MS Word Document

We are converting a C++ project to Java where we generate reports in ".doc" extension. The problem is we don't use any third party library to generate MS Word document, rather a file with .doc extension. Everything works fine except that we can't seem to find a way to add a Header at the beginning of every page. Using line numbers is not an option. Any other way it can be done?
Thank you.
The Apache POI library might be of some help.
It has facilities to read and modify Microsoft proprietary file formats like MS-Word .doc and MS-Excel .xls

Parse Pdf File and write content in word file using java

how to Parse a PDF file and write the content in word file using Java?
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
Both are free.
Try the iText java library:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
It can be used for your parsing step.
As for generating word documents - the OpenOffice Java API might be able to generate Word compatible docs (no personal experience with this API).
You might want to try any of these:
http://incubator.apache.org/pdfbox/
https://pdf-renderer.dev.java.net/
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
Best!
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.

Can Java POI write image to word document?

Anyone know if it is possible?
And got any sample code for this?
Or any other java API that can do this?
The Office 2007 format is based on XML and so can probably be written to using XML tools. However there is this library which claims to be able to write DocX format word documents.
The only other alternative is to use a Java-COM Bridge and use COM to manipulate word. This is probably not a good idea though - I would suggest finding a simpler way.
For example, Word can easily read RTF documents and you can generate .rtf documents from within Java. You don't have to use the Microsoft Word format!
As others have said POI isn't going to allow you to do anything really fancy - plus it doesn't support Office 2007+ formats. Treating MS Word as a component that provides this type of functionality via COM is most likely the best approach here (unless you are running on a non-Windows OS or just can't guarantee that Word will be installed on the machine).
If you do go the COM route, I recommend that you look into the JACOB project. You do need to be somewhat familiar with COM (which has a very steep learning curve), but the library works quite well and is easier than trying to do it in native code with a JNI wrapper.
If you are using docx, you could try docx4j.
See the AddImage sample
Surely:
Take a look at this: http://code.google.com/p/java2word
Word 2004+ is XML based. The above framework gets the image, convert to Base64 representation and adds it to the XML.
When you open your Word Document, there will be your image.
Simple like this:
IDocument myDoc = new Document2004();
myDoc.getBody().addEle("path/myImage.png"));
Java2Word is one API to generate Word Docs using obviously Java code. J2W takes care of all implementation and XML generation behind the scenes.
As far as can be gathered from the project website: no.
POI's HWPF can extract an MS Word document's text and perform simple modifications (basically deleting and inserting text).
AFAIK it can't do much more than that.
Also keep in mind that HWPF works only with the older MS Word (97) format, not the latest ones.
Not sure if Java out of the box can do it directly. But i've read about a component that can pretty much do anything in terms of automating word document generation without having Word. Aspose Words
JasperReports uses this API alternatively to POI, because it supports images:
JExcelAPI
I didn't try it yet and don't know how good/bad it is.

Categories

Resources