GWT document format converter - java

I am searching on ways to make a small app using GWT for converting documents
from one format to other.
Mainly these formats .doc , .pdf , .odt , .rtf.. and maybe a couple
more.
Has anyone tried this before??
I came across the library JODConverter but it needs open office to be
already installed and i don't really know how many people have used it
with gwt in past.
Please give me some starting pointers, or if anyone has experience
with this kind of app, do share.
Thanks and regards,
Rohit

I was looking into implementing something like this a few month ago.
Since GWT compiles your code to JavaScript there is no way for you to do that on the client side, JavaScript can't access the file system.
So you would need to upload the file to the server first and do the conversion on the server side and send the converted file back.
I have never heard of JODConverter before, the the library I wanted to use was Apache POI . Unfortunately I can't tell you anything about it, because I haven't tried it yet.

It sounds like JOD Converter is precisely what you need since you're looking at multi format conversions from Java. You would install OpenOffice on your server and link it up with JOD Converter. When a document is uploaded, your application would call JOD Converter to perform the conversion and stream the converted document back to the caller. Alternatively you can put the file somewhere, and send a link (URL) back to the caller so they can fetch the document. You can also look at JOD Reports or Docmosis if you need to manipulate the documents.

GWT is mostly a client side toolkit. Are you trying to make a tool that does all the conversion on the client side, with no help from the server? In that case, you should be looking for JavaScript libraries that can read/convert all those formats. If you are planning to have the user upload their files to the server, then you can use whatever technology you want on the server, and just use GWT for the UI.

Related

Java Google Docs API - Document Conversion Tool used in a standalone java program

First time asking something, let's see if I don't mess up.
My question is, I believe, a simple one:
Can one you use the Document Conversion Tool present in the Google Docs API to convert a PPTX/PPT to ODP (OpenDocument Presentation) without any intent of uploading it to Google Docs.
Basically I just want to use the conversion tool and have the file to save. Is it possible?
Thank a bunch for your time.
Your contribution will keep the entire World spinning.. well, at least my World :P
No, it is not possible to use the API to convert file into different formats if they are not uploaded to Google Drive.
You can always upload a file to Drive, convert it to a different format and then delete the original file, but I don't why you might want to do that :)

Native Java document parser and converter library / linux based document converters

I'm looking for a Java library which can do the following:
parse emails in *.eml or *.msg format for attachments of type DOC,DOCX,JPEG,PNG,GIF,TXT,XLS,XLSX,PPT,PDF and convert the attachmens to the TIFF format.
It can be either open source or a comercial library. Alternatively I'm looking for command line tools for linux doing this. We already tried open office, but there are too many problems with some document formats.
UPDATE:
What I found out by research up to now:
For parsing emails and extracting attachments, JavaMail (http://www.oracle.com/technetwork/java/javamail/index.html) is a good choice.
For converting documents, JodConverter (http://code.google.com/p/jodconverter/) is a confortable library. However it's only a wrapper for open office, so if there are issues with open office (and I do have often trouble with openoffice) to convert a document, you will have them also with JodConcerter.
In conclusion I had no luck (up to now) to find any document conversion library implemented in native java, which handels all common document formats, neither open source or even commercial. It seems to be a real market gap.
RainbowPDF may fit: its a commercial server based conversion tool with Java API.
If you've got a Windows server, have a look at NEEVIA Document Converter Pro. It has some mail functionality.
Apace POI is an interface to read the content of Microsoft Office documents. You will have to code the image generating and layouting components on your own. Nervertheless it reads Outlook MSG format.
Apache POI - the Java API for Microsoft Documents. However I don't know how to easily convert parsed document to TIFF.
May be a mix of different approaches could be useful? Depending on your requirements, could be possible to use several libraries to convert all the formats you need to manage: Microsoft Office, Adobe PDF, some different image formats and simple text files.
I mean, you can create a process that, depending on the type of the file extracted (using Java Mail), you could recognize what kind of format the file has and continue processing with the right conversion mechanism using the suitable library. Then you will idenfity if a file it's an image to convert, try Java Advanced Imaging, if it's a Microsoft Office file, try Apache POI and so on. For managing PDF files, you can try Apache PDFBox it's another good and opensource solution.
By the way, if you are looking not only for a Java approach, may be this thread may help you.
I don't know if there are better commercial solutions than #ChrisGer commented.
Do not waste your time looking at Apache POI, as it can only parse the content of the Office files but is not suitable for rendering it.
Since there are OpenOffice servers available, I suggest you do this. I also know you can easily use DCOM to talk with Microsoft Office apps, maybe a Java->DCOM bridge is more up to the task. However, this is not even recommended by Microsoft (so I suppose the JodConverter thing is equally unstable).

html to pdf in java plugin [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Converting HTML files to PDF
List of HTML to PDF converters
We want an java plugin to convert html to pdf, that will integrate with our app. is that possible with any java plugins?, we dont want to purchase it every time especially number of users for one app something like that. we will integrate that plugin with our app. we are not going to show that there plugin inside our app.but there is need of printing html on browser and as well need to produce pdf for that HTML in our app. we want plugin for our application. is there any java /java server plugin .
Please let us know that if this suitable to our requirements. any reply that would be great help.
Thanks and Regards
kumar kasimala
You can use itext library for this purpose
Try the PDF Converter services. It supports loads of formats including HTML and works from any Web Services capable environment including Java and .net.
Here are some examples:
Converting using the web services interface
Converting HTML
Note that the articles may mention SharePoint a couple of times. Don't worry, the product is not tied to SharePoint, it just happens to be used a lot in that environment.
Big disclaimer, I worked on the product so I am obviously biased about how fantastic the product is :-)

How to create an excel file in google app engine (java)?

A question that seems to have quite a few options for Python, but none for Java after googling for two days. Really really could use some help all I have found so far is a recommendation to use gaeVFS to build an excel file from the xml components and then zip it all together which sounds like a slap in the face. Oh yes and if you were wondering I am questioning my use of Java rather than python but at 5,000 lines of code it would be insane to turn back now...
Other things you might find useful
Client: GWT
Server: Servlets running
on google app engine storing data
into the google data store
Excel file: mandatory, CSV isn't good
enough, no need to save the file just
to be able to "serve" it to the
client i.e. open a "Save As" box.
Have you checked out this api already: Java Excel API ?
You could also take a look at the Apache POI project. You can read and write MS Excel documents with this library.
Take a look at this post.
It's a step by step tutorial on how to generate excel files on google app engine.
Try this :
http://code.google.com/p/gwt-table-to-excel/
google app engine do not support input/output stream classes, you need to use google app engine virtual file system.

How can I read MS Office files in a server without installing MS Office and without using the Interop Library?

The interop library is slow and needs MS Office installed.
Many times you don't want to install MS Office on servers.
I'd like to use Apache POI, but I'm on .NET.
I need only to extract the text portion of the files, not creating nor "storing information" in Office files.
I need to tell you that I've got a very large document library, and I can't convert it to newer XML files.
I don't want to write a parser for the binaries files.
A library like Apache POI does this for us. Unfortunately, it is only for the Java platform. Maybe I should consider writing this application in Java.
I am still not finding an open source alternative to POI in .NET, I think I'll write my own application in Java.
For all MS Office versions:
You could use the third-party components like TX Text Controls for Word and TMS Flexcel Studio for Excel
For the new Office (2007):
You could do some basic stuff using .net functionality from system.io.packaging. See how at http://msdn.microsoft.com/en-us/library/bb332058.aspx
For the old Office (before 2007):
The old Office formats are now documented: http://www.microsoft.com/interop/docs/officebinaryformats.mspx. If you want to do something really easy you might consider trying it. But be aware that these formats are VERY complex.
Check out the Aspose components. They are designed to mimic the Interop functionality without requiring a full Office install on a server.
As the new docx formats are inherently XML based files, you can create and manipulate them programmatically with standard XML DOM techniques, once you know the structure.
The files are basically zip archives with an alternate file extension. Use the System.IO.Packaging namespace to get access to the internal elements of the file, then open them into a XmlDocument to perform the manipulation.
There are examples available for doing this, and the Office Open XML project on SourceForge may be worth looking at for inspiration.
As for the older binary formats, these were proprietary to MS, and the only way you're likely to get at the content from within is through the Office object model (requires an Office install), or a third party file converter/parser.
Unfortunately there's nothing first party and native to the .NET platform to work with these files.
What do you need to do with those file? If you just want to stream them to the user, then the basic file streams are fine. If you want to create new files (perhaps based on a template) to send to the user that the user can open in Office, there are a variety or work-arounds.
If you're actually keeping data in Office documents for use by your web site, you're doing it wrong. Office documents, even Excel spreadsheets and access databases, are not really an appropriate choice for use with an interactive web site.
If the document is in word 2007 format, you can use the system.io.packaging library to interact with it programatically.
RWendi
In Java world, there is also JExcelApi. It is very clearly written, from what I was able to see, much cleaner then POI. So maybe even a port of that code to .NET is not out of the question, depending of course you have enough of time on your hands.
OpenOffice.
You can program against it and have it do a lot for you, without spending the money on a license for the server, or have the vulnerability associated with it on your server.
Microsoft Excel workbooks can be read using an ODBC driver (or is it an OLE DB driver? can't remember) that makes the workbook look like a database table. But I don't know whether that driver is available without the Office Suite itself.
You can use OpenOffice. It has a command-line conversion tool:
Conversion Howto
In short, you define a macro in OpenOffice and you call that macro with a command-line
argument to OpenOffice. In that argument the name of the local file (the Office file) is
encoded.
It's not a great sollution, but it should be workable.

Categories

Resources