Apache POI Powerpoint Alternative - java

I have an application which currently uses the Apache POI libraries to produce Microsoft Powerpoint documents. I need to move this application into a more restricted environment which doesn't allow the POI libraries.
Are there any alternatives to POI and interfacing with COM for writing information to PowerPoint with Java?
I could learn and work with COM, but I'd rather avoid it at this point.
Thanks

I would suggest that you look at two commercial libraries.
The first is Aspose. It's a library that works a bit like Apache POI but in a "easiest" way. At my company, we are using it to include charts in Excel file (a thing that is not yet supported by POI).
Ezjcom is probably the library you want. It needs a living Office instance and allow you to interact with excel as if you were making a macro in Basic, but from Java. Watch-out, the documentation is terrible!

Related

Java: Is there a way to create a pdf file in eclipse using just apache poi

All of the methods that i've searched for seem to use itext libraries, but is there a way to just use apache poi libraries ?
I don't believe so. POI is strictly for Microsoft documents and PDF was created by Adobe. So the short and quick answer is no.
If you read the docs for Apache's POI project you'll read in the title "Apache POI - the Java API for Microsoft Documents".
Apache POI Project
POI is explicitly made for Microsoft Documents. Using itext is the easiest way to generate or create a PDF file within Java.

Native Java document parser and converter library / linux based document converters

I'm looking for a Java library which can do the following:
parse emails in *.eml or *.msg format for attachments of type DOC,DOCX,JPEG,PNG,GIF,TXT,XLS,XLSX,PPT,PDF and convert the attachmens to the TIFF format.
It can be either open source or a comercial library. Alternatively I'm looking for command line tools for linux doing this. We already tried open office, but there are too many problems with some document formats.
UPDATE:
What I found out by research up to now:
For parsing emails and extracting attachments, JavaMail (http://www.oracle.com/technetwork/java/javamail/index.html) is a good choice.
For converting documents, JodConverter (http://code.google.com/p/jodconverter/) is a confortable library. However it's only a wrapper for open office, so if there are issues with open office (and I do have often trouble with openoffice) to convert a document, you will have them also with JodConcerter.
In conclusion I had no luck (up to now) to find any document conversion library implemented in native java, which handels all common document formats, neither open source or even commercial. It seems to be a real market gap.
RainbowPDF may fit: its a commercial server based conversion tool with Java API.
If you've got a Windows server, have a look at NEEVIA Document Converter Pro. It has some mail functionality.
Apace POI is an interface to read the content of Microsoft Office documents. You will have to code the image generating and layouting components on your own. Nervertheless it reads Outlook MSG format.
Apache POI - the Java API for Microsoft Documents. However I don't know how to easily convert parsed document to TIFF.
May be a mix of different approaches could be useful? Depending on your requirements, could be possible to use several libraries to convert all the formats you need to manage: Microsoft Office, Adobe PDF, some different image formats and simple text files.
I mean, you can create a process that, depending on the type of the file extracted (using Java Mail), you could recognize what kind of format the file has and continue processing with the right conversion mechanism using the suitable library. Then you will idenfity if a file it's an image to convert, try Java Advanced Imaging, if it's a Microsoft Office file, try Apache POI and so on. For managing PDF files, you can try Apache PDFBox it's another good and opensource solution.
By the way, if you are looking not only for a Java approach, may be this thread may help you.
I don't know if there are better commercial solutions than #ChrisGer commented.
Do not waste your time looking at Apache POI, as it can only parse the content of the Office files but is not suitable for rendering it.
Since there are OpenOffice servers available, I suggest you do this. I also know you can easily use DCOM to talk with Microsoft Office apps, maybe a Java->DCOM bridge is more up to the task. However, this is not even recommended by Microsoft (so I suppose the JodConverter thing is equally unstable).

Java libraries that work with Microsoft Office documents but do not depend on automation

By "not depend on automation", I mean that it should not require a Microsoft Office installation to work; let alone interact with a live instance of a Microsoft Office component. One such library is Aspose.Total for Java. Are there any more out there?
Another solution I'm considering is to use OpenOffice.org. However, I'm not sure if I'm going to run into the same problems as with Microsoft Office as detailed here.
For Office Documents: http://poi.apache.org/
I have not tried this myself, but Apache usually deliver good libraries
For just Excel: JExcel API for Java
I use this for one application, and it works quite well. May use a fair bit of RAM for larger documents.
One designed specifically to with with the newer XML formats is docx4j: http://dev.plutext.org/trac/docx4j
There are two further answers for this question. Depending on your application.
can borrow from the OpenOffice library code that deals with opening and saving MS Office files. (See: http://www.artofsolving.com/opensource/jodconverter or jOpenDocument )
You might just use OpenOffice itself by scripting or automating that.
I faced this question a while back with a Ruby app and because I was in control of the source document, I got the originator to save things as HTML format and used Tidy to filter the junk. Another option it to find a tool to convert the Office files to RTF which is more generic.
Another to consider ...
LibreOffice looks useful.
jExcelAPI if you just want excel.
Finally there are some opportunities on sourceForge, try this search: http://sourceforge.net/search/?q=java+ms+office
You may find spreadsheets BIG unless you use OpenOffice or MS Office because you need to have a fancy shamancy virtual sparse matrix to do what they do well.
ODF Toolkit - http://odftoolkit.org

How to programmatically extract and manipulate images from an Office file?

How to extract some images from PowerPoint and Word documents, in order to manipulate them, and after that, put the images back in the MS Office files?
Apache has a project called "POI" explicitly made for interacting with MS Office formats from Java. Hopefully that does it for you!
http://poi.apache.org/
Apache POI can handle Word documents via its HWPF module, and extract or insert images from these. Although it's not well documented, check out the POI unit tests for image manipulation within Word (the unit tests seem to be the best documentation for this module).
Failing that, the COM interface is accessible via (say) JACOB. That's probably more work, but will make available APIs not exposed via POI.
In terms of C++, Word exposes a COM API to allow you to manipulate its document format, so as long as you have Word installed on the machine, you can do this in C++ quite easily. Word isn't open source, but you probably have the license anyway.
The company I work for, SoftArtisans, has a product called OfficeWriter that allows you do that, among other things, for Word and Excel (PowerPoint is planned to be added in the future). It is not free or open sourced though.
On the other hand, if you are working strictly with 2007 format (XML based) you can probably use OpenXML.

How can I read MS Office files in a server without installing MS Office and without using the Interop Library?

The interop library is slow and needs MS Office installed.
Many times you don't want to install MS Office on servers.
I'd like to use Apache POI, but I'm on .NET.
I need only to extract the text portion of the files, not creating nor "storing information" in Office files.
I need to tell you that I've got a very large document library, and I can't convert it to newer XML files.
I don't want to write a parser for the binaries files.
A library like Apache POI does this for us. Unfortunately, it is only for the Java platform. Maybe I should consider writing this application in Java.
I am still not finding an open source alternative to POI in .NET, I think I'll write my own application in Java.
For all MS Office versions:
You could use the third-party components like TX Text Controls for Word and TMS Flexcel Studio for Excel
For the new Office (2007):
You could do some basic stuff using .net functionality from system.io.packaging. See how at http://msdn.microsoft.com/en-us/library/bb332058.aspx
For the old Office (before 2007):
The old Office formats are now documented: http://www.microsoft.com/interop/docs/officebinaryformats.mspx. If you want to do something really easy you might consider trying it. But be aware that these formats are VERY complex.
Check out the Aspose components. They are designed to mimic the Interop functionality without requiring a full Office install on a server.
As the new docx formats are inherently XML based files, you can create and manipulate them programmatically with standard XML DOM techniques, once you know the structure.
The files are basically zip archives with an alternate file extension. Use the System.IO.Packaging namespace to get access to the internal elements of the file, then open them into a XmlDocument to perform the manipulation.
There are examples available for doing this, and the Office Open XML project on SourceForge may be worth looking at for inspiration.
As for the older binary formats, these were proprietary to MS, and the only way you're likely to get at the content from within is through the Office object model (requires an Office install), or a third party file converter/parser.
Unfortunately there's nothing first party and native to the .NET platform to work with these files.
What do you need to do with those file? If you just want to stream them to the user, then the basic file streams are fine. If you want to create new files (perhaps based on a template) to send to the user that the user can open in Office, there are a variety or work-arounds.
If you're actually keeping data in Office documents for use by your web site, you're doing it wrong. Office documents, even Excel spreadsheets and access databases, are not really an appropriate choice for use with an interactive web site.
If the document is in word 2007 format, you can use the system.io.packaging library to interact with it programatically.
RWendi
In Java world, there is also JExcelApi. It is very clearly written, from what I was able to see, much cleaner then POI. So maybe even a port of that code to .NET is not out of the question, depending of course you have enough of time on your hands.
OpenOffice.
You can program against it and have it do a lot for you, without spending the money on a license for the server, or have the vulnerability associated with it on your server.
Microsoft Excel workbooks can be read using an ODBC driver (or is it an OLE DB driver? can't remember) that makes the workbook look like a database table. But I don't know whether that driver is available without the Office Suite itself.
You can use OpenOffice. It has a command-line conversion tool:
Conversion Howto
In short, you define a macro in OpenOffice and you call that macro with a command-line
argument to OpenOffice. In that argument the name of the local file (the Office file) is
encoded.
It's not a great sollution, but it should be workable.

Categories

Resources