Upload Apache POI XLSX generated files to CodeIgniter website - java

I have to upload XLSX files to a web application. This web application uses CodeIgniter to check and import those files.
Furthermore, I have to generate these XLSX files from another one. In order to read and write XLSX files, I use Apache Poi. This part is pretty easy and works well.
But, here is my problem: when uploading an auto-generated file, CodeIgniter decline the files saying that this file type is not allowed. It's probably a missing property that's not created by Apache POI library but I didn't managed to find which one.
Another 'fun' fact is that when opening an Apache Poi auto-generated file with Microsoft Excel then saving it without any modification the file gains something like 3Ko of data and becomes valid for CodeIgniter. It doesn't work with LibreOffice Calc which apparently adds some data but not the same as Microsoft Excel do.
Do you have any idea of which property or data could be missing? Any method to resolve my problem?
Edit: after some more investigations and according to php finfo_file function (used by CodeIgniter) my bad file has following mime type application/octet-stream while a legit file has following mime type application/vnd.openxmlformats-officedocument.spreadsheetml.sheet. Then, I think that Apache Poi has some bug when generating XLSX.
Edit 2: Finally, there's 2 XLSX type (see enclosed screenshot). Only the second one is recognized as an application/vnd.openxmlformats-officedocument.spreadsheetml.sheet by finfo_file. Unfortunately, Apache Poi generates signature of the first type. Thus, it isn't recognized as a XLSX.

This strange behaviour finally comes from the ambiguity of the XLSX file format. It covers two different file formats (as you can see on the enclosed picture from https://www.filesignatures.net/). Only the second one is recognized as an application/vnd.openxmlformats-officedocument.spreadsheetml.sheet by finfo_file whereas Apache Poi generates files of the first type. Thus, the upload fails saying the file type is not allowed.

Related

Java library to convert "X-Document-Type: Workbook" to Excel

We have some legacy data in .xls (HSSF) format that we are converting to .xlsx (XSSF) format using Apache POI library. It was all working very well till we started seeing many org.apache.poi.poifs.filesystem.NotOLE2FileException. Upon closer examination we realized that the files that are throwing this exception are not actually Excel files (despite the misleading .xls extension) but Single File Web Page files (web archive X-Document-Type: Workbook).
Question) Is there any opensource Java library that converts "X-Document-Type: Workbook" to Excel?
Addendum: Clarification, as sought by #kiwiwings
No the files are not "XML Workbook" format. They are MIME documents with the X-Document-Type: Workbook declaration. Each part is a standard HTML file, with its own table.
The files are given the .xls extension and Excel is able to open them, albeit after issuing the following warning:
The file you are trying to open, 'blah-blah-blah.xls', is in a different format than specified by the file extension. Verify that the file is not corrupted and is from a trusted source before opening the file. Do you want to open the file now?

how do I check a file is a valid xlsx file in java without opening with POI

In java (jdk 1.6) is there a way to check a file is a valid xlsx without opening the entire file with POI or other API. Currently we use Apache POI in the project to open the file - basically we create a new XSSFWorkbook(inputStream) and if that throws an exception it is not a valid xlsx. However we found one xlsx file which is 8MB is taking 1GB memory to open for some reason and actually caused a production outage on our servers. We can not rely on the file extension as someone can take a file which is not xlsx like a php file and rename with xlsx extension. I'm looking for some option which has minimal memory impacts - ideally not opening the file at all.
Its too much of a risk if a single file upload can kill the server but we also still need to validate the file is in fact an xlsx.
If you don't know what your file is at all, use Apache Tika to do the detection - it can detect a huge number of different file formats for you.
Determine MS Excel file type with Apache POI
here are some examples https://www.baeldung.com/apache-tika

Apache POI creates xlsx file which it can't open later. Zip bomb detected

When I create xlsx file with Apache POI sometimes (when the file is big) it creates such a file that can't be opened by this same Apache POI while MS Excel or LibreOffice Calc open it without problems.
When I try to open this workbook with Apache POI it says that
Zip bomb detected
I can open it only if I call ZipSecureFile.setMinInflateRatio(0) or resave it in LibreOffice (MS Excel doesn't help here).
How to fix this? Why POI creates file which it can't open?
Simply do as the error message suggests and set the limits differently via
ZipSecureFile.setMinInflateRatio(0)
You seem to have a rather special use-case which produces a file that is similar to some files that malicious users could use to make your server crash, use up CPU or go out of memory. To avoid this, Apache POI has this limit, but allows to set it differently if needed. So if the file is not coming from untrusted users, you can easily adjust these limits to avoid the error message.
Excel or LibreOffice might optimize the file-content more than Apache POI does and thus produce a file that does not reach these limits.

Export Eclipse NAT Table to CSV/Excel

I am currently working on a project that uses a nat table to display data to a user. I am wanting to add an option to export this nat table to a csv file or an excel document. Is there an easy way to export to excel or must I find a way to do it manually? If I must do it "the hard way" can anyone point me somewhere to help me get started on exporting to Excel?
Thanks.
NatTable itself comes with a default exporter to Excel.
The exporter class is ExcelExporter (in package org.eclipse.nebula.widgets.nattable.export.excel).
An easy way to use this would be with ExportCommand (in package org.eclipse.nebula.widgets.nattable.export.command). With the default bindings, this results in a file chooser in which the user can specify a *.xls file.
ExportCommand cmd = new ExportCommand( m_table.getConfigRegistry(), m_table.getShell());
m_table.doCommand( cmd );
With version NatTable 0.9 on my system, when opening the file, Excel shows a warning that the file is in a different format than specified by the file extension.
NatTable comes with an Excel Exporter based on Apache POI already. You need to add the POI extension from the NatTable project to your project for this.
Doing this gives you the opportunity to use the HSSFExcelExporter which produces a valid Excel file (the default ExcelExporter simply creates a XML format) and comes with additional configuration possibilities.
configRegistry.registerConfigAttribute(ExportConfigAttributes.EXPORTER, new HSSFExcelExporter());
for reading / writing to excel you can use Apache POI http://poi.apache.org/
for how to get started with Apache POI look at http://viralpatel.net/blogs/java-read-write-excel-file-apache-poi/ might help

Any Java api similar to open xml sdk 2.0?

Is there any java api which is similar to open xml sdk 2.0. Just I need to convert open office xml excel file to .xlsx file.
office xml excel file I'm creating by using xml and xslt. I tried apache poi to read xml excel file but getting invalid header format exception.
Thanks.
Well, I believe the best API out there to handle *.xlsx files is Apache POI (it has *.xlsx support since 3.7 or so).
Some alternatives:
There was a project called JExcel API, but there's not much activity there in the last 3 or so years (and I'm not sure if it handles *.xlsx format, only *.xls, but I might be wrong).
I'm not sure, but the OpenOffice UDK might also help you. Unfortunately it is only a binding, and requires an installed implementation (i.e., you have to install OpenOffice in order to use it), which is not always a valid requirement on the server side if you do not have any X servers there.
Another option is something like using it through Jacob via COM. The pro is that you are able to access all ow the data, the con is COM, you need an installed Excel on your machine (and of course, it is a Windows-specific solution).
I believe the best way to stick to Apache POI, it is usually perfectly enough if you just want to read/write cell data.

Categories

Resources