Apache POI - Invalid part to process data - java

I access an Excel spreadsheet using Java Apache POI (hssf). I got the following error :
java.lang.RuntimeException: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.testng.internal.MethodInvocationHelper.invokeDataProvider(MethodInvocationHelper.java:143)
at org.testng.internal.Parameters.handleParameters(Parameters.java:426)
What am I missing ?

To open an xlsx (Office Open XML) file, you should use XSSFWorkbook instead of HSSFWorkbook, which is used for xls (Excel 97-2003) files.
If you are using POI < 3.5, you need to upgrade to at least version 3.5, in order to be able to read xlsx files. Here's a guide for doing the conversion, but essentially, you'd need to load the file using WorkbookFactory which takes care of the creation of either an XSSFWorkbook or HSSFWorkbook for you:
Workbook workbook = WorkbookFactory.create(new File("file.xlsx"))

Make sure the excel sheet is not corrupt, by opening it. If you seen any error, save the file as MS Excel 97-2003 Worksheet.
Also make sure you have specified the filename as "**.xls"

Related

What is the difference between poi and poi-ooxml

I'm learning about data driven testing using Selenium and Excel. I'm taking an online course that has asked used to add the Apache poi and poi-ooxml dependencies in Maven.
I'm struggling to understand what the differences between the two are. Are both required in order to retrieve data in Excel and pass these to our tests?
Thanks
Excel files has long history
Excel 97-2003 workbook:
This is a legacy Excel file that follows a binary file format. The file extension of the format is .xls.
Excel 97-2003 in terms of apache poi is called - Horrible Spreadsheet Format As the Excel file format is complex and contains a number of tricky characteristics,
apache-poi jar has code to handle these file
Excel 2007+ workbook:
This is the default XML-based file format for Excel 2007 and later versions. It follows the Office Open XML (OOXML) format, which is a zipped, XML-based file format developed by Microsoft for representing office documents. The file extension of the format is .xlsx. ( DOCX,PPTX are other OOXML based examples).
Excel 2007+ workbook in terms of apache poi is called - XML Spreadsheet Format -these file format are advanced version of HSSF and has additional features, code to handle these files are written in apache-poi-ooxml jar
More reading
As .xls is almost dead but still some applications use it, so for backward compatibility both dependencies are required.
here is what Apache have to say -
HSSF Excel XLS poi For HSSF only, if common SS is needed see below
Common SS Excel XLS and XLSX poi-ooxml WorkbookFactory and friends
all require poi-ooxml, not just core poi
you can read more at their official website http://poi.apache.org/components/index.html#components

Convert excel file from XLSX to XLSB in Java

I am using Java + Apache POI to generate an excel file (format xlsx/xls). Due to huge data the generated excel file (format xlsx/xls) takes more size (~50 MB).
So I am trying to convert xlsx/xls to xlsb to reduce the size of the excel file.
Here is the problem: Apache POI does not support writing to xlsb file. So what I am trying is:
Generate a xlsx/xls file using Apache POI.
After that convert generated xlsx/xls to xlsb. For this I am using
SmartXls Java library. But this library is not open source. You need to
buy license for this. Also I checked EasyXls Java library, but it also requires buying a license.
Below code is used to convert xlsx/xls to xlsb using SmartXls Java library:
WorkBook wb = new WorkBook();
wb.readXLSX(.../xlsxPath);
wb.writeXLSB(new java.io.FileOutputStream(.../xlsbPath));
wb.dispose();
Kindly someone help me whether any open source library is available for it or is there any other way to do this conversion in Java.
Thanks in advance.

Java Apache Poi SXSSF with Excel Macros

I have a excel template that has macros (.xlsm), I want to read it in, add a million rows to it and write it out.
I know that reading and re-writing files with POI that contain macros will preserve the macros. I need to write out the excel using SXSSF (ram limitations), but SXSSF doesn't read files.
Question: How can I read in an excel with macros using XSSF, and then write out the excel with macros using SXSSF?
Apache POI supports writing a spreadsheet with a large number of rows via SXSSFWorkbook based on a "template workbook". See the relevant constructor for details.
So you would open the .xlsm via XSSFWorkbook and then create the SXSSFWorkbook with that as template.
This should also keep the macros in place as far as I see.

Is there any way to generate Xlsx (2007) using xmlss?

When saving an excel sheet (xls - 2330) in .xml form, the xmlss file will contain the respective xsl code. But in xlsx i can't save the file with .xml format.
Also is there any specific Apache POI API availabe to convert .xls (2003) to .xlsx(2007) format?
Since the POI creates the xmlss file to create the xlsx, is there any simpler option?
related links:
http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx#odc_xmlss_x:header
Greetings in advance.

Memory issues during conversion of large volume of XLSX file to CSV with POI

This is a very challenging task to me as i am doing pretty much R&D to get rid of OutOfMemroyError during conversion of XLSX to CSV and my excel file can have three sheets and each sheet with 60000 rows.
I used XSSF and SAX (Event API) recently since this approach consumes very less memory. However the Event API is triggering events only for things actually stored within the file and this can be cause for me.
Earlier to this Event API approach, i used Workbook class to process XLSX file and eventually i am getting out of memory during this workbook creation provided below.
Workbook workbook = WorkbookFactory.create(new File("myfile.xlsx"));
so, what is the best way to process large volume of XLSX data with apache POI?
Here is an example for reading a large xls file using sax parser. Sax parser will help you avoid OOM exceptions.
Error While Reading Large Excel Files (xlsx) Via Apache POI

Categories

Resources