I have been using the apache POI library to create XLS files but find it limited both in the fact that is follows the Microsoft Excel spec and also in the way that it requires you to have your entire Workbook loaded in memory if you want to append to it.
Are there any other libraries or techniques (save hand coding the XLS XML itself) that allow you to stream the contents of an XLS document, row by row, with out needing more then the full workbook loaded in memory?
Related
I have a small java application, that takes two spreadsheets of miscellaneous data and creates one giant workbook with multiple sheets and organized data/equations etc, all in a nice, formatted workbook. I have created this in Eclipse using Apache POI.
I want to store a sheet in the actual program that is the "Main Overview Documentation" - Essentially a guide on how to read the rest of the sheets.
This overview will never change, so instead of writing/coding it in the program, I'd like to save the sheet in my actual source folder (if that's how I'm supposed to do it - I'd like to save in the application itself) - and attach the sheet to the beginning of my final workbook.
I am struggling even finding good searches to tell me how I do this.
Edit: Essentially I want to store a spreadsheet in the program that will not be edited, and in my application add that spreadsheet to my workbook that I have created
My suggestion would be to make a copy of your starter Excel document, then append the new sheets to that document.
This method for editing an existing file may be useful. You'd just need to first clone the template document.
Updating existing Excel file in Java Apache POI
I have a excel template that has macros (.xlsm), I want to read it in, add a million rows to it and write it out.
I know that reading and re-writing files with POI that contain macros will preserve the macros. I need to write out the excel using SXSSF (ram limitations), but SXSSF doesn't read files.
Question: How can I read in an excel with macros using XSSF, and then write out the excel with macros using SXSSF?
Apache POI supports writing a spreadsheet with a large number of rows via SXSSFWorkbook based on a "template workbook". See the relevant constructor for details.
So you would open the .xlsm via XSSFWorkbook and then create the SXSSFWorkbook with that as template.
This should also keep the macros in place as far as I see.
I have two excel files. There is some common data in both the files. I want to get a new excel file which only has non redundant data. I am reading data from excel files using apace POI (Java). Any help on the algorithm that can be used to remove the redundant data (since there are more than 10,000 rows in each excel file). Thanks.
copy all the data in HashSet(). and write it in another file.
This is a very challenging task to me as i am doing pretty much R&D to get rid of OutOfMemroyError during conversion of XLSX to CSV and my excel file can have three sheets and each sheet with 60000 rows.
I used XSSF and SAX (Event API) recently since this approach consumes very less memory. However the Event API is triggering events only for things actually stored within the file and this can be cause for me.
Earlier to this Event API approach, i used Workbook class to process XLSX file and eventually i am getting out of memory during this workbook creation provided below.
Workbook workbook = WorkbookFactory.create(new File("myfile.xlsx"));
so, what is the best way to process large volume of XLSX data with apache POI?
Here is an example for reading a large xls file using sax parser. Sax parser will help you avoid OOM exceptions.
Error While Reading Large Excel Files (xlsx) Via Apache POI
Is there a way to read large xls files?
I have used Apache POI to read files but only till some limit.
I have a database which has some data and now I want to upload a file of large size (like i told). This file contains little more data in one of its sheets as compare to the data that database has. now how should i update that data in oracle database without setting up the xls file.
For large xls files you should use the streaming extension of XSSF, called SXSSF. It should be able to handle your requirements without memory problems.
As for the database problem: I'd suggest you read the xls files using XSSF and create temporary tables for each file in the database (depending on your needs). Once all the rows are stored in temp tables you can easily merge them with your existing data.