processing growing excel file with java spring - java

I want to process an excel file with java spring. I am using apche poi to process the file. The excel file is auto generated and keeps growing. Example: Excel file has 20 lines on day 1. On day 2 the excel file has 35 lines. The first 20 lines are the same, but there are 15 new lines. It is unknown how many lines are added or when the excel will be uploaded.
The data from the excel is mapped to POJOs and saved to the database.
Is there a fast and reliable way to identify which new lines were added and only proccess those lines?
edit: I realised that this might not be an excel processing problem but (also) a database optimisation problem.

You can use the newer API of Apache POI, SXSSF, which an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. It consumes less memory. Check this link.

Related

What is the best way to export upto 1 million records into excel sheet via REST call?

I have a table that contains millions of data and whenever user wants to export the data into an excel sheet it's taking more time when the respective table contains huge data which leads to a longer time to wait for the user.
Here data exported by using java Apache POI.
My question here is that Is there any best way to export the entire data in a shorter time than expected?
Do we have any optimization techniques by using multi-threading, streaming, process and download later kind of thing.
Appreciated for any suggestions.
What workbook are you using in export HSSF Workbook or SXSSF Workbook
If not SXSSF, try once.
You can get more info here.

Writing CSV files causing JVM crash

In the existing project user can download report and it has 10 million records, this process gets data from database and writes to csv by using super csv java api then sends an email to user by attaching, it takes huge heap space to hold 10 million java objects and writing these records to csv files, because of this server is crashing and going down as application has many reports like this. is there any better way to handle this.? I red sxssfworkbook documentation and it says specified records count can keep in memory and remaining records will be pushed to hard disk but this is using to create excel files. is there any similar api to create csv files or sxssfworkbook can be used to create csv files.?
There are few Java libraries for reading and writing CSV files. They typically support "streaming", so they do not have the problem of needing to hold the source data or the generated CSV in memory.
The Apache Commons CSV library would be a good place to start. Here is the User Guide. It supports various of flavors of CSV file, including the CSV formats generated by Microsoft Excel.
However, I would suggest that sending a CVS file containing 10 million records (say 1GB uncompressed data) is not going to make you popular with the people who run your users' email servers! Files that size should be made available via an web or file transfer service.

POI XSSF and SXSSF

I am writing an application to use JAVA POI to read over 65536 records from oracle table and write to EXCEL .XLSX via JAVA POI library XSSF and SXSSF.
However, the error "out of memory" pops up. We have tried the following ways but the problem still occurs.
Load 66000 records into memory and write to .XLSX file
a. Larger heap size “Java.exe –Xmx4096m –Xmx4096m –XX:MaxPermSize=256m” is applied
Result : It takes about an hour to run and the file cannot be created.
Process 66000 records by region
a. load each region records’ of a season from oracle table into memory
b. append each region records from memory to a single .XLSX file
c. Larger heap size Java.exe –Xmx4096m –Xmx4096m –XX:MaxPermSize=256m is applied
Result : When it processes about 30000 records, the java memory error still pops up.
Except rewriting the program by using XML, is there any chance of fixing the memory error issue of JAVA POI XSSF and SXSSF ?
Please kindly advise

Write OutputStream To Excel in chunks

I have a requirement where i need to generate reports of around millions of records and later export it to Excel. am using jasper reports for the purpose. Am able to preview the report. But when i try to export the report to Excel, getting Heap Memory error. But am able to get the entire dataset in the ByteArrayOutputStream object. After a long Googling came to a conclusion that it is common if the data is so huge. So i decided to write the data to excel chunk by chunk. But how to achieve that? How to divide the data and how to write data to excel part by part so that the final output file is a consolidated one. Please suggest.. Thanks in advance.
Have you tried to use "poi" instead of jasper reports?
Poi is a library for Microsoft Documents by Apache. Link: http://poi.apache.org/
In this library there is a class called "SXSSFWorkbook" that uses streaming technology. Read the javadoc for more information: http://poi.apache.org/apidocs/index.html

Reading large xls files(having 7-8 sheets with each having 30-40,000 rows ) in java

Is there a way to read large xls files?
I have used Apache POI to read files but only till some limit.
I have a database which has some data and now I want to upload a file of large size (like i told). This file contains little more data in one of its sheets as compare to the data that database has. now how should i update that data in oracle database without setting up the xls file.
For large xls files you should use the streaming extension of XSSF, called SXSSF. It should be able to handle your requirements without memory problems.
As for the database problem: I'd suggest you read the xls files using XSSF and create temporary tables for each file in the database (depending on your needs). Once all the rows are stored in temp tables you can easily merge them with your existing data.

Categories

Resources