Performance : Writing oracle ResultSet into XLSX using Java, Apache-POI

Performance : Writing oracle ResultSet into XLSX using Java, Apache-POI - java

I need to write 600-700k records into xlsx file using Apache POI.
the code I am presently using is :
public void writeRecords(ResultSet rs) {
try{
SXSSFWorkbook wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true);
SXSSFSheet sh = (SXSSFSheet)wb.createSheet("Sheet 1");
Row row = null;
int numColumns = rs.getMetaData().getColumnCount();
// Workbook wb = ExcelFileUtil.createExcelWorkBook(true, 5);
sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk
Row heading = sh.createRow(1);
ResultSetMetaData rsmd = rs.getMetaData();
for(int x = 0; x < numColumns; x++) {
Cell cell = heading.createCell(x+1);
cell.setCellValue(rsmd.getColumnLabel(x+1));
}
int rowNumber = 2;
int sheetNumber = 0;
while(rs.next()) {
row = sh.createRow(rowNumber);
for(int y = 0; y < numColumns; y++) {
row.createCell(y+1).setCellValue(rs.getString(y+1));
// wb.write(bos);
}
rowNumber++;
}
FileOutputStream out = new FileOutputStream("C:/Users/test1.xlsx");
wb.write(out);
out.close();
}
catch (Exception e){
e.printStackTrace();
}
It is working fine but it is taking ~50 minutes to write ~65k records.
Resultset of 65k records was fetched in 5-6 minutes.
Is there any way we can write 600,000-700,000 records in about 10-15 minutes
using POI.
We wont be able to export data into CSV format, as the endusers have setups to import xlsx files only.
regards,
Tushar

Check the fetchSize of the PreparedStatement. If it isn't explicitly set, the value may be very small compared with the reality of the table, and the speed of queries on medium-large amounts of data sees very affected.
Check this question for more information.
Also, consider if it's necessary to use setCompressTempFiles, or SXSSFWorkbook at all. If is needed, the value of rows keeps in memory will impact performance, in a directly proportional way.

it would be very fast if your able write file output form sqlplus .
create file as below mycsv.sql:
SET DEFINE OFF
SET ECHO OFF
SET SERVEROUTPUT OFF
SET TERMOUT OFF
SET VERIFY OFF
SET FEEDBACK OFF
SET PAGESIZE 10000
SET ARRAYSIZE 5000
REM SET HEAD OFF
SET LINE 500
spool /tmp/mycsvfile.csv;
select * from MY_table;
spool off;
exit;
and from Linux prompt you can run like
$> sqlplus username/password #/tmp/mycsv.sql

Related

Issue while writing cells in excel file using apache POI

I am facing issue while writing the data to excel file.
I am using apache POI 4.1.2 version library.
Below is the sample code.
try {
outputStream = new FileOutputStream(EXCEL_FILE_PATH);
} catch (Exception e) {
System.out.println("Exception While writing excel file " + e.getMessage());
}
Workbook workbook = new HSSFWorkbook();
Sheet sheet = workbook.createSheet("FileCompare");
Row row = sheet.createRow(0);
Cell cellfilename = row.createCell(0);
cellfilename.setCellValue("File Name");
Cell cellfilename1 = row.createCell(1);
cellfilename1.setCellValue("Difference in File 1");
Cell cellfilenam2 = row.createCell(2);
cellfilenam2.setCellValue("Difference in File 2");
for (int diffcol = 1; diffcol < 3; diffcol++) {
for (int i = 1; i < 57; i++) {
Row rows = sheet.createRow(i);
// System.out.println("Difference Coln number " + diffcol);
Cell diffcell = rows.createCell(diffcol);
diffcell.setCellValue("abacds");
/*
* Cell diffcell2 = row.createCell(2); diffcell2.setCellValue("abacds");
*/
}
}
try {
workbook.write(outputStream);
} catch (Exception e) {
e.printStackTrace();
} finally {
outputStream.flush();
outputStream.close();
workbook.close();
}
In this only last column cells is getting saved in excel file , previous cells are kept as blank.
Kindly help and let me know if I am doing something wrong?

Not sure about the actual api but I think you inner loop should create columns and your outer one should create rows like this
for (int row=1:row<57;row++)
{
Row rows = sheet.createRow(row);
for (int diffCol = 1; diffCol < 3; difCol++)
{
Cell diffcell = rows.createCell(diffcol);
diffcell.setCellValue("abacds");
}
}

The problem is that inside your loop you're always using sheet.createRow(i) to retrieve the row you need, but as the docs says (docs that are written not so clear, actually) this method is always recreating a brand new & empty row, deleting the existing one (if a row at that i-position was already present).
It means that each iteration of yuor loop is actually deleting previous rows creating brand new rows: at the end only rows created by the last iteration are surviving!
To solve you're problem, use sheet.createRow(i) only one time in order to create the row at i-position and then only use sheet.getRow(i) to retreive it.
So replace (in your code) the following wrong line
Row rows = sheet.createRow(i);
with the following code
Row row = sheet.getRow(i);
if (row == null) row = sheet.createRow(i);
where new row is created only if it does not already exist!
And you're done, it works like a charm!

Repeatedly writing in same Excel file with Apache POI

I am using Java Apache poi library. I have to write data in an Excel file in chunks. It is my application scope that I can not write whole data at once to my Excel file so a batch size is fixed and writing data in batches (chunks). I am using following code.
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("sheet");
int rowNum = startIndex;
Row excelRow = sheet.createRow(rowNum++);
int colNum = 1;
// Placing matrix results in rest of rows. Also keywrods in first column.
for(int rowIndex = 0; rowIndex < keywords.size(); rowIndex++) {
excelRow = sheet.createRow(rowNum++);
colNum = 0;
Cell cell = excelRow.createCell(colNum);
cell.setCellValue(keywords.get(rowIndex));
colNum++;
for(int colIndex = 0; colIndex < scoreResults[rowIndex].length; colIndex++) {
cell = excelRow.createCell(colNum);
cell.setCellValue(scoreResults[rowIndex][colIndex]);
colNum++;
}
}
FileOutputStream outputStream = new FileOutputStream(outputExcelFileName,true);
workbook.write(outputStream);
outputStream.close();
workbook.close();
This is written in a function and I have to call that function again and again. If my elements size is same of my batch size then there is no issue. File is created and opened successfully. Problem comes when let say batch size is 10 and my elements are 15. Then 2nd iteration is not happening successfully. Not any getting any error at run time but when I open excel file then MS(2010) report this error:
Excel found unreadable content in 'file_name'. Do you want to recover the contents of this workbook?
If I click on "yes" then it recovers contents of first iteration only. If batch size is 10 then only 10 elements will be recovered. So, issue exist after 1st iteration.
I have spent a lot of time in figuring out this issue but still unable to resolve it. If someone can help I will be thankful to you.

java read part of large xlsx file

what is the fastest and less memory intensive way to read a portion of very large xlsx file?
Currently I have this code:
FileInputStream fis = null;
fis = new FileInputStream("D:/verylargefile.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(fis);
XSSFSheet sheet = workbook.getSheetAt(0);
int r = sheet.getPhysicalNumberOfRows();
int c = sheet.getRow(1).getLastCellNum();
for (int row = 1; rows < r;row++){
for (int cell = 1; cell < c;cell++){
int cellvalue = (int)sheet.getRow(row).getCell(cell).getNumericCellValue()
//do some simple math op with that cell or several cells
}
}
So I need to do very large number of those simple math operations (for example average of every 5 cells in every row or something simillar) and very fast, with a small part of a very large xlsx file at once. With code above, I am getting heap space error with 10mb xlsx file and 1gb ram dedicated to java vm (-Xms1000M).
Thank you

OutofMemoryError:Java heap space when writing large no. of rows to xls file

I need to write a resultset of more than 1,000,000 rows in xls file(Microsoft 97-xp). So I am trying to use JExcel API But I get error OutOfMemoryError: Java heap space . How do I solve this problem besides increasing JVM memory or creating different excel files and then merge them manually. I don't even want to create CSV file. Please help.
Sample Code:
int sheetNumber = 1;
int maxSheetSize=65000;
int start = 0;
int end=maxSheetSize;
int totalsize=1000000;
int completed = 0;
int columnCount=10;
WritableWorkbook workbook = Workbook.createWorkbook(new File("output.xls"));
while (completed < totalsize) {
WritableSheet sheet = workbook.createSheet("Sheet " + (sheetNumber++), 0);
for (int r = start; r < end; r++) {
int i=1;
for (int c = 0; c < columnCount; c++) {
Label label = new Label(c, i, "data from resultset");
sheet.addCell(label);
}
i++;
}
completed += (end - completed);
start = end;
end = (totalsize - completed) < maxSheetSize ? (totalsize - completed) : completed + maxSheetSize;
end = end < maxSheetSize ? totalsize : end;
}
workbook.write();
workbook.close();

That old versions of the excel format only supports 65,536 rows by 256 columns.
"All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want." Aaron Digulla - API to write huge excel files using java

Up until Excel 2003 the max rows is 65,536.
Even if you use later versions, I would recommend that you re-open the worksheet and append say a max number of rows (Maybe 10000). Close, Open and repeat.

SXSSF: to where does it flush rows not in the window prior to output to file?

According to the SXSSF (Streaming Usermodel API) documentation:
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
However, in the provided example the flush happens before the workbook is given the file location at which to write the file.
public static void main(String[] args) throws Throwable {
Workbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
wb.write(out);
out.close();
}
So this begs the questions:
Where on the file system is it storing the data?
Is it just creating a temp file in the default temp directory?
Is this safe for all / most implementations?

The class that does the buffering is SheetDataWriter in org.apache.poi.xssf.streaming.SXSSFSheet
The magic line you're probably interested in is:
_fd = File.createTempFile("poi-sxxsf-sheet", ".xml");
In terms of is that safe, probably, but not certainly... It's likely worth opening a bug in the poi bugzilla, and requesting it be switched to using org.apache.poi.util.TempFile which allows a bit more control. In general though, as long as you specify a valid property for java.io.tmpdir (or the default is sensible for you) you should be fine

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Performance : Writing oracle ResultSet into XLSX using Java, Apache-POI - java

Related

Issue while writing cells in excel file using apache POI

Repeatedly writing in same Excel file with Apache POI

java read part of large xlsx file

OutofMemoryError:Java heap space when writing large no. of rows to xls file

SXSSF: to where does it flush rows not in the window prior to output to file?

Categories

Resources