I am using Java Apache poi library. I have to write data in an Excel file in chunks. It is my application scope that I can not write whole data at once to my Excel file so a batch size is fixed and writing data in batches (chunks). I am using following code.
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("sheet");
int rowNum = startIndex;
Row excelRow = sheet.createRow(rowNum++);
int colNum = 1;
// Placing matrix results in rest of rows. Also keywrods in first column.
for(int rowIndex = 0; rowIndex < keywords.size(); rowIndex++) {
excelRow = sheet.createRow(rowNum++);
colNum = 0;
Cell cell = excelRow.createCell(colNum);
cell.setCellValue(keywords.get(rowIndex));
colNum++;
for(int colIndex = 0; colIndex < scoreResults[rowIndex].length; colIndex++) {
cell = excelRow.createCell(colNum);
cell.setCellValue(scoreResults[rowIndex][colIndex]);
colNum++;
}
}
FileOutputStream outputStream = new FileOutputStream(outputExcelFileName,true);
workbook.write(outputStream);
outputStream.close();
workbook.close();
This is written in a function and I have to call that function again and again. If my elements size is same of my batch size then there is no issue. File is created and opened successfully. Problem comes when let say batch size is 10 and my elements are 15. Then 2nd iteration is not happening successfully. Not any getting any error at run time but when I open excel file then MS(2010) report this error:
Excel found unreadable content in 'file_name'. Do you want to recover the contents of this workbook?
If I click on "yes" then it recovers contents of first iteration only. If batch size is 10 then only 10 elements will be recovered. So, issue exist after 1st iteration.
I have spent a lot of time in figuring out this issue but still unable to resolve it. If someone can help I will be thankful to you.
Related
what is the fastest and less memory intensive way to read a portion of very large xlsx file?
Currently I have this code:
FileInputStream fis = null;
fis = new FileInputStream("D:/verylargefile.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(fis);
XSSFSheet sheet = workbook.getSheetAt(0);
int r = sheet.getPhysicalNumberOfRows();
int c = sheet.getRow(1).getLastCellNum();
for (int row = 1; rows < r;row++){
for (int cell = 1; cell < c;cell++){
int cellvalue = (int)sheet.getRow(row).getCell(cell).getNumericCellValue()
//do some simple math op with that cell or several cells
}
}
So I need to do very large number of those simple math operations (for example average of every 5 cells in every row or something simillar) and very fast, with a small part of a very large xlsx file at once. With code above, I am getting heap space error with 10mb xlsx file and 1gb ram dedicated to java vm (-Xms1000M).
Thank you
So my excel file is relatively small in size. It contains 8 sheets. Each sheet has "records" of data which i need to read. Each sheet also has the first row reserved for headers which i skip; so my data will begin from the 2nd row (1st index) of each sheet and end on the last record.
So, below is my code to iterate through the sheets and read each row however it fails to read each sheet. And i can't seem to figure out why. Please have look and any suggestions will be appreciated.
Thanks!
FileInputStream fis = new FileInputStream(new File(filePath));
XSSFWorkbook wb = new XSSFWorkbook(fis);
DataFormatter formatter = new DataFormatter();
//iterate over sheets
for (int i=0; i<NUM_OF_SHEETS; i++) {
sheet = wb.getSheetAt(i);
sheetName = sheet.getSheetName();
//iterate over rows
for (int j=1; j<=lastRow; j++) { //1st row or 0-index of each sheet is reserved for the headings which i do not need.
row = sheet.getRow(j);
if (row!=null) {
data[j-1][0] = sheetName; //1st column or 0th-index of each record in my 2d array is reserved for the sheet's name.
//iterate over cells
for (int k=0; k<NUM_OF_COLUMNS; k++) {
cell = row.getCell(k, XSSFRow.RETURN_BLANK_AS_NULL);
cellValue = formatter.formatCellValue(cell); //convert cell to type String
data[j-1][k+1] = cellValue;
}//end of cell iteration
}
}//end of row iteration
}//end of sheet iteration
wb.close();
fis.close();
At least there is one big logical error. Since you are putting the data of all sheets in one array, this array must be dimensioned like:
String[][] data = new String[lastRow*NUM_OF_SHEETS][NUM_OF_COLUMNS+1];
And then the allocations must be like:
...
data[(j-1)+(i*lastRow)][0] = sheetName; //1st column or 0th-index of each record in my 2d array is reserved for the sheet's name.
...
and
...
data[(j-1)+(i*lastRow)][k+1] = cellValue;
...
With your code, the allocations from second sheet will overwrite the ones from the first sheet, since j starts with 1 for every sheet.
I'm developing an app in java using POI, which writes data into excell sheet.
I want to write a new row when I have a new data (in order to assist the user to follow the data).
I dont want to close and reopen the excell file each time I have a new data to write.
The initilize code is:
FileOutputStream fileOut = new FileOutputStream(new File (excellFileName));
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("FirstSheet");
rowNum = 0;
HSSFRow rowhead = sheet.createRow((short)rowNum);
createRowColumns(rowhead, rowNum);
workbook.write(fileOut);
rowNum++;
Each time I have a new data, I'm using this code:
HSSFRow rowMsg = sheet.createRow((short)rowNum);
createRowColumns(rowMsg, rowNum);
workbook.write(fileOut);
rowNum++;
(createRowColumns mehtod sets the data (in seperate cells in a new: rowMsg)
The problem is that I cant see any rows in the excell file, except the first row (rowhead , row #0).
What Am I missing ?
(Pay attention that I dont want to close and reopen the file each time I have data to write)
Thanks
To write all rows you have to write to worksheet only once. So, you need to update your code as follows.
for(int index = 0; index < rowNum; rowNum++) {
HSSFRow rowMsg = sheet.createRow((short)rowNum);
createRowColumns(rowMsg, rowNum);
}
workbook.write(fileOut);
I need to write a resultset of more than 1,000,000 rows in xls file(Microsoft 97-xp). So I am trying to use JExcel API But I get error OutOfMemoryError: Java heap space . How do I solve this problem besides increasing JVM memory or creating different excel files and then merge them manually. I don't even want to create CSV file. Please help.
Sample Code:
int sheetNumber = 1;
int maxSheetSize=65000;
int start = 0;
int end=maxSheetSize;
int totalsize=1000000;
int completed = 0;
int columnCount=10;
WritableWorkbook workbook = Workbook.createWorkbook(new File("output.xls"));
while (completed < totalsize) {
WritableSheet sheet = workbook.createSheet("Sheet " + (sheetNumber++), 0);
for (int r = start; r < end; r++) {
int i=1;
for (int c = 0; c < columnCount; c++) {
Label label = new Label(c, i, "data from resultset");
sheet.addCell(label);
}
i++;
}
completed += (end - completed);
start = end;
end = (totalsize - completed) < maxSheetSize ? (totalsize - completed) : completed + maxSheetSize;
end = end < maxSheetSize ? totalsize : end;
}
workbook.write();
workbook.close();
That old versions of the excel format only supports 65,536 rows by 256 columns.
"All existing Java APIs try to build the whole document in RAM at once. Try to write an XML file which conforms to the new xslx file format instead. To get you started, I suggest to build a small file in the desired form in Excel and save it. Then open it and examine the structure and replace the parts you want." Aaron Digulla - API to write huge excel files using java
Up until Excel 2003 the max rows is 65,536.
Even if you use later versions, I would recommend that you re-open the worksheet and append say a max number of rows (Maybe 10000). Close, Open and repeat.
According to the SXSSF (Streaming Usermodel API) documentation:
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
However, in the provided example the flush happens before the workbook is given the file location at which to write the file.
public static void main(String[] args) throws Throwable {
Workbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
Sheet sh = wb.createSheet();
for(int rownum = 0; rownum < 1000; rownum++){
Row row = sh.createRow(rownum);
for(int cellnum = 0; cellnum < 10; cellnum++){
Cell cell = row.createCell(cellnum);
String address = new CellReference(cell).formatAsString();
cell.setCellValue(address);
}
}
// Rows with rownum < 900 are flushed and not accessible
for(int rownum = 0; rownum < 900; rownum++){
Assert.assertNull(sh.getRow(rownum));
}
// ther last 100 rows are still in memory
for(int rownum = 900; rownum < 1000; rownum++){
Assert.assertNotNull(sh.getRow(rownum));
}
FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
wb.write(out);
out.close();
}
So this begs the questions:
Where on the file system is it storing the data?
Is it just creating a temp file in the default temp directory?
Is this safe for all / most implementations?
The class that does the buffering is SheetDataWriter in org.apache.poi.xssf.streaming.SXSSFSheet
The magic line you're probably interested in is:
_fd = File.createTempFile("poi-sxxsf-sheet", ".xml");
In terms of is that safe, probably, but not certainly... It's likely worth opening a bug in the poi bugzilla, and requesting it be switched to using org.apache.poi.util.TempFile which allows a bit more control. In general though, as long as you specify a valid property for java.io.tmpdir (or the default is sensible for you) you should be fine