Maybe "writing" wasn't the correct word since in this function, I am just setting the cells and then writing afterwards.
I have a function that I have pin pointed to be the cause of it bogging down. When it gets to this function, it spends over 10 minutes here before I just terminate it.
This is the function that I am passing an output_wb to:
private static void buildRowsByListOfRows(int sheetNumber, ArrayList<Row> sheet, Workbook wb) {
Sheet worksheet = wb.getSheetAt(sheetNumber);
int lastRow;
Row row;
String cell_value;
Cell cell;
int x = 0;
System.out.println("Size of array list: " + sheet.size());
for (Row my_row : sheet) {
try {
lastRow = worksheet.getLastRowNum();
row = worksheet.createRow(++lastRow);
for (int i = 0; i < my_row.getLastCellNum(); i++) {
cell_value = getCellContentAsString(my_row.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK));
cell = row.createCell(i);
cell.setCellValue(cell_value);
System.out.println("setting row #: " + x + "with value =>" + cell_value);
}
x++;
} catch (Exception e) {
System.out.println("SOMETHING WENT WRONG");
System.out.println(e);
}
}
}
The size of the ArrayList is 73,835. It starts off running pretty fast then it gets to around row 20,000 and it then you can see the print statements in the loop getting spread out further and further apart. Each row has 70 columns.
Is this function really written that poorly or is something else going on?
What can I do to optimize this?
I create the output workbook like this if this matters:
// Create output file with the required sheets
createOutputXLSFile(output_filename_path);
XSSFWorkbook output_wb = new XSSFWorkbook(new FileInputStream(output_filename_path));
And the createOutputXLSFile() looks like this:
private static void createOutputXLSFile(String output_filename_path) throws FileNotFoundException {
try {
// Directory path where the xls file will be created
// Create object of FileOutputStream
FileOutputStream fout = new FileOutputStream(output_filename_path);
XSSFWorkbook wb = new XSSFWorkbook();
wb.createSheet("Removed records");
wb.createSheet("Added records");
wb.createSheet("Updated records");
// Build the Excel File
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
wb.write(outputStream);
outputStream.writeTo(fout);
outputStream.close();
fout.close();
wb.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private static String getCellContentAsString(Cell cell) {
DataFormatter fmt = new DataFormatter();
String data = null;
if (cell.getCellType() == CellType.STRING) {
data = String.valueOf(cell.getStringCellValue());
} else if (cell.getCellType() == CellType.NUMERIC) {
data = String.valueOf(fmt.formatCellValue(cell));
} else if (cell.getCellType() == CellType.BOOLEAN) {
data = String.valueOf(fmt.formatCellValue(cell));
} else if (cell.getCellType() == CellType.ERROR) {
data = String.valueOf(cell.getErrorCellValue());
} else if (cell.getCellType() == CellType.BLANK) {
data = String.valueOf(cell.getStringCellValue());
} else if (cell.getCellType() == CellType._NONE) {
data = String.valueOf(cell.getStringCellValue());
}
return data;
}
Update #1- Seems to be happening here. If I comment out all 3 lines then it finishes:
cell_value = getCellContentAsString(my_row.getCell(i, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK));
cell = row.createCell(i);
cell.setCellValue(cell_value);
Update #2 - If I comment out these two lines, then the loop finishes as expected:
cell = row.createCell(i); // The problem
cell.setCellValue(cell_value);
So now I know the problem is the row.createCell(i) but why? How can I optimize this?
I finally managed to resolve this issue. Turns out that using XSSF to write is just too slow if the files are large. So I converted the XSSF output workbook to an SXSSFWorkbook. To do that I just passed in my already existing XSSFWorkbook into SXSSFWorkbook like this :
// Create output file with the required sheets
createOutputXLSFile(output_filename_path);
XSSFWorkbook output_wb_temp = new XSSFWorkbook(new FileInputStream(output_filename_path));
SXSSFWorkbook output_wb = new SXSSFWorkbook(output_wb_temp);
The rest of the code works as is.
I am trying to read in each row that has data in the first cell into an ArrayList of Objects. My problem is that my code doesn't seem to be incrementing my counter past the first row. Am I missing something simple?
Code
try
{
wb = new XSSFWorkbook(new FileInputStream(fileName));
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
XSSFSheet sheet = wb.getSheetAt(2);
ArrayList<Object> obj = new ArrayList<Object>();
int rowIndex = 0;
int cellIndex = 0;
XSSFRow row = sheet.getRow(rowIndex);
Iterator<Cell> rowItr = row.iterator();
while(rowIndex <= sheet.getLastRowNum())
{
if(row.getCell(0) == null)
{
continue;
}
else
{
while(rowItr.hasNext() && rowItr.next() != null)
{
XSSFCell cell = row.getCell(cellIndex);
if(cell == null)
{
continue;
}
else
{
obj.add(row.getCell(cellIndex).toString());
}
cellIndex++;
}
rowIndex++;
cellIndex = 0;
}
System.out.println(obj.toString());
}
rowIndex++;
}
}
Output
[ValuSmart Series 1120 Double Hung]
... I get this output 72 times since there are 72 rows in the sheet
Isolated Loop
ArrayList<Object> obj = new ArrayList<Object>();
int rowCounter = 16;
int x = 0;
while(rowCounter <= 21)
{
XSSFRow row = sheet.getRow(rowCounter);
Iterator<Cell> rowItr = row.iterator();
while(rowItr.hasNext() && rowItr.next() != null)
{
XSSFCell cell = row.getCell(x);
if(cell == null)
{
continue;
}
else
{
obj.add(row.getCell(x).toString());
}
x++;
}
rowCounter++;
x = 0;
}
System.out.println(obj.toString());
You're not select the next row anywhere, and your loops are confusing and switch between index- and iterator-based lookups. Try a simple enhanced for loop:
for (Row row : sheet) {
for (Cell cell : row) {
if (cell != null) {
obj.add(row.getCell(x).toString());
}
}
}
System.out.println(obj.toString());
I am coding a program to format the contents of an excel file. Eclipse is saying that the line Cell cell = cellIterator.next(); is unreachable and I don't understand why. Where did I go wrong?
private String formatExcel(File excel)
{
this.statusLabel.setText("formatting...");
try
{
FileInputStream file = new FileInputStream(excel);
try
{
this.workbook = WorkbookFactory.create(file);
}
catch (InvalidFormatException ex)
{
file.close();
}
int excelType = 0;
if ((this.workbook instanceof HSSFWorkbook)) {
excelType = 1;
}
int sheetNum = 0;
try
{
sheetNum = Integer.parseInt(this.sheetNumber.getText());
}
catch (NumberFormatException e)
{
file.close();
}
if ((sheetNum < 1) || (sheetNum > this.workbook.getNumberOfSheets()))
{
file.close();
return "Please input a valid sheet number.";
}
Sheet sheet = this.workbook.getSheetAt(sheetNum - 1);
sheet.setZoom(17, 20);
Iterator<Row> rowIterator = sheet.iterator();
int startRow = Integer.MAX_VALUE;
Iterator<Cell> cellIterator;
for (; rowIterator.hasNext(); cellIterator.hasNext())
{
Row row = (Row)rowIterator.next();
cellIterator = row.cellIterator();
continue;
Cell cell = (Cell)cellIterator.next(); // <- this line is unreachable
switch (cell.getCellType())
{
case 4:
break;
case 0:
break;
case 1:
if (cell.getStringCellValue().trim().equalsIgnoreCase("Condition Code")) {
startRow = cell.getRowIndex();
}
if ((cell.getRowIndex() > startRow + 1) && (cell.getColumnIndex() > 0) && (cell.getColumnIndex() < 5)) {
if (excelType == 0) {
cell.setCellValue(formatCellXSSF(
cell.getStringCellValue(),
cell.getColumnIndex()));
} else {
cell.setCellValue(formatCellHSSF(
cell.getStringCellValue(),
cell.getColumnIndex()));
}
}
if (!cell.getStringCellValue().trim().equalsIgnoreCase("<<Test Data>>")) {
if (!cell.getStringCellValue().trim().equalsIgnoreCase("<<Screenshots>>")) {
break;
}
}
break;
}
}
sheet.autoSizeColumn(5);
file.close();
FileOutputStream out = new FileOutputStream(excel);
this.workbook.write(out);
out.close();
return "";
}
catch (FileNotFoundException ex)
{
return "Error. File is open. Please close it first.";
}
catch (IOException ex) {}
return "Cannot format file because it is open. Please close it first.";
}
You have an unconditionnal continue in your for loop. Next statements are never executed, no way.
for (; rowIterator.hasNext(); cellIterator.hasNext())
{
Row row = (Row)rowIterator.next();
cellIterator = row.cellIterator();
continue;
Cell cell = (Cell)cellIterator.next();
I have a problem with getting data from this function when I call it twice. The function returns an arrayList of all rows fetched from an excel sheet. When I call the function the first time I get the correct amount of rows (all rows except the headline row and the row with exit). The second time I call the function I get 0.
It seems that something happens with file or the sheets created the second time, here is the code:
private static List<String[]> getDataFromXLS(String excelPath) {
FileInputStream fis;
Workbook workbook; Sheet sheet; XSSFRow row;
Iterator<Row> rows;
XSSFCell cell;
List<String[]> allExcelRows = new ArrayList<String[]>();
String[] xlsRow;
columnNames = new LinkedHashMap<Integer, String>();
paramNames = new LinkedHashMap<String, Integer>();
int totalColumnCount = 0;
int rowNumber = 1;
try {
fis = new FileInputStream(new File(excelPath));
workbook = WorkbookFactory.create(fis);
sheet = workbook.getSheet("TestData");
rows = sheet.rowIterator();
while (rows.hasNext()) {
row = ((XSSFRow) rows.next());
if (rowNumber == 1) {
//based on amount of parameters on first xls row
totalColumnCount = row.getLastCellNum();
}
xlsRow = new String[totalColumnCount];
//check which column is TestType
//iterate through all the columns
for (int columnNumber=0; columnNumber<totalColumnCount; columnNumber++) {
cell = row.getCell(columnNumber, Row.CREATE_NULL_AS_BLANK);
if (getCellValue(cell).trim().toLowerCase().trim().equals("testtype") ){
testTypeColumnIndex = columnNumber; //this is Testtype index
break;
}
}
if (rowNumber != 1) {
for(int columnNumber=0; columnNumber<totalColumnCount; columnNumber++) {
cell = row.getCell(columnNumber, Row.CREATE_NULL_AS_BLANK);
//read only rows before exit
if (columnNumber == testTypeColumnIndex && getCellValue(cell).trim().toLowerCase().trim().equals("exit") ){
reachedExit = true;
break;
}
xlsRow[columnNumber] = getCellValue(cell).trim();
}
//reached exit?
if (reachedExit) {
break;
}
allExcelRows.add(xlsRow);
} else {
//save column names into map
for(int columnNumber=0; columnNumber<totalColumnCount; columnNumber++) {
cell = row.getCell(columnNumber, Row.CREATE_NULL_AS_BLANK);
columnNames.put(columnNumber, getCellValue(cell).trim());
paramNames.put(getCellValue(cell).trim(), columnNumber);
}
}
rowNumber++;
}
} catch (Exception e) {
e.printStackTrace();
}
fis.close();
return allExcelRows;
}
Am taking a bit of a guess here but I think the problem is that the reachedExit class level boolean is not reset at the start of the method. Hence when you call it the second time this code block executes:
//reached exit?
if (reachedExit) {
break;
}
....meaning that nothing gets added to allExcelRows
I'm using the Apache POi HSSF library to import info into my application. The problem is that the files have some extra/empty rows that need to be removed first before parsing.
There's not a HSSFSheet.removeRow( int rowNum ) method. Only removeRow( HSSFRow row ). The problem with this it that empty rows can't be removed. For example:
sheet.removeRow( sheet.getRow(rowNum) );
gives a NullPointerException on empty rows because getRow() returns null.
Also, as I read on forums, removeRow() only erases the cell contents but the row is still there as an empty row.
Is there a way of removing rows (empty or not) without creating a whole new sheet without the rows that I want to remove?
/**
* Remove a row by its index
* #param sheet a Excel sheet
* #param rowIndex a 0 based index of removing row
*/
public static void removeRow(HSSFSheet sheet, int rowIndex) {
int lastRowNum=sheet.getLastRowNum();
if(rowIndex>=0&&rowIndex<lastRowNum){
sheet.shiftRows(rowIndex+1,lastRowNum, -1);
}
if(rowIndex==lastRowNum){
HSSFRow removingRow=sheet.getRow(rowIndex);
if(removingRow!=null){
sheet.removeRow(removingRow);
}
}
}
I know, this is a 3 year old question, but I had to solve the same problem recently, and I had to do it in C#. And here is the function I'm using with NPOI, .Net 4.0
public static void DeleteRow(this ISheet sheet, IRow row)
{
sheet.RemoveRow(row); // this only deletes all the cell values
int rowIndex = row.RowNum;
int lastRowNum = sheet.LastRowNum;
if (rowIndex >= 0 && rowIndex < lastRowNum)
{
sheet.ShiftRows(rowIndex + 1, lastRowNum, -1);
}
}
Something along the lines of
int newrownum=0;
for (int i=0; i<=sheet.getLastRowNum(); i++) {
HSSFRow row=sheet.getRow(i);
if (row) row.setRowNum(newrownum++);
}
should do the trick.
The HSSFRow has a method called setRowNum(int rowIndex).
When you have to "delete" a row, you put that index in a List. Then, when you get to the next row non-empty, you take an index from that list and set it calling setRowNum(), and remove the index from that list. (Or you can use a queue)
My special case (it worked for me):
//Various times to delete all the rows without units
for (int j=0;j<7;j++) {
//Follow all the rows to delete lines without units (and look for the TOTAL row)
for (int i=1;i<sheet.getLastRowNum();i++) {
//Starting on the 2nd row, ignoring first one
row = sheet.getRow(i);
cell = row.getCell(garMACode);
if (cell != null)
{
//Ignore empty rows (they have a "." on first column)
if (cell.getStringCellValue().compareTo(".") != 0) {
if (cell.getStringCellValue().compareTo("TOTAL") == 0) {
cell = row.getCell(garMAUnits+1);
cell.setCellType(HSSFCell.CELL_TYPE_FORMULA);
cell.setCellFormula("SUM(BB1" + ":BB" + (i - 1) + ")");
} else {
cell = row.getCell(garMAUnits);
if (cell != null) {
int valor = (int)(cell.getNumericCellValue());
if (valor == 0 ) {
//sheet.removeRow(row);
removeRow(sheet,i);
}
}
}
}
}
}
}
This answer is an extension over AndreAY's answer, Giving you complete function on deleting a row.
public boolean deleteRow(String sheetName, String excelPath, int rowNo) throws IOException {
XSSFWorkbook workbook = null;
XSSFSheet sheet = null;
try {
FileInputStream file = new FileInputStream(new File(excelPath));
workbook = new XSSFWorkbook(file);
sheet = workbook.getSheet(sheetName);
if (sheet == null) {
return false;
}
int lastRowNum = sheet.getLastRowNum();
if (rowNo >= 0 && rowNo < lastRowNum) {
sheet.shiftRows(rowNo + 1, lastRowNum, -1);
}
if (rowNo == lastRowNum) {
XSSFRow removingRow=sheet.getRow(rowNo);
if(removingRow != null) {
sheet.removeRow(removingRow);
}
}
file.close();
FileOutputStream outFile = new FileOutputStream(new File(excelPath));
workbook.write(outFile);
outFile.close();
} catch(Exception e) {
throw e;
} finally {
if(workbook != null)
workbook.close();
}
return false;
}
I'm trying to reach back into the depths of my brain for my POI-related experience from a year or two ago, but my first question would be: why do the rows need to be removed before parsing? Why don't you just catch the null result from the sheet.getRow(rowNum) call and move on?