I'm using the Apache POi HSSF library to import info into my application. The problem is that the files have some extra/empty rows that need to be removed first before parsing.
There's not a HSSFSheet.removeRow( int rowNum ) method. Only removeRow( HSSFRow row ). The problem with this it that empty rows can't be removed. For example:
sheet.removeRow( sheet.getRow(rowNum) );
gives a NullPointerException on empty rows because getRow() returns null.
Also, as I read on forums, removeRow() only erases the cell contents but the row is still there as an empty row.
Is there a way of removing rows (empty or not) without creating a whole new sheet without the rows that I want to remove?
/**
* Remove a row by its index
* #param sheet a Excel sheet
* #param rowIndex a 0 based index of removing row
*/
public static void removeRow(HSSFSheet sheet, int rowIndex) {
int lastRowNum=sheet.getLastRowNum();
if(rowIndex>=0&&rowIndex<lastRowNum){
sheet.shiftRows(rowIndex+1,lastRowNum, -1);
}
if(rowIndex==lastRowNum){
HSSFRow removingRow=sheet.getRow(rowIndex);
if(removingRow!=null){
sheet.removeRow(removingRow);
}
}
}
I know, this is a 3 year old question, but I had to solve the same problem recently, and I had to do it in C#. And here is the function I'm using with NPOI, .Net 4.0
public static void DeleteRow(this ISheet sheet, IRow row)
{
sheet.RemoveRow(row); // this only deletes all the cell values
int rowIndex = row.RowNum;
int lastRowNum = sheet.LastRowNum;
if (rowIndex >= 0 && rowIndex < lastRowNum)
{
sheet.ShiftRows(rowIndex + 1, lastRowNum, -1);
}
}
Something along the lines of
int newrownum=0;
for (int i=0; i<=sheet.getLastRowNum(); i++) {
HSSFRow row=sheet.getRow(i);
if (row) row.setRowNum(newrownum++);
}
should do the trick.
The HSSFRow has a method called setRowNum(int rowIndex).
When you have to "delete" a row, you put that index in a List. Then, when you get to the next row non-empty, you take an index from that list and set it calling setRowNum(), and remove the index from that list. (Or you can use a queue)
My special case (it worked for me):
//Various times to delete all the rows without units
for (int j=0;j<7;j++) {
//Follow all the rows to delete lines without units (and look for the TOTAL row)
for (int i=1;i<sheet.getLastRowNum();i++) {
//Starting on the 2nd row, ignoring first one
row = sheet.getRow(i);
cell = row.getCell(garMACode);
if (cell != null)
{
//Ignore empty rows (they have a "." on first column)
if (cell.getStringCellValue().compareTo(".") != 0) {
if (cell.getStringCellValue().compareTo("TOTAL") == 0) {
cell = row.getCell(garMAUnits+1);
cell.setCellType(HSSFCell.CELL_TYPE_FORMULA);
cell.setCellFormula("SUM(BB1" + ":BB" + (i - 1) + ")");
} else {
cell = row.getCell(garMAUnits);
if (cell != null) {
int valor = (int)(cell.getNumericCellValue());
if (valor == 0 ) {
//sheet.removeRow(row);
removeRow(sheet,i);
}
}
}
}
}
}
}
This answer is an extension over AndreAY's answer, Giving you complete function on deleting a row.
public boolean deleteRow(String sheetName, String excelPath, int rowNo) throws IOException {
XSSFWorkbook workbook = null;
XSSFSheet sheet = null;
try {
FileInputStream file = new FileInputStream(new File(excelPath));
workbook = new XSSFWorkbook(file);
sheet = workbook.getSheet(sheetName);
if (sheet == null) {
return false;
}
int lastRowNum = sheet.getLastRowNum();
if (rowNo >= 0 && rowNo < lastRowNum) {
sheet.shiftRows(rowNo + 1, lastRowNum, -1);
}
if (rowNo == lastRowNum) {
XSSFRow removingRow=sheet.getRow(rowNo);
if(removingRow != null) {
sheet.removeRow(removingRow);
}
}
file.close();
FileOutputStream outFile = new FileOutputStream(new File(excelPath));
workbook.write(outFile);
outFile.close();
} catch(Exception e) {
throw e;
} finally {
if(workbook != null)
workbook.close();
}
return false;
}
I'm trying to reach back into the depths of my brain for my POI-related experience from a year or two ago, but my first question would be: why do the rows need to be removed before parsing? Why don't you just catch the null result from the sheet.getRow(rowNum) call and move on?
Related
I want to create an Excel file with Apache Poi, based on the sheet of another Excel file. Only the first two columns and their corresponding rows should be applied to the new Excel sheet.
First, I insert all cells of the first column, then I increment the columnIndex to insert the other cells.
private static void createNewWorkBook(XSSFSheet oldSheet) {
XSSFWorkbook newWorkbook = new XSSFWorkbook();
XSSFSheet newSheet = newWorkbook.createSheet("test-sheet");
for (int columnIndex = 0; columnIndex < 2; columnIndex++) {
int rowIndex = 0;
for (Row oldRow : oldSheet) {
XSSFRow newRow = newSheet.createRow(rowIndex);
XSSFCell newCell = newRow.createCell(columnIndex);
newCell.setCellValue("Hello"); // just for test purposes
// newCell.setCellValue(oldSheet.getRow(rowIndex).getCell(columnIndex).getStringCellValue());
rowIndex++;
}
}
try {
FileOutputStream fos = new FileOutputStream(new File("CreateExcelDemo.xlsx"));
newWorkbook.write(fos);
fos.close();
} catch (
IOException e) {
e.printStackTrace();
}
}
Unfortunatley it doesn't work. I only get the values of the second column in my newly generated excel-sheet. The first column is just empty.
BUT:
If I replace the columnIndex with 0 or 1, it works! Where is my thinking problem?
I am using JAVA 8 and Apache POI 3.17. I have an Excel file and i want to keep only few lines and delete the others. But my Excel have 40K rows and deleting them one by one is quite long (nearly 30 min :/ )
So i try to change my way of doing it. Now i think it's better to only take rows that i need in the excel source and copy to another new one. But what i have tried so far is not efficient.
I have all my rows and want to keep in a List. But this not working and create me a blank excel :
public void createExcelFileFromLog (Path logRejetFilePath, Path fichierInterdits) throws IOException {
Map<Integer, Integer> mapLigneColonne = getRowAndColumnInError(logRejetFilePath);
Workbook sourceWorkbook = WorkbookFactory.create(new File(fichierInterdits.toAbsolutePath().toString()));
Sheet sourceSheet = sourceWorkbook.getSheetAt(0);
List<Row> listLignes = new ArrayList<Row>();
// get Rows from source Excel
for (Map.Entry<Integer, Integer> entry : mapLigneColonne.entrySet()) {
listLignes.add(sourceSheet.getRow(entry.getKey()-1));
}
// The new Excel
Workbook workbookToWrite = new XSSFWorkbook();
Sheet sheetToWrite = workbookToWrite.createSheet("Interdits en erreur");
// Copy Rows
Integer i = 0;
for (Row row : listLignes) {
copyRow(sheetToWrite, row, i);
i++;
}
FileOutputStream fos = new FileOutputStream(config.getDossierTemporaire() + "Interdits_en_erreur.xlsx");
workbookToWrite.write(fos);
workbookToWrite.close();
sourceWorkbook.close();
}
private static void copyRow(Sheet newSheet, Row sourceRow, int newRowNum) {
Row newRow = newSheet.createRow(newRowNum);
newRow = sourceRow;
}
EDIT : Change the method of copyRow it's better but the date have weird format and blank cells from the original row are gone.
private static void copyRow(Sheet newSheet, Row sourceRow, int newRowNum) {
Row newRow = newSheet.createRow(newRowNum);
Integer i = 0;
for (Cell cell : sourceRow) {
if(cell.getCellTypeEnum() == CellType.NUMERIC) {
newRow.createCell(i).setCellValue(cell.getDateCellValue());
} else {
newRow.createCell(i).setCellValue(cell.getStringCellValue());
}
i++;
}
}
EDIT 2 : To keep blank cell
private static void copyRow(Sheet newSheet, Row sourceRow, Integer newRowNum, Integer cellToColor) {
Row newRow = newSheet.createRow(newRowNum);
//Integer i = 0;
int lastColumn = Math.max(sourceRow.getLastCellNum(), 0);
for(int i = 0; i < lastColumn; i++) {
Cell oldCell = sourceRow.getCell(i, Row.MissingCellPolicy.RETURN_BLANK_AS_NULL);
if(oldCell == null) {
newRow.createCell(i).setCellValue("");
} else if (oldCell.getCellTypeEnum() == CellType.NUMERIC) {
newRow.createCell(i).setCellValue(oldCell.getDateCellValue());
} else {
newRow.createCell(i).setCellValue(oldCell.getStringCellValue());
}
}
}
I need to read a large (50000 row and 20 columns) excel file using Apache POI library. There is another question that asks exactly the same thing. My attempted approach is as follows:
public static ArrayList<Double> readColumn(String excelFile,String sheetName, int columnNumber)
{
ArrayList<Double> excelData = new ArrayList<>();
XSSFWorkbook workbook = null;
try
{
workbook = new XSSFWorkbook(excelFile);
} catch (IOException e)
{
e.printStackTrace();
}
Sheet sheet = workbook.getSheet(sheetName);
for (int i = 0; i <= sheet.getLastRowNum(); i++)
{
Row row = sheet.getRow(i);
if (row != null) {
Cell cell = row.getCell(columnNumber);
if (cell != null)
{
// Skip cellls that are not numericals
if (cell.getCellTypeEnum() == CellType.NUMERIC)
{
excelData.add(cell.getNumericCellValue());
System.out.println(cell.getNumericCellValue());
}
}
}
}
return excelData;
}
Unfortunately, while this method seems to work when accessing a low index column number (e.g. columnNumber =1), I get an OutOfMemoryError exception for a large columnNumber. The file itself is not too large to make my computer run out memory. I can achieve the same outcome in Python with very little memory requirements.Is there a better way to solve this? Or, is there any Java library that would allow me to do that?
I have a problem with getting data from this function when I call it twice. The function returns an arrayList of all rows fetched from an excel sheet. When I call the function the first time I get the correct amount of rows (all rows except the headline row and the row with exit). The second time I call the function I get 0.
It seems that something happens with file or the sheets created the second time, here is the code:
private static List<String[]> getDataFromXLS(String excelPath) {
FileInputStream fis;
Workbook workbook; Sheet sheet; XSSFRow row;
Iterator<Row> rows;
XSSFCell cell;
List<String[]> allExcelRows = new ArrayList<String[]>();
String[] xlsRow;
columnNames = new LinkedHashMap<Integer, String>();
paramNames = new LinkedHashMap<String, Integer>();
int totalColumnCount = 0;
int rowNumber = 1;
try {
fis = new FileInputStream(new File(excelPath));
workbook = WorkbookFactory.create(fis);
sheet = workbook.getSheet("TestData");
rows = sheet.rowIterator();
while (rows.hasNext()) {
row = ((XSSFRow) rows.next());
if (rowNumber == 1) {
//based on amount of parameters on first xls row
totalColumnCount = row.getLastCellNum();
}
xlsRow = new String[totalColumnCount];
//check which column is TestType
//iterate through all the columns
for (int columnNumber=0; columnNumber<totalColumnCount; columnNumber++) {
cell = row.getCell(columnNumber, Row.CREATE_NULL_AS_BLANK);
if (getCellValue(cell).trim().toLowerCase().trim().equals("testtype") ){
testTypeColumnIndex = columnNumber; //this is Testtype index
break;
}
}
if (rowNumber != 1) {
for(int columnNumber=0; columnNumber<totalColumnCount; columnNumber++) {
cell = row.getCell(columnNumber, Row.CREATE_NULL_AS_BLANK);
//read only rows before exit
if (columnNumber == testTypeColumnIndex && getCellValue(cell).trim().toLowerCase().trim().equals("exit") ){
reachedExit = true;
break;
}
xlsRow[columnNumber] = getCellValue(cell).trim();
}
//reached exit?
if (reachedExit) {
break;
}
allExcelRows.add(xlsRow);
} else {
//save column names into map
for(int columnNumber=0; columnNumber<totalColumnCount; columnNumber++) {
cell = row.getCell(columnNumber, Row.CREATE_NULL_AS_BLANK);
columnNames.put(columnNumber, getCellValue(cell).trim());
paramNames.put(getCellValue(cell).trim(), columnNumber);
}
}
rowNumber++;
}
} catch (Exception e) {
e.printStackTrace();
}
fis.close();
return allExcelRows;
}
Am taking a bit of a guess here but I think the problem is that the reachedExit class level boolean is not reset at the start of the method. Hence when you call it the second time this code block executes:
//reached exit?
if (reachedExit) {
break;
}
....meaning that nothing gets added to allExcelRows
I have to check the strings of two xlsx files if they are equal must return the name, but it always returns null, someone can help me?
try
{
FileInputStream fisCod = new FileInputStream(pathC);
XSSFWorkbook wb = new XSSFWorkbook (fisCod);
XSSFSheet sheet = wb.getSheetAt(0);
int lastRow = sheet.getLastRowNum();
for(int i=0; i<lastRow; i++)
{
Row row = sheet.getRow(i);
Cell cell = row.getCell(jobCod);
String tmp = cell.getRichStringCellValue().getString().toLowerCase();
if (tmp.equals(jobName)) //jobName is a String
{
return tmp;
}
else
{
return null;
}
}
fisCod.close();
}
catch (IOException e)
{
System.out.println(e.getMessage());
}
The first mismatch in the above code will cause null to be returned without checking subsequent row values. More than likely, this is the scenario you are describing.
Check all cell values before resorting to returning null when the attempted match fails.
for (int i = 0; i < lastRow; i++) {
Row row = sheet.getRow(i);
Cell cell = row.getCell(jobCod);
String tmp = cell.getRichStringCellValue().getString().toLowerCase();
if (tmp.equals(jobName)) {
return tmp;
}
}
return null; // now return null