I use groovy script to works with excel file. And I use POI API to manipulate those files. But in the documentation there is no methods or object that help me to find a way to get the used range of a sheet. I tried to calculate it by my own using methods like getLastRowNum() or getPhysicalNumberOfRows() but none of them works well because they stop counting when they meet an empty rows. Sometimes excel file can have empty rows and after those empty rows they could be filled rows, but those methods just STOP when they just meet one empty rows. So those function will not help me to reach my goal.
So I try another solution. I want to create a named range in the workbook by using the methods createName() then make a named range with a formula that return the usedrange of the actual sheet. But I don't know how to make it, I searched a lot and all I found is about VBA, I don't want to use it because in named range formula we can't use VBA. I found a function call GET.WORKBOOK and I think this could a good start point to search an answer about my problem. This function return the list of worksheet name of the workbook. There no link between my problem and this result but I think that GET object could contain more method like GET.WORKSHEET it's very speculative but I think there is more than just GET.WORKBOOK. (If you have any informations about this, even if it's not solve my problem please put this in the comment please, I'm really interested in this GET function.)
NB : If you find a way to solve my problem with a groovy-only solution I would be very happy too. I didn't recall this type of solution because I search a lot in this direction but I didn't found anything to help me.
NB2 : I adding java tag because groovy and java are very close. And I think someone that can found a solution in java for this problem could do the same in groovy.
NB3 : I want a cell reference like A1:B2 to specify the used range
NB4 : I re-test the methods getLastRowNum() and it worked perfectly, I made some mistakes in my code that's why it didn't work well. Now here's my new problem, when I use this method I cannot access to a cell that empty with the getCell methods. Here's my code :
import org.apache.poi.ss.usermodel.WorkbookFactory;
wb = WorkbookFactory.create(new File("./webapps/etlserver/data/files/test_ws.xlsx"));
def getUsedRangeByIndex(file_path,ind_ws){
wb = WorkbookFactory.create(new File(file_path));
max_col = 0;
for(int i = 0 ; i < wb.getSheetAt(ind_ws).getLastRowNum() ; i++){
LOG.info(i.toString())
if(wb.getSheetAt(ind_ws).getRow(i) != null && wb.getSheetAt(ind_ws).getRow(i).getLastCellNum() > max_col){
max_col = wb.getSheetAt(ind_ws).getRow(i).getLastCellNum();
}
}
return "A1:" + wb.getSheetAt(ind_ws).getRow(wb.getSheetAt(ind_ws).getLastRowNum()).getCell(max_col, RETURN_NULL_AND_BLANK).getReference()
}
LOG.info(getUsedRangeByIndex("./webapps/etlserver/data/files/test_ws.xlsx",0))
I know I have to improve it with some code that calculate the first used cell but for now I will consider A1 as the first used cell.
If the definition of the used range of a worksheet is as follows: ...
The used range is the cell range from first used top left cell to last used bottom right cell.
... and the used Apache POI version is one of the current ones (me using apache poi 5.2.2) , then the simplest approach to get that used range is usig following methods:
Sheet.getFirstRowNum and Sheet.getLastRowNum to get first used row and last used row in sheet. If one of this return -1 then the sheet does not contain any rows and so does not have a used range.
Then loop over all rows between first used row and last used row and get Row.getFirstCellNum and Row.getLastCellNum. Note the API doc of Row.getLastCellNum : Gets the index of the last cell contained in this row PLUS ONE. If the found first column in that row is lower than before found first columns, then this is the new first column. If the found last column in that row is greater than before found last columns, then this is the new last column.
After that we have first used row, last used row, leftmost used colum and rightmost used column. That is the used range then.
Complete example:
import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.CellRangeAddress;
class ExcelGetSheetUsedRange {
/**
* Simplest method to get the used range from a sheet.
*
* #param sheet The sheet to get the used range from.
+ #return CellRangeAddress representing the used range or null for an empty sheet.
*/
static CellRangeAddress getUsedRange(Sheet sheet) {
int firstRow = sheet.getFirstRowNum();
if (firstRow == -1) return null;
int lastRow = sheet.getLastRowNum();
if (lastRow == -1) return null;
int firstCol = Integer.MAX_VALUE;
int lastCol = -1;
for (int r = firstRow; r <= lastRow; r++) {
Row row = sheet.getRow(r);
if (row != null) {
int thisRowFirstCol = row.getFirstCellNum();
int thisRowLastCol = row.getLastCellNum()-1; // see API doc Row.getLastCellNum : Gets the index of the last cell contained in this row PLUS ONE.
if (thisRowFirstCol < firstCol) firstCol = thisRowFirstCol;
if (thisRowLastCol > lastCol) lastCol = thisRowLastCol;
}
}
if (firstCol == Integer.MAX_VALUE) return null;
if (lastCol == -1) return null;
return new CellRangeAddress(firstRow, lastRow, firstCol, lastCol);
}
public static void main(String[] args) throws Exception {
//Workbook workbook = WorkbookFactory.create(new FileInputStream("./template.xls"));
Workbook workbook = WorkbookFactory.create(new FileInputStream("./template.xlsx"));
Sheet sheet = workbook.getSheetAt(0);
CellRangeAddress usedRange = getUsedRange(sheet);
System.out.println(usedRange);
}
}
As told in API doc of Sheet.getLastRowNum:
Note: rows which had content before and were set to empty later might
still be counted as rows by Excel and Apache POI...
But that is a problem of Excel wich also may occur when get the used range via Worksheet.UsedRange property.
The solution of Axel Richter is perfect. But here's a pre-build code you can directly insert into a jedox job to makes things work well. It a kind of translation from java to groovy. Here's the code :
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.ss.util.CellRangeAddress;
wb = WorkbookFactory.create(new File("./webapps/etlserver/data/files/test_ws.xlsx"));
sheet = wb.getSheetAt(0);
def getUsedRange(sheet) {
firstRow = sheet.getFirstRowNum();
if (firstRow == -1) return null;
lastRow = sheet.getLastRowNum();
if (lastRow == -1) return null;
firstCol = Integer.MAX_VALUE;
lastCol = -1;
for (int r = firstRow; r <= lastRow; r++) {
row = sheet.getRow(r);
if (row != null) {
thisRowFirstCol = row.getFirstCellNum();
thisRowLastCol = row.getLastCellNum()-1;
if (thisRowFirstCol < firstCol) firstCol = thisRowFirstCol;
if (thisRowLastCol > lastCol) lastCol = thisRowLastCol;
}
}
if (firstCol == Integer.MAX_VALUE) return null;
if (lastCol == -1) return null;
return (new CellRangeAddress(firstRow, lastRow, firstCol, lastCol)).formatAsString();
}
LOG.info(getUsedRange(sheet));
Related
I am using poi(v4.0.0) to import the excel document. But when I tried to get the next cell carModelCell, it always return null, this is my Java 8 code looks like:
public void verifyCar(Cell cell, int relativeRowIndex, Head head) {
if (cell.getRowIndex() > 0 && head.getFieldName().equals("car")) {
if (StringUtils.isBlank(cell.getStringCellValue())|| cell.getStringCellValue().equals("无车")) {
return;
}
Cell carModelCell = cell.getRow().getCell(cell.getColumnIndex() + 1);
if (carModelCell == null || StringUtils.isBlank(carModelCell.getStringCellValue())) {
SparkUserParseResult result = new SparkUserParseResult();
result.setSuccess(false);
UploadSparkUserDataListener.parseSuccess.set(result);
return;
}
}
}
I am tried to get row from Cell, and get the next cell value with the same row and do some check, but the next cell carModelCell always return null. I have already sure the next cell of current row have a value. why would this happen? what should I do to fix this problem? This code block was in CellStyleWriteHandler which extend AbstractCellStyleStrategy in easy excel (version 2.2.11):
public class CellStyleWriteHandler extends AbstractCellStyleStrategy {
#Override
protected void setContentCellStyle(Cell cell, Head head, Integer relativeRowIndex) {
impl(cell, head, relativeRowIndex);
}
}
I tried to get the last index num was 14, the current column index number was 13. the total column of my imported excel was 24, seems the easy excel did not pass the full column, is it possible to fix this problem? How to get the next cell of current row?
i also user poi to parse excel, i think problem in this line:
Cell carModelCell = cell.getRow().getCell(cell.getColumnIndex() + 1);
code above is error, becase cell can get from row,like:
Cell cell = sheetColumnRow.getRow()
one raw can cantain many Cell,but you can not get complete row from Cell, Parse row Cell value can not reverse; wish help you;
The same question was asked here a few years ago:
how to remove all formulas from an excel sheet by java POI api?.
However, it did not receive an answer at the time that works for me.
I have a workbook with several large sheets and want to loop over all cells to replace the cell contents with strings. The problem is, many cells contain formulas which I have to get rid of first. Neither cell.setCellFormula(null) nor cell.setCellType(CellType.STRING) (nor BLANK) is satisfying, as the underlying processes to remove array formulas take ages and make the entire job far too slow.
The following works but leaves a corrupt excel workbook which can only be opened with a repairing step on the first time:
Method m = XSSFCell.class.getDeclaredMethod("setBlank");
m.setAccessible(true);
m.invoke(cell);
Is there any other fast and cleaner way to simply set certain cells blank, regardless of any formulas?
The problem why the corrupted workbook occurs is that there is a calculation chain stored in /xl/calcChain.xml. The normal slow methods to remove the formulas will updating this calculation chain. But, as you found already, they also attempt to be usable for removing single formulas only and not all. So they must be carefully while removing parts of array formulas which makes them slow.
But if really all formulas shall be removed, this carefulness is not necessary and then simply the whole /xl/calcChain.xml can be removed.
Example:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.model.CalculationChain;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellFormulaType;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.POIXMLDocumentPart;
import java.lang.reflect.Method;
class ExcelRemoveFormulasAndCalcChain {
private static void removeCalcChain(XSSFWorkbook workbook) throws Exception {
CalculationChain calcchain = workbook.getCalculationChain();
Method removeRelation = POIXMLDocumentPart.class.getDeclaredMethod("removeRelation", POIXMLDocumentPart.class);
removeRelation.setAccessible(true);
removeRelation.invoke(workbook, calcchain);
}
public static void main(String[] args) throws Exception {
XSSFWorkbook workbook = (XSSFWorkbook)WorkbookFactory.create(new FileInputStream("Test.xlsx"));
for (Sheet sheet : workbook) {
for (Row row : sheet) {
for (Cell cell : row) {
XSSFCell xssfcell = (XSSFCell)cell;
if (xssfcell.getCTCell().isSetF() && xssfcell.getCTCell().getF().getT() != STCellFormulaType.DATA_TABLE) {
xssfcell.getCTCell().unsetF();
}
}
}
}
removeCalcChain(workbook);
workbook.write(new FileOutputStream("Test_1.xlsx"));
workbook.close();
}
}
This should remove all formulas and let all cells back containing only the values and styles.
I think I was able to find how to remove formulas in some cell range.
I noticed that if I delete sheet first formulas with link to it are deleted fast.
If I swap deleting of formulas and deleting of sheets it takes a lot of time.
So if we create a sheet, rewrite all formulas using link to it, and delete sheet, formulas are removed fast (setting formulas with link to non-existing sheet doesn't work).
It takes seconds for 15k+ rows. Here is the experiment:
File fReport = new File(".xlsx");
XSSFWorkbook book = new XSSFWorkbook(new FileInputStream(fReport));
XSSFSheet sheet = book.getSheet("");
XSSFSheet dummy = book.createSheet("dummy");
int lastRow = sheet.getLastRowNum();
for (int i = 8; i <= lastRow; i++) {
XSSFRow rowToClean = sheet.getRow(i);
XSSFCell cell = rowToClean.getCell(2);
System.out.println(i);
if (cell != null) {
cell.setCellFormula("'dummy'!A1");
}
}
book.removeSheetAt(book.getSheetIndex(dummy));
for (int i = 8; i <= lastRow; i++) {
XSSFRow rowToClean = sheet.getRow(i);
XSSFCell cell = rowToClean.getCell(2);
System.out.println(i);
if (cell != null) {
cell.removeFormula();
}
}
book.write(new FileOutputStream(fReport));
book.close();
I'm getting a weird error while trying to read the Cell values through Apache POI in java:
System.out.println(row.getCell(13, Row.CREATE_NULL_AS_BLANK).getStringCellValue())
is always printing null, even after specifying the Missing policy as Row.CREATE_NULL_AS_BLANK.My writing logic to the Cell is :
public void writeCell( String value, Sheet sheet, int rowNum, int colNum)
{
Row row = sheet.getRow(rowNum);
if (row == null)
{
row = sheet.createRow(rowNum);
}
Cell cell = row.createCell(colNum, Cell.CELL_TYPE_STRING);
if (value == null)
{
return;
}
cell.setCellValue(value);
}
When I'm writing to Cell at colNum = 13 , the String value object is null. I'm not able to sort out this issue.
This line doesn't do what you seem to think it does:
System.out.println(row.getCell(13, Row.CREATE_NULL_AS_BLANK).getStringCellValue())
In effect, that's doing
Cell cell = row.getCell(13);
if (cell == null) { cell = row.createCell(13, Cell.CELL_TYPE_BLANK); }
So, if there is nothing in that cell, it creates it as an empty blank one
Then, you try doing:
cell.getStringCellValue()
This only works for String cells, and in the missing case you've told POI to give you a Blank new cell!
If you really just want a string value of a cell, use DataFormatter.formatCellValue(Cell) - that returns a String representation of your cell including formatting. Otherwise, check the type of your cell before trying to fetch the value!
The getStringCellValue() on the Cell interface would return "" if your code worked as supposed (setting the call blank).
Is it not possible that value for col id 13 is not null but "null"?
I have an excel file with 3000 rows. I remove the 2000 (with ms excel app), but when i call the sheet.getLastRowNum() from code , it gives me 3000 (instead of 1000).. How can i remove the blank rows?
I tried the code from here but it doesn't works....
There are two ways for it:
1.) Without code:
Copy the content of your excel and paste it in a new excel, and later rename is as required.
2.) With code(I did not find any functions for it so I created my own function):
You need to check each of the cells for any type of blank/empty string/null kind of things.
Before processing the row(I am expecting you are processing row wise also I am using org.apache.poi.xssf.usermodel.XSSFRow), put a if check, and check for this method's return type in the if(condition), if it is true that means the row(XSSFRow) has some value other wise move the iterator to next row
public boolean containsValue(XSSFRow row, int fcell, int lcell)
{
boolean flag = false;
for (int i = fcell; i < lcell; i++) {
if (StringUtils.isEmpty(String.valueOf(row.getCell(i))) == true ||
StringUtils.isWhitespace(String.valueOf(row.getCell(i))) == true ||
StringUtils.isBlank(String.valueOf(row.getCell(i))) == true ||
String.valueOf(row.getCell(i)).length() == 0 ||
row.getCell(i) == null) {}
else {
flag = true;
}
}
return flag;
}
So finally your processing method will look like
.
.
.
int fcell = row.getFirstCellNum();// first cell number of excel
int lcell = row.getLastCellNum(); //last cell number of excel
while (rows.hasNext()) {
row = (XSSFRow) rows.next();//increment the row iterator
if(containsValue(row, fcell, lcell) == true){
.
.
..//processing
.
.
}
}
Hope this will help. :)
I haven't found any solution on how to easily get the "real" number of rows but I've found a solution to remove such rows which might be useful to someone who's tackling similar issue. See bellow.
I've searched a bit and found this solution
All it does is it deletes those empty rows from the bottom which might be exactly what you want.
As per my understanding for deleting rows you Must have selected all the cells and pressed Delete button. If I am right then you have deleted the rows by wrong way. By this way the cells become blank not deleted so the rows actually contain cells with blank values and that is why get included in the row count.
The correct way to do this is select the row from the left of its first cell where row numbers are appearing. Clicking there on row numbers will select the complete row. Select all required rows with the help of shift key. Now right click and then select delete.
This may be helpful for you.
remove rows/columns by poi api
transfer xls to csv
transfer csv to xls
hope this will help you
I'm opening a Excel (xls) file in my Java Application with POI.
There are 30 Lines in this Excelfile.
I need to get the Value at ColumnIndex 9.
My code:
Workbook wb;
wb = WorkbookFactory.create(inp);
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
if (row.getLastCellNum() >= 6) {
for (Cell cell : row) {
if(cell.getColumnIndex == 9) {
//do something
}
}
}
}
Every Row in Excel has Values in Columns 1-14.
My problem is, only some Values are recognized. I wrote the same value in every cell in ColumnIndex 9 (10th Column in my Excel sheet), but the Problem is still the same.
What could cause this problem?
Make sure you set the same Date format for all cells in column (select column and set format explicity) And i belive using DataUtil class to get data is more appropriate, than call cell.getDateCellValue().
POI uses 0 based counting for columns. So, if you want the 9th Column, you need to fetch the cell with index 8, not 9. It looks like you're checking for column with index 9, so are one column out.
If you're not sure about 0 based indexing, then the safest thing is to use the CellReference class to help you. This will translate between Excel style references, eg A1, and POI style 0-based offsets eg 0,0. Use something like:
CellReference ref = new CellReference("I10");
Row r = sheet.getRow(ref.getRow());
if (r == null) {
// That row is empty
} else {
Cell c = r.getCell(ref.getCol());
// c is now the cell at I10
}
Seems to be a Problem with the excel document(s).
Converting them to csv and then back to xls solves the problem.