I use the following code in order to autosize columns in my spreadsheet:
for (int i = 0; i < columns.size(); i++) {
sheet.autoSizeColumn(i, true);
sheet.setColumnWidth(i, sheet.getColumnWidth(i) + 600);
}
The problem is it takes more than 10 minutes to autosize each column in case of large spreadsheets with more than 3000 rows. It goes very fast for small documents though. Is there anything which could help autosizing to work faster?
Solution which worked for me:
It was possible to avoid merged regions, so I could iterate through the other cells and finally autosize to the largest cell like this:
int width = ((int)(maxNumCharacters * 1.14388)) * 256;
sheet.setColumnWidth(i, width);
where 1.14388 is a max character width of the "Serif" font and 256 font units.
Performance of autosizing improved from 10 minutes to 6 seconds.
The autoSizeColumn function itself works not perfect and some columns width not exactly fit the data inside. So, I found some solution that works for me.
To avoid crazy calculations let give that to autoSizeColumn() function:
sheet.autoSizeColumn(<columnIndex>);
Now, our column autosized by library but we wont to add a little bit more to the current column width to make table looks fine:
// get autosized column width
int currentColumnWidth = sheet.getColumnWidth(<columnIndex>);
// add custom value to the current width and apply it to column
sheet.setColumnWidth(<columnIndex>, (currentColumnWidth + 2500));
The full function could looks like:
public void autoSizeColumns(Workbook workbook) {
int numberOfSheets = workbook.getNumberOfSheets();
for (int i = 0; i < numberOfSheets; i++) {
Sheet sheet = workbook.getSheetAt(i);
if (sheet.getPhysicalNumberOfRows() > 0) {
Row row = sheet.getRow(sheet.getFirstRowNum());
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
int columnIndex = cell.getColumnIndex();
sheet.autoSizeColumn(columnIndex);
int currentColumnWidth = sheet.getColumnWidth(columnIndex);
sheet.setColumnWidth(columnIndex, (currentColumnWidth + 2500));
}
}
}
}
P.S. Thanks Ondrej Kvasnovsky for the function https://stackoverflow.com/a/35324693/13087091
The autosizeColumn() function very slow and unneficient. Even authors of apache POI mentioned in docs, that:
This process can be relatively slow on large sheets, ...
Calculating and setting the cell's width manually is way faster - in my case I reduced the time from ~25,000ms to ~1-5ms.
This is how to achieve it (I was basing on Vladimir Shcherbukhin's answer:
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet();
final int[] maxNumCharactersInColumns = new int[headers.length]; // maximum number of characters in columns. Necessary to calculate the cell width in most efficient way. sheet.autoSizeColumn(...) is very slow.
Row headersRow = sheet.createRow(0);
CellStyle headerStyle = createHeadersStyle(workbook); // createHeadersStyle() is my own function. Create headers style if you want
for (int i = 0; i < headers.length; i++) { // create headers
Cell headerCell = headersRow.createCell(i, CELL_TYPE_STRING);
headerCell.setCellValue(headers[i]);
headerCell.setCellStyle(headerStyle);
int length = headers[i].length();
if (maxNumCharactersInColumns[i] < length) { // adjust the columns width
maxNumCharactersInColumns[i] = length + 2; // you can add +2 if you have filtering enabled on your headers
}
}
int rowIndex = 1;
for (List<Object> rowValues : rows) {
Row row = sheet.createRow(rowIndex);
int columnIndex = 0;
for (Object value : rowValues) {
Cell cell = createRowCell(row, value, columnIndex); // createRowCell() is my own function.
int length;
if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
String cellValue = cell.getStringCellValue();
// this is quite important part. In some excel spreadsheet you can have a values with line-breaks. It'll be cool to handle that scenario :)
String[] arr = cellValue.split("\n"); // if cell contains complex value with line breaks, calculate only the longest line
length = Arrays.stream(arr).map(String::length).max(Integer::compareTo).get();
} else {
length = value != null ? value.toString().length() : 0;
}
if (maxNumCharactersInColumns[columnIndex] < length) { // if the current cell value is the longest one, save it to an array
maxNumCharactersInColumns[columnIndex] = length;
}
columnIndex++;
}
rowIndex++;
}
for (int i = 0; i < headers.length; i++) {
int width = (int) (maxNumCharactersInColumns[i] * 1.45f) * 256; // 1.45f <- you can change this value
sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cell width
}
sheet.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.length - 1));
ByteArrayOutputStream output = new ByteArrayOutputStream();
workbook.write(output);
workbook.close();
Unfortunately I don't have enough reputations yet to add comments in answers. So here some annotations:
When using Row row = sheet.getRow(sheet.getFirstRowNum()); be shure, this row contains at least a value in the last column. Otherwise the cellIterator will end too early, i.e. if a subsequent row has a value in this column, this column will not be autosized. This problem is bypassed if rowcontains the headers (names of the columns). Or explicit use a known header row, e.g.
int indexOfHeaderRow = ...;
...
Row row = sheet.getRow(indexOfHeaderRow);
Jakub SÅ‚owikowski
sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cellwidth
I'm not shure about this line because there is no information about content of MAX_CELL_WIDTH - perhaps overall maximum? So I used instead:
sheet.setColumnWidth(i, Math.max(width, 2048));
2048 seams to be the default width? This value prevents extremely narrow widths for empty columns.
Related
I have table in one of my ppt slide, my requirement is to read the height of table's row, so if the row is going out of that particular slide i can remove it, the height will vary base on the text in it.
I tried reading like these but for some reason not getting the accurate output:
int totalRows = table.getNumberOfRows();
double rowHeight = 0;
for(int t =0; t< totalTableRows; t++)
{
//logic to read height of row
rowHeight += table.getRows().get(t).getCells().get(0).getAnchor().getHeight();
//logic to remove row.
if(rowHeight > slideHeight)
{
for(int remove = t;t< totalRows; t++)
{
table.removeRow(remove);
}
}
}
Note: some rows and column has merged cells as well.
I'm figuring out how to style multiple cell using cell range. see my code for my current code. Thanks in advance for those who want to help me.
for (int counter = 0; counter < ColumnList.length; counter++) {
SXSSFCell cell = currentRow.createCell(counter);
if (counter == 0) {
cell.setCellValue(String.valueOf(rowNum));
cell.setCellStyle(cellStyle);
} else {
String columnValue = ColumnList[counter];
String cellValue = rs.getString(columnValue);
cell.setCellValue(cellValue);
cell.setCellStyle(cellStyle);
}
}
There is a CellRangeAddress on the docs which has a constructor you could use:
CellRangeAddress(int firstRow, int lastRow, int firstCol, int lastCol)
This is done. A better way of applying cellStyles in multiple cell is by row.
There is a setRowStyle on the docs which you can apply Styles in a row and loop until it reach the last row.
I m using getLastRowNum() and getPhysicalNumberOfCells() for the number of used rows and columns respectively but its not giving the correct index of the row.
int lastRowNum = sheetAt.getLastRowNum();
int lastColNum = sheetAt.getRow(0).getPhysicalNumberOfCells();
Any other option to find out the same???
Rows and Cells can be missing, and there is no built in function to return the number of cells used in a column. So you have to write the function yourself.
int count = 0;
for (Row row : sheet) {
if (row.getCell(5) != null) {
count += 1;
}
}
This retrieves the number of used cells in column F.
I think my title is clear what I want to know. I already searched google and there's no answer to my problem.
I want to know how can I get the size or length of an specific column in POI Apache Java?
I thought you cannot getheight for column.But you can getheight for Specific row . other way is use CellStyle to get Height(It can be done using top border+ bottom border+ font height) for specific cell.
I think there is no direct method for it.you have to iterate over all rows to know the size of column.
sample :
for (Cell cell : row) {
++COLUMNCOUNT;
}
Workbook workbook = new XSSFWorkbook(ExcelFile);
Sheet firstSheet = workbook.getSheetAt(0);
Iterator<Row> iterator = firstSheet.iterator();
Row nextRow = iterator.next();
rowCount = firstSheet.getLastRowNum();
columnCount = nextRow.getLastCellNum();
I already found out how to get the size of column in my on way. Post another answers if you have another one for future references.
int columnSize = 0;
for (int x = 0; x < row.getLastCellNum(); x++) {
for (int y = 0; y < row.length; y++) {
columnSize = y;
}
break;
}
I have a method for reading Excel cells using Apache POI, and it works fine. Well... almost fine.
public static ArrayList readXLsXFile() throws FileNotFoundException, IOException {
ArrayList outListaExcel = new ArrayList();
FileInputStream fis;
ptxf= new FileInputStream(pathToExcelFile);
XSSFWorkbook workbook = new XSSFWorkbook(ptxf);
XSSFSheet sheetAr = workbook.getSheetAt(0);
Iterator rowsAr = sheetAr.rowIterator();
while (rowsAr.hasNext()) {
XSSFRow row1 = (XSSFRow) rowsAr.next();
Iterator cellsAr = row1.cellIterator();
ArrayList<String> arr;
arr = new ArrayList();
while (cellsAr.hasNext()) {
XSSFCell cell1 = (XSSFCell) cellsAr.next();
arr.add(String.valueOf(cell1));
}
outListaExcel.add(arr);
}
return outListaExcel;
}
If cells are formatted, for example if whole A column have borders, then it will keep reading empty cells giving me empty strings. How to ignore those empty(formated) cells?
So readXLsXFile will give me an ArryList with
[0] -> [1][2]
[1] -> [3][4]
But it will also give ten more nodes with empty strings,because coloumn A is formated with borders.
edit after Gagravarr answer.
I can avoid checking wether subList is empty and then do not add it to mainList. But in the case of some very large .xls files and if there is many of them it will take too long, and generaly I think it is not a good practice.
My question was if there is something for rows, like it is for cells that I have overlooked.
ArrayList<ArrayList<String>>mainLista = new ArrayList<ArrayList<String>>();
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
int lastColumn = r.getLastCellNum();
ArrayList<String> subList = new ArrayList<String>();
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c != null) {
subList.add(c.getStringCellValue());
} else {
}
}
if (!subList.isEmpty() ){ // I think it is not good way
mainLista.add(subList);} // to do this, because it still reads
} // an empty rows
As explained in the Apache POI Documentation on Iterate over rows and cells, the iterators only give you the rows and cells which are defined and have/had content.
If you want to fetch cells with full control over blank or empty cells, you need to instead use something like:
// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}
If you want to fetch blank cells (typically those with styling but no values), play with the other Missing Cell Policies, eg RETURN_NULL_AND_BLANK
set the border for column B, in my case it helped me