How to read empty, but formated, Excel cells with Apache POI?

How to read empty, but formated, Excel cells with Apache POI? - java

I have a method for reading Excel cells using Apache POI, and it works fine. Well... almost fine.
public static ArrayList readXLsXFile() throws FileNotFoundException, IOException {
ArrayList outListaExcel = new ArrayList();
FileInputStream fis;
ptxf= new FileInputStream(pathToExcelFile);
XSSFWorkbook workbook = new XSSFWorkbook(ptxf);
XSSFSheet sheetAr = workbook.getSheetAt(0);
Iterator rowsAr = sheetAr.rowIterator();
while (rowsAr.hasNext()) {
XSSFRow row1 = (XSSFRow) rowsAr.next();
Iterator cellsAr = row1.cellIterator();
ArrayList<String> arr;
arr = new ArrayList();
while (cellsAr.hasNext()) {
XSSFCell cell1 = (XSSFCell) cellsAr.next();
arr.add(String.valueOf(cell1));
}
outListaExcel.add(arr);
}
return outListaExcel;
}
If cells are formatted, for example if whole A column have borders, then it will keep reading empty cells giving me empty strings. How to ignore those empty(formated) cells?
So readXLsXFile will give me an ArryList with
[0] -> [1][2]
[1] -> [3][4]
But it will also give ten more nodes with empty strings,because coloumn A is formated with borders.
edit after Gagravarr answer.
I can avoid checking wether subList is empty and then do not add it to mainList. But in the case of some very large .xls files and if there is many of them it will take too long, and generaly I think it is not a good practice.
My question was if there is something for rows, like it is for cells that I have overlooked.
ArrayList<ArrayList<String>>mainLista = new ArrayList<ArrayList<String>>();
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
int lastColumn = r.getLastCellNum();
ArrayList<String> subList = new ArrayList<String>();
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c != null) {
subList.add(c.getStringCellValue());
} else {
}
}
if (!subList.isEmpty() ){ // I think it is not good way
mainLista.add(subList);} // to do this, because it still reads
} // an empty rows

As explained in the Apache POI Documentation on Iterate over rows and cells, the iterators only give you the rows and cells which are defined and have/had content.
If you want to fetch cells with full control over blank or empty cells, you need to instead use something like:
// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}
If you want to fetch blank cells (typically those with styling but no values), play with the other Missing Cell Policies, eg RETURN_NULL_AND_BLANK

set the border for column B, in my case it helped me

Related

How to check if 2 `XSSFSheet` are identical without having to loop over every cell in each sheet

The goal is to check if 2 sheets are identical or not in Java by using the Apache POI maven library
So let's say we have the following method :
boolean isIdentical(final XSSFSheet firstSheet, final XSSFSheet secondSheet);
I know this can be done by looping every row and every cell and then checking if the cell value is the same, like the following
public boolean isIdentical(final XSSFSheet firstSheet, final XSSFSheet secondSheet) {
for (int rowIndex = firstSheet.getFirstRowNum(); rowIndex <= firstSheet.getLastRowNum(); rowIndex++) {
final XSSFRow row1 = firstSheet.getRow(rowIndex);
final XSSFRow row2 = secondSheet.getRow(rowIndex);
for (int cellIndex = row1.getFirstCellNum(); cellIndex <= row2row1getLastCellNum(); cellIndex++) {
final XSSFCell cell1 = row1.getCell(cellIndex);
final XSSFCell cell2 = row2.getCell(cellIndex);
// check if cell has same raw value
// ...
}
}
But does anyone know if there is a way to compare the data in a more efficient way, rather then going through each cell in the sheet ?
Thank you!

How to horizontally merge XWPFTable using POI in Java

I want to horizontally merge columns of the row in a XWPFTable. I tried the answer in this link.
How to merge cells (or apply colspan) using XWPFTable in POI in Java?
and also of this link
How to merge cells horizontally using apache-poi
It helped me to get cells merged vertically. But horizontal merge is not happening.
I am attaching the sample screenshot of what I really wanted.
Thanks.

There are two methods setting horizontally merging. The first is using CTHMerge which is similar to the vertically merging using CTVMerge and it does not explicitly need a table grid. The second is using grid span properties. This method needs a table grid and the cells which are merged with the first one must be removed.
Microsoft Word supports all methods.
Libreoffice Writer supports CTHMerge too but a table grid must be set because of the correct rendering the table.
WPS Writer supports only setting grid span.
So this should be the most compatible solution:
import java.io.File;
import java.io.FileOutputStream;
import java.math.BigInteger;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTcPr;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTblWidth;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STTblWidth;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTVMerge;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.STMerge;
public class CreateWordTableMerge {
static void mergeCellVertically(XWPFTable table, int col, int fromRow, int toRow) {
for(int rowIndex = fromRow; rowIndex <= toRow; rowIndex++) {
XWPFTableCell cell = table.getRow(rowIndex).getCell(col);
CTVMerge vmerge = CTVMerge.Factory.newInstance();
if(rowIndex == fromRow){
// The first merged cell is set with RESTART merge value
vmerge.setVal(STMerge.RESTART);
} else {
// Cells which join (merge) the first one, are set with CONTINUE
vmerge.setVal(STMerge.CONTINUE);
// and the content should be removed
for (int i = cell.getParagraphs().size(); i > 0; i--) {
cell.removeParagraph(0);
}
cell.addParagraph();
}
// Try getting the TcPr. Not simply setting an new one every time.
CTTcPr tcPr = cell.getCTTc().getTcPr();
if (tcPr == null) tcPr = cell.getCTTc().addNewTcPr();
tcPr.setVMerge(vmerge);
}
}
//merging horizontally by setting grid span instead of using CTHMerge
static void mergeCellHorizontally(XWPFTable table, int row, int fromCol, int toCol) {
XWPFTableCell cell = table.getRow(row).getCell(fromCol);
// Try getting the TcPr. Not simply setting an new one every time.
CTTcPr tcPr = cell.getCTTc().getTcPr();
if (tcPr == null) tcPr = cell.getCTTc().addNewTcPr();
// The first merged cell has grid span property set
if (tcPr.isSetGridSpan()) {
tcPr.getGridSpan().setVal(BigInteger.valueOf(toCol-fromCol+1));
} else {
tcPr.addNewGridSpan().setVal(BigInteger.valueOf(toCol-fromCol+1));
}
// Cells which join (merge) the first one, must be removed
for(int colIndex = toCol; colIndex > fromCol; colIndex--) {
table.getRow(row).removeCell(colIndex); // use only this for apache poi versions greater than 3
//table.getRow(row).getCtRow().removeTc(colIndex); // use this for apache poi versions up to 3
//table.getRow(row).removeCell(colIndex);
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document= new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run=paragraph.createRun();
run.setText("The table:");
//create table
XWPFTable table = document.createTable(3,5);
for (int row = 0; row < 3; row++) {
for (int col = 0; col < 5; col++) {
table.getRow(row).getCell(col).setText("row " + row + ", col " + col);
}
}
//create CTTblGrid for this table with widths of the 5 columns.
//necessary for Libreoffice/Openoffice to accept the column widths.
//values are in unit twentieths of a point (1/1440 of an inch)
//first column = 1 inches width
table.getCTTbl().addNewTblGrid().addNewGridCol().setW(BigInteger.valueOf(1*1440));
//other columns (2 in this case) also each 1 inches width
for (int col = 1 ; col < 5; col++) {
table.getCTTbl().getTblGrid().addNewGridCol().setW(BigInteger.valueOf(1*1440));
}
//create and set column widths for all columns in all rows
//most examples don't set the type of the CTTblWidth but this
//is necessary for working in all office versions
for (int col = 0; col < 5; col++) {
CTTblWidth tblWidth = CTTblWidth.Factory.newInstance();
tblWidth.setW(BigInteger.valueOf(1*1440));
tblWidth.setType(STTblWidth.DXA);
for (int row = 0; row < 3; row++) {
CTTcPr tcPr = table.getRow(row).getCell(col).getCTTc().getTcPr();
if (tcPr != null) {
tcPr.setTcW(tblWidth);
} else {
tcPr = CTTcPr.Factory.newInstance();
tcPr.setTcW(tblWidth);
table.getRow(row).getCell(col).getCTTc().setTcPr(tcPr);
}
}
}
//using the merge methods
mergeCellVertically(table, 0, 0, 1);
mergeCellHorizontally(table, 1, 2, 3);
mergeCellHorizontally(table, 2, 1, 4);
paragraph = document.createParagraph();
FileOutputStream out = new FileOutputStream("create_table.docx");
document.write(out);
out.close();
System.out.println("create_table.docx written successully");
}
}

Export Java HashMap to xlsx

I need convert HashMaps to xlsx using poi. For sheet data2 i need something like that:
table1:
But i have table2:
Here's my list of HashMaps:
rows=[{kol2=s, kol1=s}, {kol2=bbbb, kol3=bbbb, kol1=aaaa}, {kol2=bbbb, kol3=bbbb, kol1=aaaa}, {kol2=bbbb, kol3=bbbb, kol1=aaaa}, {kol2=s, kol1=s}]}
Here's my code:
XSSFWorkbook workBook = new XSSFWorkbook();
XSSFSheet sheet = workBook.createSheet("data");
XSSFSheet sheet2 = workBook.createSheet("data2");
int rowCount = 0;
int help = 1;
List<HashMap<String, Object>> rows = ((List<HashMap<String, Object>>) x);
int rowCount2 = 0;
int header = 1;
Row header2 = sheet2.createRow(0);
for (int i = 0; i < rows.size(); i++) {
int li = 0;
Row row2 = sheet2.createRow(++rowCount2);
HashMap<String, Object> row = rows.get(i);
int columnCount2 = 0;
for (HashMap.Entry<String, Object> subElement : row.entrySet()) {
if (subElement.getValue() != null) {
if (i == li) {
Cell cell = header2.createCell(header);
cell.setCellValue(subElement.getKey().toString());
header++;
}
li++;
Cell cell2 = row2.createCell(++columnCount2);
cell2.setCellValue(subElement.getValue().toString());
}
}
}
Someone can help?

Iterating over a HashMap's EntrySet
The first problem is that you are iterating over the entrySet of your HashMap
for (HashMap.Entry<String, Object> subElement : row.entrySet()) {
// no guaranteed order
}
Looking at the JavaDoc of the Set#iterator() method you will see this:
Returns an iterator over the elements in this set. The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee).
There are Sets which are ordered (such as the TreeSet), but since you are using a HashMap, your EntrySet won't be ordered too.
Notice the column order in your sheet is kol2-kol3-kol1. Don't you want it to be kol1-kol2-kol3?
Not creating empty columns
You are forgetting to create empty cells for columns you don't have in your Map.
if (subElement.getValue() != null) {
// there won't be an empty cell if you e.g. don't have kol2 in your rows Map,
// since this just skips your current value
}
This is why you end up with something like:
kol2 kol3 kol1
s s
bbbb bbbb aaaa
...
instead of:
kol2 kol3 kol1
s s
bbbb bbbb aaaa
...
Creating the header row inside the loop
By creating the header row inside your loop, you are making your solution more complicated than necessary. It would be much easier just to create the header row and then loop over your entries in the List.
if (i == li) {
Cell cell = header2.createCell(header);
cell.setCellValue(subElement.getKey().toString());
header++;
}
If you are doing this outside the loop, there is no need for the li and the header variable
Suggested solution
I would (for a start) come up with something like this (I added some extra comments I normally wouldn't put there to make more clear what the intentions are and what aspects of the solution you need to understand):
XSSFSheet sheet2 = workBook.createSheet("data2");
List<HashMap<String, Object>> rows = ((List<HashMap<String, Object>>) x);
List<String> headers = Arrays.asList("kol1", "kol2", "kol3");
int currentRowNumber = 0;
// create header row
Row header = sheet2.createRow(currentRowNumber);
for (int i = 0; i < headers.size(); i++) {
Cell headerCell = header.createCell(i);
headerCell.setCellValue(headers.get(i));
}
// create data rows (we loop over the rows List)
for (int i = 0; i < rows.size(); i++) {
HashMap<String, Object> row = rows.get(i);
// we neet to increment the rowNumber for the row in the sheet at the beginning of
// each row. entry 0 in the rows List is in sheetRow 1, entry 1 in sheetRow 2, etc.
currentRowNumber++;
Row sheetRow = sheet2.createRow(currentRowNumber);
// we can now loop over the columns inside the row loop (using the headers List)
// we create a Cell for each column, but only fill it if there is
for (int j = 0; j < headers.size(); j++) {
Cell cell = sheetRow.createCell(j);
// only fill the cell if we are having data in the row map for the current column
String currentColumnName = headers.get(j);
if (row.containsKey(currentColumnName)) {
cell.setCellValue(row.get(currentColumnName).toString());
}
}
}
If you want a different column order, just change the header List and you are done (e.g. Arrays.asList("kol2", "kol3", "kol1")).

How to get the size or length of column in generated Excel file using POI Apache

I think my title is clear what I want to know. I already searched google and there's no answer to my problem.
I want to know how can I get the size or length of an specific column in POI Apache Java?

I thought you cannot getheight for column.But you can getheight for Specific row . other way is use CellStyle to get Height(It can be done using top border+ bottom border+ font height) for specific cell.

I think there is no direct method for it.you have to iterate over all rows to know the size of column.
sample :
for (Cell cell : row) {
++COLUMNCOUNT;
}

Workbook workbook = new XSSFWorkbook(ExcelFile);
Sheet firstSheet = workbook.getSheetAt(0);
Iterator<Row> iterator = firstSheet.iterator();
Row nextRow = iterator.next();
rowCount = firstSheet.getLastRowNum();
columnCount = nextRow.getLastCellNum();

I already found out how to get the size of column in my on way. Post another answers if you have another one for future references.
int columnSize = 0;
for (int x = 0; x < row.getLastCellNum(); x++) {
for (int y = 0; y < row.length; y++) {
columnSize = y;
}
break;
}

How to speed up autosizing columns in apache POI?

I use the following code in order to autosize columns in my spreadsheet:
for (int i = 0; i < columns.size(); i++) {
sheet.autoSizeColumn(i, true);
sheet.setColumnWidth(i, sheet.getColumnWidth(i) + 600);
}
The problem is it takes more than 10 minutes to autosize each column in case of large spreadsheets with more than 3000 rows. It goes very fast for small documents though. Is there anything which could help autosizing to work faster?

Solution which worked for me:
It was possible to avoid merged regions, so I could iterate through the other cells and finally autosize to the largest cell like this:
int width = ((int)(maxNumCharacters * 1.14388)) * 256;
sheet.setColumnWidth(i, width);
where 1.14388 is a max character width of the "Serif" font and 256 font units.
Performance of autosizing improved from 10 minutes to 6 seconds.

The autoSizeColumn function itself works not perfect and some columns width not exactly fit the data inside. So, I found some solution that works for me.
To avoid crazy calculations let give that to autoSizeColumn() function:
sheet.autoSizeColumn(<columnIndex>);
Now, our column autosized by library but we wont to add a little bit more to the current column width to make table looks fine:
// get autosized column width
int currentColumnWidth = sheet.getColumnWidth(<columnIndex>);
// add custom value to the current width and apply it to column
sheet.setColumnWidth(<columnIndex>, (currentColumnWidth + 2500));
The full function could looks like:
public void autoSizeColumns(Workbook workbook) {
int numberOfSheets = workbook.getNumberOfSheets();
for (int i = 0; i < numberOfSheets; i++) {
Sheet sheet = workbook.getSheetAt(i);
if (sheet.getPhysicalNumberOfRows() > 0) {
Row row = sheet.getRow(sheet.getFirstRowNum());
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
int columnIndex = cell.getColumnIndex();
sheet.autoSizeColumn(columnIndex);
int currentColumnWidth = sheet.getColumnWidth(columnIndex);
sheet.setColumnWidth(columnIndex, (currentColumnWidth + 2500));
}
}
}
}
P.S. Thanks Ondrej Kvasnovsky for the function https://stackoverflow.com/a/35324693/13087091

The autosizeColumn() function very slow and unneficient. Even authors of apache POI mentioned in docs, that:
This process can be relatively slow on large sheets, ...
Calculating and setting the cell's width manually is way faster - in my case I reduced the time from ~25,000ms to ~1-5ms.
This is how to achieve it (I was basing on Vladimir Shcherbukhin's answer:
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet();
final int[] maxNumCharactersInColumns = new int[headers.length]; // maximum number of characters in columns. Necessary to calculate the cell width in most efficient way. sheet.autoSizeColumn(...) is very slow.
Row headersRow = sheet.createRow(0);
CellStyle headerStyle = createHeadersStyle(workbook); // createHeadersStyle() is my own function. Create headers style if you want
for (int i = 0; i < headers.length; i++) { // create headers
Cell headerCell = headersRow.createCell(i, CELL_TYPE_STRING);
headerCell.setCellValue(headers[i]);
headerCell.setCellStyle(headerStyle);
int length = headers[i].length();
if (maxNumCharactersInColumns[i] < length) { // adjust the columns width
maxNumCharactersInColumns[i] = length + 2; // you can add +2 if you have filtering enabled on your headers
}
}
int rowIndex = 1;
for (List<Object> rowValues : rows) {
Row row = sheet.createRow(rowIndex);
int columnIndex = 0;
for (Object value : rowValues) {
Cell cell = createRowCell(row, value, columnIndex); // createRowCell() is my own function.
int length;
if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
String cellValue = cell.getStringCellValue();
// this is quite important part. In some excel spreadsheet you can have a values with line-breaks. It'll be cool to handle that scenario :)
String[] arr = cellValue.split("\n"); // if cell contains complex value with line breaks, calculate only the longest line
length = Arrays.stream(arr).map(String::length).max(Integer::compareTo).get();
} else {
length = value != null ? value.toString().length() : 0;
}
if (maxNumCharactersInColumns[columnIndex] < length) { // if the current cell value is the longest one, save it to an array
maxNumCharactersInColumns[columnIndex] = length;
}
columnIndex++;
}
rowIndex++;
}
for (int i = 0; i < headers.length; i++) {
int width = (int) (maxNumCharactersInColumns[i] * 1.45f) * 256; // 1.45f <- you can change this value
sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cell width
}
sheet.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.length - 1));
ByteArrayOutputStream output = new ByteArrayOutputStream();
workbook.write(output);
workbook.close();

Unfortunately I don't have enough reputations yet to add comments in answers. So here some annotations:
When using Row row = sheet.getRow(sheet.getFirstRowNum()); be shure, this row contains at least a value in the last column. Otherwise the cellIterator will end too early, i.e. if a subsequent row has a value in this column, this column will not be autosized. This problem is bypassed if rowcontains the headers (names of the columns). Or explicit use a known header row, e.g.
int indexOfHeaderRow = ...;
...
Row row = sheet.getRow(indexOfHeaderRow);
Jakub Słowikowski
sheet.setColumnWidth(i, Math.min(width, MAX_CELL_WIDTH)); // <- set calculated cellwidth
I'm not shure about this line because there is no information about content of MAX_CELL_WIDTH - perhaps overall maximum? So I used instead:
sheet.setColumnWidth(i, Math.max(width, 2048));
2048 seams to be the default width? This value prevents extremely narrow widths for empty columns.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read empty, but formated, Excel cells with Apache POI? - java

set the border for column B, in my case it helped me

Related

How to check if 2 `XSSFSheet` are identical without having to loop over every cell in each sheet

How to horizontally merge XWPFTable using POI in Java

Export Java HashMap to xlsx

How to get the size or length of column in generated Excel file using POI Apache

How to speed up autosizing columns in apache POI?

Categories

Resources