I have very little knowledge of Apache POI. Is there any optimized way to delete the Excel rows quickly using Apache POI and Java?
It is taking more than two hours to remove the 11000 rows and is it possible to delete multiple rows at a time?
I have used the following code:
for (int s = rowNum; s < 11000; s++) {
Row r = sheet.getRow(s);
if(r != null) {
sheet.removeRow(r);
}
}
Unfortunately, as of apache-poi-5.0.0 it cannot be done any faster.
The problem of removing a row from an XSSFSheet may look linear but is actually quadratic.
So when you try to remove mutliple XSSFRows from an XSSFSheet, you are also removing the XSSFCells associated with it. Thus, more the number of XSSFCells the slower it gets.
Code from XSSFSheet:
/**
* Remove a row from this sheet. All cells contained in the row are removed as well
*
* #param row the row to remove.
*/
#Override
public void removeRow(Row row) {
if (row.getSheet() != this) {
throw new IllegalArgumentException("Specified row does not belong to this sheet");
}
// collect cells into a temporary array to avoid ConcurrentModificationException
ArrayList<XSSFCell> cellsToDelete = new ArrayList<>();
for (Cell cell : row) {
cellsToDelete.add((XSSFCell)cell);
}
for (XSSFCell cell : cellsToDelete) {
row.removeCell(cell);
}
final int rowNum = row.getRowNum();
// Performance optimization: explicit boxing is slightly faster than auto-unboxing, though may use more memory
//noinspection UnnecessaryBoxing
final Integer rowNumI = Integer.valueOf(rowNum); // NOSONAR
// this is not the physical row number!
final int idx = _rows.headMap(rowNumI).size();
_rows.remove(rowNumI);
worksheet.getSheetData().removeRow(idx);
// also remove any comment located in that row
if(sheetComments != null) {
for (CellAddress ref : getCellComments().keySet()) {
if (ref.getRow() == rowNum) {
sheetComments.removeComment(ref);
}
}
}
}
My suggestion would be to copy the XSSFRows from the existing XSSFSheet to a new XSSFSheet and then remove the previous XSSFSheet. Use this iff, number of XSSFRows to be deleted is greater than the number of XSSFRows to be copied.
private void copyRowsAndDeleteSheet(XSSFWorkbook workbook) {
XSSFSheet srcSheet = workbook.createSheet();
// Inserting dummy data
for (int i = 0; i < 11000; i++) {
XSSFRow row = srcSheet.createRow(i);
row.createCell(0).setCellType(CellType.STRING);
row.getCell(0).setCellValue("Hello" + i);
}
// Sheet to be copied to
XSSFSheet destSheet = workbook.createSheet();
// Defines how the cell should be copied
CellCopyPolicy policy = new CellCopyPolicy().createBuilder().cellStyle(true).build();
for (int i = 0; i < 100; i++) {
// Row to be copied
XSSFRow srcRow = srcSheet.getRow(i);
// Inserting into a i+1 instead of i to avoid IllegalArgumentException throw by FormulaShifter
destSheet.createRow(i + 1).copyRowFrom(srcRow, policy);
}
// Shifting the rows up by 1 row to match source sheet
destSheet.shiftRows(1, 100, -1);
// removing the source sheet
workbook.removeSheetAt(workbook.getSheetIndex(srcSheet));
}
Note: copyRowFrom() is still beta so it would be wise to not use it in a production environment.
Related
I have a requirement for apache poi to act like "pulling down" formatting in excel. So taking a sample row, getting the "formatting" in each cell and applying it to all the cells below. Formatting according to the requirement includes number formats and the cells' background colors changing depending on the value. So I wrote a class that gets the CellStyle from the example row's cells and applies it according.
public class FormatScheme implements ObjIntConsumer<Sheet> {
private Map<Integer, CellStyle> cellFormats = new LinkedHashMap<>();
public static FormatScheme of(Row row, int xOffset){
FormatScheme scheme = new FormatScheme();
for (int i = xOffset; i < row.getLastCellNum(); i++) {
Cell cell = row.getCell(i);
if(cell==null) continue;
scheme.cellFormats.put(i, cell.getCellStyle());
}
return scheme;
}
#Override
public void accept(Sheet sheet, int rowIndex) {
Row row = sheet.getRow(rowIndex);
if(row==null) row=sheet.createRow(rowIndex);
Row finalRow = row;
cellFormats.entrySet().forEach(entry -> {
Cell cell = finalRow.getCell(entry.getKey());
if(cell==null) cell= finalRow.createCell(entry.getKey());
cell.setCellStyle(entry.getValue());
});
}
private FormatScheme(){}
}
This does seem to work for the number formats but doesn't grab the changing background colors. ~I guess, I'm missing something.~
With help from Alex Richter I understand that I need to use the sheet's SheetConditionalFormatting. How can I get the ConditionalFormatting that are currently applied to a cell and expand the range the affect downward?
Your question is not really clear. But I suspect you want copying formatting from one target row to multiple adjacent following rows. And you want expanding the ranges of conditional formatting rules too, so that the cells in the adjacent following rows also follow that rules. So the same what Excel's format painter does if you select one row, then click format painter and then select multiple adjacent following rows.
How to copy cell styles, you have got already. But why doing this that complicated? Copying cell styles form one cell to another is a one-liner: targetCell.setCellStyle(sourceCell.getCellStyle());.
Second we should copy possible row style too. The following example has a method void copyRowStyle(Row sourceRow, Row targetRow) for this.
To expand the ranges for the conditional formatting rules we need getting the rules which are applied to the cell. The rules are stored on sheet level. So we need traversing the SheetConditionalFormatting to get the rules where the cell is in range. The following example has a method List<ConditionalFormatting> getConditionalFormattingsForCell(Cell cell) for this.
Having this we can expand the ranges of the conditional formatting rules. The following example has a method void expandConditionalFormatting(Cell sourceCell, Cell targetCell) for this. It expands the ranges of the conditional formatting rules from sourceCell to targetCell.
Complete example which shows the principle:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.usermodel.ConditionalFormatting;
import org.apache.poi.ss.util.*;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.List;
import java.util.ArrayList;
class ExcelCopyFormatting {
static List<ConditionalFormatting> getConditionalFormattingsForCell(Cell cell) {
List<ConditionalFormatting> conditionalFormattingList = new ArrayList<ConditionalFormatting>();
Sheet sheet = cell.getRow().getSheet();
SheetConditionalFormatting sheetConditionalFormatting = sheet.getSheetConditionalFormatting();
for (int i = 0; i < sheetConditionalFormatting.getNumConditionalFormattings(); i++) {
ConditionalFormatting conditionalFormatting = sheetConditionalFormatting.getConditionalFormattingAt(i);
CellRangeAddress[] cellRangeAddressArray = conditionalFormatting.getFormattingRanges();
for (CellRangeAddress cellRangeAddress : cellRangeAddressArray) {
if (cellRangeAddress.isInRange(cell)) {
conditionalFormattingList.add(conditionalFormatting);
}
}
}
return conditionalFormattingList;
}
static void expandConditionalFormatting(Cell sourceCell, Cell targetCell) {
List<ConditionalFormatting> conditionalFormattingList = getConditionalFormattingsForCell(sourceCell);
for (ConditionalFormatting conditionalFormatting : conditionalFormattingList) {
CellRangeAddress[] cellRangeAddressArray = conditionalFormatting.getFormattingRanges();
for (int i = 0; i < cellRangeAddressArray.length; i++) {
CellRangeAddress cellRangeAddress = cellRangeAddressArray[i];
if (cellRangeAddress.isInRange(sourceCell)) {
if (cellRangeAddress.getFirstRow() > targetCell.getRowIndex()) {
cellRangeAddress.setFirstRow(targetCell.getRowIndex());
}
if (cellRangeAddress.getFirstColumn() > targetCell.getColumnIndex()) {
cellRangeAddress.setFirstColumn(targetCell.getColumnIndex());
}
if (cellRangeAddress.getLastRow() < targetCell.getRowIndex()) {
cellRangeAddress.setLastRow(targetCell.getRowIndex());
}
if (cellRangeAddress.getLastColumn() < targetCell.getColumnIndex()) {
cellRangeAddress.setLastColumn(targetCell.getColumnIndex());
}
cellRangeAddressArray[i] = cellRangeAddress;
}
}
conditionalFormatting.setFormattingRanges(cellRangeAddressArray);
}
}
static void copyRowStyle(Row sourceRow, Row targetRow) {
if (sourceRow.isFormatted()) {
targetRow.setRowStyle(sourceRow.getRowStyle());
}
}
static void copyCellStyle(Cell sourceCell, Cell targetCell) {
targetCell.setCellStyle(sourceCell.getCellStyle());
}
static void copyFormatting(Sheet sheet, int fromRow, int upToRow) {
Row sourceRow = sheet.getRow(fromRow);
for (int r = fromRow + 1; r <= upToRow; r++) {
Row targetRow = sheet.getRow(r);
if (targetRow == null) targetRow = sheet.createRow(r);
copyRowStyle(sourceRow, targetRow);
for (Cell sourceCell : sourceRow) {
Cell targetCell = targetRow.getCell(sourceCell.getColumnIndex());
if (targetCell == null) targetCell = targetRow.createCell(sourceCell.getColumnIndex());
copyCellStyle(sourceCell, targetCell);
if (r == upToRow) {
if (getConditionalFormattingsForCell(sourceCell).size() > 0) {
expandConditionalFormatting(sourceCell, targetCell);
}
}
}
}
}
public static void main(String[] args) throws Exception {
//Workbook workbook = WorkbookFactory.create(new FileInputStream("./Workbook.xls")); String filePath = "./WorkbookNew.xls";
Workbook workbook = WorkbookFactory.create(new FileInputStream("./Workbook.xlsx")); String filePath = "./WorkbookNew.xlsx";
Sheet sheet = workbook.getSheetAt(0);
copyFormatting(sheet, 1, 9); // copy formatting from row 2 up to row 10
FileOutputStream out = new FileOutputStream(filePath);
workbook.write(out);
out.close();
workbook.close();
}
}
In my code I am going through an XLSX-file row by row, validating them against a database with Apache POI 4.1.0. If I find a incorrect row I will "mark" them for deletion by adding it to the the List<XSSFRow> toRemove. After iterating over every row this small method is supposed to remove the rows marked for deletion:
ListIterator<XSSFRow> rowIterator = toRemove.listIterator(toRemove.size());
while (rowIterator.hasPrevious()) {
XSSFRow row = rowIterator.previous();
if (row != null && row.getSheet() == sheet) {
int lastRowNum = sheet.getLastRowNum();
int rowIndex = row.getRowNum();
if (rowIndex == lastRowNum) {
sheet.removeRow(row);
} else if (rowIndex >= 0 && rowIndex < lastRowNum) {
sheet.removeRow(row);
} else {
System.out.println("\u001B[31mERROR: Removal failed because row " + rowIndex + " is out of bounds\u001B[0m");
}
System.out.println("Row " + rowIndex + " successfully removed");
} else {
System.out.println("Row skipped in removal because it was null already");
}
}
But for some unknown reason it removes all rows perfectly and then throws a XmlValueDisconnectedException when getting the row index (getRowNum()) of the last (first added) row.
Relevant part of the Stacktrace:
org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1258)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTRowImpl.getR(Unknown Source)
at org.apache.poi.xssf.usermodel.XSSFRow.getRowNum(XSSFRow.java:400)
at Overview.removeRows(Overview.java:122)
EDIT: I also tried changing the iteration process (see below) but the error stays the same.
for (XSSFRow row : toRemove) {
// same code as above without iterator and while
}
The error occurs if one row is double contained in List toRemove. A List allows duplicate entries. So the same row may be double added to the List. If then Iterator gets the first occurrence of that row and this will be removed properly from the sheet. But then if the same row occurs again later, the row.getRowNum() fails that way because the row does not more exists in the sheet.
Here is complete code to reproduce that behavior:
import org.apache.poi.ss.usermodel.*;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.*;
public class ExcelRemoveRows {
public static void main(String[] args) throws Exception {
String filePath = "Excel.xlsx"; // must contain at least 5 filled rows
Workbook workbook = WorkbookFactory.create(new FileInputStream(filePath));
Sheet sheet = workbook.getSheetAt(0);
List<Row> toRemoveList = new ArrayList<Row>();
toRemoveList.add(sheet.getRow(0));
toRemoveList.add(sheet.getRow(2));
toRemoveList.add(sheet.getRow(4));
toRemoveList.add(sheet.getRow(2)); // this produces the error
System.out.println(toRemoveList); // contains row hawing index 2 (r="3") two times
for (Row row : toRemoveList) {
System.out.println(row.getRowNum()); // XmlValueDisconnectedException on second occurance of row index 2
sheet.removeRow(row);
}
FileOutputStream out = new FileOutputStream("Changed"+filePath);
workbook.write(out);
out.close();
workbook.close();
}
}
The solution is to avoid that the List contains the same row multiple times.
I would not collecting the rows to remove in a List<XSSFRow> but the row numbers to remove in a Set<Integer>. That would avoid duplicates since a Set does not allow duplicate elements. The row to remove then can simply got via sheet.getRow(rowNum).
Code:
...
Set<Integer> toRemoveSet = new HashSet<Integer>();
toRemoveSet.add(sheet.getRow(0).getRowNum());
toRemoveSet.add(sheet.getRow(2).getRowNum());
toRemoveSet.add(sheet.getRow(4).getRowNum());
toRemoveSet.add(sheet.getRow(2).getRowNum());
System.out.println(toRemoveSet); // does not contain the row index 2 two times
for (Integer rowNum : toRemoveSet) {
Row row = sheet.getRow(rowNum);
System.out.println(row.getRowNum());
sheet.removeRow(row);
}
...
My objective is to sort shapes into sheets > shelf objects depending on the parameters of each randomly generated shape. Each sheet can contain as many shelves as necessary depending on sheet parameters. The max width is (SHEET_WIDTH) and max height is (SHEET+HEIGHT). When the with is too high but there is enough height, a new shelf is generated within the sheet object. Each sheet and shelf is tracked by an index.
My problem is that all shapes are being stored within the same shelf on the first sheet. I've manually gone through each statement and cannot find a resolution.
public List<Sheet> nextFit(List<Shape> shapes) {
List<Sheet> usedSheets = new ArrayList<Sheet>();
int sheetIndex = 0;
int shelfIndex = 0;
Sheet firstSheet = new Sheet();
Shelf firstShelf = new Shelf();
firstSheet.addShelf(firstShelf);
usedSheets.add(firstSheet);
Sheet currentSheet = usedSheets.get(sheetIndex);
Shelf currentShelf = currentSheet.getShelves().get(shelfIndex);
// usedsheets(0)(depending on index)(get shelf
for (Shape s: shapes) {
if (s.getWidth() <= Sheet.SHEET_WIDTH - currentShelf.getWidth()
&& s.getHeight() <= Sheet.SHEET_HEIGHT - currentSheet.allShelvesHeight()) {
currentShelf.place(s);
} else if (s.getWidth() > Sheet.SHEET_WIDTH - currentShelf.getWidth()
&& s.getHeight() <= Sheet.SHEET_HEIGHT - currentSheet.allShelvesHeight()) {
Shelf nextShelf = new Shelf();
currentSheet.addShelf(nextShelf);
shelfIndex++;
currentShelf.place(s);
} else if (s.getHeight() > Sheet.SHEET_HEIGHT - currentSheet.allShelvesHeight()) {
Sheet newSheet = new Sheet();
Shelf newShelf = new Shelf();
newSheet.addShelf(newShelf);
usedSheets.add(newSheet);
sheetIndex++;
shelfIndex = 0;
currentShelf.place(s);
}
}
return usedSheets;
}
You never change currentShelf when you create a new Shelf object
currentSheet.addShelf(nextShelf);
shelfIndex++;
currentShelf = nextShelf;
currentShelf.place(s);
And you have the same problem when you create a new sheet with both currentSheet and currentShelf
I have a code in which I traverse table rows and columns, and I'd like to add it's values to a list.
It takes me a significant amount of time.
So I added a time measurement, and I noticed that for some reason the time increases from row to row.
I cannot understand why.
Can you advise please?
private void buildTableDataMap() {
WebElement table = chromeWebDriver.findElement(By.id("table-type-1"));
List<WebElement> rows = table.findElements(By.tagName("tr"));
theMap.getInstance().clear();
String item;
for (WebElement row : rows) {
ArrayList<String> values = new ArrayList<>();
List<WebElement> tds = row.findElements(By.tagName("td"));
if(tds.size() > 0){
WebElement last = tds.get(tds.size() - 1);
long time = System.currentTimeMillis();
values.addAll(tds.stream().map(e->e.getText()).collect(Collectors.toList()));
System.out.println(System.currentTimeMillis() - time);
//remove redundant last entry:
values.remove(tds.size() - 1);
callSomeFunc(values, last);
item = tds.get(TABLE_COLUMNS.NAME_COL.getNumVal()).getText();
item = item.replaceAll("[^.\\- /'&A-Za-z0-9]", "").trim();//remove redundant chars
theMap.getInstance().getMap().put(item, values);
}
}
}
Guys, I continued researching.
First of all, Florent's kind answer did not help me because, at lease as I understand, It returned me an array list of strings which I had to parse, and I don't like this kind of solution too much...
So I nailed the problem in finding that the e.getText() call was increasing in time from call to call!!!
I also tried e.getAttribute("innerText") instead but no change.
Can't understand why. Any idea to solve?
WebElement last = null;
for (WebElement e : tds){
last = e;
long tm1 = 0, tm2 = 0;
if(Settings.verboseYN) {
tm1 = System.currentTimeMillis();
}
s = e.getText(); //This action increases in time!!!
if(Settings.verboseYN) {
tm2 = System.currentTimeMillis();
}
values.add(s); //a 0 ms action!!!
if(Settings.verboseYN) {
System.out.println("e.getText()) took " + (tm2 - tm1) + " ms...");
}
}
That is an graph of the time getText took...
08-May-18
Another source of growing execution time is this one:
void func(WebElement anchorsElement){
List<WebElement> anchors = anchorsElement.findElements(By.tagName("a"));
for (WebElement a : anchors) {
if (a.getAttribute("class").indexOf("a") > 0)
values.add("A");
else if (a.getAttribute("class").indexOf("b") > 0)
values.add("B");
else if (a.getAttribute("class").indexOf("c") > 0)
values.add("C");
}
}
Every functions has 5 iterations only, but still each call to the function increases its execution time.
Is there a solution for this one as well?
Calling the driver is an expensive operation. To significantly reduce the execution time, use a JavaScript injection with executeScript to read the whole table in a single call. Then process/filter the data on the client side with Java.
public ArrayList<?> readTable(WebElement table)
{
final String JS_READ_CELLS =
"var table = arguments[0]; " +
"return map(table.querySelectorAll('tr'), readRow); " +
"function readRow(row) { return map(row.querySelectorAll('td'), readCell) }; " +
"function readCell(cell) { return cell.innerText }; " +
"function map(items, fn) { return Array.prototype.map.call(items, fn) }; " ;
WebDriver driver = ((RemoteWebElement)table).getWrappedDriver();
Object result = ((JavascriptExecutor)driver).executeScript(JS_READ_CELLS, table);
return (ArrayList<?>)result;
}
The problem you are facing is because of the way Selenium works by design. Let's look at how a JavaScript get's executed or a operation is performed
tds.get(TABLE_COLUMNS.NAME_COL.getNumVal()).getText();
You have a collection of objects. Each object is assigned a unique ID on the browser side by the selenium driver
So when you do a getText() below is what happens
Your code -> HTTP Request -> Browser Driver -> Browser ->
|
<---------------------------------------------
Now if you have a table of 400rx10c then it accounts to 4000 HTTP calls, even if one call takes 10ms, we are looking at a 40000ms~=40sec, which is a decent delay to read a table
So what you want to do is to get all the data in single go by executing a javascript which give you 2d array back. It is quite simple, I found a code on below site
http://cwestblog.com/2016/08/21/javascript-snippet-convert-html-table-to-2d-array/
function tableToArray(tbl, opt_cellValueGetter) {
opt_cellValueGetter = opt_cellValueGetter || function(td) { return td.textContent || td.innerText; };
var twoD = [];
for (var rowCount = tbl.rows.length, rowIndex = 0; rowIndex < rowCount; rowIndex++) {
twoD.push([]);
}
for (var rowIndex = 0, tr; rowIndex < rowCount; rowIndex++) {
var tr = tbl.rows[rowIndex];
for (var colIndex = 0, colCount = tr.cells.length, offset = 0; colIndex < colCount; colIndex++) {
var td = tr.cells[colIndex], text = opt_cellValueGetter(td, colIndex, rowIndex, tbl);
while (twoD[rowIndex].hasOwnProperty(colIndex + offset)) {
offset++;
}
for (var i = 0, colSpan = parseInt(td.colSpan, 10) || 1; i < colSpan; i++) {
for (var j = 0, rowSpan = parseInt(td.rowSpan, 10) || 1; j < rowSpan; j++) {
twoD[rowIndex + j][colIndex + offset + i] = text;
}
}
}
}
return twoD;
}
I assume you store the above script in a SCRIPT variable and then you can run it like below
WebDriver driver = ((RemoteWebElement)table).getWrappedDriver();
Object result = ((JavascriptExecutor)driver).executeScript(SCRIPT + "\n return tableToArray(arguments[0]);" , table);
This will get you a 2D array of the data and you can then process it the way you like it
I am using arraylist to collect reference IDs of the product from Excel sheet. I am using POI. The issue is that I don't want to include blank cells in my arraylist. Here is my code:
public static ArrayList<String> extractExcelContentByColumnIndex()
{
ArrayList<String> columnData=null;
try
{
FileInputStream fis=new FileInputStream(excel_path);
Workbook wb=WorkbookFactory.create(fis);
Sheet sh=wb.getSheet(sheetname);
Iterator<Row> rowIterator=sh.iterator();
columnData=new ArrayList<>();
while(rowIterator.hasNext())
{
Row row=rowIterator.next();
Iterator<Cell> cellIterator=row.cellIterator();
while(cellIterator.hasNext())
{
Cell cell=cellIterator.next();
if((row.getRowNum()>=3) && (row.getRowNum()<=sh.getPhysicalNumberOfRows()))
{
if(cell.getColumnIndex()==3)// column under watch
{
columnData.add(cell.getStringCellValue());
Collections.sort(columnData);
}
}
}
}
fis.close();
}
catch(Exception e)
{
e.printStackTrace();
}
System.err.println("DL BoM = "+columnData);
return columnData;
}
And output is :
DL BoM = [, , , , , , 03141, 03803, 08002, 08012, 08817, 13124, A9C22712, A9N21024, A9N21027, A9N21480]
The POI documentation provides some useful information to that specific topic.
Iterate over cells, with control of missing / blank cells
In some cases, when iterating, you need full control over how missing
or blank rows and cells are treated, and you need to ensure you visit
every cell and not just those defined in the file. (The CellIterator
will only return the cells defined in the file, which is largely those
with values or stylings, but it depends on Excel).
In cases such as these, you should fetch the first and last column
information for a row, then call getCell(int, MissingCellPolicy) to
fetch the cell. Use a MissingCellPolicy to control how blank or null
cells are handled.
// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
if (r == null) {
// This whole row is empty
// Handle it as needed
continue;
}
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}
Source: http://poi.apache.org/spreadsheet/quick-guide.html#Iterator
Before getting the cell value.. check if it's not empty
if(cell.getColumnIndex()==3)// column under watch
{
if(cell.getStringCellValue().Trim() != "")
{
columnData.add(cell.getStringCellValue());
}
Collections.sort(columnData);
}