Apache POI formula cell duplication very slow - java

I'm generating an excel file using Apache POI 3.8 , and have the need to replicate some existing row n° times.
This because I have some complex formula which I use as a template to create new lines, replacing cell indexes with regexps.
The problem is that performance are awful, it takes 2h to generate some 4000 rows.
I've pinpointed the problem to be not in the regexp part, as I initially thought, but in the duplication of formula cells.
I actually use this to replicate formula cells:
case Cell.CELL_TYPE_FORMULA:
newCell.setCellType(oldCell.getCellType());
newCell.setCellFormula(oldCell.getCellFormula());
break;
If I copy the formula as text like this:
case Cell.CELL_TYPE_FORMULA:
newCell.setCellType(Cell.CELL_TYPE_STRING);
newCell.setCellValue("="+oldCell.getCellFormula());
break;
it's instead pretty fast, even with my regexp in place.
Anyway, this is an imperfect solution, because the formula has english keywords (ie IF()), when I need to write in italian format.
More, cells with formula inserted like that need to be forcefully re-evaluated in excel with something like "replace all -> '=' with '='".
The problem seems to rely in the setCellFormula(), because of the HSSFFormulaParser.parse().
What's strange, is that parsing time seems to grow exponentially:
100 rows -> 6785ms
200 rows -> 23933ms
300 rows -> 51388ms
400 rows -> 88586ms
What it seems, is that each time I copy a formula, the POI library re-evaluates or re-parses or re-something all preceding rows.
Do anyone know how can solve this problem?
Thanks in advance.

Oh my...I think I found it...
Original was:
// If the row exist in destination, push down all rows by 1 else create a new row
if (newRow != null) {
worksheet.shiftRows(destinationRowNum, worksheet.getLastRowNum(), 1);
} else {
newRow = worksheet.createRow(destinationRowNum);
}
I've commented everything leaving only
newRow = worksheet.createRow(destinationRowNum);
And now I'm down to 60sec to process all rows!
Probably, there's some dirt in my template which was causing POI to shift everything at each iteration.

Related

Apache Poi cell not returning the correct value

I have a excel file with a cell that generates the number 3.69 (based on calculations from proceeding numbers)
However when pulling that number in java using
if (brightCell.getNumericCellValue()) > 0 )
{
double brightness = brightCell.getNumericCellValue();
return brightness;
}
I've also tried:
if (Double.parseDouble(brightCell.getStringCellValue()) > 0 )
{
double brightness = Double.parseDouble(brightCell.getStringCellValue());
return brightness;
}
brightCell is instantiated with :
brightCell = spreadsheet.getRow(new CellReference(brightString).getRow()).getCell(new CellReference(brightString).getCol());
brightString is String brightString = "BV29"
But with both solutions, brightness receives the value, 3.2133....
So thanks to #Igor I managed to figure it out but it led to more issues.
So the solution was creating an evaluator
FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
evaluator.setIgnoreMissingWorkbooks(true); //if you need it
when you finish setting the required cells and want to evaluate.
evaluator.EvaluateAll();
The problem for me is I'm doing this multiple times and my 1st resut is correct but upon the second iteration it becomes skewed, and more skewed.
What I'm doing is setting various cells (via java) then before I retrieve the value for a cell (that contains a formula) I run EvaluateAll. Now, I'm not sure if I should be evaluating after EVERY change or after I make all my changes to the excel sheet (via java).
I can't evaluate a specific cell at a time because there's over 38 sheets with multitudes of formulas. So EvaluateAll is the best option for me
EDIT 26/10/2018*
So the issue was not clearing the cache after making inputs. The solution was after each input as specified in the javaDoc that:
Should be called whenever there are changes to input cells in the evaluated workbook.
Failure to call this method after changing cell values will cause incorrect behaviour
of the evaluate~ methods of this class
therefore after making an input on a cell you should call evaluator.clearAllCachedResultValues();

Using POI Set Cell Style Based on Cell Formula Result

I need some help on setting the cell style base on the cell value.
The code used to populate cell.
String totalvariationweightv1 = "J" + (x+1);
String totalvariationweightv2 = "L" + (x+1);
cell85014.setCellType(Cell.CELL_TYPE_FORMULA);
cell85014.setCellFormula("SUM(((" + totalvariationweightv2 + "-" + totalvariationweightv1 + ")/" + totalvariationweightv1 + ")*100)");
Then I need to color the field if it exceeds a certain value. Right now I just have alternating colors:
cell85014.setCellStyle((x%2)==0?stylefloatGray:stylefloat);
I cannot figure out how to get the cell value. Using getNumericValue returns 0.
Apache POI stores the formula, but it doesn't evaluate it automatically.
The Excel file format (both .xls and .xlsx) stores a "cached" result for every formula along with the formula itself. This means that when the file is opened, it can be quickly displayed, without needing to spend a long time calculating all of the formula results. It also means that when reading a file through Apache POI, the result is quickly available to you too!
After making changes with Apache POI to either Formula Cells themselves, or those that they depend on, you should normally perform a Formula Evaluation to have these "cached" results updated. This is normally done after all changes have been performed, but before you write the file out.
You must tell Apache POI to evaluate the formula separately.
FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
// Set your cell formula here
switch (evaluator.evaluateFormulaCell(cell85014)) {
case Cell.CELL_TYPE_NUMERIC:
double x = cell85014.getNumericCellValue();
// Set cell style here, based on numeric value,
// as you already are doing in your code.
// Watch out for floating point inaccuracies!
break;
default:
System.err.println("Unexpected result type!");
break;
}

POI: setCellType(Cell.CELL_TYPE_FORMULA) fails because of Cell.CELL_TYPE_ERROR

My Java application reads an xls file and presents it on a JTable. So far so good.
When I try to save my worksheet, I iterate over row,col in my JTable and:
String str = (String) Table.getValueAt(row, col);
HSSFRow thisrow = sheet.getRow(row);
HSSFCell thiscell = thisrow.getCell(col);
if(thiscell==null) thiscell = thisrow.createCell(col);
switch(inferType(str)) {
case "formula":
thiscell.setCellType(Cell.CELL_TYPE_FORMULA);
thiscell.setCellFormula(str.substring(1));
break;
case "numeric":
thiscell.setCellType(Cell.CELL_TYPE_NUMERIC);
thiscell.setCellValue(Double.parseDouble(str));
break;
case "text":
thiscell.setCellType(Cell.CELL_TYPE_STRING);
thiscell.setCellValue(str);
break;
}
But when I run over a cell which was originally a formula, say A1/B1, that is #DIV/0! at the moment, setCellType fails.
With much investigation I found out that when setCellType is called, it tries to convert the old content to the new type. BUT, this didn't seem a problem to me, since every table formula cell was already a formula in the xls. Hence, I am never actually changing types.
Even so, when I call setCellType(Cell.CELL_TYPE_FORMULA) on a cell that is already a formula, but it is evaluated to #DIV/0!, I get an conversion exception.
Exception in thread "AWT-EventQueue-0" java.lang.IllegalStateException: Cannot get a numeric value from a error formula cell
at org.apache.poi.hssf.usermodel.HSSFCell.typeMismatch(HSSFCell.java:648)
at org.apache.poi.hssf.usermodel.HSSFCell.checkFormulaCachedValueType(HSSFCell.java:653)
at org.apache.poi.hssf.usermodel.HSSFCell.getNumericCellValue(HSSFCell.java:678)
at org.apache.poi.hssf.usermodel.HSSFCell.setCellType(HSSFCell.java:317)
at org.apache.poi.hssf.usermodel.HSSFCell.setCellType(HSSFCell.java:283)
Actually my only workaround is, before setCellType:
if(thiscell.getCachedFormulaResultType()==Cell.CELL_TYPE_ERROR)
thiscell = thisrow.createCell(col);
This IS working, but I lose the original layout of the cell, e.g. its colors.
How can I properly setCellType if the Cell is a formula with evaluation error?
I found this in the mailing list of poi-apache:
There are two possible scenarios when setting value for a formula
cell;
Update the pre-calculated value of the formula. If a cell contains formula then cell.setCellValue just updates the pre-calculated
(cached) formula value, the formula itself remains and the cell type
is not changed
Remove the formula and change the cell type to String or Number:
cell.setCellFormula(null); //Remove the formula
then cell.setCellValue("I changed! My type is CELL_TYPE_STRING now"");
or cell.setCellValue(200); //NA() is gone, the real value is 200
I think we can improve cell.setCellValue for the case (1). If the new
value conflicts with formula type then IllegalArgumentException should
be thrown.
Regards, Yegor
Still, it does feel like a workaround to me. But everything is now working.
cell.setCellFormula(null) before any setCellType should prevent conversion failure, because the first will discard the cached content.

Apache POI seeing columns in empty spreadsheet?

I have an empty spreadsheet, but when I'm accessing it with Apache POI (version 3.10), it says it has 1024 columns and 20 physical columns.
I really deleted all the cells, only some formatting remains, but no content.
And if I delete some columns with LibreOffice Calc (version 4.1.3.2), the number of columns only increases! What's going on?
Is there a reliable way to get the real number of columns (or cells in a row)?
import java.net.URL;
import org.apache.poi.ss.usermodel.*;
public class Test {
public static void main(final String... args) throws Exception {
final URL url = new URL("http://aditsu.net/empty.xlsx");
final Workbook w = WorkbookFactory.create(url.openStream());
final Row r = w.getSheetAt(0).getRow(0);
System.out.println(r.getLastCellNum());
System.out.println(r.getPhysicalNumberOfCells());
}
}
After some more investigation, I think I figured out what's happening.
First, some terminology from POI: there are some cells that don't actually exist at all in the spreadsheet - those are called missing, or undefined/not defined. Then there are some cells that are defined, but have no value - those are called blank cells. Both types of cells appear empty in a spreadsheet program and can't be distinguished visually.
My spreadsheet has some blank cells that LibreOffice added at the end of the row (possibly a bug). When I delete columns, LibreOffice seems to shift the subsequent cells (including the blank ones) to the left, and adds more blank cells at the end (up to 1024).
And now the key part: neither getLastCellNum() nor getPhysicalNumberOfCells() ignore blank cells. getLastCellNum() gives the last defined cell, and getPhysicalNumberOfCells() gives the number of defined cells, both including blank cells. There doesn't seem to be any method available that skips blank cells. The javadoc for getPhysicalNumberOfCells() is somewhat misleading - "if only columns 0,4,5 have values then there would be 3", but it's actually counting blank cells too, which don't really have values.
So the only solution I found is to loop through the cells and check if they are blank.
Side note: getLastRowNum() and getFirstCellNum() are 0-based but getLastCellNum() is 1-based, wtf?
Most likely you have some kind of formatting applied for you row. I have an empty xlsx file created with excel and method getRow produces null for empty rows.
#aditsu as per https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/Row.html, getLastCellNum() gets the index of the last cell contained in this row PLUS ONE.
+1 for libreOffice strugle! it's a bug, and in my opinion is very random. I'm getting null randomly, and often helps if I delete EMPTY rows (bellow) and EMPTY columns (on the right side).
...

Writing a formula to a cell with OpenXLS

I'm using Java and OpenXLS to write out an Excel spreadsheet. I want to set a formula for a cell but I haven't got a clue how to do it. Can anybody help me, please? :)
(Can't tag this with "openxls" because I'm a new user...)
I don't know about OpenXLS, but it's easy to do with Andy Khan's JExcel. I'd recommend trying it. I think it's far superior to POI; I'm betting that it's better than OpenXLS as well.
OpenXLS support very well formulas. Look at this example.
I put a value in the columns A and B of a sheet named "testSheet". In the column C of the same sheet I put the result of SUM (A+B).Don't forget to initialise the column C else you will have a CellNotFoundException
WorkBookHandle workbook = new WorkBookHandle();
workbook.createWorkSheet("testSheet");
WorkSheetHandle sheet = workbook.getWorkSheet("testSheet");
for (int i=1 ;i<=10; i++)
{
sheet.add(10*i, "A"+i);
sheet.add(15*i, "B"+i);
CellHandle cx = sheet.add(0,"C"+i);
cx.setFormula("=SUM(A"+i+":B"+i+")");
}
I hope that that this example will help other people.
Ultimately it turned out that OpenXLS doesn't support formula cells. They are included in the paid for version, though...
You can set the formula String directly on the cell in the Worksheet:
CellHandle cell = ws.add( "=SUM(A1:A3)", "A5" );
This adds the SUM(A1:A3) formula in cell A5. Any Cell set with a String value that is prefixed with '=' is considered a Formula.
Updates and maintenance are now happening on github (search for openxls).

Categories

Resources