How to remove all formulas from workbook with Java POI - java

The same question was asked here a few years ago:
how to remove all formulas from an excel sheet by java POI api?.
However, it did not receive an answer at the time that works for me.
I have a workbook with several large sheets and want to loop over all cells to replace the cell contents with strings. The problem is, many cells contain formulas which I have to get rid of first. Neither cell.setCellFormula(null) nor cell.setCellType(CellType.STRING) (nor BLANK) is satisfying, as the underlying processes to remove array formulas take ages and make the entire job far too slow.
The following works but leaves a corrupt excel workbook which can only be opened with a repairing step on the first time:
Method m = XSSFCell.class.getDeclaredMethod("setBlank");
m.setAccessible(true);
m.invoke(cell);
Is there any other fast and cleaner way to simply set certain cells blank, regardless of any formulas?

The problem why the corrupted workbook occurs is that there is a calculation chain stored in /xl/calcChain.xml. The normal slow methods to remove the formulas will updating this calculation chain. But, as you found already, they also attempt to be usable for removing single formulas only and not all. So they must be carefully while removing parts of array formulas which makes them slow.
But if really all formulas shall be removed, this carefulness is not necessary and then simply the whole /xl/calcChain.xml can be removed.
Example:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.model.CalculationChain;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellFormulaType;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.POIXMLDocumentPart;
import java.lang.reflect.Method;
class ExcelRemoveFormulasAndCalcChain {
private static void removeCalcChain(XSSFWorkbook workbook) throws Exception {
CalculationChain calcchain = workbook.getCalculationChain();
Method removeRelation = POIXMLDocumentPart.class.getDeclaredMethod("removeRelation", POIXMLDocumentPart.class);
removeRelation.setAccessible(true);
removeRelation.invoke(workbook, calcchain);
}
public static void main(String[] args) throws Exception {
XSSFWorkbook workbook = (XSSFWorkbook)WorkbookFactory.create(new FileInputStream("Test.xlsx"));
for (Sheet sheet : workbook) {
for (Row row : sheet) {
for (Cell cell : row) {
XSSFCell xssfcell = (XSSFCell)cell;
if (xssfcell.getCTCell().isSetF() && xssfcell.getCTCell().getF().getT() != STCellFormulaType.DATA_TABLE) {
xssfcell.getCTCell().unsetF();
}
}
}
}
removeCalcChain(workbook);
workbook.write(new FileOutputStream("Test_1.xlsx"));
workbook.close();
}
}
This should remove all formulas and let all cells back containing only the values and styles.

I think I was able to find how to remove formulas in some cell range.
I noticed that if I delete sheet first formulas with link to it are deleted fast.
If I swap deleting of formulas and deleting of sheets it takes a lot of time.
So if we create a sheet, rewrite all formulas using link to it, and delete sheet, formulas are removed fast (setting formulas with link to non-existing sheet doesn't work).
It takes seconds for 15k+ rows. Here is the experiment:
File fReport = new File(".xlsx");
XSSFWorkbook book = new XSSFWorkbook(new FileInputStream(fReport));
XSSFSheet sheet = book.getSheet("");
XSSFSheet dummy = book.createSheet("dummy");
int lastRow = sheet.getLastRowNum();
for (int i = 8; i <= lastRow; i++) {
XSSFRow rowToClean = sheet.getRow(i);
XSSFCell cell = rowToClean.getCell(2);
System.out.println(i);
if (cell != null) {
cell.setCellFormula("'dummy'!A1");
}
}
book.removeSheetAt(book.getSheetIndex(dummy));
for (int i = 8; i <= lastRow; i++) {
XSSFRow rowToClean = sheet.getRow(i);
XSSFCell cell = rowToClean.getCell(2);
System.out.println(i);
if (cell != null) {
cell.removeFormula();
}
}
book.write(new FileOutputStream(fReport));
book.close();

Related

How to get the usedrange of sheet without VBA?

I use groovy script to works with excel file. And I use POI API to manipulate those files. But in the documentation there is no methods or object that help me to find a way to get the used range of a sheet. I tried to calculate it by my own using methods like getLastRowNum() or getPhysicalNumberOfRows() but none of them works well because they stop counting when they meet an empty rows. Sometimes excel file can have empty rows and after those empty rows they could be filled rows, but those methods just STOP when they just meet one empty rows. So those function will not help me to reach my goal.
So I try another solution. I want to create a named range in the workbook by using the methods createName() then make a named range with a formula that return the usedrange of the actual sheet. But I don't know how to make it, I searched a lot and all I found is about VBA, I don't want to use it because in named range formula we can't use VBA. I found a function call GET.WORKBOOK and I think this could a good start point to search an answer about my problem. This function return the list of worksheet name of the workbook. There no link between my problem and this result but I think that GET object could contain more method like GET.WORKSHEET it's very speculative but I think there is more than just GET.WORKBOOK. (If you have any informations about this, even if it's not solve my problem please put this in the comment please, I'm really interested in this GET function.)
NB : If you find a way to solve my problem with a groovy-only solution I would be very happy too. I didn't recall this type of solution because I search a lot in this direction but I didn't found anything to help me.
NB2 : I adding java tag because groovy and java are very close. And I think someone that can found a solution in java for this problem could do the same in groovy.
NB3 : I want a cell reference like A1:B2 to specify the used range
NB4 : I re-test the methods getLastRowNum() and it worked perfectly, I made some mistakes in my code that's why it didn't work well. Now here's my new problem, when I use this method I cannot access to a cell that empty with the getCell methods. Here's my code :
import org.apache.poi.ss.usermodel.WorkbookFactory;
wb = WorkbookFactory.create(new File("./webapps/etlserver/data/files/test_ws.xlsx"));
def getUsedRangeByIndex(file_path,ind_ws){
wb = WorkbookFactory.create(new File(file_path));
max_col = 0;
for(int i = 0 ; i < wb.getSheetAt(ind_ws).getLastRowNum() ; i++){
LOG.info(i.toString())
if(wb.getSheetAt(ind_ws).getRow(i) != null && wb.getSheetAt(ind_ws).getRow(i).getLastCellNum() > max_col){
max_col = wb.getSheetAt(ind_ws).getRow(i).getLastCellNum();
}
}
return "A1:" + wb.getSheetAt(ind_ws).getRow(wb.getSheetAt(ind_ws).getLastRowNum()).getCell(max_col, RETURN_NULL_AND_BLANK).getReference()
}
LOG.info(getUsedRangeByIndex("./webapps/etlserver/data/files/test_ws.xlsx",0))
I know I have to improve it with some code that calculate the first used cell but for now I will consider A1 as the first used cell.
If the definition of the used range of a worksheet is as follows: ...
The used range is the cell range from first used top left cell to last used bottom right cell.
... and the used Apache POI version is one of the current ones (me using apache poi 5.2.2) , then the simplest approach to get that used range is usig following methods:
Sheet.getFirstRowNum and Sheet.getLastRowNum to get first used row and last used row in sheet. If one of this return -1 then the sheet does not contain any rows and so does not have a used range.
Then loop over all rows between first used row and last used row and get Row.getFirstCellNum and Row.getLastCellNum. Note the API doc of Row.getLastCellNum : Gets the index of the last cell contained in this row PLUS ONE. If the found first column in that row is lower than before found first columns, then this is the new first column. If the found last column in that row is greater than before found last columns, then this is the new last column.
After that we have first used row, last used row, leftmost used colum and rightmost used column. That is the used range then.
Complete example:
import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.CellRangeAddress;
class ExcelGetSheetUsedRange {
/**
* Simplest method to get the used range from a sheet.
*
* #param sheet The sheet to get the used range from.
+ #return CellRangeAddress representing the used range or null for an empty sheet.
*/
static CellRangeAddress getUsedRange(Sheet sheet) {
int firstRow = sheet.getFirstRowNum();
if (firstRow == -1) return null;
int lastRow = sheet.getLastRowNum();
if (lastRow == -1) return null;
int firstCol = Integer.MAX_VALUE;
int lastCol = -1;
for (int r = firstRow; r <= lastRow; r++) {
Row row = sheet.getRow(r);
if (row != null) {
int thisRowFirstCol = row.getFirstCellNum();
int thisRowLastCol = row.getLastCellNum()-1; // see API doc Row.getLastCellNum : Gets the index of the last cell contained in this row PLUS ONE.
if (thisRowFirstCol < firstCol) firstCol = thisRowFirstCol;
if (thisRowLastCol > lastCol) lastCol = thisRowLastCol;
}
}
if (firstCol == Integer.MAX_VALUE) return null;
if (lastCol == -1) return null;
return new CellRangeAddress(firstRow, lastRow, firstCol, lastCol);
}
public static void main(String[] args) throws Exception {
//Workbook workbook = WorkbookFactory.create(new FileInputStream("./template.xls"));
Workbook workbook = WorkbookFactory.create(new FileInputStream("./template.xlsx"));
Sheet sheet = workbook.getSheetAt(0);
CellRangeAddress usedRange = getUsedRange(sheet);
System.out.println(usedRange);
}
}
As told in API doc of Sheet.getLastRowNum:
Note: rows which had content before and were set to empty later might
still be counted as rows by Excel and Apache POI...
But that is a problem of Excel wich also may occur when get the used range via Worksheet.UsedRange property.
The solution of Axel Richter is perfect. But here's a pre-build code you can directly insert into a jedox job to makes things work well. It a kind of translation from java to groovy. Here's the code :
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.ss.util.CellRangeAddress;
wb = WorkbookFactory.create(new File("./webapps/etlserver/data/files/test_ws.xlsx"));
sheet = wb.getSheetAt(0);
def getUsedRange(sheet) {
firstRow = sheet.getFirstRowNum();
if (firstRow == -1) return null;
lastRow = sheet.getLastRowNum();
if (lastRow == -1) return null;
firstCol = Integer.MAX_VALUE;
lastCol = -1;
for (int r = firstRow; r <= lastRow; r++) {
row = sheet.getRow(r);
if (row != null) {
thisRowFirstCol = row.getFirstCellNum();
thisRowLastCol = row.getLastCellNum()-1;
if (thisRowFirstCol < firstCol) firstCol = thisRowFirstCol;
if (thisRowLastCol > lastCol) lastCol = thisRowLastCol;
}
}
if (firstCol == Integer.MAX_VALUE) return null;
if (lastCol == -1) return null;
return (new CellRangeAddress(firstRow, lastRow, firstCol, lastCol)).formatAsString();
}
LOG.info(getUsedRange(sheet));

Distancematrix with Apache Poi

I'm quite new to programming things in Java, especially doing it to create a Excel file. But maybe someone could help me with this problem.
I currently created via Apache Poi and Eclipse a spreadsheet in Excel. In there I got 3 columns and 40 Rows. These are filled with coordinates ( x and y-coordinates) and their names (in my case, 1 - 40). Now that I finally got these random numbers, I want to create a distance matrix (with euclidean distance) between those points.
For Example I want to have it look like:
1 2 3
1 0 1 2
2 1 0 4
3 2 4 0
I'm not sure how to get this created random numbers and to calculate with them. As well as I'm not sure how to implement the formular for the euclidean distance. It would be awesome if someone could help me! Thanks in advance!
Here is my code so far:
import java.io.File;
import java.io.FileOutputStream;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.ss.util.CellRangeAddress;
import java.util.Random;
public class poiexample {
public static void main(String[] args) throws Exception {
XSSFWorkbook Datei = new XSSFWorkbook();
FileOutputStream out = new FileOutputStream (new File ("Dateien.xlsx"));
for(int i=0;i<101;i++)
{ XSSFSheet Blatt = Datei.createSheet("Tabelle" + i);
XSSFRow row1 = Blatt.createRow(0);
row1.createCell(2).setCellValue("x");
row1.createCell(3).setCellValue("y");
for(int j=0; j<25; j++) {
XSSFRow row = Blatt.createRow(j+1);
row.createCell(0).setCellValue("P"+j);
row.createCell(1).setCellValue(j+1);
row.createCell(2).setCellValue(Math.round(Math.random()*10));
row.createCell(3).setCellValue(Math.round(Math.random()*10));
}
for(int j=25; j<40; j++) {
XSSFRow row = Blatt.createRow(j+1);
row.createCell(0).setCellValue("D");
row.createCell(1).setCellValue(j+1);
row.createCell(2).setCellValue(Math.round(Math.random()*10));
row.createCell(3).setCellValue(Math.round(Math.random()*10));
}
}
try {
Datei.write(out);
out.close();
}
catch(Exception e) {
System.out.println(e);
}
System.out.println("Excel file created");
}
}
I See you are generating the co-ordinates and writing into excel sheet. Then you want to create the euclidean matrix.
I suggest, first you have these co-ordinates in a 2D array and arrive at the algorithm to calculate the euclidean distance matrix. Then it is just a matter of writing all the co-ordinates and the distance matrix into excel file.
If you have the co-ordinates already in an excel, just read the excel and populate the 2D array and then call your routine to calculate and generate the euclidean matrix
Of course there may be better solutions, this what I can think of right now.

How to replace a cell value - Apache POI

I am trying to replace a cell value using existing cell value from other sheets(in the same workbook)
My code:
public static void update_sheet(XSSFWorkbook w)
{
XSSFSheet sheet,sheet_overview;
sheet_overview = w.getSheetAt(0);
int lastRowNum,latest_partition_date;
latest_partition_date = 3;
XSSFRow row_old, row_new;
XSSFCell cell_old, cell_new;
for(int i=1;i<=10;i++)
{
sheet = w.getSheetAt(i);
lastRowNum = sheet.getLastRowNum();
row_old = sheet.getRow(lastRowNum);
cell_old = row_old.getCell(0); //getting cell value from a sheet
row_new = sheet_overview.getRow(latest_partition_date);
cell_new = row_new.getCell(5);
***cell_new.setCellValue(cell_old)***;//trying to overwrite cellvalue
latest_partition_date++;
}
}
The 'type' values I am trying to copy
7/10/2017
7/11/2017
7/12/2017
7/13/2017
2017-07-14
2017-07-15
Error
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method setCellValue(boolean) in the type XSSFCell is not applicable for the arguments (XSSFCell)
at Sample2.update_overview_sheet(Sample2.java:78)
at Sample2.main(Sample2.java:26)
Any help or suggestions is appreciated.
The problem is that getCell() returns a value of type Cell. You're not actually retrieving the value of that cell, but the cell object itself. In order to set the value with setCellValue you need to provide it a value, something that's a date, boolean, string, richtextstring, etc., one of the methods that's listed here in the apache POI documentation for Cell.

Apache POI getStringCellValue() printing null

I'm getting a weird error while trying to read the Cell values through Apache POI in java:
System.out.println(row.getCell(13, Row.CREATE_NULL_AS_BLANK).getStringCellValue())
is always printing null, even after specifying the Missing policy as Row.CREATE_NULL_AS_BLANK.My writing logic to the Cell is :
public void writeCell( String value, Sheet sheet, int rowNum, int colNum)
{
Row row = sheet.getRow(rowNum);
if (row == null)
{
row = sheet.createRow(rowNum);
}
Cell cell = row.createCell(colNum, Cell.CELL_TYPE_STRING);
if (value == null)
{
return;
}
cell.setCellValue(value);
}
When I'm writing to Cell at colNum = 13 , the String value object is null. I'm not able to sort out this issue.
This line doesn't do what you seem to think it does:
System.out.println(row.getCell(13, Row.CREATE_NULL_AS_BLANK).getStringCellValue())
In effect, that's doing
Cell cell = row.getCell(13);
if (cell == null) { cell = row.createCell(13, Cell.CELL_TYPE_BLANK); }
So, if there is nothing in that cell, it creates it as an empty blank one
Then, you try doing:
cell.getStringCellValue()
This only works for String cells, and in the missing case you've told POI to give you a Blank new cell!
If you really just want a string value of a cell, use DataFormatter.formatCellValue(Cell) - that returns a String representation of your cell including formatting. Otherwise, check the type of your cell before trying to fetch the value!
The getStringCellValue() on the Cell interface would return "" if your code worked as supposed (setting the call blank).
Is it not possible that value for col id 13 is not null but "null"?

POI says Cell is empty but cell has a value

I'm opening a Excel (xls) file in my Java Application with POI.
There are 30 Lines in this Excelfile.
I need to get the Value at ColumnIndex 9.
My code:
Workbook wb;
wb = WorkbookFactory.create(inp);
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
if (row.getLastCellNum() >= 6) {
for (Cell cell : row) {
if(cell.getColumnIndex == 9) {
//do something
}
}
}
}
Every Row in Excel has Values in Columns 1-14.
My problem is, only some Values are recognized. I wrote the same value in every cell in ColumnIndex 9 (10th Column in my Excel sheet), but the Problem is still the same.
What could cause this problem?
Make sure you set the same Date format for all cells in column (select column and set format explicity) And i belive using DataUtil class to get data is more appropriate, than call cell.getDateCellValue().
POI uses 0 based counting for columns. So, if you want the 9th Column, you need to fetch the cell with index 8, not 9. It looks like you're checking for column with index 9, so are one column out.
If you're not sure about 0 based indexing, then the safest thing is to use the CellReference class to help you. This will translate between Excel style references, eg A1, and POI style 0-based offsets eg 0,0. Use something like:
CellReference ref = new CellReference("I10");
Row r = sheet.getRow(ref.getRow());
if (r == null) {
// That row is empty
} else {
Cell c = r.getCell(ref.getCol());
// c is now the cell at I10
}
Seems to be a Problem with the excel document(s).
Converting them to csv and then back to xls solves the problem.

Categories

Resources