Converting a large data into excel in Apachi POI Java - java

I am working on a large CSV (~200 mb of text file) which I would like to convert into excel sheet but the workbook becomes so memory consuming that in the middle of the process, Java throws "GC Overhead limit exceeded"!
I have checked the code if I am generating dummy references but I think none exists.
In my opinion those library calls from Apachi - POI might generate some references that keeps garbage collector so busy.
My question is if I could just write the workbook into a file chunk by chunk like text file something like appending to a text file without bringing it into memory. Is there any solution for that or am I missing something here?
GC throws the exception in the following code:
private void updateExcelWorkbook(String input, String fileName, Workbook workbook) {
try {
Sheet sheet = workbook.createSheet(fileName);
// Create a new font and alter it.
Font font = workbook.createFont();
font.setFontHeightInPoints((short) 11);
font.setBold(true);
// Fonts are set into a style so create a new one to use.
CellStyle style = workbook.createCellStyle();
style.setFont(font);
Row row;
Cell cell;
String[] columns;
String[] lines = input.split("\n");
int colIndex;
int rowIndex = 1;
for (String line : lines) {
row = sheet.createRow(rowIndex++);
columns = line.split("\t");
colIndex = 0;
for (String column: columns) {
cell = row.createCell(colIndex++);
if (rowIndex == 1)
cell.setCellStyle(style);
cell.setCellValue(column);
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}

Seems you are using the POI usermodel, which has a very high memory footprint, because it keeps the entire worksheet in memory, similar to how DOM keeps an entire XML document in memory.
You need to use a streaming API. Using POI, you can create .xlsx files using the SXSSF Buffered Streaming API, as mentioned here: https://poi.apache.org/spreadsheet/index.html#SXSSF+(Since+POI+3.8+beta3)
The page linked above has this image, showing the Spreadsheet API Feature Summary of POI:
(source: apache.org)

Related

Apache Poi setActiveCell() for multiple cells

I'm trying to use the method sheet.setActiveCell(CellAddress addr) to set a range of multiple cells active at the same time. I've tryed with multiple versions of Apache poi-ooxml library and now i'm using 3.16 which also supports the method sheet.setActiveCell(String addr)(I know 3.16 is old but the issue stays the same also with the latest version).
Following the suggestions on this question: Is it possible to set the active range with Apache POI XSSF?
I've managed to get it to work, both with the custom CellAddress and the String in the format "A1:B5".
The problem is that every time I try to open an xlsx in which a range of cells has been set to active using apache poi, I get an error message from Excel saying that the file is damaged and need to be recovered. If I do, the recovery completes correctly, but this error is annoying since I have to open a great number of these files each day.
Is there a way to avoid this error from excel (maybe modifying the creation of the xlsx or changing some setting in Excel)?
Only one cell can be the active cell. And Sheet.setActiveCell only sets that one active cell. So sheet.setActiveCell("A1:B5") will work if setActiveCell(String addr) is available but it leads to a corrupted sheet. That's why it was removed.
Multiple cells can be selected. But there are no methods to set the selected cells in apache poi's high level classes. So the underlying low level classes needs to be used. Doing this one needs differentiate between XSSF and HSSF because different low level classes needs to be used.
Following complete example sets active cell to B2. This also sets sheet view having selection and active cell to that one given cell B2. Then it uses low level methods of XSSF and HSSF to set the selection to B2:E5.
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
class CreateExcelSelectMultipleCells {
public static void main(String[] args) throws Exception {
try (Workbook workbook = new XSSFWorkbook(); FileOutputStream out = new FileOutputStream("Excel.xlsx") ) {
//try (Workbook workbook = new HSSFWorkbook(); FileOutputStream out = new FileOutputStream("Excel.xls") ) {
Sheet sheet = workbook.createSheet();
Row row;
Cell cell;
for (int r = 0; r < 6; r++) {
row = sheet.createRow(r);
for (int c = 0; c < 6; c++) {
cell = row.createCell(c);
cell.setCellValue("R" + (r+1) + "C" + (c+1));
}
}
// set active cell; this also sets sheet view having selection and active cell to one given cell
sheet.setActiveCell(new CellAddress("B2"));
// set selected cells
if (sheet instanceof XSSFSheet) {
XSSFSheet xssfSheet = (XSSFSheet) sheet;
xssfSheet.getCTWorksheet().getSheetViews().getSheetViewArray(0).getSelectionArray(0).setSqref(
java.util.Arrays.asList("B2:E5"));
} else if (sheet instanceof HSSFSheet) {
HSSFSheet hssfSheet = (HSSFSheet) sheet;
org.apache.poi.hssf.record.SelectionRecord selectionRecord = hssfSheet.getSheet().getSelection();
java.lang.reflect.Field field_6_refs = org.apache.poi.hssf.record.SelectionRecord.class.getDeclaredField("field_6_refs");
field_6_refs.setAccessible(true);
field_6_refs.set(
selectionRecord,
new org.apache.poi.hssf.util.CellRangeAddress8Bit[] { new org.apache.poi.hssf.util.CellRangeAddress8Bit(1,4,1,4) }
);
}
workbook.write(out);
}
}
}

String cells data is not visible in excel editor when excel is created using SXSSFWorkbook and modified using XSSFWorkbook

Scenario:
1) A csv file is converted into excel file using SXSSFWorkbook.
2) If the data is again read from CSV file and written to the above generated excel file using XSSFWorkbook then the string data is not visible in libre office but data is visible if the excel file is opened in online excel viewer(some of the excel viewers are mentioning that the file is corrupt and data can be recoverable).
Cell creation Using SXSSFWorkbook:
Cell cell = row.createCell(1);
cell.setCellValue("Some Value");
Cell updation using XSSFWorkbook:
Cell cell = row.getCell(1);
cell.setCellValue("Some Value");
Observations:
1) When cell value is updated using XSSFCell, then the raw value of cell and string value of the cell are different.
2) If excel file is generated with SXSSFWorkbook and opened using XSSFWorkbook then internally maintained STCellType is STCellType.INLINE_STR and if excel file is generated using XSSFWorkbook then internally maintained STCellType is STCellType.S (STCellType is used in CTCell of XSSFCell).
Apache POI Version: 4.1.0
Please suggest solution.
The SXSSFWorkbook uses inline strings per default while XSSFWorkbook uses shared strings table per default. And XSSFCell.setCellValueImpl is incomplete for inline strings. It does:
...
if(_cell.getT() == STCellType.INLINE_STR) {
//set the 'pre-evaluated result
_cell.setV(str.getString());
}
...
So for inline strings it always sets v element containing the text. But inline strings also may have is element having t element containing the text, or even is element having different rich text runs. This is not considered using XSSFCell.
But SXSSFWorkbook can be constructed so it also uses shared strings table. See constructor SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles, boolean useSharedStringsTable). So if following constructor used:
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(new XSSFWorkbook(), 2, true, true);
then no inline strings are used and later updating using XSSF will not be problematic.
If SXSSFWorkbook is not using shared strings table but inline strings, there is a problem when later updating cells using XSSF because of the incompleteness of XSSFCell in using inline strings. Possible workaround will be managing the inline strings updating with own code.
Example:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
public class SXSSFTest {
public static void main(String[] args) throws Exception {
// first create SXSSFTest.xlsx using SXSSF ============================================
String[][] data1 = new String[][]{
new String[]{"A1", "B1", "C1"},
new String[]{"A2", "B2", "C2"},
new String[]{"A3", "B3", "C3"},
new String[]{"A4", "B4", "C4"}
};
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook();
//SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(new XSSFWorkbook(), 2, true, true);
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet();
int r = 0;
for (String[] rowValues : data1) {
SXSSFRow row = sxssfSheet.createRow(r++);
int c = 0;
for (String value : rowValues) {
SXSSFCell cell = row.createCell(c++);
cell.setCellValue(value);
}
}
FileOutputStream outputStream = new FileOutputStream("SXSSFTest.xlsx");
sxssfWorkbook.write(outputStream);
outputStream.close();
sxssfWorkbook.dispose();
sxssfWorkbook.close();
// now reread the SXSSFTest.xlsx and update it using XSSF =============================
String[][] data2 = new String[][]{
new String[]{"A2 New", "B2 New", "C2 New"},
new String[]{"A3 New", "B3 New", "C3 New"}
};
XSSFWorkbook xssfWorkbook = (XSSFWorkbook)WorkbookFactory.create(
new FileInputStream("SXSSFTest.xlsx"));
XSSFSheet xssfSheet = xssfWorkbook.getSheetAt(0);
r = 1;
for (String[] rowValues : data2) {
XSSFRow row = xssfSheet.getRow(r++); if (row == null) row = xssfSheet.createRow(r++);
int c = 0;
for (String value : rowValues) {
XSSFCell cell = row.getCell(c++);
if (cell != null) { // cell was already there
if (cell.getCTCell().getT() == STCellType.INLINE_STR) { // cell has inline string in it
if (cell.getCTCell().isSetIs()) { // inline string has is element
cell.getCTCell().getIs().setT(value); // set t element in is element
} else {
cell.getCTCell().setV(value); // set v element of inline string
}
} else {
cell.setCellValue(value); // set shared string cell value
}
} else {
cell = row.createCell(c++);
cell.setCellValue(value);
}
}
}
outputStream = new FileOutputStream("XSSFTest.xlsx");
xssfWorkbook.write(outputStream);
outputStream.close();
xssfWorkbook.close();
}
}
After that the SXSSFTest.xlsx looks like so in my LibreOffice Calc:
All cells have inline strings in it.
And the XSSFTest.xlsx looks like so:
There all inline strings are updated correctly now.
LibreOffice
Version: 6.0.7.3
Build ID: 1:6.0.7-0ubuntu0.18.04.5

Can't see styling changes in POI Apache Excel .xls document

I wonder how to autoSize the columns in Excel doc.
When I run this code it don't do a jack shit in the document. And I can't really find out what is wrong!
Literally, nothing is autoSized in the document. I don't understand what could be wrong!! Very frustrating problem..
Also, I would be happy to get some feedback on the code, do I practice bad coding habits?
Thanks!
Here is my code:
try
{
FileInputStream myxls = new FileInputStream("/Users/xxxxxx/Desktop/tryIt.xls");
HSSFWorkbook workbook = new HSSFWorkbook(myxls);
HSSFSheet sheet = workbook.getSheetAt(0);
int lastRow=sheet.getLastRowNum();
HSSFCellStyle styleRowHeading = workbook.createCellStyle();
HSSFCellStyle style = workbook.createCellStyle();
HSSFFont fontRowHeading = workbook.createFont();
HSSFFont font = workbook.createFont();
fontRowHeading.setBold(true);
fontRowHeading.setFontName(HSSFFont.FONT_ARIAL);
fontRowHeading.setFontHeightInPoints((short) 14);
styleRowHeading.setFillForegroundColor(IndexedColors.LIGHT_GREEN.getIndex());
styleRowHeading.setFillPattern(FillPatternType.SOLID_FOREGROUND);
styleRowHeading.setBorderTop(BorderStyle.MEDIUM);
styleRowHeading.setBorderBottom(BorderStyle.MEDIUM);
styleRowHeading.setBorderLeft(BorderStyle.MEDIUM);
styleRowHeading.setBorderRight(BorderStyle.MEDIUM);
styleRowHeading.setFont(fontRowHeading);
font.setFontName(HSSFFont.FONT_ARIAL);
font.setFontHeightInPoints((short)12);
style.setFillForegroundColor(IndexedColors.GREY_25_PERCENT.getIndex());
style.setFillPattern(FillPatternType.SOLID_FOREGROUND);
style.setBorderTop(BorderStyle.MEDIUM);
style.setBorderBottom(BorderStyle.MEDIUM);
style.setBorderLeft(BorderStyle.MEDIUM);
style.setBorderRight(BorderStyle.MEDIUM);
style.setFont(font);
// Create heading
if(lastRow <=0){
Row rowHeading = sheet.createRow(lastRow);
rowHeading.createCell(0).setCellValue("TEST1");
rowHeading.createCell(1).setCellValue("TEST2");
rowHeading.createCell(2).setCellValue("TEST3");
rowHeading.createCell(3).setCellValue("TEST4");
for(int i = 0; i < 4; i++){
rowHeading.getCell(i).setCellStyle(styleRowHeading);
}
}
Row row = sheet.createRow(++lastRow);
int i = 0;
org.apache.poi.ss.usermodel.Cell cellId = row.createCell(i);
org.apache.poi.ss.usermodel.Cell cellId1 = row.createCell(i+=1);
org.apache.poi.ss.usermodel.Cell cellId2 = row.createCell(i+=1);
org.apache.poi.ss.usermodel.Cell cellId3 = row.createCell(i+=1);
cellId.setCellValue(todaysDate);
cellId1.setCellValue(txt_year.getText());
cellId2.setCellValue(txt_correct.getText());
cellId3.setCellValue(txt_errors.getText());
cellId.setCellStyle(style);
cellId1.setCellStyle(style);
cellId2.setCellStyle(style);
cellId3.setCellStyle(style);
// Autofit
for(int w = 0; w < 5; w++){
sheet.autoSizeColumn(w);
}
myxls.close();
FileOutputStream output_file =new FileOutputStream(new File("/Users/xxxx/Desktop/tryIt.xls"));
//write changes
workbook.write(output_file);
output_file.close();
System.out.println("SUCCESSSSSSSSS!");
}catch(Exception e){
System.out.println(e.getMessage());
}
I assume HSSFCellStyle might be causing issues here, could you change to CellStyle and check once if you see any formatting changes:
CellStyle style=null;
XSSFFont defaultFont= wb.createFont();
defaultFont.setFontHeightInPoints((short)10);
defaultFont.setFontName("Arial");
defaultFont.setColor(IndexedColors.BLACK.getIndex());
defaultFont.setBold(false);
defaultFont.setItalic(false);
XSSFFont font= wb.createFont();
font.setFontHeightInPoints((short)10);
font.setFontName("Arial");
font.setColor(IndexedColors.WHITE.getIndex());
font.setBold(true);
font.setItalic(false);
style=row.getRowStyle();
style.setFillBackgroundColor(IndexedColors.DARK_BLUE.getIndex());
style.setFillPattern(CellStyle.SOLID_FOREGROUND);
style.setAlignment(CellStyle.ALIGN_CENTER);
style.setFont(font);
Key things to keep in mind:
Let's understand the basic difference between HSSFWorkbook and XSSFWorkbook
HSSFWorkbook: This class has methods to read and write Microsoft Excel files in .xls format.
XSSFWorkbook: This class has methods to read and write Microsoft Excel and OpenOffice xml files in .xls or .xlsx format.
SXSSF: it is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited
Workbook
This is the super-interface of all classes that create or maintain
Excel workbooks. It belongs to the org.apache.poi.ss.usermodel package
and both the above mentioned XSSF, HSSF and SXSSF are implementations of
WORKBOOK
Hence, my suggestion would be to until-unless utmost necessary, i.e, you need a specific feature for xlsx or xls, just go with the workbook implementation
Most of the styling changes are hit and trial. You need to keep
digging iterating with to finally find what you need.
Suggestions:
If you code for just HSSF via HSSFWorkbook, you can only work with .xls files. I'd suggest you go for the common ones wherever possible (workbook)
Your loading code should be something like:
Workbook wb = WorkbookFactory.create(new File("test.xls"));
Sheet s = wb.getSheetAt(0);
....
Now, it will auto-detect the type of the file and give you back a working object for either .xls or .xlsx based on what it finds. Also, wherever possible try to keep the styling and designing parts generic and version independent. That way the same code could be re-used for both formats.
If you need to have any specific feature which require either XSSF
or HSSF and can't use just the Workbook then do a check for the
type first like this:
Workbook wb = WorkbookFactory.create(myExcelFile);
Then you can check the exact type created by the factory:
if (wb instanceof HSSFWorkbook) {
// do whatever
} else if (wb instanceof SXSSFWorkbook) {
// do whatever
} else if (wb instanceof XSSFWorkbook) {
// do whatever
}

How to read specific worksheet using OPCPackage POI

Suppose I have a xlsx file consisting of three worksheets. Using this code snippet I'm able to read the whole xlsx file i.e. all three worksheets in which each row is separated by brackets and each cell separated by comma.
public static List<List<String>> excelProcess(File xlsxFile) throws Exception {
int minColumns = -1;
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);
XLSXParse xlsx2csv = new XLSXParse(p, System.out, minColumns);
xlsx2csv.process();
System.out.println("row list===="+xlsx2csv.getRowList().size());
return xlsx2csv.getRowList();
}
Here xlsxFile is the path of xlsx file. But I only want the data of a specific worksheet, say worksheet2 so I would pass worksheet name also like below.
public static List<List<String>> excelProcess(File xlsxFile,String sheetName) throws Exception {
Here sheetName is particular Worksheet's name.
You don't appear to be using any built-in Apache POI code for your parsing, so you'll need to switch to using POI directly!
Once you have, if you look at the methods on Workbook, you'll see there are methods to let you fetch a given Sheet by name or by index
Your code would then look something like
public static List<List<String>> excelProcess(File xlsxFile, String sheetName)
throws Exception {
Workbook wb = WorkbookFactory.create(xlsxFile);
Sheet sheet = wb.getSheet(sheetName);
// process sheet contents here
// eg something like
DataFormatter formatter = new DataFormatter();
for (Row r : sheet) {
for (Cell c : r) {
System.out.println(formatter.formatCellValue(c));
}
}
}
See the Usermodel documentation and iterating over rows and cells documentation to get started on processing the file with Apache POI

Create Excel reports by programming from templates

I'm using "Apache POI" to generate Excel report. I've a well designed Excel template. I want to create a report by filling the data into predefined locations of the template. That is to say, I do not want to care about the formats of the report. Is that possible? Could you give me some instructions?
I got my answer. I can use the "Cell Naming" utility in Microsoft Excel and then use the following code to locate the cell and do something.
CellReference[] crefs = this.getCellsByName(wb, cName);
// Locate the cell position
Sheet sheet = wb.getSheet(crefs[0].getSheetName());
Row row = sheet.getRow(crefs[0].getRow());
Cell cell = row.getCell(crefs[0].getCol());
// Write in data
cell.setCellValue(cellRegion.getContent());
"cName" is the cell's name predefined in Microsoft Excel.
You can have a look at jXLS, I think that's what you are looking for.
It takes a Excel as template and you can write a Java app to fill the data:
http://jxls.sourceforge.net/
You can load you template file like any other XLS. And then make the changes you want to the specific cells and write it out into another file.
Some sample code:
Load file
InputStream inputStream = new FileInputStream ("D:\\book_original.xls");
POIFSFileSystem fileSystem = new POIFSFileSystem (inputStream);
HSSFWorkbook workBook = new HSSFWorkbook (fileSystem);
do stuff
HSSFSheet sheet1 = workBook.getSheetAt (0);
Iterator<Row> rows = sheet1.rowIterator ();
while (rows.hasNext ())
{
Row row = rows.next ();
// do stuff
if (row.getCell(0).getCellType() == HSSFCell.CELL_TYPE_NUMERIC)
System.out.println ("Row No.: " + row.getRowNum ()+ " " + row.getCell(0).getNumericCellValue());
HSSFCell cell = row.createCell(0);
cell.setCellValue("100");
}
Write the output to a file
FileOutputStream fileOut1 = new FileOutputStream("D:\\book_modified.xls");
workBook.write(fileOut1);
fileOut1.close();
You may also take a look at Xylophone. This is Java library built on top of Apache POI. It uses spreadsheet templates in XLS(X) format and consumes data in XML format.

Categories

Resources