Suppose I have a xlsx file consisting of three worksheets. Using this code snippet I'm able to read the whole xlsx file i.e. all three worksheets in which each row is separated by brackets and each cell separated by comma.
public static List<List<String>> excelProcess(File xlsxFile) throws Exception {
int minColumns = -1;
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);
XLSXParse xlsx2csv = new XLSXParse(p, System.out, minColumns);
xlsx2csv.process();
System.out.println("row list===="+xlsx2csv.getRowList().size());
return xlsx2csv.getRowList();
}
Here xlsxFile is the path of xlsx file. But I only want the data of a specific worksheet, say worksheet2 so I would pass worksheet name also like below.
public static List<List<String>> excelProcess(File xlsxFile,String sheetName) throws Exception {
Here sheetName is particular Worksheet's name.
You don't appear to be using any built-in Apache POI code for your parsing, so you'll need to switch to using POI directly!
Once you have, if you look at the methods on Workbook, you'll see there are methods to let you fetch a given Sheet by name or by index
Your code would then look something like
public static List<List<String>> excelProcess(File xlsxFile, String sheetName)
throws Exception {
Workbook wb = WorkbookFactory.create(xlsxFile);
Sheet sheet = wb.getSheet(sheetName);
// process sheet contents here
// eg something like
DataFormatter formatter = new DataFormatter();
for (Row r : sheet) {
for (Cell c : r) {
System.out.println(formatter.formatCellValue(c));
}
}
}
See the Usermodel documentation and iterating over rows and cells documentation to get started on processing the file with Apache POI
Related
Scenario:
1) A csv file is converted into excel file using SXSSFWorkbook.
2) If the data is again read from CSV file and written to the above generated excel file using XSSFWorkbook then the string data is not visible in libre office but data is visible if the excel file is opened in online excel viewer(some of the excel viewers are mentioning that the file is corrupt and data can be recoverable).
Cell creation Using SXSSFWorkbook:
Cell cell = row.createCell(1);
cell.setCellValue("Some Value");
Cell updation using XSSFWorkbook:
Cell cell = row.getCell(1);
cell.setCellValue("Some Value");
Observations:
1) When cell value is updated using XSSFCell, then the raw value of cell and string value of the cell are different.
2) If excel file is generated with SXSSFWorkbook and opened using XSSFWorkbook then internally maintained STCellType is STCellType.INLINE_STR and if excel file is generated using XSSFWorkbook then internally maintained STCellType is STCellType.S (STCellType is used in CTCell of XSSFCell).
Apache POI Version: 4.1.0
Please suggest solution.
The SXSSFWorkbook uses inline strings per default while XSSFWorkbook uses shared strings table per default. And XSSFCell.setCellValueImpl is incomplete for inline strings. It does:
...
if(_cell.getT() == STCellType.INLINE_STR) {
//set the 'pre-evaluated result
_cell.setV(str.getString());
}
...
So for inline strings it always sets v element containing the text. But inline strings also may have is element having t element containing the text, or even is element having different rich text runs. This is not considered using XSSFCell.
But SXSSFWorkbook can be constructed so it also uses shared strings table. See constructor SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles, boolean useSharedStringsTable). So if following constructor used:
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(new XSSFWorkbook(), 2, true, true);
then no inline strings are used and later updating using XSSF will not be problematic.
If SXSSFWorkbook is not using shared strings table but inline strings, there is a problem when later updating cells using XSSF because of the incompleteness of XSSFCell in using inline strings. Possible workaround will be managing the inline strings updating with own code.
Example:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
public class SXSSFTest {
public static void main(String[] args) throws Exception {
// first create SXSSFTest.xlsx using SXSSF ============================================
String[][] data1 = new String[][]{
new String[]{"A1", "B1", "C1"},
new String[]{"A2", "B2", "C2"},
new String[]{"A3", "B3", "C3"},
new String[]{"A4", "B4", "C4"}
};
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook();
//SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(new XSSFWorkbook(), 2, true, true);
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet();
int r = 0;
for (String[] rowValues : data1) {
SXSSFRow row = sxssfSheet.createRow(r++);
int c = 0;
for (String value : rowValues) {
SXSSFCell cell = row.createCell(c++);
cell.setCellValue(value);
}
}
FileOutputStream outputStream = new FileOutputStream("SXSSFTest.xlsx");
sxssfWorkbook.write(outputStream);
outputStream.close();
sxssfWorkbook.dispose();
sxssfWorkbook.close();
// now reread the SXSSFTest.xlsx and update it using XSSF =============================
String[][] data2 = new String[][]{
new String[]{"A2 New", "B2 New", "C2 New"},
new String[]{"A3 New", "B3 New", "C3 New"}
};
XSSFWorkbook xssfWorkbook = (XSSFWorkbook)WorkbookFactory.create(
new FileInputStream("SXSSFTest.xlsx"));
XSSFSheet xssfSheet = xssfWorkbook.getSheetAt(0);
r = 1;
for (String[] rowValues : data2) {
XSSFRow row = xssfSheet.getRow(r++); if (row == null) row = xssfSheet.createRow(r++);
int c = 0;
for (String value : rowValues) {
XSSFCell cell = row.getCell(c++);
if (cell != null) { // cell was already there
if (cell.getCTCell().getT() == STCellType.INLINE_STR) { // cell has inline string in it
if (cell.getCTCell().isSetIs()) { // inline string has is element
cell.getCTCell().getIs().setT(value); // set t element in is element
} else {
cell.getCTCell().setV(value); // set v element of inline string
}
} else {
cell.setCellValue(value); // set shared string cell value
}
} else {
cell = row.createCell(c++);
cell.setCellValue(value);
}
}
}
outputStream = new FileOutputStream("XSSFTest.xlsx");
xssfWorkbook.write(outputStream);
outputStream.close();
xssfWorkbook.close();
}
}
After that the SXSSFTest.xlsx looks like so in my LibreOffice Calc:
All cells have inline strings in it.
And the XSSFTest.xlsx looks like so:
There all inline strings are updated correctly now.
LibreOffice
Version: 6.0.7.3
Build ID: 1:6.0.7-0ubuntu0.18.04.5
I am trying to rename headers of an existing xlsx-file. The idea is to have an excel-file to export data from XML to excel and reimport the XML once some user has made adjustments.
At the moment we have created a "template" xlsx-sheet with Excel which already contains a sortable table (XSSFTable in poi) and a mapping to a XSD-source. Then we import it via POI, map XML data into it and save it. To adjust the sheet to the users we want to translate the headers/column-names of this existing table into different languages. It worked with POI 3.10-FINAL but since an upgrade to 4.0.1 it leads to a corrupt xlsx-file when opening.
I found this question on stackoverflow already
Excel file gets corrupted when i change the value of any cell in the header (Columns Title)
but it is not answered and pretty old. But I tried to figure out what the comments may were about and tried to flatten the existing XSSFTable, copy the filled data to a new sheet and put on a new XSSFTable to the data. Sadly this seems to be pretty complicated so I am back to correcting the broken header-cells.
I also tried to create the whole sheet with POI and step away from using that "template"-xslx, but I cannot figure out how to implement our XSD-Mapping (in Excel its Developer-Tools -> Source -> Add and then mapping the nodes to some cells in a dynamic table)
The code that worked until the upgrade of poi is basically this:
//Sheet is the current XSSFSheet
//header is a Map with the original header-name from the template mapped to a the new translated name
//headerrownumber is the row containing the tableheader to be translated
public static void translateHeaders(Sheet sheet,final Map<String,String> header,int headerrownumber) {
CellRangeAddress address = new CellRangeAddress(headerrownumber,headerrownumber,0,sheet.getRow(headerrownumber).getLastCellNum()); //Cellrange is the header-row
MyCellWalk cellWalk = new MyCellWalk (sheet,address);
cellWalk.traverse(new CellHandler() {
public void onCell(Cell cell, CellWalkContext ctx) {
String val = cell.getStringCellValue();
if (header.containsKey(val)) {
cell.setCellValue(header.get(val));
}
}
});
}
MyCellWalk is a org.apache.poi.ss.util.cellwalk.CellWalk which traverses the cell range from top left to the bottom right cell.
As far as I could figure out its not enough to simply change the flat value of the cell because xlsx keeps references to the cellname in some of their maps, but I cannot figure out how to grab them all and rename the header. Maybe there is also another approach in translating the headernames?
Well, the XSSFTable.updateHeaders should do the trick if apache poi would not fail doing it.
All the following is done using apache poi 4.0.1.
I have downloaded your dummy_template.xlsx and then tried changing the table column headers in the sheet. But even after calling XSSFTable.updateHeaders the column names in the XSSFTable has not changed. So I had a look into XSSFTable.java -> updateHeaders to determine why this not happens. There we find:
if (row != null && row.getCTRow().validate()) {
//do changing the column names
}
So the column names only will be changed if the corresponding row in the sheet is valid XML according to Office Open XML name spaces. But in later Excel versions (after 2007) additional name spaces were added. In this case the row's XML looks like:
<row r="4" spans="1:3" x14ac:dyDescent="0.25">
Note the additional x14ac:dyDescent attribute. That's why row.getCTRow().validate() returns false.
The following code gets your dummy_template.xlsx, renames the column headers in the sheet and then calls a disarmed version static void updateHeaders(XSSFTable table). After that the result.xlsx is valid for opening in Excel.
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.ss.util.cellwalk.*;
import org.apache.poi.xssf.usermodel.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.*;
import java.io.*;
import java.util.*;
class ExcelRenameTableColumns {
static void translateHeaders(Sheet sheet, final Map<String,String> header, int headerrownumber) {
CellRangeAddress address = new CellRangeAddress(
headerrownumber, headerrownumber,
0, sheet.getRow(headerrownumber).getLastCellNum());
CellWalk cellWalk = new CellWalk (sheet, address);
cellWalk.traverse(new CellHandler() {
public void onCell(Cell cell, CellWalkContext ctx) {
String val = cell.getStringCellValue();
if (header.containsKey(val)) {
cell.setCellValue(header.get(val));
}
}
});
}
static void updateHeaders(XSSFTable table) {
XSSFSheet sheet = (XSSFSheet)table.getParent();
CellReference ref = table.getStartCellReference();
if (ref == null) return;
int headerRow = ref.getRow();
int firstHeaderColumn = ref.getCol();
XSSFRow row = sheet.getRow(headerRow);
DataFormatter formatter = new DataFormatter();
System.out.println(row.getCTRow().validate()); // false!
if (row != null /*&& row.getCTRow().validate()*/) {
int cellnum = firstHeaderColumn;
CTTableColumns ctTableColumns = table.getCTTable().getTableColumns();
if(ctTableColumns != null) {
for (CTTableColumn col : ctTableColumns.getTableColumnList()) {
XSSFCell cell = row.getCell(cellnum);
if (cell != null) {
col.setName(formatter.formatCellValue(cell));
}
cellnum++;
}
}
}
}
public static void main(String[] args) throws Exception {
String templatePath = "dummy_template.xlsx";
String outputPath = "result.xlsx";
FileInputStream inputStream = new FileInputStream(templatePath);
Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);
Map<String, String> header = new HashMap<String, String>();
header.put("textone", "Spalte eins");
header.put("texttwo", "Spalte zwei");
header.put("textthree", "Spalte drei");
translateHeaders(sheet, header, 3);
XSSFTable table = ((XSSFSheet)sheet).getTables().get(0);
updateHeaders(table);
FileOutputStream outputStream = new FileOutputStream(outputPath);
workbook.write(outputStream);
outputStream.close();
workbook.close();
}
}
If I open the dummy_template.xlsx using Excel 2007 and then save as dummy_template2007.xlsx, the row's XML changes to
<row r="4" spans="1:3">
Now when using this dummy_template2007.xlsx no manually calling the XSSFTable.updateHeaders is necessary. The XSSFTable.writeTo which is called by XSSFTable.commit does this automatically.
I need to read data from a row of excel sheet through Java code and writing the content of it to another excel sheet but not in the same column but in different columns. Can anyone help me ? THANKS a lot in advance :)
Example: Suppose from the first excel file , I got this from one of the column : CN=user1,CN=Users,DC=example,DC=com
Now, I need to put this data in another excel sheet but in different columns i.e. each of the comma separated values will go to different columns.
You could try exporting to tab-delineated text document and manipulating that using Java code.
https://www.howtogeek.com/79991/convert-an-excel-spreadsheet-to-a-tab-delimited-text-file/
Just use Java's I/O capabilities from there.
https://docs.oracle.com/javase/tutorial/essential/io/
I've recently used the Apache POI library for parsing excel spreadsheets and found it incredibly useful.
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
void parseWorkbook(File file) {
POIFSFileSystem fs = new POIFSFileSystem(file.getInputStream());
HSSFWorkbook wb = new HSSFWorkbook(fs);
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
parseSheet(wb.getSheetAt(i));
}
}
void parseSheet(HSSFSheet sheet) throws IllegalStateException {
final int rows = sheet.getPhysicalNumberOfRows();
HSSFRow row;
for (int r = 0; r < rows; r++) {
row = sheet.getRow(r);
if (row != null) {
parseRow(row);
}
}
}
void parseRow(HSSFRow row) {
row.getCell(0);
....
}
An example of reading and writing to a spreadsheet can be found here
I am working on a large CSV (~200 mb of text file) which I would like to convert into excel sheet but the workbook becomes so memory consuming that in the middle of the process, Java throws "GC Overhead limit exceeded"!
I have checked the code if I am generating dummy references but I think none exists.
In my opinion those library calls from Apachi - POI might generate some references that keeps garbage collector so busy.
My question is if I could just write the workbook into a file chunk by chunk like text file something like appending to a text file without bringing it into memory. Is there any solution for that or am I missing something here?
GC throws the exception in the following code:
private void updateExcelWorkbook(String input, String fileName, Workbook workbook) {
try {
Sheet sheet = workbook.createSheet(fileName);
// Create a new font and alter it.
Font font = workbook.createFont();
font.setFontHeightInPoints((short) 11);
font.setBold(true);
// Fonts are set into a style so create a new one to use.
CellStyle style = workbook.createCellStyle();
style.setFont(font);
Row row;
Cell cell;
String[] columns;
String[] lines = input.split("\n");
int colIndex;
int rowIndex = 1;
for (String line : lines) {
row = sheet.createRow(rowIndex++);
columns = line.split("\t");
colIndex = 0;
for (String column: columns) {
cell = row.createCell(colIndex++);
if (rowIndex == 1)
cell.setCellStyle(style);
cell.setCellValue(column);
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
Seems you are using the POI usermodel, which has a very high memory footprint, because it keeps the entire worksheet in memory, similar to how DOM keeps an entire XML document in memory.
You need to use a streaming API. Using POI, you can create .xlsx files using the SXSSF Buffered Streaming API, as mentioned here: https://poi.apache.org/spreadsheet/index.html#SXSSF+(Since+POI+3.8+beta3)
The page linked above has this image, showing the Spreadsheet API Feature Summary of POI:
(source: apache.org)
I try to read this excel file: Test.xlsx, to do this I used an example I found on the internet, but
I used this link as en example: http://howtodoinjava.com/2013/06/19/readingwriting-excel-files-in-java-poi-tutorial/
it doens't work.
I copied the url for the file, so there is no error there.
Whenever I run it, it doensnt show errors just : []
When I debug it, it shows me that the listsize = 0
What should I change?
ArrayList<String> list = new ArrayList<String>();
#Override
public List<String> getExcel(){
try {
FileInputStream file = new FileInputStream(new File("C:\\Users\\user\\Documents\\Test.xlsx"));
//Create Workbook instance holding reference to .xlsx file
HSSFWorkbook workbook = new HSSFWorkbook(file);
//Get first/desired sheet from the workbook
HSSFSheet sheet = workbook.getSheet("Sheet1");
//Iterate through each rows one by one
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
//For each row, iterate through all the columns
if (row.getRowNum() <= 7) {
continue;// skip to read the first 7 row of file
}
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
list.add(cell.getStringCellValue());
}
//System.out.println("");
}
file.close();
} catch (Exception e) {
e.printStackTrace();
}
return list;
}
You are using the wrong class for the file you are trying to open (Test.xlsx). By the extension, I can assume this is an Excel 2007 or later document. Use HSSFWorkbook for Excel 2003 and XSSFWorkbook for Excel 2007 or later. Review Apache POI documentation that came with the downloaded package. It contains basic tutorials on how to accomplish this.
You will need to replace all of the 'HSSF' classes for the 'XSSF' equivalent. Beware that the methods called to create the parts of the document (i.e. Workbook, Sheet, etc) are not always the same.
Try this link. I created a small demo for a simple tutorial on Apache POI some time back. There is an Excel example you could follow. The location contains source code and a set of slides that you should be able to follow easily.