I am trying to create an Excel file and send it to the SFTP location using the Apache Camel. I can create a PSV from a Java object like this:
Java POJO:
#CsvRecord(separator = ",", skipFirstLine = true, generateHeaderColumns = true)
public class IexPerson implements Serializable {
private static final long serialVersionUID = 1234069326527342909L;
#DataField(pos = 1, columnName = "Person Name")
private String personName;
#DataField(pos = 2, columnName = "Gender")
private String gender;
// other fields ...
and then I convert a list of IexPersons in the route:
DataFormat iexPersonFormat = new BindyCsvDataFormat(IexPerson.class);
from("direct-get-list-of-persons")
.setHeader(Exchange.FILE_NAME).simple("Persons_${date:now:yyyyMMdd-HHmmssSSS}_${random(1000,10000000)}.csv")
.marshal(iexPersonFormat)
.to("file:///tmp?fileName=${header.CamelFileName}");
This is working fine, but now I need to create an Excel file from the same list and send it to another location. I didn't manage to get it to work.
I didn't find anything on the internet that would help me.
Posting here for future users who need help with the same topic.
I was able to convert the Java Object to a working excel file by using the Apache POI library.
I created a service and inside it a method that converts an object to a file, which I called from my Camel route. This is well explained here.
Here is the code:
// Create a blank Workbook
try (Workbook workbook = new XSSFWorkbook()) {
// Create a blank Sheet
Sheet sheet = workbook.createSheet("IexPersons");
// column names
List <String> columns = new ArrayList<>();
columns.add("Person Name");
columns.add("Gender");
Row headerRow = sheet.createRow(0);
// Create columns/first row in a file
for (int i = 0; i < columns.size(); i++) {
Cell cell = headerRow.createCell(i);
cell.setCellValue(columns.get(i));
}
int rowNum = 1;
// iterate over the list of persons and for each person write its values to the excel row
for (IexPerson iexPerson : getListOfPersons()) {
Row row = sheet.createRow(rowNum++);
// populate file with the values for each column
row.createCell(0).setCellValue(iexPerson.getName());
row.createCell(1).setCellValue(iexPerson.getGender());
}
// create file
FileOutputStream out = new FileOutputStream(new File("iexpersons.xlsx"));
// write data to file
workbook.write(out);
// close the output stream
out.close();
} catch (IOException e) {
e.printStackTrace();
}
I usually use XSL transformations to create Excel files in the Microsoft Office XML format.
You can find an example XML for an Excel spreadsheet on the linked Wikipedia page.
However, I usually create a simple Excel file that represents what I want and then I save it as "XML Spreadsheet 2003".
The result is the XML file structure that I need to generate with an XSL stylesheet.
Related
I have an Excel spreadsheet that has the first sheet designated for the raw data. There are 3 more sheets that are coded to transform and format the data from the raw sheet. The fifth sheet has the final output.
How can I use Java:
load the data from the CSV file into the first sheet of the excel file?
save the data from the 5th sheet into the new CSV file.
Also, if the original CSV has thousands of rows, I assume the multi-sheet transformations would take some time before the 5th sheet gets all the final data - is there a way to know?
I would follow this approach:
Load the specific .csv file and prepare to read it with Java
Load the .xlsx file and change it according to your requirements and the data that you get from the .csv file. A small example of how an excel file is changed with Apache POI can be seen below:
try
{
HashMap<Integer, ArrayList<String>> fileData; // This for example keeps the data from the csv in this form ( 0 -> [ "Column1", "Column2" ]...)
// Working with the excel file now
FileInputStream file = new FileInputStream("Data.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(file); // getting the Workbook
XSSFSheet sheet = workbook.getSheetAt(0);
Cell cell = null;
AtomicInteger row = new AtomicInteger(0);
fileData.forEach((key, csvRow) ->
{
//Update the value of the cell
//Retrieve the row and check for null
HSSFRow sheetRow = sheet.getRow(row);
if(sheetRow == null)
{
sheetRow = sheet.createRow(row);
}
for (int i = 0; i < csvRow.size(); i++)
{
//Update the value of cell
cell = sheetRow.getCell(i);
if(cell == null){
cell = sheetRow.createCell(i);
}
cell.setCellValue(csvRow.get(i));
}
});
file.close();
FileOutputStream outFile =new FileOutputStream(new File("Data.xlsx"));
workbook.write(outFile);
outFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
After saving the .xlsx file, you can create the .csv file by following this question.
Scenario:
1) A csv file is converted into excel file using SXSSFWorkbook.
2) If the data is again read from CSV file and written to the above generated excel file using XSSFWorkbook then the string data is not visible in libre office but data is visible if the excel file is opened in online excel viewer(some of the excel viewers are mentioning that the file is corrupt and data can be recoverable).
Cell creation Using SXSSFWorkbook:
Cell cell = row.createCell(1);
cell.setCellValue("Some Value");
Cell updation using XSSFWorkbook:
Cell cell = row.getCell(1);
cell.setCellValue("Some Value");
Observations:
1) When cell value is updated using XSSFCell, then the raw value of cell and string value of the cell are different.
2) If excel file is generated with SXSSFWorkbook and opened using XSSFWorkbook then internally maintained STCellType is STCellType.INLINE_STR and if excel file is generated using XSSFWorkbook then internally maintained STCellType is STCellType.S (STCellType is used in CTCell of XSSFCell).
Apache POI Version: 4.1.0
Please suggest solution.
The SXSSFWorkbook uses inline strings per default while XSSFWorkbook uses shared strings table per default. And XSSFCell.setCellValueImpl is incomplete for inline strings. It does:
...
if(_cell.getT() == STCellType.INLINE_STR) {
//set the 'pre-evaluated result
_cell.setV(str.getString());
}
...
So for inline strings it always sets v element containing the text. But inline strings also may have is element having t element containing the text, or even is element having different rich text runs. This is not considered using XSSFCell.
But SXSSFWorkbook can be constructed so it also uses shared strings table. See constructor SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles, boolean useSharedStringsTable). So if following constructor used:
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(new XSSFWorkbook(), 2, true, true);
then no inline strings are used and later updating using XSSF will not be problematic.
If SXSSFWorkbook is not using shared strings table but inline strings, there is a problem when later updating cells using XSSF because of the incompleteness of XSSFCell in using inline strings. Possible workaround will be managing the inline strings updating with own code.
Example:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.xssf.streaming.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
public class SXSSFTest {
public static void main(String[] args) throws Exception {
// first create SXSSFTest.xlsx using SXSSF ============================================
String[][] data1 = new String[][]{
new String[]{"A1", "B1", "C1"},
new String[]{"A2", "B2", "C2"},
new String[]{"A3", "B3", "C3"},
new String[]{"A4", "B4", "C4"}
};
SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook();
//SXSSFWorkbook sxssfWorkbook = new SXSSFWorkbook(new XSSFWorkbook(), 2, true, true);
SXSSFSheet sxssfSheet = sxssfWorkbook.createSheet();
int r = 0;
for (String[] rowValues : data1) {
SXSSFRow row = sxssfSheet.createRow(r++);
int c = 0;
for (String value : rowValues) {
SXSSFCell cell = row.createCell(c++);
cell.setCellValue(value);
}
}
FileOutputStream outputStream = new FileOutputStream("SXSSFTest.xlsx");
sxssfWorkbook.write(outputStream);
outputStream.close();
sxssfWorkbook.dispose();
sxssfWorkbook.close();
// now reread the SXSSFTest.xlsx and update it using XSSF =============================
String[][] data2 = new String[][]{
new String[]{"A2 New", "B2 New", "C2 New"},
new String[]{"A3 New", "B3 New", "C3 New"}
};
XSSFWorkbook xssfWorkbook = (XSSFWorkbook)WorkbookFactory.create(
new FileInputStream("SXSSFTest.xlsx"));
XSSFSheet xssfSheet = xssfWorkbook.getSheetAt(0);
r = 1;
for (String[] rowValues : data2) {
XSSFRow row = xssfSheet.getRow(r++); if (row == null) row = xssfSheet.createRow(r++);
int c = 0;
for (String value : rowValues) {
XSSFCell cell = row.getCell(c++);
if (cell != null) { // cell was already there
if (cell.getCTCell().getT() == STCellType.INLINE_STR) { // cell has inline string in it
if (cell.getCTCell().isSetIs()) { // inline string has is element
cell.getCTCell().getIs().setT(value); // set t element in is element
} else {
cell.getCTCell().setV(value); // set v element of inline string
}
} else {
cell.setCellValue(value); // set shared string cell value
}
} else {
cell = row.createCell(c++);
cell.setCellValue(value);
}
}
}
outputStream = new FileOutputStream("XSSFTest.xlsx");
xssfWorkbook.write(outputStream);
outputStream.close();
xssfWorkbook.close();
}
}
After that the SXSSFTest.xlsx looks like so in my LibreOffice Calc:
All cells have inline strings in it.
And the XSSFTest.xlsx looks like so:
There all inline strings are updated correctly now.
LibreOffice
Version: 6.0.7.3
Build ID: 1:6.0.7-0ubuntu0.18.04.5
I am trying to rename headers of an existing xlsx-file. The idea is to have an excel-file to export data from XML to excel and reimport the XML once some user has made adjustments.
At the moment we have created a "template" xlsx-sheet with Excel which already contains a sortable table (XSSFTable in poi) and a mapping to a XSD-source. Then we import it via POI, map XML data into it and save it. To adjust the sheet to the users we want to translate the headers/column-names of this existing table into different languages. It worked with POI 3.10-FINAL but since an upgrade to 4.0.1 it leads to a corrupt xlsx-file when opening.
I found this question on stackoverflow already
Excel file gets corrupted when i change the value of any cell in the header (Columns Title)
but it is not answered and pretty old. But I tried to figure out what the comments may were about and tried to flatten the existing XSSFTable, copy the filled data to a new sheet and put on a new XSSFTable to the data. Sadly this seems to be pretty complicated so I am back to correcting the broken header-cells.
I also tried to create the whole sheet with POI and step away from using that "template"-xslx, but I cannot figure out how to implement our XSD-Mapping (in Excel its Developer-Tools -> Source -> Add and then mapping the nodes to some cells in a dynamic table)
The code that worked until the upgrade of poi is basically this:
//Sheet is the current XSSFSheet
//header is a Map with the original header-name from the template mapped to a the new translated name
//headerrownumber is the row containing the tableheader to be translated
public static void translateHeaders(Sheet sheet,final Map<String,String> header,int headerrownumber) {
CellRangeAddress address = new CellRangeAddress(headerrownumber,headerrownumber,0,sheet.getRow(headerrownumber).getLastCellNum()); //Cellrange is the header-row
MyCellWalk cellWalk = new MyCellWalk (sheet,address);
cellWalk.traverse(new CellHandler() {
public void onCell(Cell cell, CellWalkContext ctx) {
String val = cell.getStringCellValue();
if (header.containsKey(val)) {
cell.setCellValue(header.get(val));
}
}
});
}
MyCellWalk is a org.apache.poi.ss.util.cellwalk.CellWalk which traverses the cell range from top left to the bottom right cell.
As far as I could figure out its not enough to simply change the flat value of the cell because xlsx keeps references to the cellname in some of their maps, but I cannot figure out how to grab them all and rename the header. Maybe there is also another approach in translating the headernames?
Well, the XSSFTable.updateHeaders should do the trick if apache poi would not fail doing it.
All the following is done using apache poi 4.0.1.
I have downloaded your dummy_template.xlsx and then tried changing the table column headers in the sheet. But even after calling XSSFTable.updateHeaders the column names in the XSSFTable has not changed. So I had a look into XSSFTable.java -> updateHeaders to determine why this not happens. There we find:
if (row != null && row.getCTRow().validate()) {
//do changing the column names
}
So the column names only will be changed if the corresponding row in the sheet is valid XML according to Office Open XML name spaces. But in later Excel versions (after 2007) additional name spaces were added. In this case the row's XML looks like:
<row r="4" spans="1:3" x14ac:dyDescent="0.25">
Note the additional x14ac:dyDescent attribute. That's why row.getCTRow().validate() returns false.
The following code gets your dummy_template.xlsx, renames the column headers in the sheet and then calls a disarmed version static void updateHeaders(XSSFTable table). After that the result.xlsx is valid for opening in Excel.
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.ss.util.cellwalk.*;
import org.apache.poi.xssf.usermodel.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.*;
import java.io.*;
import java.util.*;
class ExcelRenameTableColumns {
static void translateHeaders(Sheet sheet, final Map<String,String> header, int headerrownumber) {
CellRangeAddress address = new CellRangeAddress(
headerrownumber, headerrownumber,
0, sheet.getRow(headerrownumber).getLastCellNum());
CellWalk cellWalk = new CellWalk (sheet, address);
cellWalk.traverse(new CellHandler() {
public void onCell(Cell cell, CellWalkContext ctx) {
String val = cell.getStringCellValue();
if (header.containsKey(val)) {
cell.setCellValue(header.get(val));
}
}
});
}
static void updateHeaders(XSSFTable table) {
XSSFSheet sheet = (XSSFSheet)table.getParent();
CellReference ref = table.getStartCellReference();
if (ref == null) return;
int headerRow = ref.getRow();
int firstHeaderColumn = ref.getCol();
XSSFRow row = sheet.getRow(headerRow);
DataFormatter formatter = new DataFormatter();
System.out.println(row.getCTRow().validate()); // false!
if (row != null /*&& row.getCTRow().validate()*/) {
int cellnum = firstHeaderColumn;
CTTableColumns ctTableColumns = table.getCTTable().getTableColumns();
if(ctTableColumns != null) {
for (CTTableColumn col : ctTableColumns.getTableColumnList()) {
XSSFCell cell = row.getCell(cellnum);
if (cell != null) {
col.setName(formatter.formatCellValue(cell));
}
cellnum++;
}
}
}
}
public static void main(String[] args) throws Exception {
String templatePath = "dummy_template.xlsx";
String outputPath = "result.xlsx";
FileInputStream inputStream = new FileInputStream(templatePath);
Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);
Map<String, String> header = new HashMap<String, String>();
header.put("textone", "Spalte eins");
header.put("texttwo", "Spalte zwei");
header.put("textthree", "Spalte drei");
translateHeaders(sheet, header, 3);
XSSFTable table = ((XSSFSheet)sheet).getTables().get(0);
updateHeaders(table);
FileOutputStream outputStream = new FileOutputStream(outputPath);
workbook.write(outputStream);
outputStream.close();
workbook.close();
}
}
If I open the dummy_template.xlsx using Excel 2007 and then save as dummy_template2007.xlsx, the row's XML changes to
<row r="4" spans="1:3">
Now when using this dummy_template2007.xlsx no manually calling the XSSFTable.updateHeaders is necessary. The XSSFTable.writeTo which is called by XSSFTable.commit does this automatically.
Suppose I have a xlsx file consisting of three worksheets. Using this code snippet I'm able to read the whole xlsx file i.e. all three worksheets in which each row is separated by brackets and each cell separated by comma.
public static List<List<String>> excelProcess(File xlsxFile) throws Exception {
int minColumns = -1;
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);
XLSXParse xlsx2csv = new XLSXParse(p, System.out, minColumns);
xlsx2csv.process();
System.out.println("row list===="+xlsx2csv.getRowList().size());
return xlsx2csv.getRowList();
}
Here xlsxFile is the path of xlsx file. But I only want the data of a specific worksheet, say worksheet2 so I would pass worksheet name also like below.
public static List<List<String>> excelProcess(File xlsxFile,String sheetName) throws Exception {
Here sheetName is particular Worksheet's name.
You don't appear to be using any built-in Apache POI code for your parsing, so you'll need to switch to using POI directly!
Once you have, if you look at the methods on Workbook, you'll see there are methods to let you fetch a given Sheet by name or by index
Your code would then look something like
public static List<List<String>> excelProcess(File xlsxFile, String sheetName)
throws Exception {
Workbook wb = WorkbookFactory.create(xlsxFile);
Sheet sheet = wb.getSheet(sheetName);
// process sheet contents here
// eg something like
DataFormatter formatter = new DataFormatter();
for (Row r : sheet) {
for (Cell c : r) {
System.out.println(formatter.formatCellValue(c));
}
}
}
See the Usermodel documentation and iterating over rows and cells documentation to get started on processing the file with Apache POI
I'm using "Apache POI" to generate Excel report. I've a well designed Excel template. I want to create a report by filling the data into predefined locations of the template. That is to say, I do not want to care about the formats of the report. Is that possible? Could you give me some instructions?
I got my answer. I can use the "Cell Naming" utility in Microsoft Excel and then use the following code to locate the cell and do something.
CellReference[] crefs = this.getCellsByName(wb, cName);
// Locate the cell position
Sheet sheet = wb.getSheet(crefs[0].getSheetName());
Row row = sheet.getRow(crefs[0].getRow());
Cell cell = row.getCell(crefs[0].getCol());
// Write in data
cell.setCellValue(cellRegion.getContent());
"cName" is the cell's name predefined in Microsoft Excel.
You can have a look at jXLS, I think that's what you are looking for.
It takes a Excel as template and you can write a Java app to fill the data:
http://jxls.sourceforge.net/
You can load you template file like any other XLS. And then make the changes you want to the specific cells and write it out into another file.
Some sample code:
Load file
InputStream inputStream = new FileInputStream ("D:\\book_original.xls");
POIFSFileSystem fileSystem = new POIFSFileSystem (inputStream);
HSSFWorkbook workBook = new HSSFWorkbook (fileSystem);
do stuff
HSSFSheet sheet1 = workBook.getSheetAt (0);
Iterator<Row> rows = sheet1.rowIterator ();
while (rows.hasNext ())
{
Row row = rows.next ();
// do stuff
if (row.getCell(0).getCellType() == HSSFCell.CELL_TYPE_NUMERIC)
System.out.println ("Row No.: " + row.getRowNum ()+ " " + row.getCell(0).getNumericCellValue());
HSSFCell cell = row.createCell(0);
cell.setCellValue("100");
}
Write the output to a file
FileOutputStream fileOut1 = new FileOutputStream("D:\\book_modified.xls");
workBook.write(fileOut1);
fileOut1.close();
You may also take a look at Xylophone. This is Java library built on top of Apache POI. It uses spreadsheet templates in XLS(X) format and consumes data in XML format.