I would like to load an Excel file into a Java program, parse it and insert the necessary things into a database every day, but don't want to load the whole file every time when I run the program. I need to get last 90 rows only. Is it possible to load an Excel (XLSM) file partially in Java (not necessary but preferred, can be another programing language too) to decrease loading time?
It takes around 60-70 seconds, and loading Excel takes 35 seconds, Excel file has 4000 rows and rows has 900 columns.
try{
workbook = WorkbookFactory.create(new FileInputStream(file));
sheet = workbook.getSheetAt(0);
rowSize=sheet.getLastRowNum();
myWriter = new FileWriter("/Users/mykyusuf/Desktop/filename.txt");
Row malzeme=sheet.getRow(1);
Row kaynak=sheet.getRow(2);
Row endeks=sheet.getRow(3);
myWriter.write("insert all\n");
Row row=sheet.getRow(rowSize-1);
for (int i = 4; i < rowSize-1; i++) {
row = sheet.getRow(i);
for (Cell cell : row) {
if (cell.getColumnIndex()>3) {
myWriter.write("into piyasa_takip (tarih,malzeme,kaynak,endeks,deger) values (to_date(\'" + row.getCell(3).getLocalDateTimeCellValue().toLocalDate() + "\','YYYY-MM-DD'),\'" + malzeme.getCell(cell.getColumnIndex()) + "\',\'" + kaynak.getCell(cell.getColumnIndex()) + "\',\'" + endeks.getCell(cell.getColumnIndex()) + "\',\'" + cell + "\')\n");
}
}
}
row = sheet.getRow(rowSize-1);
for (Cell cell : row) {
if (cell.getColumnIndex()>3 ) {
myWriter.write("into piyasa_takip (tarih,malzeme,kaynak,endeks,deger) values (to_date(\'" + row.getCell(3).getLocalDateTimeCellValue().toLocalDate() + "\','YYYY-MM-DD'),\'" + malzeme.getCell(cell.getColumnIndex()) + "\',\'" + kaynak.getCell(cell.getColumnIndex()) + "\',\'" + endeks.getCell(cell.getColumnIndex()) + "\',\'" + cell + "\')\n");
}
}
myWriter.write(" Select * from DUAL\n");
myWriter.close();
}
I do not know a simple answer to your question, but I want to help you figure it out
Exist two substantially different formats: *.XLS (old) and *.XLSX (new). In common case, new format more compact (because use zipping as part of "container").
I don't know simple way for "cut" last 90 rows from excel file. Especially, excel have a complicated format with tabs, formulas and hyperlinks (and scripts :-) ) in document.
But, we can use "divide and rule" principle. If you have a big excel file locally and this file wery slow loading on remote host, you can process fiel locally (for extracnt only new reccords in other file) and load to remote host this "modifications" only.
Thus, you divide the task into two parts: super-simple processing of a large file locally (to highlight the changed part) and normal and smart processing on a remote host.
Maybe this will help you?
Maybe you can try to use Free Spire.Xls to solve this.
I choose some data (70 rows and 8 columns ). It costs me 1-2 seconds to read them.
Hope it can help you to save some time.
And codes are right below:
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;
public class GetCellRange {
public static void main(String[] args) {
//Load the sample document
Workbook workbook = new Workbook();
workbook.loadFromFile("sample.xlsx");
//Get the first worksheet
Worksheet worksheet = workbook.getWorksheets().get(0);
//Choose the output content
for (int row = 1; row <= 70 ; row++) {
for (int col = 1; col <= 8 ; col++) {
System.out.println(worksheet.getCellRange(row,col).getValue() + "\t");
}
System.out.println("\n");
}
}
}
Related
I have a not big, but unsolvable for me problem. My Java code connects with smartsheet, and reads a list of sheets at folder (by folder id), next, it copies from folder ony sheets (leaving reports and dashboards), changing name of sheets and paste them at another folder. I'm using Correto 17.
How can I change a value of cell at any sheet by code. I need change only 1 cell for each running code, so it not should be a genius loop. I want to understand the basics of this.
Below I added 2 classes at part after copying and renaming. Everything is working perfect before I trying to change cell. If it will help: cell is unlocked, valueType text_number, not a formula, just text, but in sheet are other cells with values, they are with formulas / they are locked).
I can read a value at cell but can't set it.
public void projectFolderCreate(String projectID_Value) throws SmartsheetException {
Smartsheet smartsheet = SmartsheetFactory.createDefaultClient(API);
System.out.println("Folder working. Loading folder: " + getFolderID());
Folder workFolder = smartsheet.folderResources().getFolder(getFolderID(), null);
for (var element : workFolder.getSheets()) {
System.out.println(element.getId());
ContainerDestination containerDestination = new ContainerDestination().setDestinationType(DestinationType.FOLDER).setDestinationId(8775808614459268L).setNewName(element.getName().replace("Project_Name", projectID_Value));
Sheet sheet = smartsheet.sheetResources().copySheet(element.getId(),containerDestination,EnumSet.of(SheetCopyInclusion.DATA,SheetCopyInclusion.CELLLINKS));
if (sheet.getName().endsWith("Metadata")) {
Sheet newSheet = smartsheet.sheetResources().getSheet(sheet.getId(),null,null,null,null,null,null,null);
long newSheetId = newSheet.getId();
System.out.println("ID of the New Sheet : " + newSheetId);
Column projIDcolumn = newSheet.getColumns().get(1);
System.out.println(projIDcolumn.getTitle());
Row firstRow = newSheet.getRows().get(2);
firstRow.getCells().get(2).setValue(new CellDataItem().setObjectValue(projectID_Value));
firstRow.getColumnByIndex(projIDcolumn.getIndex());
Cell s = newSheet.getRows().get(2).getCells().get(1).setDisplayValue(projectID_Value);
}System.out.println(sheet.getName() + " was created");
}
}
ProjectFolderWorking(String projectID) throws SQLException, SmartsheetException {
String folderID;
LikeAWall law = new LikeAWall(1);
DataBase db = new DataBase(law.getPickDB());
db.setApi();
SmartIM smartIM = new SmartIM(db.getApi());
folderID = String.valueOf(SmartIM.getFolderID());
System.out.println("FolderID: " + folderID);
setProjectID(projectID);
smartIM.projectFolderCreate(getProjectID());
}
I'm sure that linking to variable was good, because I tried getters, but setters of this aren't working but compiling without any exceptions / errors / bugs / problems and of coarse result.
Below are the methods I tried.
smartsheet.sheetResources().getSheet(sheet.getId(),null,null,null,null,null,null,null).getRows().get(2).getCells().set(1,null);
smartsheet.sheetResources().getSheet(sheet.getId(),null,null,null,null,null,null,null).getRows().get(2).getCells().get(1).setDisplayValue(projectID_Value)
smartsheet.sheetResources().getSheet(sheet.getId(),null,null,null,null,null,null,null).getRows().get(2).getCells().get(1).setValue(projectID_Value.toString) / and getBytes()
smartsheet.sheetResources().getSheet(sheet.getId(),null,null,null,null,null,null,null).getRows().get(2).getCells().get(1).setObjectValue()
This is didn't make sense, but I just broke. I tried do every above with parts as:
Sheet sh = lallalalla;
Column s = sh.lalallalala;
Rows w = sh.lalallala;
Cell = w.getCells().get(lalallaa);
I want to add SQL Query result to the created Excel File. And as a result, I need to average some columns at the bottom. My question is;
The number of rows is changing in some queries. How can I shift the average result by the last row according to the changing row?
example
after writing to excel
I'm learning now to add results using Postgresql to an Excel file created using apache poi.
Resource resource = resourceLoader.getResource("classpath:/temps/" + querySelected.getTemplateName());
workbook = new XSSFWorkbook(resource.getInputStream());
XSSFSheet sheetTable1 = workbook.getSheet("Table1");
int rowCounter = 1;
for (tname:tname) {
Row values = sheetTable1.createRow(rowCounter);
rowCounter++;}
cell = values.createCell(0, CellType.NUMERIC);
cell.setCellValue(knz.gettablename().doubleValue());}
I have code for importing 15.000 Row in Excel with Java Spring, it takes around 10 minutes in Production Environtment but in Development Environtment its only takes around 5 minutes, how can i enhance the performance? heres my code.
Flow Code :
Checking Row Excel is Clean to Save
Save to Database 1 by 1
start checking row excel
Cell currentCell = cellsInRow.next();
String uuidAsString = uuid.toString();
Date today = Calendar.getInstance().getTime();
if(cellIndex==0) {
ble.setA(currentCell.getStringCellValue());
} else if(cellIndex==1) {
ble.setB(currentCell.getStringCellValue());
} else if(cellIndex==2) {
ble.setC(currentCell.getDateCellValue());
}
after start
blacklistExternalRepository.saveAll(lstBlacklistExternal);
The code posted is not complete so here is an idea with the following assumptions:
the variable today can be calculated only once for a batch import
the excel document has a regular format, that is, each row has at least three cells at indices 0, 1 and 2
With this in mind, you could do something like
LocalDate today = LocalDate.now();
List<BLE> bleList = new ArrayList<>(); // a list of ble objects
for (Row r : rows) {
Iterator<Cell> cellsInRow = ... // get the cells in row r
BLE ble = new BLE();
ble.setA(cellsInRow.next().getStringCellValue());
ble.setB(cellsInRow.next().getStringCellValue());
ble.setC(cellsInRow.next().getDateCellValue());
bleList.add(ble);
}
// do whatever you need to with the list of ble objects
I am writing an application which needs to load a large csv file that is pure data and doesn't contain any headers.
I am using a fastCSV library to parse the file, however the data needs to be stored and specific fields need to be retrieved. Since the entire data is not necessary I am skipping every third line.
Is there a way to set the headers after the file has been parsed and save it in a data structure such as an ArrayList?
Here is the function which loads the file:
public void fastCsv(String filePath) {
File file = new File(filePath);
CsvReader csvReader = new CsvReader();
int linecounter = 1;
try (CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8)) {
CsvRow row;
while ((row = csvParser.nextRow()) != null) {
if ((linecounter % 3) > 0 ) {
// System.out.println("Read line: " + row);
//System.out.println("First column of line: " + row.getField(0));
System.out.println(row);
}
linecounter ++;
}
System.out.println("Execution Time in ms: " + elapsedTime);
csvParser.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Any insight would be greatly appreciated.
univocity-parsers supports field selection and can do this very easily. It's also faster than the library you are using.
Here's how you can use it to select columns of interest:
Input
String input = "X, X2, Symbol, Date, Open, High, Low, Close, Volume\n" +
" 5, 9, AAPL, 01-Jan-2015, 110.38, 110.38, 110.38, 110.38, 0\n" +
" 2710, 289, AAPL, 01-Jan-2015, 110.38, 110.38, 110.38, 110.38, 0\n" +
" 5415, 6500, AAPL, 02-Jan-2015, 111.39, 111.44, 107.35, 109.33, 53204600";
Configure
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial
settings.setHeaderExtractionEnabled(true); //tells the parser to use the first row as the header row
settings.selectFields("X", "X2"); //selects the fields
Parse and print results
CsvParser parser = new CsvParser(settings);
for(String[] row : parser.iterate(new StringReader(input))){
System.out.println(Arrays.toString(row));
}
}
Output
[5, 9]
[2710, 289]
[5415, 6500]
On the field selection, you can use any sequence of fields, and have rows with different column sizes, and the parser will handle this just fine. No need to write complex logic to handle that.
The process the File in your code, change the example above to do this:
for(String[] row : parser.iterate(new File(filePath))){
... //your logic goes here.
}
If you want a more usable record (with typed values), use this instead:
for(Record record : parser.iterateRecords(new File(filePath))){
... //your logic goes here.
}
Speeding up
The fastest way of processing the file is through a RowProcessor. That's a callback that received the rows parsed from the input:
settings.setProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
System.out.println(Arrays.toString(row));
context.skipLines(3); //use the context object to control the parser
}
});
CsvParser parser = new CsvParser(settings);
//`parse` doesn't return anything. Rows go to the `rowProcessed` method.
parser.parse(new StringReader(input));
You should be able to parse very large files pretty quickly. If things are slowing down look in your code (avoid adding values to lists or collections in memory, or at least pre-allocate the collections to a good size, and give the JVM a large amount of memory to work with using Xms and Xmx flags).
Right now this parser is the fastest you can find. I made this performance comparison a while ago you can use for reference.
Hope this helps
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license)
Do you know which fields/columns you want to keep, and what you'd like the "header" value to be ? , ie you want columns the first and third columns and you want them called "first" and "third" ? If so, you could build a HashMap of string/objects (or other appropriate type, depends on your actual data and needs), and add the HashMap to an ArrayList - this should get you going, just be sure to change the HashMap types as needed
ArrayList<HashMap<String,String>> arr=new ArrayList<>();
HashMap<String,String> hm=new HashMap<>();
while ((row = csvParser.nextRow()) != null) {
if ((linecounter % 3) > 0 ) {
// System.out.println("Read line: " + row);
//System.out.println("First column of line: " + row.getField(0));
// keep col1 and col3
hm.clear();
hm.put("first",row.getField(0));
hm.put("third",row.getField(2));
arr.add(hm);
}
linecounter ++;
}
If you want to capture all columns, you can use a similar technique but I'd build a mapping data structure so that you can match field indexes to column header names in a loop to add each column to the HashMap that is then stored in the ArrayList
I have a spreadsheet with a lot of formulas in it and several tabs. One of the tabs is for Input of numbers into 10 fields. Another tab is for viewing the output of calculated formulas.
Using Apache POI, I have opened the spreadsheet and input my numbers. The problem comes when I try to evaluate the spreadsheet.
I've tried
FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
helper.createFormulaEvaluator();
evaluator.evaluateAll();
And I get an error (that nobody seems to have an answer for): Unexpected arg eval type (org.apache.poi.ss.formula.eval.MissingArgEval)] with root cause
So I've changed to evaluating cells individually so I could find which cell has the error, so my code looks like this:
FormulaEvaluator evaluator = this.workbook.getCreationHelper().createFormulaEvaluator();
for (Sheet sheet : this.workbook) {
System.out.println("Evaluating next sheet");
System.out.println(sheet.getSheetName());
for (Row r : sheet) {
System.out.println("Row Number:");
System.out.println(r.getRowNum());
for (Cell c : r) {
if (c.getCellType() == Cell.CELL_TYPE_FORMULA) {
System.out.println(c.getColumnIndex());
try {
evaluator.evaluateFormulaCell(c);
} catch (Exception e) {
rowArray.add(r.getRowNum());
cellArray.add(c.getColumnIndex());
System.out.println("Skipping failed cell");
}
}
}
}
And I'm getting the same error as when I run evaluateAll.
By putting the little bit of debugging in there, I found that the error is coming from Cell L3, which contains formula: =D5. Since the evaluator goes by row:column, it evaluates everything on row 3 first before getting to 5, so L3 references a field that has not been evaluated yet, and therefore throws an error.
I tried catching the errors and storing the row and cell number in an array, then after everything in a sheet is processed, attempt to reprocess the unprocessed cells, but I still get the same result. I'm a bit perplexed why the retry didn't work.
Retry code:
// try to fix any failed evaluations here
Iterator cellItr = cellArray.iterator();
Iterator rowItr = rowArray.iterator();
while (cellItr.hasNext()) {
Integer cellElement = (int) cellItr.next();
Integer rowElement = (int) rowItr.next();
XSSFRow row = sheet.getRow(rowElement);
XSSFCell cell = row.getCell(cellElement);
System.out.println("Re-evaluating: " + rowElement + " : " + cellElement);
evaluator.evaluateFormulaCell(cell);
}
}
The retry code gave the same result.
I tried changing the original evaluator to use evaluateInCell to change the formula to an actual number, but that didn't seem to help.
----------------- UPDATE ---------------------
I just realized that evaluateFormulaCell is deprecated in favor of evaluateFormulaCellEnum. I put all of the code into a function and ran the function multiple times and realized it's evaluating all of the cells over and over again, so I switched to using evaluateInCell and found that it only evaluates each cell once, but still can't get pass the cells mentioned.
Here is my updated code, which I have inside a function that I run 5 times:
for (Sheet sheet : this.workbook) {
System.out.println("Evaluating next sheet" + sheet.getSheetName());
for (Row r : sheet) {
for (Cell c : r) {
if (c.getCellType() == Cell.CELL_TYPE_FORMULA) {
System.out.println("Cell index: " + r.getRowNum() + " - " + c.getColumnIndex());
try {
evaluator.evaluateInCell(c);
} catch (Exception e) {
try {
evaluator.evaluateFormulaCellEnum(c);
} catch (Exception ee) {
System.out.println("Skipping failed cell after 2 attempts");
}
}
}
}
}
With the debugging I have in place, I was able to see which cells in the spreadsheet were failing, so I saved the formulas from the failing cells in a text document and replaced the formulas with their values, then recompiled the code and the spreadsheet actually evaluated!
Then I went through all of the cells and put their formulas back two by two until it broke again. It turned out to be a case I already knew about, but searching a spreadsheet for is no piece of cake.
This was the formula with the issue: =ROUNDUP('HW page'!$H$53*'HW page'!$H$54,)
I added a 0 as the last parameter so it looks like this: =ROUNDUP('HW assumptions'!$H$53*'HW assumptions'!$H$54,0), then the evaluator works.