I have a large xlsx file (74 Mbyte). I have found a way to read it in. Here is my source code so far.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
private static void readXLSX(String path) throws IOException {
File myFile = new File(path);
FileInputStream fis = new FileInputStream(myFile);
// Finds the workbook instance for XLSX file
XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);
// Return first sheet from the XLSX workbook
XSSFSheet mySheet = myWorkBook.getSheetAt(0);
// Get iterator to all the rows in current sheet
Iterator<Row> rowIterator = mySheet.iterator();
// Traversing over each row of XLSX file
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t");
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t");
break;
default :
}
}
System.out.println("");
}
}
The problem is that my 8 GByte Ram doesn't seem to be sufficient, even using swapping and extending the JVM memory.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
Do You have any idea why this code is so inefficient? Or maybe You have an idea how to read this code sequentially and buffer the temporary rows in a less memory consuming way?
Thanks in advance
Using XSSF version of Poi is known to cause memory issues. You can use the streaming alternative, this will ensure you wont run out of memory.
In short, use this alternative
SXSSFWorkbook instead of XSSFWorkbook
API details here
Related
This coding is able to read the small data of excel file... but not reading the large data files in excel files.... how to modify the code further?
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.sql.SQLException;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
/**
*
* #author Administrator
*/
public class ReadExcelNdArray {
public static void main(String[] args) throws Exception {
long start = System.currentTimeMillis();
System.out.println("Time taken: " + (System.currentTimeMillis() - start) + " ms");
File myFile = new File("D://Raghulpr/Transaction Data.xlsx");
FileInputStream fis = new FileInputStream(myFile);
// Finds the workbook instance for XLSX file
XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);
// Return first sheet from the XLSX workbook
XSSFSheet mySheet = myWorkBook.getSheetAt(0);
// Get iterator to all the rows in current sheet
Iterator<Row> rowIterator = mySheet.iterator();
// Traversing over each row of XLSX file
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t");
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t");
break;
default :
}
}
System.out.println("");
}
}
}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:77)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:121)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:55)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:88)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:272)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:254)
at readexcelndarray.ReadExcelNdArray.main(ReadExcelNdArray.java:36)
I don't know if you still need answer to this, but I was also searching for the same and was struggling to read a large file . After spending a lot of time all over the internet I found one solution to this . You can check
Excel streaming reader
import com.monitorjbl.xlsx.StreamingReader;
InputStream is = new FileInputStream(new File("G:\\Book1.xlsx"));
Workbook workbook = StreamingReader.builder()
.rowCacheSize(100)
.bufferSize(4096)
.open(is);
Now you can use workbook to process your file further .
I was able to process xlsx file having more than 4 lac records .
Firstly you need to close all Input - output stream object like FileInputStream etc in your code. Secondly, you can also increase your JVM heap space as mention in this link: Increase heap size in Java
We have jxl api for reading, writing excel files. The problem with this api is at the max you can read and write 65535 rows while starting row is indexed at 0. But it's really flexible.
Since, number of rows are more than 65535 in your case, I would suggest you to prefer Apache POI. Virtually, there is no limit for this api.
You need to increase the heap size so as to read the large files.I suggest using 64bit machine.
I've had the same problem, if you change to the much lower level SAX parsing instead you will save a lot of memory. http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api
I think I reduced about 4.5 GB(!) memory usage (about 11MB file with a lot of formulas) down to something more manageable (don't remember exactly, but it was so low it didn't matter anymore, at least reduced by a factor of 10).
Harder to implement but worth the time if you need to reduce memory footprint
I am trying to convert xlsx file to csv file using below code
import java.io.*;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class XLStoCSVConvert {
static void xlsx(File inputFile, File outputFile) {
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream(outputFile);
// Get the workbook object for XLSX file
System.out.println("working......1");
XSSFWorkbook wBook = new XSSFWorkbook(new FileInputStream(inputFile));
// Get first sheet from the workbook
System.out.println("working......2");
XSSFSheet sheet = wBook.getSheetAt(0);
Row row;
Cell cell;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
}
}
fos.write(data.toString().getBytes());
fos.close();
} catch (Exception ioe) {
ioe.printStackTrace();
}
}
public static void main(String[] args) {
File inputFile = new File("/home/raptorjd4/Desktop/ToConsult.xlsx");
//writing excel data to csv
File outputFile = new File("/home/raptorjd4/Desktop/RaptorTrackingSystem/ToConsult.csv");
xlsx(inputFile, outputFile);
}
}
But i am getting output,
Working......1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/UnsupportedFileFormatException
My jar in lib folder,
poi-3.5-FINAL.jar
poi-ooxml-3.11.jar
why i am getting this error when i mapped all needed jar file in lib folder.
Where am i doing mistake?
For me your code worked just fine, below are the dependencies I added:-
The problem is either with the data in your excel or some missing dependency.
Ensure that you do have all the necessary dependencies on your classpath.
Your problem is this part:
poi-3.5-FINAL.jar
poi-ooxml-3.11.jar
As explained in this Apache POI FAQ entry:
Can I mix POI jars from different versions?
No. This is not supported.
All POI jars in use must come from the same version. A combination such as poi-3.11.jar and poi-ooxml-3.9.jar is not supported, and will fail to work in unpredictable ways.
You must ensure that all of your Apache POI jars come from the same version!
Switch your jars to be from the same version, ideally the latest, and you should be good
I solved this problem by adding below jar files,
poi-3.9.jar
poi-ooxml-3.9.jar
poi-ooxml-schemas-3.9-20121203.jar
xmlbeans-2.3.0.jar
dom4j-1.6.1.jar
I found plenty of solutions how to convert XLSX to CSV file using Java, all the solutions use: XSSFWorkbook. Problem I am facing is that probably the stream is having too much data. I just don't get why, the file is just 4mb.
CODE:
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream(outputFile);
System.out.println("Getting input stream.");
// Get the workbook object for XLS file
XSSFWorkbook workbook = new XSSFWorkbook(new FileInputStream(inputFile));
System.out.println(" - Done");
// Get first sheet from the workbook
XSSFSheet sheet = workbook.getSheetAt(0);
Cell cell;
Row row;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
System.out.println(" - Reading xlsx rows.");
while (rowIterator.hasNext()) {
i++;
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ";");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ";");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ";");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ";");
break;
default:
data.append(cell + ";");
}
}
data.append('\n');
int limit = 10000;
if ((i % limit) == 0) {
System.out.println(" - Writing " + limit + " data.");
fos.write(data.toString().getBytes());
fos.flush();
data = null;
data = new StringBuffer();
System.out.println(" - Data written.");
}
}
fos.write(data.toString().getBytes());
fos.flush();
fos.close();
The error is pointing to line in switch statement where I am appending something to data (StringBuffer), but I am nulling it so it shouldn't be an issue.
Now you may not be able to use SXSSFWorkbook (as it's write-only), but you may be able to convert your program to streaming-style using the SAX-based API. Edit: Another thing you may want to try is to create the XSSFWorkbook from File instead of InputStream (I remember reading somewhere that the File-based code needs less memory).
(First try was:
Since you are reading data sequentially the SXSSFWorkbook class should be just the thing you need.)
The xlsx format is just a zip with content xml and shared-strings xml. Hence 4 MB compressed, may well be very large uncompressed.
Using a zip file system you could load the shared strings into memory, and then read content xml sequentially, immediately outputting.
As two inner files are concerned, you might use java's zip file system. Tedious but not difficult.
try this code this one is perfectly working for me i hope that also working for you.
package com.converting;
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import com.opencsv.CSVWriter;
import java.util.Iterator;
import java.io.FileWriter;
public class XlsxtoCSV {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("/home/blackpearl/Downloads/aa.xlsx"));
XSSFWorkbook my_xls_workbook = new XSSFWorkbook(input_document);
XSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0);
Iterator<Row> rowIterator = my_worksheet.iterator();
FileWriter my_csv=new FileWriter("/home/blackpearl/Downloads/Newaa.csv");
CSVWriter my_csv_output=new CSVWriter(my_csv);
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
int i=0;//String array
String[] csvdata = new String[20];
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
case Cell.CELL_TYPE_STRING:
csvdata[i]= cell.getStringCellValue();
break;
}
i=i+1;
}
my_csv_output.writeNext(csvdata);
}
System.out.println("file imported");
my_csv_output.close(); //close the CSV file
input_document.close(); //close xlsx file
}
}
i am using servlet and trying to read the user uploaded excel file and insert into database.
my excel is in this format:
ID IP1 IP2 USER TKTNO(these are headings in excel & database table as well)
under those heading i have data in excel file which i have to read and insert into database.
please desperately need help....thank you
I am using Docx4J for this purpose... good with Docx and xlsx
http://www.docx4java.org/trac/docx4j
this is how you read an excel file using apache POI library , i guess this is good enough for starters , now you can take the cell values stored in some collection objects and store the object to Database according to requirement
package com.Excel;
import java.io.*;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class ReadExcelFile {
public static void main(String[] args)
{
try {
FileInputStream file = new FileInputStream(new File("C:/Users/hussain.a/Desktop/mar_25/Tradestation_Q4 Dashboard_Week 5_1029-1104.xlsx"));
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
Iterator<Row> rowIterator = sheet.iterator();
rowIterator.next();
while(rowIterator.hasNext())
{
Row row = rowIterator.next();
//For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext())
{
Cell cell = cellIterator.next();
switch(cell.getCellType())
{
case Cell.CELL_TYPE_BOOLEAN:
System.out.println("boolean===>>>"+cell.getBooleanCellValue() + "\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.println("numeric===>>>"+cell.getNumericCellValue() + "\t");
break;
case Cell.CELL_TYPE_STRING:
System.out.println("String===>>>"+cell.getStringCellValue() + "\t");
break;
}
}
System.out.println("");
}
file.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
I've try to run this code in eclipse but I've get this: selection does not contain a main type eclipse.
Does anyone know how I will do it? I am newbie in java and I need help!
The program I try to make is to read excel file using POI! :)
import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class sample2 {
private void sample2(test)
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
//Get the workbook instance for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(test);
//Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(0);
//Get iterator to all the rows in current sheet
Iterator<Row> rowIterator = sheet.iterator();
//Get iterator to all cells of current row
Iterator<Cell> cellIterator = row.cellIterator();
try {
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
//Get the workbook instance for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(file);
//Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(0);
//Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
//For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch(cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t\t");
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t\t");
break;
}
}
System.out.println("");
}
file.close();
FileOutputStream out =
new FileOutputStream(new File("C:\\test.xls"));
workbook.write(out);
out.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
You cannot run a Java application without a main method.
You need something like the following:
public static void main(String[] args) {
sample2 s = new sample2();
s.sample();
}
Also your code contains a lot of errors. You are:
Missing a main method
Capitalization is wrong
Miss types on the input argument for the sample2 method (String test?)
The code is broken many ways. You duplicated the code to read files twice, for error handling, etc.
Reading a good tutorial on Java would help greatly here. A great tutorial on Java and Excel can be found here, and pay some attention to the main method, that's the entry of your Java application.
Your code will not compile due to the identifier "test" in the sample2 method. Remove it and to run the program :
Just add the following method:
public static void main(String[] args) {
new sample2().sample2();
}