XLSX to CSV out of memory error

XLSX to CSV out of memory error - java

I found plenty of solutions how to convert XLSX to CSV file using Java, all the solutions use: XSSFWorkbook. Problem I am facing is that probably the stream is having too much data. I just don't get why, the file is just 4mb.
CODE:
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream(outputFile);
System.out.println("Getting input stream.");
// Get the workbook object for XLS file
XSSFWorkbook workbook = new XSSFWorkbook(new FileInputStream(inputFile));
System.out.println(" - Done");
// Get first sheet from the workbook
XSSFSheet sheet = workbook.getSheetAt(0);
Cell cell;
Row row;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
System.out.println(" - Reading xlsx rows.");
while (rowIterator.hasNext()) {
i++;
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ";");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ";");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ";");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ";");
break;
default:
data.append(cell + ";");
}
}
data.append('\n');
int limit = 10000;
if ((i % limit) == 0) {
System.out.println(" - Writing " + limit + " data.");
fos.write(data.toString().getBytes());
fos.flush();
data = null;
data = new StringBuffer();
System.out.println(" - Data written.");
}
}
fos.write(data.toString().getBytes());
fos.flush();
fos.close();
The error is pointing to line in switch statement where I am appending something to data (StringBuffer), but I am nulling it so it shouldn't be an issue.

Now you may not be able to use SXSSFWorkbook (as it's write-only), but you may be able to convert your program to streaming-style using the SAX-based API. Edit: Another thing you may want to try is to create the XSSFWorkbook from File instead of InputStream (I remember reading somewhere that the File-based code needs less memory).
(First try was:
Since you are reading data sequentially the SXSSFWorkbook class should be just the thing you need.)

The xlsx format is just a zip with content xml and shared-strings xml. Hence 4 MB compressed, may well be very large uncompressed.
Using a zip file system you could load the shared strings into memory, and then read content xml sequentially, immediately outputting.
As two inner files are concerned, you might use java's zip file system. Tedious but not difficult.

try this code this one is perfectly working for me i hope that also working for you.
package com.converting;
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import com.opencsv.CSVWriter;
import java.util.Iterator;
import java.io.FileWriter;
public class XlsxtoCSV {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("/home/blackpearl/Downloads/aa.xlsx"));
XSSFWorkbook my_xls_workbook = new XSSFWorkbook(input_document);
XSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0);
Iterator<Row> rowIterator = my_worksheet.iterator();
FileWriter my_csv=new FileWriter("/home/blackpearl/Downloads/Newaa.csv");
CSVWriter my_csv_output=new CSVWriter(my_csv);
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
int i=0;//String array
String[] csvdata = new String[20];
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
case Cell.CELL_TYPE_STRING:
csvdata[i]= cell.getStringCellValue();
break;
}
i=i+1;
}
my_csv_output.writeNext(csvdata);
}
System.out.println("file imported");
my_csv_output.close(); //close the CSV file
input_document.close(); //close xlsx file
}
}

Related

How to read xlsx files sequentially

I have a large xlsx file (74 Mbyte). I have found a way to read it in. Here is my source code so far.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
private static void readXLSX(String path) throws IOException {
File myFile = new File(path);
FileInputStream fis = new FileInputStream(myFile);
// Finds the workbook instance for XLSX file
XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);
// Return first sheet from the XLSX workbook
XSSFSheet mySheet = myWorkBook.getSheetAt(0);
// Get iterator to all the rows in current sheet
Iterator<Row> rowIterator = mySheet.iterator();
// Traversing over each row of XLSX file
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t");
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t");
break;
default :
}
}
System.out.println("");
}
}
The problem is that my 8 GByte Ram doesn't seem to be sufficient, even using swapping and extending the JVM memory.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
Do You have any idea why this code is so inefficient? Or maybe You have an idea how to read this code sequentially and buffer the temporary rows in a less memory consuming way?
Thanks in advance

Using XSSF version of Poi is known to cause memory issues. You can use the streaming alternative, this will ensure you wont run out of memory.
In short, use this alternative
SXSSFWorkbook instead of XSSFWorkbook
API details here

Write time format (hh:mm:ss) without date in Java [duplicate]

This question already has answers here:
how to read exact cell content of excel file in apache POI
(2 answers)
Closed 7 years ago.
I am trying to convert an Excel (.xls) file having multiple worksheets into a .csv. The code works fine but I notice the datatype for certain columns is getting changed from time datatype to double datatype.
Example: If my input is 00:45:20, I am getting output like 0.006168981481481482. Each worksheet has columns using time datatype.
Note: My input do not have date part. Only time component is there. I have seen few posts related to this and tried the same. But the java code is printing only default date and excluded the time part.
I feel something has to be modified in case statement to populate time datatype. I would like to have a generic program so that whenever there is time datatype I have to write it in same format. The code I used:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
public class exceltst
{
static void xls(File inputFile, File outputFile,int sheet_num)
{
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try
{
FileOutputStream fos = new FileOutputStream(outputFile);
// Get the workbook object for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(inputFile));
// Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(sheet_num);
Cell cell;
Row row;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext())
{
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
cell = cellIterator.next();
switch (cell.getCellType())
{
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
}
data.append('\n');
}
fos.write(data.toString().getBytes());
fos.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
{
File inputFile = new File("C:\\Call_Center_20150323.xls");
File outputFile1 = new File("C:\\live_person.csv");
xls(inputFile, outputFile1,3);
}
}
Could you please help how to populate the time datatype (hh:mm:ss) without date instead of double in the output file?

You should create a CellStyle at the first, then set this style for your time cell. Also for cvs file, you cannot create a CellStyle, you should work on excel file for using cell styles.
For Excel:
CellStyle style = workBook.createCellStyle();
style.setDataFormat(workBook.createDataFormat().getFormat("hh:mm:ss"));
cell.setCellStyle(style);
cell.setCellValue("16:15:11");
For cvs file, you should set value of your Cell as String:
data.append("16:15:11" + ",");

Try
if(cell.getCellType()==Cell.CELL_TYPE_NUMERIC){
if (DateUtil.isCellDateFormatted(cell)) {
System.out.println(cell.getDateCellValue());
} else {
System.out.println(cell.getNumericCellValue());
}
}
For details you can refer here

can not convert xlsx to csv using poi api

I am trying to convert xlsx file to csv file using below code
import java.io.*;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class XLStoCSVConvert {
static void xlsx(File inputFile, File outputFile) {
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream(outputFile);
// Get the workbook object for XLSX file
System.out.println("working......1");
XSSFWorkbook wBook = new XSSFWorkbook(new FileInputStream(inputFile));
// Get first sheet from the workbook
System.out.println("working......2");
XSSFSheet sheet = wBook.getSheetAt(0);
Row row;
Cell cell;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
}
}
fos.write(data.toString().getBytes());
fos.close();
} catch (Exception ioe) {
ioe.printStackTrace();
}
}
public static void main(String[] args) {
File inputFile = new File("/home/raptorjd4/Desktop/ToConsult.xlsx");
//writing excel data to csv
File outputFile = new File("/home/raptorjd4/Desktop/RaptorTrackingSystem/ToConsult.csv");
xlsx(inputFile, outputFile);
}
}
But i am getting output,
Working......1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/UnsupportedFileFormatException
My jar in lib folder,
poi-3.5-FINAL.jar
poi-ooxml-3.11.jar
why i am getting this error when i mapped all needed jar file in lib folder.
Where am i doing mistake?

For me your code worked just fine, below are the dependencies I added:-
The problem is either with the data in your excel or some missing dependency.
Ensure that you do have all the necessary dependencies on your classpath.

Your problem is this part:
poi-3.5-FINAL.jar
poi-ooxml-3.11.jar
As explained in this Apache POI FAQ entry:
Can I mix POI jars from different versions?
No. This is not supported.
All POI jars in use must come from the same version. A combination such as poi-3.11.jar and poi-ooxml-3.9.jar is not supported, and will fail to work in unpredictable ways.
You must ensure that all of your Apache POI jars come from the same version!
Switch your jars to be from the same version, ideally the latest, and you should be good

I solved this problem by adding below jar files,
poi-3.9.jar
poi-ooxml-3.9.jar
poi-ooxml-schemas-3.9-20121203.jar
xmlbeans-2.3.0.jar
dom4j-1.6.1.jar

how to read excel file and insert those data into database using java and poi or any other libraray?

i am using servlet and trying to read the user uploaded excel file and insert into database.
my excel is in this format:
ID IP1 IP2 USER TKTNO(these are headings in excel & database table as well)
under those heading i have data in excel file which i have to read and insert into database.
please desperately need help....thank you

I am using Docx4J for this purpose... good with Docx and xlsx
http://www.docx4java.org/trac/docx4j

this is how you read an excel file using apache POI library , i guess this is good enough for starters , now you can take the cell values stored in some collection objects and store the object to Database according to requirement
package com.Excel;
import java.io.*;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class ReadExcelFile {
public static void main(String[] args)
{
try {
FileInputStream file = new FileInputStream(new File("C:/Users/hussain.a/Desktop/mar_25/Tradestation_Q4 Dashboard_Week 5_1029-1104.xlsx"));
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
Iterator<Row> rowIterator = sheet.iterator();
rowIterator.next();
while(rowIterator.hasNext())
{
Row row = rowIterator.next();
//For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext())
{
Cell cell = cellIterator.next();
switch(cell.getCellType())
{
case Cell.CELL_TYPE_BOOLEAN:
System.out.println("boolean===>>>"+cell.getBooleanCellValue() + "\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.println("numeric===>>>"+cell.getNumericCellValue() + "\t");
break;
case Cell.CELL_TYPE_STRING:
System.out.println("String===>>>"+cell.getStringCellValue() + "\t");
break;
}
}
System.out.println("");
}
file.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

How to run this program which read excel using POI

I've try to run this code in eclipse but I've get this: selection does not contain a main type eclipse.
Does anyone know how I will do it? I am newbie in java and I need help!
The program I try to make is to read excel file using POI! :)
import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class sample2 {
private void sample2(test)
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
//Get the workbook instance for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(test);
//Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(0);
//Get iterator to all the rows in current sheet
Iterator<Row> rowIterator = sheet.iterator();
//Get iterator to all cells of current row
Iterator<Cell> cellIterator = row.cellIterator();
try {
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
//Get the workbook instance for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(file);
//Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(0);
//Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
//For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch(cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t\t");
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t\t");
break;
}
}
System.out.println("");
}
file.close();
FileOutputStream out =
new FileOutputStream(new File("C:\\test.xls"));
workbook.write(out);
out.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

You cannot run a Java application without a main method.
You need something like the following:
public static void main(String[] args) {
sample2 s = new sample2();
s.sample();
}
Also your code contains a lot of errors. You are:
Missing a main method
Capitalization is wrong
Miss types on the input argument for the sample2 method (String test?)
The code is broken many ways. You duplicated the code to read files twice, for error handling, etc.
Reading a good tutorial on Java would help greatly here. A great tutorial on Java and Excel can be found here, and pay some attention to the main method, that's the entry of your Java application.

Your code will not compile due to the identifier "test" in the sample2 method. Remove it and to run the program :
Just add the following method:
public static void main(String[] args) {
new sample2().sample2();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XLSX to CSV out of memory error - java

Related

How to read xlsx files sequentially

Write time format (hh:mm:ss) without date in Java [duplicate]

can not convert xlsx to csv using poi api

how to read excel file and insert those data into database using java and poi or any other libraray?

How to run this program which read excel using POI

Categories

Resources