can not convert xlsx to csv using poi api

can not convert xlsx to csv using poi api - java

I am trying to convert xlsx file to csv file using below code
import java.io.*;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class XLStoCSVConvert {
static void xlsx(File inputFile, File outputFile) {
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream(outputFile);
// Get the workbook object for XLSX file
System.out.println("working......1");
XSSFWorkbook wBook = new XSSFWorkbook(new FileInputStream(inputFile));
// Get first sheet from the workbook
System.out.println("working......2");
XSSFSheet sheet = wBook.getSheetAt(0);
Row row;
Cell cell;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
}
}
fos.write(data.toString().getBytes());
fos.close();
} catch (Exception ioe) {
ioe.printStackTrace();
}
}
public static void main(String[] args) {
File inputFile = new File("/home/raptorjd4/Desktop/ToConsult.xlsx");
//writing excel data to csv
File outputFile = new File("/home/raptorjd4/Desktop/RaptorTrackingSystem/ToConsult.csv");
xlsx(inputFile, outputFile);
}
}
But i am getting output,
Working......1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/UnsupportedFileFormatException
My jar in lib folder,
poi-3.5-FINAL.jar
poi-ooxml-3.11.jar
why i am getting this error when i mapped all needed jar file in lib folder.
Where am i doing mistake?

For me your code worked just fine, below are the dependencies I added:-
The problem is either with the data in your excel or some missing dependency.
Ensure that you do have all the necessary dependencies on your classpath.

Your problem is this part:
poi-3.5-FINAL.jar
poi-ooxml-3.11.jar
As explained in this Apache POI FAQ entry:
Can I mix POI jars from different versions?
No. This is not supported.
All POI jars in use must come from the same version. A combination such as poi-3.11.jar and poi-ooxml-3.9.jar is not supported, and will fail to work in unpredictable ways.
You must ensure that all of your Apache POI jars come from the same version!
Switch your jars to be from the same version, ideally the latest, and you should be good

I solved this problem by adding below jar files,
poi-3.9.jar
poi-ooxml-3.9.jar
poi-ooxml-schemas-3.9-20121203.jar
xmlbeans-2.3.0.jar
dom4j-1.6.1.jar

Related

PackagePropertiesMarshaller$NamespaceImpl not found using Apache poi with Java Servlet

I've been trying to build my first web application using IntelliJ and Tomcat, and one of the tasks is being able to upload and process an Excel sheet file. So, I looked up online, and found the Apache POI library that can help me parse an Excel file. But when I downloaded all the required jars and copied and pasted some testing code, and start up the server, it shows on the webpage an error with http status 500, the root cause being: java.lang.ClassNotFoundException: org.apache.poi.openxml4j.opc.internal.marshallers.PackagePropertiesMarshaller$NamespaceImpl.
I've encountered the problem with other jars, but all solved by putting the corresponding jars inside tomcat's lib folder, just except for this one.
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
public class ExcelParser {
private String pathname;
public ExcelParser(String pathname) {
this.pathname = pathname;
}
public void parse() {
try {
FileInputStream file = new FileInputStream(new File("/Users/JohnDoe/Desktop/test.xlsx"));
//Create Workbook instance holding reference to .xlsx file
XSSFWorkbook workbook = new XSSFWorkbook(file);
//Get first/desired sheet from the workbook
XSSFSheet sheet = workbook.getSheetAt(0);
//Iterate through each rows one by one
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
//For each row, iterate through all the columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
//Check the cell type and format accordingly
switch (cell.getCellType()) {
case NUMERIC:
System.out.print(cell.getNumericCellValue() + "t");
break;
case STRING:
System.out.print(cell.getStringCellValue() + "t");
break;
}
}
System.out.println();
}
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
I'm just testing the functionality of Excel parsing, so don't really worry about the pathname.
Btw, I can see that this (inner) class is declared in poi-ooxml4-4.1.0.jar, which is also included in my Tomcat lib folder.
Any ideas why this is happening, and how I should fix it is appreciated.

To use Apache POI, you need the following jar files.
poi-ooxml-4.1.0.jar
poi-ooxml-schemas-4.1.0.jar
xmlbeans-3.1.0.jar
commons-compress-1.18.jar
curvesapi-1.06.jar
poi-4.1.0.jar
commons-codec-1.12.jar
commons-collections4-4.3.jar
commons-math3-3.6.1.jar
You can refer to the following link, which I have answered few things.
Unable to read Excel using Apache POI

I think I missed something when moving the jars to the lib directory, as I removed the original files and redo the cp command, everything works now. I'm closing the question with answer, thanks for the help!

How to read xlsx files sequentially

I have a large xlsx file (74 Mbyte). I have found a way to read it in. Here is my source code so far.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
private static void readXLSX(String path) throws IOException {
File myFile = new File(path);
FileInputStream fis = new FileInputStream(myFile);
// Finds the workbook instance for XLSX file
XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);
// Return first sheet from the XLSX workbook
XSSFSheet mySheet = myWorkBook.getSheetAt(0);
// Get iterator to all the rows in current sheet
Iterator<Row> rowIterator = mySheet.iterator();
// Traversing over each row of XLSX file
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "\t");
break;
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "\t");
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.print(cell.getBooleanCellValue() + "\t");
break;
default :
}
}
System.out.println("");
}
}
The problem is that my 8 GByte Ram doesn't seem to be sufficient, even using swapping and extending the JVM memory.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
Do You have any idea why this code is so inefficient? Or maybe You have an idea how to read this code sequentially and buffer the temporary rows in a less memory consuming way?
Thanks in advance

Using XSSF version of Poi is known to cause memory issues. You can use the streaming alternative, this will ensure you wont run out of memory.
In short, use this alternative
SXSSFWorkbook instead of XSSFWorkbook
API details here

Write time format (hh:mm:ss) without date in Java [duplicate]

This question already has answers here:
how to read exact cell content of excel file in apache POI
(2 answers)
Closed 7 years ago.
I am trying to convert an Excel (.xls) file having multiple worksheets into a .csv. The code works fine but I notice the datatype for certain columns is getting changed from time datatype to double datatype.
Example: If my input is 00:45:20, I am getting output like 0.006168981481481482. Each worksheet has columns using time datatype.
Note: My input do not have date part. Only time component is there. I have seen few posts related to this and tried the same. But the java code is printing only default date and excluded the time part.
I feel something has to be modified in case statement to populate time datatype. I would like to have a generic program so that whenever there is time datatype I have to write it in same format. The code I used:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
public class exceltst
{
static void xls(File inputFile, File outputFile,int sheet_num)
{
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try
{
FileOutputStream fos = new FileOutputStream(outputFile);
// Get the workbook object for XLS file
HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(inputFile));
// Get first sheet from the workbook
HSSFSheet sheet = workbook.getSheetAt(sheet_num);
Cell cell;
Row row;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext())
{
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
cell = cellIterator.next();
switch (cell.getCellType())
{
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
}
data.append('\n');
}
fos.write(data.toString().getBytes());
fos.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
{
File inputFile = new File("C:\\Call_Center_20150323.xls");
File outputFile1 = new File("C:\\live_person.csv");
xls(inputFile, outputFile1,3);
}
}
Could you please help how to populate the time datatype (hh:mm:ss) without date instead of double in the output file?

You should create a CellStyle at the first, then set this style for your time cell. Also for cvs file, you cannot create a CellStyle, you should work on excel file for using cell styles.
For Excel:
CellStyle style = workBook.createCellStyle();
style.setDataFormat(workBook.createDataFormat().getFormat("hh:mm:ss"));
cell.setCellStyle(style);
cell.setCellValue("16:15:11");
For cvs file, you should set value of your Cell as String:
data.append("16:15:11" + ",");

Try
if(cell.getCellType()==Cell.CELL_TYPE_NUMERIC){
if (DateUtil.isCellDateFormatted(cell)) {
System.out.println(cell.getDateCellValue());
} else {
System.out.println(cell.getNumericCellValue());
}
}
For details you can refer here

Trying to read an excel file using poi apache library

I am trying to read an excel file using poi apache library. I tried different types of code but still i am getting the same error with all of my codes. I do not know why this error is coming.
You can download POI apache library from this link:
https://poi.apache.org/download.html
Here is my code to read an excel file:
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
/**
*
* #author Pacer
*/
public class ReadExcelDemo
{
public static void main(String[] args)
{
try
{
System.out.println("Working Directory = " + System.getProperty("user.dir"));
FileInputStream file = new FileInputStream(new File("book.xlsx"));
//Create Workbook instance holding reference to .xlsx file
XSSFWorkbook workbook = new XSSFWorkbook(file);
//Get first/desired sheet from the workbook
XSSFSheet sheet = workbook.getSheetAt(0);
//Iterate through each rows one by one
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext())
{
Row row = rowIterator.next();
//For each row, iterate through all the columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
Cell cell = cellIterator.next();
//Check the cell type and format accordingly
switch (cell.getCellType())
{
case Cell.CELL_TYPE_NUMERIC:
System.out.print(cell.getNumericCellValue() + "t");
break;
case Cell.CELL_TYPE_STRING:
System.out.print(cell.getStringCellValue() + "t");
break;
}
}
System.out.println("");
}
file.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
And here is the error i am getting:
Working Directory = E:\NetBeansProjects\Project24\CoverageCodetool
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at coveragecodetool.ReadExcelDemo.main(ReadExcelDemo.java:30)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
Java Result: 1
Please help!

You're missing xmlbeans apache library in your classpath.
Add xmlbeans to your classpath and everything will work.
The library itself can be downloaded here.
The general algorithm of resolving NoClassDefFoundException is the following:
search for the library that uses the class mentioned in the Exception. I prefer this service
add the library to your classpath
try to run the code and see if the problem still persists.
repeat from step one

XLSX to CSV out of memory error

I found plenty of solutions how to convert XLSX to CSV file using Java, all the solutions use: XSSFWorkbook. Problem I am facing is that probably the stream is having too much data. I just don't get why, the file is just 4mb.
CODE:
// For storing data into CSV files
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream(outputFile);
System.out.println("Getting input stream.");
// Get the workbook object for XLS file
XSSFWorkbook workbook = new XSSFWorkbook(new FileInputStream(inputFile));
System.out.println(" - Done");
// Get first sheet from the workbook
XSSFSheet sheet = workbook.getSheetAt(0);
Cell cell;
Row row;
// Iterate through each rows from first sheet
Iterator<Row> rowIterator = sheet.iterator();
System.out.println(" - Reading xlsx rows.");
while (rowIterator.hasNext()) {
i++;
row = rowIterator.next();
// For each row, iterate through each columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ";");
break;
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ";");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ";");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ";");
break;
default:
data.append(cell + ";");
}
}
data.append('\n');
int limit = 10000;
if ((i % limit) == 0) {
System.out.println(" - Writing " + limit + " data.");
fos.write(data.toString().getBytes());
fos.flush();
data = null;
data = new StringBuffer();
System.out.println(" - Data written.");
}
}
fos.write(data.toString().getBytes());
fos.flush();
fos.close();
The error is pointing to line in switch statement where I am appending something to data (StringBuffer), but I am nulling it so it shouldn't be an issue.

Now you may not be able to use SXSSFWorkbook (as it's write-only), but you may be able to convert your program to streaming-style using the SAX-based API. Edit: Another thing you may want to try is to create the XSSFWorkbook from File instead of InputStream (I remember reading somewhere that the File-based code needs less memory).
(First try was:
Since you are reading data sequentially the SXSSFWorkbook class should be just the thing you need.)

The xlsx format is just a zip with content xml and shared-strings xml. Hence 4 MB compressed, may well be very large uncompressed.
Using a zip file system you could load the shared strings into memory, and then read content xml sequentially, immediately outputting.
As two inner files are concerned, you might use java's zip file system. Tedious but not difficult.

try this code this one is perfectly working for me i hope that also working for you.
package com.converting;
import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import com.opencsv.CSVWriter;
import java.util.Iterator;
import java.io.FileWriter;
public class XlsxtoCSV {
public static void main(String[] args) throws Exception{
FileInputStream input_document = new FileInputStream(new File("/home/blackpearl/Downloads/aa.xlsx"));
XSSFWorkbook my_xls_workbook = new XSSFWorkbook(input_document);
XSSFSheet my_worksheet = my_xls_workbook.getSheetAt(0);
Iterator<Row> rowIterator = my_worksheet.iterator();
FileWriter my_csv=new FileWriter("/home/blackpearl/Downloads/Newaa.csv");
CSVWriter my_csv_output=new CSVWriter(my_csv);
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
int i=0;//String array
String[] csvdata = new String[20];
Iterator<Cell> cellIterator = row.cellIterator();
while(cellIterator.hasNext()) {
Cell cell = cellIterator.next(); //Fetch CELL
switch(cell.getCellType()) { //Identify CELL type
case Cell.CELL_TYPE_STRING:
csvdata[i]= cell.getStringCellValue();
break;
}
i=i+1;
}
my_csv_output.writeNext(csvdata);
}
System.out.println("file imported");
my_csv_output.close(); //close the CSV file
input_document.close(); //close xlsx file
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

can not convert xlsx to csv using poi api - java

For me your code worked just fine, below are the dependencies I added:- The problem is either with the data in your excel or some missing dependency. Ensure that you do have all the necessary dependencies on your classpath.

I solved this problem by adding below jar files, poi-3.9.jar poi-ooxml-3.9.jar poi-ooxml-schemas-3.9-20121203.jar xmlbeans-2.3.0.jar dom4j-1.6.1.jar

Related

PackagePropertiesMarshaller$NamespaceImpl not found using Apache poi with Java Servlet

How to read xlsx files sequentially

Write time format (hh:mm:ss) without date in Java [duplicate]

Trying to read an excel file using poi apache library

XLSX to CSV out of memory error

Categories

Resources