Java CSVReader ignore commas in double quotes - java

I have a CSV file that I am having trouble parsing. I am using the opencsv library. Here is what my data looks like and what I am trying to achieve.
RPT_PE,CLASS,RPT_MKT,PROV_CTRCT,CENTER_NM,GK_TY,MBR_NM,MBR_PID
"20150801","NULL","33612","00083249P PCP602","JOE SMITH ARNP","NULL","FRANK, LUCAS E","50004655200"
The issue I am having is the member name ("FRANK, LUCAS E") is being split into two columns and the member name should be one. Again I'm using opencsv and a comma as the separator. Is there any way I can ignore the commas inside the double-quotes?
public void loadCSV(String csvFile, String tableName,
boolean truncateBeforeLoad) throws Exception {
CSVReader csvReader = null;
if (null == this.connection) {
throw new Exception("Not a valid connection.");
}
try {
csvReader = new CSVReader(new FileReader(csvFile), this.seprator);
} catch (Exception e) {
e.printStackTrace();
throw new Exception("Error occured while executing file. "
+ e.getMessage());
}
String[] headerRow = csvReader.readNext();
if (null == headerRow) {
throw new FileNotFoundException(
"No columns defined in given CSV file."
+ "Please check the CSV file format.");
}
String questionmarks = StringUtils.repeat("?,", headerRow.length);
questionmarks = (String) questionmarks.subSequence(0, questionmarks
.length() - 1);
String query = SQL_INSERT.replaceFirst(TABLE_REGEX, tableName);
System.out.println("Base Query: " + query);
String headerRowMod = Arrays.toString(headerRow).replaceAll(", ]", "]");
String[] strArray = headerRowMod.split(",");
query = query
.replaceFirst(KEYS_REGEX, StringUtils.join(strArray, ","));
System.out.println("Add Headers: " + query);
query = query.replaceFirst(VALUES_REGEX, questionmarks);
System.out.println("Add questionmarks: " + query);
String[] nextLine;
Connection con = null;
PreparedStatement ps = null;
try {
con = this.connection;
con.setAutoCommit(false);
ps = con.prepareStatement(query);
if (truncateBeforeLoad) {
//delete data from table before loading csv
con.createStatement().execute("DELETE FROM " + tableName);
}
final int batchSize = 1000;
int count = 0;
Date date = null;
while ((nextLine = csvReader.readNext()) != null) {
System.out.println("Next Line: " + Arrays.toString(nextLine));
if (null != nextLine) {
int index = 1;
for (String string : nextLine) {
date = DateUtil.convertToDate(string);
if (null != date) {
ps.setDate(index++, new java.sql.Date(date
.getTime()));
} else {
ps.setString(index++, string);
}
}
ps.addBatch();
}
if (++count % batchSize == 0) {
ps.executeBatch();
}
}
ps.executeBatch(); // insert remaining records
con.commit();
} catch (SQLException | IOException e) {
con.rollback();
e.printStackTrace();
throw new Exception(
"Error occured while loading data from file to database."
+ e.getMessage());
} finally {
if (null != ps) {
ps.close();
}
if (null != con) {
con.close();
}
csvReader.close();
}
}
public char getSeprator() {
return seprator;
}
public void setSeprator(char seprator) {
this.seprator = seprator;
}
public char getQuoteChar() {
return quoteChar;
}
public void setQuoteChar(char quoteChar) {
this.quoteChar = quoteChar;
}
}

Did you try the the following?
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), ',');
I wrote a following program and it works for me, I got the following result:
[20150801] [NULL] [33612] [00083249P PCP602] [JOE SMITH ARNP] [NULL]
[FRANK, LUCAS E] [50004655200]
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import au.com.bytecode.opencsv.CSVReader;
public class CVSTest {
/**
* #param args
*/
public static void main(String[] args) {
CSVReader reader = null;
try {
reader = new CSVReader(new FileReader(
"C:/Work/Dev/Projects/Pure_Test/Test/src/cvs"), ',');
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String[] nextLine;
try {
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println("[" + nextLine[0] + "] [" + nextLine[1]
+ "] [" + nextLine[2] + "] [" + nextLine[3] + "] ["
+ nextLine[4] + "] [" + nextLine[5] + "] ["
+ nextLine[6] + "] [" + nextLine[7] + "]");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

According to the documentation, you can supply custom separator and quote characters in the constructor, which should deal with it:
CSVReader(Reader reader, char separator, char quotechar)
Construct your reader with , as separator and " as quotechar.

It is simple to load your CSV as an SQL table into HSQLDB, then select rows from the table to insert into another database. HSQLDB handles commas inside quotes. You need to define your text source as "quoted". See this:
http://hsqldb.org/doc/2.0/guide/texttables-chapt.html

Your case should be handled out of the box with no special configuration required.
If you can't make it work, then just switch to uniVocity-parsers to do this for you - it's twice as fast in comparison to OpenCSV, requires much less code and is packed with features.
CsvParserSettings settings = new CsvParserSettings(); // you have many configuration options here - check the tutorial.
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new FileReader(new File("C:/Work/Dev/Projects/Pure_Test/Test/src/cvs")));
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Related

Android - Importing multiple .CSV Files to multiple tables in SQLite database

I have in my Android device a folder with multiple .CSV files.
I want to import all of them to my SQLite Database, but each file must be a different table.
All .CSV file are simple. They have just one column.
Example:
File.CSV
12345
123
00000000
AnotherFile.CSV
XXXXX
ZZZZZZZZZZ
FFFF
Here is my method and it is not working. I could not understand why:
#TargetApi(Build.VERSION_CODES.M)
public void importaTabelas() {
//Check the read permission
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) == PackageManager.PERMISSION_GRANTED) {
try {
//Check if the folder exists
File importDir = new File (Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS) + "/ENEL/IMPORTADOS/");
if (!importDir.exists())
{
importDir.mkdirs();
}
//Read all file names
for (File f : importDir.listFiles()) {
if (f.isFile()) {
//Put the files names into variable nomeArq
nomeArq = f.getName();
//Take off the file extension .csv
if (nomeArq.indexOf(".") > 0)
nomeArq = nomeArq.substring(0, nomeArq.lastIndexOf("."));
SQLiteDatabase db = this.banco.getWritableDatabase();
try {
//Create table with the name of the .csv file
String criaTab = "CREATE TABLE IF NOT EXISTS " + nomeArq + " (id integer PRIMARY KEY AUTOINCREMENT, codigo varchar (50))";
db.execSQL(criaTab);
db.close();
} catch (SQLException e) {
e.printStackTrace();
}
//String for the file location
String fn = importDir + "/" + nomeArq + ".csv";
//Reads the file
FileReader fileReader = new FileReader(fn);
BufferedReader buffer = new BufferedReader(fileReader);
//ContentValues contentValues = new ContentValues();
String line = "";
//db.beginTransaction();
while ((line = buffer.readLine()) != null) {
//String[] colums = line.split("\t");
//String[] colums = line.split(";");
Toast.makeText(this, line, Toast.LENGTH_SHORT).show();
//contentValues.put("codigo", line);
//db.insert(nomeArq, null, contentValues);
db.execSQL("INSERT INTO " + nomeArq + " (codigo) VALUES ('" + line + "')");
}
//db.setTransactionSuccessful();
//db.endTransaction();
}
}
} catch (Exception e) {
Toast.makeText(this, "Catch!", Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
}
else {
requestPermissions(new String[]{Manifest.permission.READ_EXTERNAL_STORAGE}, 1);
}
}
Would you help me to make it works? Thanks!
I refactored your code into a bunch of smaller methods. Each method is responsible for one thing, which is good practice, try to do that any time you can.
I only did a few changes:
The line creating the FileReader now uses the file (less error prone)
Changed the way you insert by creating a single insert query so you have less database accesses (ie like this: INSERT INTO table(codigo) VALUES ('XXXX'), ('ZZZZ'), ('FFFF'); )
Changed the text in the Toasts to identify better where your error comes from.
Try it out and see if you can find your error better.
(I did not try to compile the code so you might have to tweak it a little but should be fine overall)
main import method:
public void importaTabelas() {
//Check the read permission
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) == PackageManager.PERMISSION_GRANTED) {
try {
makeDirs();
//Read all file names
for (File f : importDir.listFiles()) {
if (f.isFile()) {
importFile(f);
}
}
} catch (IOException e) {
Toast.makeText(this, "Could not import tables! " + e.getMessage(), Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
} else {
requestPermissions(new String[]{Manifest.permission.READ_EXTERNAL_STORAGE}, 1);
}
}
Make the directories
private void makeDirs() {
//Check if the folder exists
File importDir = new File (Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS) + "/ENEL/IMPORTADOS/");
if (!importDir.exists()) {
importDir.mkdirs();
}
}
Import a single file
private void importFile(File f) {
try {
SQLiteDatabase db = this.banco.getWritableDatabase();
//Put the files names into variable nomeArq
String nomeArq = f.getName();
//Take off the file extension .csv
if (nomeArq.indexOf(".") > 0)
nomeArq = nomeArq.substring(0, nomeArq.lastIndexOf("."));
createTable(db, nomeArq);
String insertQuery = buildImportQuery(f, nomeArq);
db.execSQL(insertQuery);
} catch (SQLException e) {
Toast.makeText(this, "Could not import file. " + e.getMessage(), Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
}
Builds the insert query for a specific file
private String buildImportQuery(File f, String nomeArq) {
StringBuilder sb = new StringBuilder();
try {
//Reads the file
FileReader fileReader = new FileReader(f);
BufferedReader buffer = new BufferedReader(fileReader);
String line;
sb.append("INSERT INTO " + nomeArq + " (codigo) VALUES ");
boolean addComma = false;
while ((line = buffer.readLine()) != null) {
if(line.length() > 0) {
if(addComma) {
sb.append(",");
}
sb.append("('" + line.trim() + "')");
addComma = true;
}
}
sb.append(";");
} catch (IOException e) {
Toast.makeText(this, "Could not write query. " + e.getMessage(), Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
return sb.toString();
}
Creates a single table
private void createTable(SQLiteDatabase db, String tableName) {
try {
//Create table with the name of the .csv file
String criaTab = "CREATE TABLE IF NOT EXISTS " + tableName + " (id integer PRIMARY KEY AUTOINCREMENT, codigo varchar (50))";
db.execSQL(criaTab);
db.close();
} catch (Exception e) {
Toast.makeText(this, "Could not create table " + tableName + "." + e.getMessage(), Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
}

Doesn't insert csv file records in java (jdbc)

I have an error when my CSVLoader file does not insert the data in my database, I get this in my console
FieldName (UploadDownloadFileServlet) = fileName
FileName (UploadDownloadFileServlet) = prueba2.csv
ContentType (UploadDownloadFileServlet) = application/vnd.ms-excel
Size in bytes (UploadDownloadFileServlet) = 47
Absolute Path at server (UploadDownloadFileServlet) = C:\Users\SISTEMAS\workspaceEclipse.metadata.plugins\org.eclipse.wst.server.core\tmp0\wtpwebapps\SAC-sin-impresion\uploadedCsvFiles\prueba2.csv
Query: INSERT INTO MATRICULA(C:UsersSISTEMASworkspaceEclipse.metadata.pluginsorg.eclipse.wst.server.coretmp0wtpwebappsSAC-sin-impresionuploadedCsvFilesprueba2.csv) VALUES(?)
[CSVLoader]: 0 records loaded into MATRICULA DB table
THIS IS MY CODE:
public void loadCSV(InputStream csvFile, String tableName,
boolean truncateBeforeLoad) throws Exception {
CSVReader csvReader = null;
if(null == this.connection) {
throw new Exception("Not a valid connection.");
}
try {
/* Modified by rammar.
*
* I was having issues with the CSVReader using the "\" to escape characters.
* A MySQL CSV file contains quote-enclosed fields and non-quote-enclosed NULL
* values written as "\N". The CSVReader was removing the "\". To detect "\N"
* I must remove the escape character, and the only character you can replace
* it with that you are pretty much guaranteed will not be used to escape
* text is '\0'.
* I read this on:
* http://stackoverflow.com/questions/6008395/opencsv-in-java-ignores-backslash-in-a-field-value
* based on:
* http://sourceforge.net/p/opencsv/support-requests/5/
*/
// PREVIOUS VERSION: csvReader = new CSVReader(new FileReader(csvFile), this.seprator);
csvReader = new CSVReader(new InputStreamReader(csvFile), this.seprator, '"', '\0');
} catch (Exception e) {
e.printStackTrace();
throw new Exception("Error occured while executing file. "
+ e.getMessage());
}
String[] headerRow = csvReader.readNext();
if (null == headerRow) {
throw new FileNotFoundException(
"No columns defined in given CSV file." +
"Please check the CSV file format.");
}
String questionmarks = StringUtils.repeat("?,", headerRow.length);
questionmarks = (String) questionmarks.subSequence(0, questionmarks
.length() - 1);
/* NOTE from Ron: Header column names must match SQL table fields */
String query = SQL_INSERT.replaceFirst(TABLE_REGEX, tableName);
query = query
.replaceFirst(KEYS_REGEX, StringUtils.join(headerRow, ","));
query = query.replaceFirst(VALUES_REGEX, questionmarks);
System.out.println("Query: " + query); // Modified by rammar to suppress output
String[] nextLine;
Connection con = null;
PreparedStatement ps = null;
try {
con = this.connection;
con.setAutoCommit(false);
ps = con.prepareStatement(query);
if(truncateBeforeLoad) {
//delete data from table before loading csv
con.createStatement().execute("DELETE FROM " + tableName);
}
final int batchSize = 1000;
int count = 0;
Date date = null;
while ((nextLine = csvReader.readNext()) != null) {
if (null != nextLine) {
int index = 1;
for (String string : nextLine) {
date = DateUtil.convertToDate(string);
if (null != date) {
ps.setDate(index++, new java.sql.Date(date
.getTime()));
} else {
/* Section modified by rammar to allow NULL values
* to be input into the DB. */
if (string.length() > 0 && !string.equals("\\N")) {
ps.setString(index++, string);
} else {
ps.setNull(index++, Types.VARCHAR);
//ps.setString(index++, null); // can use this syntax also - not sure which is better
}
}
}
ps.addBatch();
}
if (++count % batchSize == 0) {
ps.executeBatch();
}
}
ps.executeBatch(); // insert remaining records
System.out.println("[" + this.getClass().getSimpleName() + "]: " +
count + " records loaded into " + tableName + " DB table");
con.commit();
} catch (Exception e) {
con.rollback();
e.printStackTrace();
throw new Exception(
"Error occured while loading data from file to database."
+ e.getMessage());
} finally {
/*if (null != ps)
ps.close();
*/
/*if (null != con)
con.close();*/
csvReader.close();
}
}

OpenCSV + JMS/MDB behavior + performance issue

I have an web application, that runs under Glassfish 4.1, that contains a couple of features that require JMS/MDB.
In particular I am having problems regarding the generation of a report using JMS/MDB, that is, obtain data from a table and dump them in a file.
This is what happens, i have a JMS/MDB message that does a couple tasks in an Oracle database and after having the final result in a table, i would like to obtain a csv report from that table (which usually is 30M+ records).
So while in JMS/MDB this is what happens to generate the report:
public boolean handleReportContent() {
Connection conn = null;
try {
System.out.println("Handling report content... " + new Date());
conn = DriverManager.getConnection(data.getUrl(), data.getUsername(), data.getPassword());
int reportLine = 1;
String sql = "SELECT FIELD_NAME, VALUE_A, VALUE_B, DIFFERENCE FROM " + data.getDbTableName() + " WHERE SET_PK IN ( SELECT DISTINCT SET_PK FROM " + data.getDbTableName() + " WHERE IS_VALID=? )";
PreparedStatement ps = conn.prepareStatement(sql);
ps.setBoolean(1, false);
ResultSet rs = ps.executeQuery();
List<ReportLine> lst = new ArrayList<>();
int columns = data.getLstFormats().size();
int size = 0;
int linesDone = 0;
while (rs.next()) {
ReportLine rl = new ReportLine(reportLine, rs.getString("FIELD_NAME"), rs.getString("VALUE_A"), rs.getString("VALUE_B"), rs.getString("DIFFERENCE"));
lst.add(rl);
linesDone = columns * (reportLine - 1);
size++;
if ((size - linesDone) == columns) {
reportLine++;
if (lst.size() > 4000) {
appendReportContentNew(lst);
lst.clear();
}
}
}
if (lst.size() > 0) {
appendReportContentNew(lst);
lst.clear();
}
ps.close();
conn.close();
return true;
} catch (Exception e) {
System.out.println("exception handling report content new: " + e.toString());
return false;
}
This is working, i am aware it is slow and inneficient and most likely there is a better option to perform the same operation.
What this method does is:
collect the data from the ResultSet;
dump it in a List;
for each 4K objects will call the method appendReportContentNew()
dump the data in the List for the file
public void appendReportContentNew(List<ReportLine> lst) {
File f = new File(data.getJobFilenamePath());
try {
if (!f.exists()) {
f.createNewFile();
}
FileWriter fw = new FileWriter(data.getJobFilenamePath(), true);
BufferedWriter bw = new BufferedWriter(fw);
for (ReportLine rl : lst) {
String rID = "R" + rl.getLine();
String fieldName = rl.getFieldName();
String rline = rID + "," + fieldName + "," + rl.getValue1() + "," + rl.getValue2() + "," + rl.getDifference();
bw.append(rline);
bw.append("\n");
}
bw.close();
} catch (IOException e) {
System.out.println("exception appending report content: " + e.toString());
}
}
With this method, in 20 minutes, it wrote 800k lines (30Mb file) it usually goes to 4Gb or more. This is what i want to improve, if possible.
So i decided to try OpenCSV, and i got the following method:
public boolean handleReportContentv2() {
Connection conn = null;
try {
FileWriter fw = new FileWriter(data.getJobFilenamePath(), true);
System.out.println("Handling report content v2... " + new Date());
conn = DriverManager.getConnection(data.getUrl(), data.getUsername(), data.getPassword());
String sql = "SELECT NLINE, FIELD_NAME, VALUE_A, VALUE_B, DIFFERENCE FROM " + data.getDbTableName() + " WHERE SET_PK IN ( SELECT DISTINCT SET_PK FROM " + data.getDbTableName() + " WHERE IS_VALID=? )";
PreparedStatement ps = conn.prepareStatement(sql);
ps.setBoolean(1, false);
ps.setFetchSize(500);
ResultSet rs = ps.executeQuery();
BufferedWriter out = new BufferedWriter(fw);
CSVWriter writer = new CSVWriter(out, ',', CSVWriter.NO_QUOTE_CHARACTER);
writer.writeAll(rs, false);
fw.close();
writer.close();
rs.close();
ps.close();
conn.close();
return true;
} catch (Exception e) {
System.out.println("exception handling report content v2: " + e.toString());
return false;
}
}
So I am collecting all the data from the ResultSet, and dumping in the CSVWriter. This operation for the same 20 minutes, only wrote 7k lines.
But the same method, if I use it outside the JMS/MDB, it has an incredible difference, just for the first 4 minutes it wrote 3M rows in the file.
For the same 20 minutes, it generated a file of 500Mb+.
Clearly using OpenCSV is by far the best option if i want to improve the performance, my question is why it doesn't perform the same way inside the JMS/MDB?
If it is not possible is there any possible solution to improve the same task by any other way?
I appreciate the feedback and help on this matter, i am trying to understand the reason why the behavior/performance is different in/out of the JMS/MDB.
**
EDIT:
**
#MessageDriven(activationConfig = {
#ActivationConfigProperty(propertyName = "destinationType", propertyValue = "javax.jms.Queue"),
#ActivationConfigProperty(propertyName = "destinationLookup", propertyValue = "MessageQueue")})
public class JobProcessorBean implements MessageListener {
private static final int TYPE_A_ID = 0;
private static final int TYPE_B_ID = 1;
#Inject
JobDao jobsDao;
#Inject
private AsyncReport generator;
public JobProcessorBean() {
}
#Override
public void onMessage(Message message) {
int jobId = -1;
ObjectMessage msg = (ObjectMessage) message;
try {
boolean valid = true;
JobWrapper jobw = (JobWrapper) msg.getObject();
jobId = jobw.getJob().getJobId().intValue();
switch (jobw.getJob().getJobTypeId().getJobTypeId().intValue()) {
case TYPE_A_ID:
jobsDao.updateJobStatus(jobId, 0);
valid = processTask1(jobw);
if(valid) {
jobsDao.updateJobFileName(jobId, generator.getData().getJobFilename());
System.out.println(":: :: JOBW FileName :: "+generator.getData().getJobFilename());
jobsDao.updateJobStatus(jobId, 0);
}
else {
System.out.println("error...");
jobsDao.updateJobStatus(jobId, 1);
}
**boolean validfile = handleReportContentv2();**
if(!validfile) {
System.out.println("error file...");
jobsDao.updateJobStatus(jobId, 1);
}
break;
case TYPE_B_ID:
(...)
}
if(valid) {
jobsDao.updateJobStatus(jobw.getJob().getJobId().intValue(), 2); //updated to complete
}
System.out.println("***********---------Finished JOB " + jobId + "-----------****************");
System.out.println();
jobw = null;
} catch (JMSException ex) {
Logger.getLogger(JobProcessorBean.class.getName()).log(Level.SEVERE, null, ex);
jobsDao.updateJobStatus(jobId, 1);
} catch (Exception ex) {
Logger.getLogger(JobProcessorBean.class.getName()).log(Level.SEVERE, null, ex);
jobsDao.updateJobStatus(jobId, 1);
} finally {
msg = null;
}
}
private boolean processTask1(JobWrapper jobw) throws Exception {
boolean valid = true;
jobsDao.updateJobStatus(jobw.getJob().getJobId().intValue(), 0);
generator.setData(jobw.getData());
valid = generator.deployGenerator();
if(!valid) return false;
jobsDao.updateJobParameters(jobw.getJob().getJobId().intValue(),new ReportContent());
Logger.getLogger(JobProcessorBean.class.getName()).log(Level.INFO, null, "Job Finished");
return true;
}
So if the same method, handleReportContent() is executed inside the generator.deployGenerator() is has those slow results. If I wait for everything inside that method and make the file in this bean JobProcessorBean is way more fast. I am just trying to figure out why/how the behavior works to performs like this.
Adding the #TransactionAttribute(NOT_SUPPORTED) annotation on the bean might solve the problem (and it did, as your comment indicates).
Why is this so? Because if you don't put any transactional annotation on a message-driven bean, the default becomes #TransactionAttribute(REQUIRED) (so everything the bean does, is supervised by a transaction manager). Apparently, this slows things down.

Update MySQL table using data from a text file through Java

I have a text file with four lines, each line contains comma separated values like below file
My file is:
Raj,raj34#myown.com,123455
kumar,kumar#myown.com,23453
shilpa,shilpa#myown.com,765468
suraj,suraj#myown.com,876567
and I have a MySQL table which contains four fields
firstname lastname email phno
---------- ---------- --------- --------
Raj babu raj34#hisown.com 2343245
kumar selva kumar#myown.com 23453
shilpa murali shilpa#myown.com 765468
suraj abd suraj#myown.com 876567
Now I want to update my table using the data in the above text file through Java.
I have tried using bufferedReader to read from the file and used split method using comma as delimiter and stored it in array. But it is not working. Any help appreciated.
This is what I have tried so far
void readingFile()
{
try
{
File f1 = new File("TestFile.txt");
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
String strln = null;
strln = br.readLine();
while((strln=br.readLine())!=null)
{
// System.out.println(strln);
arr = strln.split(",");
strfirstname = arr[0];
strlastname = arr[1];
stremail = arr[2];
strphno = arr[3];
System.out.println(strfirstname + " " + strlastname + " " + stremail +" "+ strphno);
}
// for(String i : arr)
// {
// }
br.close();
fr.close();
}
catch(IOException e)
{
System.out.println("Cannot read from File." + e);
}
try
{
st = conn.createStatement();
String query = "update sampledb set email = stremail,phno =strphno where firstname = strfirstname ";
st.executeUpdate(query);
st.close();
System.out.println("sampledb Table successfully updated.");
}
catch(Exception e3)
{
System.out.println("Unable to Update sampledb table. " + e3);
}
}
and the output i got is:
Ganesh Pandiyan ganesh1#myown.com 9591982389
Dass Jeyan jeyandas#myown.com 9689523645
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
Gowtham Selvan gowthams#myown.com 9894189423
at TemporaryPackages.FileReadAndUpdateTable.readingFile(FileReadAndUpdateTable.java:35)
at TemporaryPackages.FileReadAndUpdateTable.main(FileReadAndUpdateTable.java:72)
Java Result: 1
#varadaraj:
This is the code of yours....
String stremail,strphno,strfirstname,strlastname;
// String[] arr;
Connection conn;
Statement st;
void readingFile()
{
try {
BufferedReader bReader= new BufferedReader(new FileReader("TestFile.txt"));
String fileValues;
while ((fileValues = bReader.readLine()) != null)
{
String[] values=fileValues .split(",");
strfirstname = values[0];
// strlastname = values[1];
stremail = values[1];
strphno = values[2];
System.out.println(strfirstname + " " + strlastname + " " + stremail +" "+ strphno);
}
bReader.close();
} catch (IOException e) {
System.out.println("File Read Error");
}
// for(String i : arr)
// {
// }
try
{
st = conn.createStatement();
String query = "update sampledb set email = stremail,phno =strphno where firstname = strfirstname ";
st.executeUpdate(query);
st.close();
System.out.println("sampledb Table successfully updated.");
}
catch(Exception e3)
{
System.out.println("Unable to Update sampledb table. " + e3);
}
}
What you are having looks like a CSV file, you may consider libraries like Super CSV to help you in reading and parsing the file.
you are getting ArrayIndexOutOfBoundException upon trying to access at index 1 , i.e at lastname field value, so check whether you have no data at index 1 for any of the list elements in your text file
try this
public class FileReaderTesting {
static String stremail;
static String strphno;
static String strfirstname;
static String strlastname;
static Connection conn;
static Statement st;
public static void main(String[] args) {
try {
BufferedReader bReader= new BufferedReader(new FileReader("C:\\fileName.txt"));
String fileValues;
while ((fileValues = bReader.readLine()) != null)
{
String[] values=fileValues .split(",");
strfirstname = values[0];
// strlastname = values[1];
stremail = values[1];
strphno = values[2];
System.out.println(strfirstname + " " + stremail +" "+ strphno);
st = conn.createStatement();
String query = "update sampledb set email = '"+stremail+"',pno = '"+strphno+"' where firstname = '"+strfirstname+"' ";
System.out.println(query);
st.executeUpdate(query);
st.close();
System.out.println("sampledb Table successfully updated.");
}
bReader.close();
} catch (IOException e) {
System.out.println("File Read Error");
}
catch(Exception e3)
{
System.out.println("Unable to Update sampledb table. " + e3);
}
}
}

How to improve the speed of this code?

I'm trying to import all googlebooks-1gram files into a postgresql database. I wrote the following Java code for that:
public class ToPostgres {
public static void main(String[] args) throws Exception {
String filePath = "./";
List<String> files = new ArrayList<String>();
for (int i =0; i < 10; i++) {
files.add(filePath+"googlebooks-eng-all-1gram-20090715-"+i+".csv");
}
Connection c = null;
try {
c = DriverManager.getConnection("jdbc:postgresql://localhost/googlebooks",
"postgres", "xxxxxx");
} catch (SQLException e) {
e.printStackTrace();
}
if (c != null) {
try {
PreparedStatement wordInsert = c.prepareStatement(
"INSERT INTO words (word) VALUES (?)", Statement.RETURN_GENERATED_KEYS
);
PreparedStatement countInsert = c.prepareStatement(
"INSERT INTO wordcounts (word_id, \"year\", total_count, total_pages, total_books) " +
"VALUES (?,?,?,?,?)"
);
String lastWord = "";
Long lastId = -1L;
for (String filename: files) {
BufferedReader input = new BufferedReader(new FileReader(new File(filename)));
String line = "";
while ((line = input.readLine()) != null) {
String[] data = line.split("\t");
Long id = -1L;
if (lastWord.equals(data[0])) {
id = lastId;
} else {
wordInsert.setString(1, data[0]);
wordInsert.executeUpdate();
ResultSet resultSet = wordInsert.getGeneratedKeys();
if (resultSet != null && resultSet.next())
{
id = resultSet.getLong(1);
}
}
countInsert.setLong(1, id);
countInsert.setInt(2, Integer.parseInt(data[1]));
countInsert.setInt(3, Integer.parseInt(data[2]));
countInsert.setInt(4, Integer.parseInt(data[3]));
countInsert.setInt(5, Integer.parseInt(data[4]));
countInsert.executeUpdate();
lastWord = data[0];
lastId = id;
}
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
However, when running this for ~3 hours it only placed 1.000.000 entries in the wordcounts table. When I check the amount of lines in the entire 1gram dataset it's 500.000.000 lines. So to import everything would take about 62.5 days, I can accept that it imports in about a week, but 2 months? I think I'm doing something seriously wrong here(I do have a server that runs 24/7, so I can actually run it for this long, but faster would be nice XD)
EDIT: This code is how I solved it:
public class ToPostgres {
public static void main(String[] args) throws Exception {
String filePath = "./";
List<String> files = new ArrayList<String>();
for (int i =0; i < 10; i++) {
files.add(filePath+"googlebooks-eng-all-1gram-20090715-"+i+".csv");
}
Connection c = null;
try {
c = DriverManager.getConnection("jdbc:postgresql://localhost/googlebooks",
"postgres", "xxxxxx");
} catch (SQLException e) {
e.printStackTrace();
}
if (c != null) {
c.setAutoCommit(false);
try {
PreparedStatement wordInsert = c.prepareStatement(
"INSERT INTO words (id, word) VALUES (?,?)"
);
PreparedStatement countInsert = c.prepareStatement(
"INSERT INTO wordcounts (word_id, \"year\", total_count, total_pages, total_books) " +
"VALUES (?,?,?,?,?)"
);
String lastWord = "";
Long id = 0L;
for (String filename: files) {
BufferedReader input = new BufferedReader(new FileReader(new File(filename)));
String line = "";
int i = 0;
while ((line = input.readLine()) != null) {
String[] data = line.split("\t");
if (!lastWord.equals(data[0])) {
id++;
wordInsert.setLong(1, id);
wordInsert.setString(2, data[0]);
wordInsert.executeUpdate();
}
countInsert.setLong(1, id);
countInsert.setInt(2, Integer.parseInt(data[1]));
countInsert.setInt(3, Integer.parseInt(data[2]));
countInsert.setInt(4, Integer.parseInt(data[3]));
countInsert.setInt(5, Integer.parseInt(data[4]));
countInsert.executeUpdate();
lastWord = data[0];
if (i % 10000 == 0) {
c.commit();
}
if (i % 100000 == 0) {
System.out.println(i+" mark file "+filename);
}
i++;
}
c.commit();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
I reached 1.5 million rows in about 15 minutes now. That's fast enough for me, thanks all!
JDBC connections have autocommit enabled by default, which carries a per-statement overhead. Try disabling it:
c.setAutoCommit(false)
then commit in batches, something along the lines of:
long ops = 0;
for(String filename : files) {
// ...
while ((line = input.readLine()) != null) {
// insert some stuff...
ops ++;
if(ops % 1000 == 0) {
c.commit();
}
}
}
c.commit();
If your table has indexes, it might be faster to delete them, insert the data, and recreate the indexes later.
Setting autocommit off, and doing a manual commit every 10 000 records or so (look into the documentation for a reasonable value - there is some limit) could speed up as well.
Generating the index/foreign key yourself, and keeping track of it should be faster than wordInsert.getGeneratedKeys(); but I'm not sure, whether it is possible from your content.
There is an approach called 'bulk insert'. I don't remember the details, but its a starting point for a search.
Write it to do threading, running 4 threads at the same time, or split it up in sections (read from config file) and distribute it to X machines and have them get the data togeather.
Use batch statements to execute multiple inserts at the same time, rather than one INSERT at a time.
In addition I would remove the part of your algorithm which updates the word count after each insert into the words table, instead just calculate all of the word counts once inserting the words is complete.
Another approach would be to do bulk inserts rather than single inserts. See this question Whats the fastest way to do a bulk insert into Postgres? for more information.
Create threads
String lastWord = "";
Long lastId = -1L;
PreparedStatement wordInsert;
PreparedStatement countInsert ;
public class ToPostgres {
public void main(String[] args) throws Exception {
String filePath = "./";
List<String> files = new ArrayList<String>();
for (int i =0; i < 10; i++) {
files.add(filePath+"googlebooks-eng-all-1gram-20090715-"+i+".csv");
}
Connection c = null;
try {
c = DriverManager.getConnection("jdbc:postgresql://localhost/googlebooks",
"postgres", "xxxxxx");
} catch (SQLException e) {
e.printStackTrace();
}
if (c != null) {
try {
wordInsert = c.prepareStatement(
"INSERT INTO words (word) VALUES (?)", Statement.RETURN_GENERATED_KEYS
);
countInsert = c.prepareStatement(
"INSERT INTO wordcounts (word_id, \"year\", total_count, total_pages, total_books) " +
"VALUES (?,?,?,?,?)"
);
for (String filename: files) {
new MyThread(filename). start();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
class MyThread extends Thread{
String file;
public MyThread(String file) {
this.file = file;
}
#Override
public void run() {
try {
super.run();
BufferedReader input = new BufferedReader(new FileReader(new File(file)));
String line = "";
while ((line = input.readLine()) != null) {
String[] data = line.split("\t");
Long id = -1L;
if (lastWord.equals(data[0])) {
id = lastId;
} else {
wordInsert.setString(1, data[0]);
wordInsert.executeUpdate();
ResultSet resultSet = wordInsert.getGeneratedKeys();
if (resultSet != null && resultSet.next())
{
id = resultSet.getLong(1);
}
}
countInsert.setLong(1, id);
countInsert.setInt(2, Integer.parseInt(data[1]));
countInsert.setInt(3, Integer.parseInt(data[2]));
countInsert.setInt(4, Integer.parseInt(data[3]));
countInsert.setInt(5, Integer.parseInt(data[4]));
countInsert.executeUpdate();
lastWord = data[0];
lastId = id;
}
} catch (NumberFormatException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}

Categories

Resources