Parse large text files and move the data into a database - java

I have a quite big text file around 1.5Gb. I have to parse the file line by line and insert the lines into a Derby database. I read a lot of forum regarding the performance and how to parse the file etc. My issue is that i benchmarked all my processes and it would take to read and parse a line for 1ms, how ever i have to make sure that the line I'm trying to insert is not exists if it is then i have to make some update on it. This part of the process is taking around 9ms.
In total that 10 ms which is really much regarding that the file contains around 10 million rows.
I'm using PreparedStatement for the querys.
Is there any way i can speed up the Query part of my code?

Did you turn of Autocommit ?
dbConnection.setAutoCommit(false);
Use batch insert instead of one by one like here:
Connection dbConnection = null;
PreparedStatement preparedStatement = null;
String insertTableSQL = "INSERT INTO DBUSER"
+ "(USER_ID, USERNAME, CREATED_BY, CREATED_DATE) VALUES"
+ "(?,?,?,?)";
try {
dbConnection = getDBConnection();
preparedStatement = dbConnection.prepareStatement(insertTableSQL);
dbConnection.setAutoCommit(false);
preparedStatement.setInt(1, 101);
preparedStatement.setString(2, "mkyong101");
preparedStatement.setString(3, "system");
preparedStatement.setTimestamp(4, getCurrentTimeStamp());
preparedStatement.addBatch();
preparedStatement.setInt(1, 102);
preparedStatement.setString(2, "mkyong102");
preparedStatement.setString(3, "system");
preparedStatement.setTimestamp(4, getCurrentTimeStamp());
preparedStatement.addBatch();
preparedStatement.setInt(1, 103);
preparedStatement.setString(2, "mkyong103");
preparedStatement.setString(3, "system");
preparedStatement.setTimestamp(4, getCurrentTimeStamp());
preparedStatement.addBatch();
preparedStatement.executeBatch();
dbConnection.commit();
System.out.println("Record is inserted into DBUSER table!");
} catch (SQLException e) {
System.out.println(e.getMessage());
dbConnection.rollback();
} finally {
if (preparedStatement != null) {
preparedStatement.close();
}
if (dbConnection != null) {
dbConnection.close();
}
}
Have a look in : https://builds.apache.org/job/Derby-docs/lastSuccessfulBuild/artifact/trunk/out/tuning/tuningderby.pdf

Since you are already using SQLiteStatement the only other thing I can think of is to make sure you are using BufferedInputStream / BufferedOutputStream on your i/o operations.
edit
my bad, this answer is for android development

Related

batch preparedstatement with different sql queries

I found existing questions similar to this one that did not actually have a clear answer to the question.
A normal batch preparedstatement with one sql query would look something like this:
private static void batchInsertRecordsIntoTable() throws SQLException {
Connection dbConnection = null;
PreparedStatement preparedStatement = null;
String insertTableSQL = "INSERT INTO DBUSER"
+ "(USER_ID, USERNAME, CREATED_BY, CREATED_DATE) VALUES"
+ "(?,?,?,?)";
try {
dbConnection = getDBConnection();
preparedStatement = dbConnection.prepareStatement(insertTableSQL);
dbConnection.setAutoCommit(false);
preparedStatement.setInt(1, 101);
preparedStatement.setString(2, "mkyong101");
preparedStatement.setString(3, "system");
preparedStatement.setTimestamp(4, getCurrentTimeStamp());
preparedStatement.addBatch();
preparedStatement.setInt(1, 102);
preparedStatement.setString(2, "mkyong102");
preparedStatement.setString(3, "system");
preparedStatement.setTimestamp(4, getCurrentTimeStamp());
preparedStatement.addBatch();
preparedStatement.setInt(1, 103);
preparedStatement.setString(2, "mkyong103");
preparedStatement.setString(3, "system");
preparedStatement.setTimestamp(4, getCurrentTimeStamp());
preparedStatement.addBatch();
preparedStatement.executeBatch();
dbConnection.commit();
System.out.println("Record is inserted into DBUSER table!");
} catch (SQLException e) {
System.out.println(e.getMessage());
dbConnection.rollback();
} finally {
if (preparedStatement != null) {
preparedStatement.close();
}
if (dbConnection != null) {
dbConnection.close();
}
}
}
Taken from: http://www.mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/
However, I'm looking for a way to perform batch transactions on different sql queries. i.e. INSERT INTO TABLE A and INSERT INTO TABLE B without the risk of SQL Injection attacks. I know that preparedstatements are the preferred method of avoiding such attacks but I don't know of a way to do batch transactions on differentiating SQL queries?
For two (2) different SQL queries you will need two (2) different PreparedStatement objects and each one will have its own batch, but you can simply execute each batch when you want to send the queries to the server:
try (
PreparedStatement thisPs = conn.prepareStatement("INSERT INTO thisTable (thisId, thisText) VALUES (?,?)");
PreparedStatement thatPs = conn.prepareStatement("INSERT INTO thatTable (thatId, thatText) VALUES (?,?)")) {
thisPs.setInt(1, 1);
thisPs.setString(2, "thisText1");
thisPs.addBatch();
thatPs.setInt(1, 1);
thatPs.setString(2, "thatText1");
thatPs.addBatch();
thisPs.setInt(1, 2);
thisPs.setString(2, "thisText2");
thisPs.addBatch();
thatPs.setInt(1, 2);
thatPs.setString(2, "thatText2");
thatPs.addBatch();
thisPs.executeBatch();
thatPs.executeBatch();
}
Also, be aware of terminology. Talking about a "batch transaction" is somewhat ambiguous:
addBatch and executeBatch are part of the mechanism to send multiple statements to the server as a single batch (transmission). This affects the way the statements are sent (transmitted) to the database server.
A database transaction is the mechanism whereby a number of statements will be processed as a complete group, i.e., either the whole group will be processed ("committed") or the whole group will be discarded ("rolled back"). The Connection#setAutoCommit(), Connection#commit(), and Connection#rollback() methods control this behaviour. This affects the way the statements are executed by the database server.

Verify if the register already exeist

I don't have much experience in using JAVA with SQL Server or any other database, so I'm having some trouble at the moment.
I have the following code:
public void insertProjeto(Planejado p){
String verifica="SELECT cd_projeto FROM PROJETO WHERE cd_projeto = ?";
String sqlInsert="INSERT INTO PROJETO (cd_projeto, ds_projeto) VALUES (?, ?)";
String projeto = p.getProjeto();
String nomeProjeto = p.getNomeProj();
PreparedStatement stmt;
try {
stmt = getDBConnection().prepareStatement(verifica);
stmt.setString(1, projeto);
ResultSet rs = stmt.executeQuery();
if (rs.equals("") || rs.equals(null)) {
System.out.println("------------------");
stmt = getDBConnection().prepareStatement(sqlInsert);
stmt.setString(1, projeto);
stmt.setString(2, nomeProjeto);
stmt.executeUpdate();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
My goal is insert a register without duplicates, but for some reason my "if" isn't working.
Can anybody help me find out why?
Thanks in advance
When obtaining a ResultSet, you must call its next() method to have it progress to the first row of data if any. In case there is no data in the ResultSet object, rs.next() will return false.
There are better ways to prevent duplicates on SQL tables depends on what SQL server you're using (MS SQL, MySQL etc.)
If you're using MySQL, you can make cd_projeto a primary key and call REPLACE INTO instead of INSERT INTO, it will result in updating the record for cases it exist and inserting a new one when it doesn't.
Solved.
try {
stmt = getDBConnection().prepareStatement(verifica);
stmt.setString(1, projeto);
ResultSet rs = stmt.executeQuery();
while(!rs.next()) {
System.out.println("------------------");
stmt = getDBConnection().prepareStatement(sqlInsert);
stmt.setString(1, projeto);
stmt.setString(2, nomeProjeto);
stmt.executeUpdate();
break;
}
} catch (SQLException e) {
e.printStackTrace();
}
Don't know if it's the best way to do it, but works.
Thank you for the tips

Insertion in Excel failed

I tried inserting data into excel sheet using Java and ODBC, my URL is correct, the query to insert the data is executing, but the values are not inserted into excel sheet, i have made connection to commit and close. kindly help!
try {
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
con = DriverManager.getConnection("jdbc:odbc:excelDsn;readonly=false;");
ps = con.prepareStatement("insert into [Sheet1$](FirstName, LastName) values (?,?)");
ps.setString(1, "AA");
ps.setString(2, "BB");
ps.execute();
con.commit();
con.close();
} catch (Exception e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
for DDL statements like insert,delete and update use ps.executeUpdate();
As your case is to insert so replace ps.execute(); with ps.executeUpdate();

Adding to table using Jdbc & ms managment system

I bumped into this problem and i cannot figure out what is wrong with this code. I use jdbc and ms managment system for the databse and its connection.
code:
try {
//create user
preparedStatement = conn.prepareStatement("INSERT INTO Users(name, pass, type) VALUES (nick=?,pass=?,type=?)",
ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
preparedStatement.setString(1, user.getNickName());
preparedStatement.setString(2, user.getPassword());
preparedStatement.setInt(3, type);
rs = preparedStatement.executeQuery();
System.out.println(rs.toString());
catch (Exception e) {
System.out.println("Exception: " + e);
}
error:
Exception: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '='.
The way you are using the ? characters is invalid in JDBC:
"INSERT INTO Users(name, pass, type) VALUES (nick=?,pass=?,type=?)
One ? represents the whole bind variable. Try
"INSERT INTO Users(name, pass, type) VALUES (?, ?, ?)"
Also, use executeUpdate to execute an insert statement (or update, or delete).
Remove the field names from the value list. These are already in the name list. Also use executeUpdate for database write operations:
preparedStatement =
conn.prepareStatement("INSERT INTO Users(name, pass, type) VALUES (?,?,?)",
ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
preparedStatement.setString(1, user.getNickName());
preparedStatement.setString(2, user.getPassword());
preparedStatement.setInt(3, type);
int rowCount = preparedStatement.executeUpdate();

Consecutive PreparedStatement good practice

I'm executing a few SELECTs in a row and I'm wondering how I should handle the PreparedStatements.
Example code:
//Connection conn is already declared
PreparedStatement pstmt = null;
ResultSet rset = null;
try {
String sql = "SELECT ...";
pstmt = conn.prepareStatement(sql);
pstmt.setString(1, someVar);
rset = pstmt.executeQuery();
// Use ResultSet
// A different query
sql = "SELECT ...";
pstmt = conn.prepareStatement(sql);
pstmt.setString(1, someVar);
rset = pstmt.executeQuery();
// Use ResultSet
} catch (SQLException e) {
// Handle
} finally {
if (rset != null)
rset.close();
if (pstmt != null)
pstmt.close();
if (conn != null)
conn.close();
}
Now the question is, would it be better to close the PreparedStatements after each usage/use different statements or would it make absolutely no difference?
I've found some information about reusing a PreparedStatement that always has the same query but I'm not sure about using different queries.
You're not using the same PreparedStatement, the factory method Connection.prepareStatement is returning you a new instance each time you call it. PreparedStatement.executeQuery is doing the same with ResultSet. You are just using the same variables.
This means you're leaking resources - the first PreparedStatement and ResultSet - every time this method is called, which are never being closed.
My recommendation would be to use Spring's JdbcTemplate which will handle these database resources correctly for you and you break your code into two methods.

Categories

Resources