Optimizing database inserts java - java

I am relatively new to java and database and therefore asking your help for my code optimization. I have around 20 text files with comma separated values.Each text files has around 10000 lines Based on the the 3rd value in each line, I insert the data into different tables. Each time I check the 3rd value and use different methods to save this data. My code is as follows. Could someone please tell me if this is the proper way to do this operation.
Thanks in advance.
public void readSave() throws SQLException
{
File dir = new File("C:\\Users\\log");
String url = Config.DB_URL;
String user= Config.DB_USERNAME;
String password= Config.DB_PASSWORD;
con= DriverManager.getConnection(url, user, password);
con.setAutoCommit(false);
String currentLine;
if (!dir.isDirectory())
throw new IllegalStateException();
for (File file : dir.listFiles()) {
BufferedReader br;
try {
br = new BufferedReader(new FileReader(file));
while ((currentLine = br.readLine()) != null) {
List<String> values = Arrays.asList(currentLine.split(","));
if (values.get(2).contentEquals("0051"))
save0051(values,con);
else if(values.get(2).contentEquals("0049"))
save0049(values,con);
else if(values.get(2).contentEquals("0021"))
save0021(values,con);
else if(values.get(2).contentEquals("0089"))
save0089(values,con);
if(statement!=null)
statement.executeBatch();
}
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
con.commit();
statement.close();
con.close();
}
catch (Exception e) {}
}
private void save0051(List<String> values, Connection connection) throws SQLException {
// TODO Auto-generated method stub
String WRITE_DATA = "INSERT INTO LOCATION_DATA"
+ "(loc_id, timestamp, message_id" +
) VALUES (?,?,?)";
try {
statement = connection.prepareStatement(WRITE_DATA);
statement.setString(1, values.get(0));
statement.setLong(2, Long.valueOf(values.get(1)));
statement.setInt(3, Integer.valueOf(values.get(2)));
statement.addBatch();
} catch (SQLException e) {
e.printStackTrace();
System.out.println("Could not save to DB, error: " + e.getMessage());
}
return;
}

Don't create the database connection in the loop. This is an expensive operation and you should create it only once.
Don't create the PreparedStatement in the loop. Create it once and reuse it.
Don't commit after every single INSERT. Read about using batches for inserting. This reduces the "commit-overhead" dramatically if you only make a commit every let's say 200 INSERTs.

If this is going to be performance critical I'd suggest a few changes.
Move the connection creation out of the loop, you don't want to be doing that thousands of times.
Since each function is repeatedly making the same query, you can cache the PreparedStatements, and repeatedly execute them rather than recreating them with each query. This way the database will only need to optimize the query once, and each query will only transmit the data for the query as opposed to the entire query and the data.

Just spotted that batch insert was already mentioned but here is a nice tutorial page I came across, I think he explains it quite well

Use JDBC Batch INSERT,executeBatch() is faster as insert is made in one shot as a list.
see
http://javarevisited.blogspot.com/2013/01/jdbc-batch-insert-and-update-example-java-prepared-statement.html
Efficient way to do batch INSERTS with JDBC
http://www.java2s.com/Code/Java/Database-SQL-JDBC/BatchUpdateInsert.htm

Related

When `GC` will be triggered when an `OutOfMemoryException` is caught?

I am having an java package, which connects with a database and fetches some data. At some rare case, I am getting heap memory exception, since the fetched query data size is exceeding the java heap space. Increasing the java heap space is not something the business can think for now.
Other option is to catch the exception and continue the flow with stopping the execution. ( I know catching OOME is not a good idea but here only me local variables are getting affected). My code is below:
private boolean stepCollectCustomerData() {
try {
boolean biResult = generateMetricCSV();
} catch (OutOfMemoryError e) {
log.error("OutOfMemoryError while collecting data ");
log.error(e.getMessage());
return false;
}
return true;
}
private boolean generateMetricCSV(){
// Executing the PAC & BI cluster SQL queries.
try (Connection connection = DriverManager.getConnection("connectionURL", "username", "password")) {
connection.setAutoCommit(false);
for (RedshiftQueryDefinition redshiftQueryDefinition: redshiftQueryDefinitions){
File csvFile = new File(dsarConfig.getDsarHomeDirectory() + dsarEntryId, redshiftQueryDefinition.getCsvFileName());
log.info("Running the query for metric: " + redshiftQueryDefinition.getMetricName());
try( PreparedStatement preparedStatement = createPreparedStatement(connection,
redshiftQueryDefinition.getSqlQuery(), redshiftQueryDefinition.getArgumentsList());
ResultSet resultSet = preparedStatement.executeQuery();
CSVWriter writer = new CSVWriter(new FileWriter(csvFile));) {
if (resultSet.next()) {
resultSet.beforeFirst();
log.info("Writing the data to CSV file.");
writer.writeAll(resultSet, true);
log.info("Metric written to csv file: " + csvFile.getAbsolutePath());
filesToZip.put(redshiftQueryDefinition.getCsvFileName(), csvFile);
} else {
log.info("There is no data for the metric " + redshiftQueryDefinition.getCsvFileName());
}
} catch (SQLException | IOException e) {
log.error("Exception while generating the CSV file: " + e);
e.printStackTrace();
return false;
}
}
} catch (SQLException e){
log.error("Exception while creating connection to the Redshift cluster: " + e);
return false;
}
return true;
}
We are getting exception in the line "ResultSet resultSet = preparedStatement.executeQuery()" in the later method and i am catching this exception in the parent method. Now, i need to make sure when the exception is caught in the former method, is the GC already triggered and cleared the local variables memory? (such as connection and result set variable) If not, when that will be happen?
I am worried about the java heap space because, this is continuous flow and I need to keep on fetching the data for another users.
The code that i have provided is only to explain the underlying issue and flow and kindly ignore syntax, etc.., I am using JDK8
Thanks in advance.

Java-MySQL: How to create a Stored Procedure

I Work on a Java 1.7 project with Mysql,
I have a method that insert a lot of data in a table with PreparedStatement, but this cause a Out Of Memory Error in the GlassFish Server.
Connection c = null;
String query = "INSERT INTO users.infos(name,phone,email,type,title) "
+ "VALUES (?, ?, ?, ?, ?, ";
PreparedStatement statement = null;
try {
c = users.getConnection();
statement = c.prepareStatement(query);
} catch (SQLException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try{
int i =0;
for(Member member: members){
i++;
statement.setString(1, member.getName());
statement.setString(2, member.getPhone());
statement.setString(3, member.getEmail());
statement.setInt(4, member.getType());
statement.setString(5, member.getTitle());
statement.addBatch();
if (i % 100000 == 0){
statement.executeBatch();
}
}
statement.executeBatch();
}catch (Exception ex){
ex.printStackTrace();
} finally {
if(c != null)
{
try {
c.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if(statement != null){
try {
statement.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
c = null;
statement = null;
}
I think that I need to create a Stored Procedure to avoid this memory issue, but I don't know where to start and if I should create a procedure or a function, and if I will be able to get some kind of response in a return or something?
What do you think about it?
Replacing your insert statement in your prepared statement with a call to a stored procedure in your prepared statement will not affect the memory consumption in any meaningful way.
You are running out of memory because you are using a very large batch size. You should test the performance of different batch sizes - you will find that the performance improvement for larger batches diminishes quickly with batch sizes greater than dozens to hundreds.
You may be able to achieve greater input rates by using load data infile. You would take your many rows of data, create a text file and loading the file. For example, see this question.
You may also consider parallelizing. For example, open multiple connections and insert rows to each connection with separate threads. Likewise, you could try doing parallel loads using load data infile.
You will have to try the various techniques, batch sizes, number of threads (probably not more than one per core), etc. on your hardware setup to see what gives the best performance.
You may also want to look at tuning some of the MySQL parameters, drop (and later recreate) indexes, etc.

Java ResultSet already closed exception while querying user data

As I've started in the title, while I'm querying for user data in my java application, I get following message: "Operation not allowed after ResultSet closed".
I know that this is happens if you try to have more ResultSets opened at the same time.
Here is my current code:
App calls getProject("..."), other 2 methods are there just for help. I'm using 2 classes because there is much more code, this is just one example of exception I get.
Please note that I've translated variable names, etc. for better understanding, I hope I didn't miss anything.
/* Class which reads project data */
public Project getProject(String name) {
ResultSet result = null;
try {
// executing query for project data
// SELECT * FROM Project WHERE name=name
result = statement.executeQuery(generateSelect(tProject.tableName,
"*", tProject.name, name));
// if cursor can't move to first place,
// that means that project was not found
if (!result.first())
return null;
return user.usersInProject(new Project(result.getInt(1), result
.getString(2)));
} catch (SQLException e) {
e.printStackTrace();
return null;
} catch (BadAttributeValueExpException e) {
e.printStackTrace();
return null;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
/* End of class */
/* Class which reads user data */
public Project usersInProject(Project p) {
ResultSet result = null;
try {
// executing query for users in project
// SELECT ID_User FROM Project_User WHERE ID_Project=p.getID()
result = statement.executeQuery(generateSelect(
tProject_User.tableName, tProject_User.id_user,
tProject_User.id_project, String.valueOf(p.getID())));
ArrayList<User> alUsers = new ArrayList<User>();
// looping through all results and adding them to array
while (result.next()) { // here java gets ResultSet closed exception
int id = result.getInt(1);
if (id > 0)
alUsers.add(getUser(id));
}
// if no user data was read, project from parameter is returned
// without any new user data
if (alUsers.size() == 0)
return p;
// array of users is added to the object,
// then whole object is returned
p.addUsers(alUsers.toArray(new User[alUsers.size()]));
return p;
} catch (SQLException e) {
e.printStackTrace();
return p;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
public User getUser(int id) {
ResultSet result = null;
try {
// executing query for user:
// SELECT * FROM User WHERE ID=id
result = statement.executeQuery(generateSelect(tUser.tableName,
"*", tUser.id, String.valueOf(id)));
if (!result.first())
return null;
// new user is constructed (ID, username, email, password)
User usr = new user(result.getInt(1), result.getString(2),
result.getString(3), result.getString(4));
return usr;
} catch (SQLException e) {
e.printStackTrace();
return null;
} catch (BadAttributeValueExpException e) {
e.printStackTrace();
return null;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
/* End of class */
Statements from both classes are added in constructor, calling connection.getStatement() when constructing each of the classes.
tProject and tProject_User are my enums, I'm using it for easier name handling. generateSelect is my method and should work as expected. I'm using this because I've found out about prepared statements after I have written most of my code, so I left it as it is.
I am using latest java MySQL connector (5.1.21).
I don't know what else to try. Any advice will be appreciated.
Quoting from #aroth's answer:
There are many situations in which a ResultSet will be automatically closed for you. To quote the official documentation:
http://docs.oracle.com/javase/6/docs/api/java/sql/ResultSet.html
A ResultSet object is automatically closed when the Statement object that generated
it is closed, re-executed, or used to retrieve the next result from a sequence of
multiple results.
Here in your code , You are creating new ResultSet in the method getUser using the same Statement object which created result set in the usersInProject method which results in closing your resultset object in the method usersInProject.
Solution:
Create another statement object and use it in getUser to create resultset.
It's not really possible to say definitively what is going wrong without seeing your code. However note that there are many situations in which a ResultSet will be automatically closed for you. To quote the official documentation:
A ResultSet object is automatically closed when the Statement object
that generated it is closed, re-executed, or used to retrieve the next
result from a sequence of multiple results.
Probably you've got one of those things happening. Or you're explicitly closing the ResultSet somewhere before you're actually done with it.
Also, have you considered using an ORM framework like Hibernate? In general something like that is much more pleasant to work with than the low-level JDBC API.

MySql database connection issues in play framework (driver not found)

So I have tried using the stock Play! 2.2 configuration for the MySql database connection. Unfortunately the guides out there are less than helpful when using the stock database (h2) alongside a MySql. SO, I coded a separate model to handle the MySql connection. It works intermittently, and I'm trying to figure out why it doesn't work all of the time.
this is the "connect" function
String sourceSchema = "db";
String databaseHost = "host";
String databaseURLSource = "jdbc:mysql://" + databaseHost + "/" + sourceSchema;
String databaseUserIDSource = "userid";
String databasePWDSource = "password";
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
conn = DriverManager.getConnection(databaseURLSource,
databaseUserIDSource, databasePWDSource);
return true;
} catch (InstantiationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IllegalAccessException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
Logger.error("SQLException: " + e.getMessage());
}
All of my credentials are correct (here obviously they are changed) Next, in my lib folder, I have the
mysql-connector-java-5.1.21-bin.jar
in place.
Next, in my Build.scala, I have this under appDependencies:
"mysql" % "mysql-connector-java" % "5.1.21"
when I try to validate the connection, using:
public boolean isConnected() {
return conn != null;
}
The connection fails (intermittantly) and then gives me:
SQLException: Before start of result set
and sometimes:
SQLException: No Suitable driver found for mysql ...
This is how my query is executed:
String qs = String.format("SELECT * FROM community_hub.alert_journal LIMIT("+ from +","+ to +")");
String qscount = String.format("SELECT COUNT(*) AS count FROM community_hub.alert_journal");
try {
if (isConnected()) {
Statement stmt = conn.createStatement();
//obtain count of rows
ResultSet rs1 = stmt.executeQuery(qscount);
//returns the number of pages to draw on index
int numPages = returnPages(rs1.getInt("count"),rpp);
NumPages(numPages);
ResultSet rs = stmt.executeQuery(qs);
while (rs.next())
{
AlertEntry ae = new AlertEntry(
rs.getTimestamp("date"),
rs.getString("service_url"),
rs.getString("type"),
rs.getString("offering_id"),
rs.getString("observed_property"),
rs.getString("detail")
);
list.add(ae);
}
rs.close();
disconnect();
} else {
System.err.println("Connection was null");
}
}
catch (Exception e)
{
e.printStackTrace();
}
Help?
Thanks!
does the mysql error tell you anything?
the first error "SQLException: Before start of result set" looks like its incomplete. Maybe the error log contains the full message or you can
the second one "SQLException: No Suitable driver found for mysql" clearly indicates a classpath issue.
usually connection pools like c3p0 or BoneCP recommed to use a validation query to determine if a connection is valid (something like "select 1" for mysql). That may help to make sure the connection is ok and not rely on the driver?

Java SQL Optimization

I am trying to use an SQL database with a Java program. I make a table that is 7 columns wide and 2.5 million rows (My next one I need to build will be about 200 million rows). I have two problems: building the SQL table is too slow (about 2,000 rows/minute) and searching the database is too slow (I need to find over 100 million rows in under a second if possible, it currently takes over a minute). I have tried creating a csv file and importing it, but I can't get it to work.
I am using xampp and phpMyAdmin on my computer (i5 + 6gb ram). I have three methods I am testing: createTable(), writeSQL(), and searchSQL().
createTable:
public static void createTable() {
String driverName = "org.gjt.mm.mysql.Driver";
Connection connection = null;
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
String serverName = "localhost";
String mydatabase = "PokerRanks4";
String url = "jdbc:mysql://" + serverName + "/" + mydatabase;
String username = "root";
String password = "";
try {
connection = DriverManager.getConnection(url, username, password);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
///////////////
String table = "CREATE TABLE ranks(deckForm bigint(10) NOT NULL,rank0 int(2) NOT NULL,rank1 int(2) NOT NULL,rank2 int(2) NOT NULL,rank3 int(2) NOT NULL,rank4 int(2) NOT NULL,rank5 int(2) NOT NULL,PRIMARY KEY (deckForm),UNIQUE id (deckForm),KEY id_2 (deckForm))";
try {
Statement st = connection.createStatement();
st.executeUpdate(table);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
///////////////
try {
connection.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
writeSQL():
public static void writeSQL() {
String driverName = "org.gjt.mm.mysql.Driver";
Connection connection = null;
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
String serverName = "localhost";
String mydatabase = "PokerRanks4";
String url = "jdbc:mysql://" + serverName + "/" + mydatabase;
String username = "root";
String password = "";
try {
connection = DriverManager.getConnection(url, username, password);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
/////////////// Prepared Statement with Batch
PreparedStatement statement = null;
String sql = "INSERT INTO ranks VALUES (? ,0, 0, 0, 0, 0, 0)";
long start = System.currentTimeMillis();
try {
statement = connection.prepareStatement(sql);
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++) {
statement.setLong(1, (i*100 + j));
statement.addBatch();
}
System.out.println(i);
statement.executeBatch();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (statement != null) {
try {
statement.close();
} catch (SQLException e) {
} // nothing we can do
}
if (connection != null) {
try {
connection.close();
} catch (SQLException e) {
} // nothing we can do
}
}
System.out.println("Total Time: " + (System.currentTimeMillis() - start) / 1000 );
///////////////
}
searchSQL():
public static void searchSQL() {
String driverName = "org.gjt.mm.mysql.Driver";
Connection connection = null;
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
String serverName = "localhost";
String mydatabase = "PokerRanks2";
String url = "jdbc:mysql://" + serverName + "/" + mydatabase;
String username = "root";
String password = "";
try {
connection = DriverManager.getConnection(url, username, password);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
/////////////// Option 1, Prepared Statement
ResultSet rs = null;
PreparedStatement pstmt = null;
String query = "SELECT rank0, rank1, rank2, rank3, rank4, rank5 FROM ranks WHERE deckForm = ?";
long start = System.currentTimeMillis();
try {
pstmt = connection.prepareStatement(query);
for (int i = 0; i < 100000; i++) {
pstmt.setLong(1, 1423354957);
rs = pstmt.executeQuery();
while (rs.next()) {
int[] arr = {rs.getInt(1), rs.getInt(2), rs.getInt(3), rs.getInt(4), rs.getInt(5), rs.getInt(6)};
}
}
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Total Time: " + (System.currentTimeMillis() - start) / 1000 );
///////////////
/*
/////////////// Option 2
Statement st = null;
long start = System.currentTimeMillis();
try {
st = connection.createStatement();
ResultSet rs = null;
long deckForm = 1012213456;
for (int i = 0; i < 100000; i++) {
rs = st.executeQuery("SELECT rank0, rank1, rank2, rank3, rank4, rank5 FROM ranks WHERE deckForm = " + deckForm);
while (rs.next()) {
int[] arr = {rs.getInt(1), rs.getInt(2), rs.getInt(3), rs.getInt(4), rs.getInt(5), rs.getInt(6)};
}
}
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Total Time: " + (System.currentTimeMillis() - start) / 1000 );
///////////////
*/
try {
connection.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Sorry that's so long. I've tried everything I can think of to make this faster but I can't figure it out. Any suggestions?
Well, there's a few improvements you could make:
You are creating a connection each time you want to search, write or create,
you should use a pooled connection and datasources.
Optimize your queries by doing explain plans, and optimize your table relations and indexes.
You can use stored procedures and call them.
Well that's all I can help with, certainly there are more tips.
As to the insert speed, you need to disable all the indexes prior to doing the insert and re-enable them after you're done. Please see Speed of Insert Statements for a lot of detailed information on improving bulk insert speed.
The query speed is probably limited by your CPU and disk speeds. You may have to throw much more hardware at the problem.
building the SQL table is too slow (about 2,000 rows/minute)
So point of view on inserting a great number of rows is sure use Heap table, it's basic table, also it named as persistent page-array usually created just by CREATE TABLE, it's not effective for searching as you meant that search is slow but for inserting is very efficient because it add rows to first free position that what find or on the end of table. But on other hand, searching is very inefficietly, because is not guaranteed sort of items/rows.
searching the database is too slow (I need to find over 100 million
rows in under a second if possible, it currently takes over a minute)
So for this you should create table in that is searching is efficiently. In a case if you using Oracle, so it offers many constructions for physical implementation for example Index organized tables, Data clustering, Clustered tables - Index / Hash / Sorted hash ...
SQL Server i'm not sure but also clustered tables and MySQL i don't know exactly, i don't want to tell you something worst. I don't say that MySQL is bad or worse like Oracle for example but just not offer some techniques for physical implementation like Oracle for example
So, i mean that it's quite hard to say some recommendations for this approach but you seriously think and study something about physical implementations of database systems, have look at relational algebra for optimize your statements, which types of tables you should create, #duffymo meant right that you can let explain your query execution plan by EXPLAIN PLANE FOR and based on result to optimize. Also how to use indexes, it's strong database construction but each index mean much more operations for any modifying of database so well to rethink for which attribute you create index etc.
Via Google, you find many useful articles about data modeling, physical implementation etc.
Regards man, I wish best of luck

Categories

Resources