I Work on a Java 1.7 project with Mysql,
I have a method that insert a lot of data in a table with PreparedStatement, but this cause a Out Of Memory Error in the GlassFish Server.
Connection c = null;
String query = "INSERT INTO users.infos(name,phone,email,type,title) "
+ "VALUES (?, ?, ?, ?, ?, ";
PreparedStatement statement = null;
try {
c = users.getConnection();
statement = c.prepareStatement(query);
} catch (SQLException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try{
int i =0;
for(Member member: members){
i++;
statement.setString(1, member.getName());
statement.setString(2, member.getPhone());
statement.setString(3, member.getEmail());
statement.setInt(4, member.getType());
statement.setString(5, member.getTitle());
statement.addBatch();
if (i % 100000 == 0){
statement.executeBatch();
}
}
statement.executeBatch();
}catch (Exception ex){
ex.printStackTrace();
} finally {
if(c != null)
{
try {
c.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if(statement != null){
try {
statement.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
c = null;
statement = null;
}
I think that I need to create a Stored Procedure to avoid this memory issue, but I don't know where to start and if I should create a procedure or a function, and if I will be able to get some kind of response in a return or something?
What do you think about it?
Replacing your insert statement in your prepared statement with a call to a stored procedure in your prepared statement will not affect the memory consumption in any meaningful way.
You are running out of memory because you are using a very large batch size. You should test the performance of different batch sizes - you will find that the performance improvement for larger batches diminishes quickly with batch sizes greater than dozens to hundreds.
You may be able to achieve greater input rates by using load data infile. You would take your many rows of data, create a text file and loading the file. For example, see this question.
You may also consider parallelizing. For example, open multiple connections and insert rows to each connection with separate threads. Likewise, you could try doing parallel loads using load data infile.
You will have to try the various techniques, batch sizes, number of threads (probably not more than one per core), etc. on your hardware setup to see what gives the best performance.
You may also want to look at tuning some of the MySQL parameters, drop (and later recreate) indexes, etc.
Related
I am having an java package, which connects with a database and fetches some data. At some rare case, I am getting heap memory exception, since the fetched query data size is exceeding the java heap space. Increasing the java heap space is not something the business can think for now.
Other option is to catch the exception and continue the flow with stopping the execution. ( I know catching OOME is not a good idea but here only me local variables are getting affected). My code is below:
private boolean stepCollectCustomerData() {
try {
boolean biResult = generateMetricCSV();
} catch (OutOfMemoryError e) {
log.error("OutOfMemoryError while collecting data ");
log.error(e.getMessage());
return false;
}
return true;
}
private boolean generateMetricCSV(){
// Executing the PAC & BI cluster SQL queries.
try (Connection connection = DriverManager.getConnection("connectionURL", "username", "password")) {
connection.setAutoCommit(false);
for (RedshiftQueryDefinition redshiftQueryDefinition: redshiftQueryDefinitions){
File csvFile = new File(dsarConfig.getDsarHomeDirectory() + dsarEntryId, redshiftQueryDefinition.getCsvFileName());
log.info("Running the query for metric: " + redshiftQueryDefinition.getMetricName());
try( PreparedStatement preparedStatement = createPreparedStatement(connection,
redshiftQueryDefinition.getSqlQuery(), redshiftQueryDefinition.getArgumentsList());
ResultSet resultSet = preparedStatement.executeQuery();
CSVWriter writer = new CSVWriter(new FileWriter(csvFile));) {
if (resultSet.next()) {
resultSet.beforeFirst();
log.info("Writing the data to CSV file.");
writer.writeAll(resultSet, true);
log.info("Metric written to csv file: " + csvFile.getAbsolutePath());
filesToZip.put(redshiftQueryDefinition.getCsvFileName(), csvFile);
} else {
log.info("There is no data for the metric " + redshiftQueryDefinition.getCsvFileName());
}
} catch (SQLException | IOException e) {
log.error("Exception while generating the CSV file: " + e);
e.printStackTrace();
return false;
}
}
} catch (SQLException e){
log.error("Exception while creating connection to the Redshift cluster: " + e);
return false;
}
return true;
}
We are getting exception in the line "ResultSet resultSet = preparedStatement.executeQuery()" in the later method and i am catching this exception in the parent method. Now, i need to make sure when the exception is caught in the former method, is the GC already triggered and cleared the local variables memory? (such as connection and result set variable) If not, when that will be happen?
I am worried about the java heap space because, this is continuous flow and I need to keep on fetching the data for another users.
The code that i have provided is only to explain the underlying issue and flow and kindly ignore syntax, etc.., I am using JDK8
Thanks in advance.
I am relatively new to java and database and therefore asking your help for my code optimization. I have around 20 text files with comma separated values.Each text files has around 10000 lines Based on the the 3rd value in each line, I insert the data into different tables. Each time I check the 3rd value and use different methods to save this data. My code is as follows. Could someone please tell me if this is the proper way to do this operation.
Thanks in advance.
public void readSave() throws SQLException
{
File dir = new File("C:\\Users\\log");
String url = Config.DB_URL;
String user= Config.DB_USERNAME;
String password= Config.DB_PASSWORD;
con= DriverManager.getConnection(url, user, password);
con.setAutoCommit(false);
String currentLine;
if (!dir.isDirectory())
throw new IllegalStateException();
for (File file : dir.listFiles()) {
BufferedReader br;
try {
br = new BufferedReader(new FileReader(file));
while ((currentLine = br.readLine()) != null) {
List<String> values = Arrays.asList(currentLine.split(","));
if (values.get(2).contentEquals("0051"))
save0051(values,con);
else if(values.get(2).contentEquals("0049"))
save0049(values,con);
else if(values.get(2).contentEquals("0021"))
save0021(values,con);
else if(values.get(2).contentEquals("0089"))
save0089(values,con);
if(statement!=null)
statement.executeBatch();
}
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
con.commit();
statement.close();
con.close();
}
catch (Exception e) {}
}
private void save0051(List<String> values, Connection connection) throws SQLException {
// TODO Auto-generated method stub
String WRITE_DATA = "INSERT INTO LOCATION_DATA"
+ "(loc_id, timestamp, message_id" +
) VALUES (?,?,?)";
try {
statement = connection.prepareStatement(WRITE_DATA);
statement.setString(1, values.get(0));
statement.setLong(2, Long.valueOf(values.get(1)));
statement.setInt(3, Integer.valueOf(values.get(2)));
statement.addBatch();
} catch (SQLException e) {
e.printStackTrace();
System.out.println("Could not save to DB, error: " + e.getMessage());
}
return;
}
Don't create the database connection in the loop. This is an expensive operation and you should create it only once.
Don't create the PreparedStatement in the loop. Create it once and reuse it.
Don't commit after every single INSERT. Read about using batches for inserting. This reduces the "commit-overhead" dramatically if you only make a commit every let's say 200 INSERTs.
If this is going to be performance critical I'd suggest a few changes.
Move the connection creation out of the loop, you don't want to be doing that thousands of times.
Since each function is repeatedly making the same query, you can cache the PreparedStatements, and repeatedly execute them rather than recreating them with each query. This way the database will only need to optimize the query once, and each query will only transmit the data for the query as opposed to the entire query and the data.
Just spotted that batch insert was already mentioned but here is a nice tutorial page I came across, I think he explains it quite well
Use JDBC Batch INSERT,executeBatch() is faster as insert is made in one shot as a list.
see
http://javarevisited.blogspot.com/2013/01/jdbc-batch-insert-and-update-example-java-prepared-statement.html
Efficient way to do batch INSERTS with JDBC
http://www.java2s.com/Code/Java/Database-SQL-JDBC/BatchUpdateInsert.htm
I want to ask, is it normal to handle OutOfMemoryError while doing batch-inserts?
I am using following code to batch-insert in mysql:
try
{
Connection con = null;
PreparedStatement ps = null;
con = Manager.getInstance().getConnection();
ps = con.prepareStatement("INSERT INTO" +
" movie_release_date_pushed_to_subscriber"
+ "(movie_id,cinema_id,msisdn,sent_timestamp)VALUES(?,?,?,?)");
for (String msisdn : subscriberBatch)
{
try
{
ps.setInt(1, movieToBeReleased.getMovieId());
ps.setInt(2, movieToBeReleased.getCinemaId());
ps.setString(3, msisdn);
ps.setTimestamp(4, new java.sql.Timestamp(new Date().getTime()));
ps.addBatch();
}
catch (OutOfMemoryError oome)
{
....
ps.executeBatch();
}
}
ps.executeBatch();
}
catch (Throwable e)
{
....
}
finally
{
try
{
Manager.getInstance().close(ps);
if (con != null)
{
con.close();
}
}
catch (Throwable e)
{
....
}
}
NOTE: Any kind of advice/recommendation is most welcome,
No its not normal. And your catch handler is totally ineffective. Catching an OOME does not miraculously solve its root cause - exhaustion of program memory. You get that error after the runtime has made a best effort at reclaiming memory, and failed. You should not be trying to execute code at that point, you may not even be able to log messages!
If you feel for whatever reason that your batch statement may cause an OOME, then you should either:
Break up the batch cycle into smaller 'buckets'
Make more memory available to the program
It makes no sense to try to execute the batch if you get an OutOfMemoryError. However, you can replace your INSERT query with an INSERT IGNORE INTO and in case of an OOME, ask the user to run the batch again after restarting JVM.
What INSERT IGNORE INTO will do is not run the insert if the primary key already exists in the table, so your batch will resume from where it crashed the app.
However, I will have to warn you that this is probably a very dirty way to circumvent this situation.
As I've started in the title, while I'm querying for user data in my java application, I get following message: "Operation not allowed after ResultSet closed".
I know that this is happens if you try to have more ResultSets opened at the same time.
Here is my current code:
App calls getProject("..."), other 2 methods are there just for help. I'm using 2 classes because there is much more code, this is just one example of exception I get.
Please note that I've translated variable names, etc. for better understanding, I hope I didn't miss anything.
/* Class which reads project data */
public Project getProject(String name) {
ResultSet result = null;
try {
// executing query for project data
// SELECT * FROM Project WHERE name=name
result = statement.executeQuery(generateSelect(tProject.tableName,
"*", tProject.name, name));
// if cursor can't move to first place,
// that means that project was not found
if (!result.first())
return null;
return user.usersInProject(new Project(result.getInt(1), result
.getString(2)));
} catch (SQLException e) {
e.printStackTrace();
return null;
} catch (BadAttributeValueExpException e) {
e.printStackTrace();
return null;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
/* End of class */
/* Class which reads user data */
public Project usersInProject(Project p) {
ResultSet result = null;
try {
// executing query for users in project
// SELECT ID_User FROM Project_User WHERE ID_Project=p.getID()
result = statement.executeQuery(generateSelect(
tProject_User.tableName, tProject_User.id_user,
tProject_User.id_project, String.valueOf(p.getID())));
ArrayList<User> alUsers = new ArrayList<User>();
// looping through all results and adding them to array
while (result.next()) { // here java gets ResultSet closed exception
int id = result.getInt(1);
if (id > 0)
alUsers.add(getUser(id));
}
// if no user data was read, project from parameter is returned
// without any new user data
if (alUsers.size() == 0)
return p;
// array of users is added to the object,
// then whole object is returned
p.addUsers(alUsers.toArray(new User[alUsers.size()]));
return p;
} catch (SQLException e) {
e.printStackTrace();
return p;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
public User getUser(int id) {
ResultSet result = null;
try {
// executing query for user:
// SELECT * FROM User WHERE ID=id
result = statement.executeQuery(generateSelect(tUser.tableName,
"*", tUser.id, String.valueOf(id)));
if (!result.first())
return null;
// new user is constructed (ID, username, email, password)
User usr = new user(result.getInt(1), result.getString(2),
result.getString(3), result.getString(4));
return usr;
} catch (SQLException e) {
e.printStackTrace();
return null;
} catch (BadAttributeValueExpException e) {
e.printStackTrace();
return null;
} finally {
// closing the ResultSet
try {
if (result != null)
result.close();
} catch (SQLException e) {
}
}
}
/* End of class */
Statements from both classes are added in constructor, calling connection.getStatement() when constructing each of the classes.
tProject and tProject_User are my enums, I'm using it for easier name handling. generateSelect is my method and should work as expected. I'm using this because I've found out about prepared statements after I have written most of my code, so I left it as it is.
I am using latest java MySQL connector (5.1.21).
I don't know what else to try. Any advice will be appreciated.
Quoting from #aroth's answer:
There are many situations in which a ResultSet will be automatically closed for you. To quote the official documentation:
http://docs.oracle.com/javase/6/docs/api/java/sql/ResultSet.html
A ResultSet object is automatically closed when the Statement object that generated
it is closed, re-executed, or used to retrieve the next result from a sequence of
multiple results.
Here in your code , You are creating new ResultSet in the method getUser using the same Statement object which created result set in the usersInProject method which results in closing your resultset object in the method usersInProject.
Solution:
Create another statement object and use it in getUser to create resultset.
It's not really possible to say definitively what is going wrong without seeing your code. However note that there are many situations in which a ResultSet will be automatically closed for you. To quote the official documentation:
A ResultSet object is automatically closed when the Statement object
that generated it is closed, re-executed, or used to retrieve the next
result from a sequence of multiple results.
Probably you've got one of those things happening. Or you're explicitly closing the ResultSet somewhere before you're actually done with it.
Also, have you considered using an ORM framework like Hibernate? In general something like that is much more pleasant to work with than the low-level JDBC API.
My application has a memory leak resulting from my usage of JDBC. I have verified this by looking at a visual dump of the heap and seeing thousands of instances of ResultSet and associated objects. My question, then, is how do I appropriately manage resources used by JDBC so they can be garbage collected? Do I need to call ".close()" for every statement that is used? Do I need to call ".close()" on the ResultSets themselves?
How would you free the memory used by the call:
ResultSet rs = connection.createStatement().executeQuery("some sql query");
??
I see that there are other, very similar, questions. Apologies if this is redundant, but either I don't quite follow the answers or they don't seem to apply universally. I am trying to achieve an authoritative answer on how to manage memory when using JDBC.
::EDIT:: Adding some code samples
I have a class that is basically a JDBC helper that I use to simplify database interactions, the main two methods are for executing an insert or update, and for executing select statements.
This one for executing insert or update statements:
public int executeCommand(String sqlCommand) throws SQLException {
if (connection == null || connection.isClosed()) {
sqlConnect();
}
Statement st = connection.createStatement();
int ret = st.executeUpdate(sqlCommand);
st.close();
return ret;
}
And this one for returning ResultSets from a select:
public ResultSet executeSelect(String select) throws SQLException {
if (connection == null || connection.isClosed()) {
sqlConnect();
}
ResultSet rs = connection.createStatement().executeQuery(select);
return rs;
}
After using the executeSelect() method, I always call resultset.getStatement().close()
Examining a heap dump with object allocation tracing on shows statements still being held onto from both of those methods...
You should close the Statement if you are not going to reuse it. It is usually good form to first close the ResultSet as some implementations did not close the ResultSet automatically (even if they should).
If you are repeating the same queries you should probably use a PreparedStatement to reduce parsing overhead. And if you add parameters to your query you really should use PreparedStatement to avoid risk of sql injection.
Yes, ResultSets and Statements should always be closed in a finally block. Using JDBC wrappers such as Spring's JdbcTemplate helps making the code less verbose and close everything for you.
I copied this from a project I have been working on. I am in the process of refactoring it to use Hibernate (from the code it should be clear why!!). Using a ORM tool like Hibernate is one way to resolve your issue. Otherwise, here is the way I used normal DAOs to access the data. There is no memory leak in our code, so this may help as a template. Hope it helps, memory leaks are terrible!
#Override
public List<CampaignsDTO> getCampaign(String key) {
ResultSet resultSet = null;
PreparedStatement statement = null;
try {
statement = connection.prepareStatement(getSQL("CampaignsDAOImpl.getPendingCampaigns"));
statement.setString(1, key);
resultSet = statement.executeQuery();
List<CampaignsDTO> list = new ArrayList<CampaignsDTO>();
while (resultSet.next()) {
list.add(new CampaignsDTO(
resultSet.getTimestamp(resultSet.findColumn("cmp_name")),
...));
}
return list;
} catch (SQLException e) {
logger.fatal(LoggerCodes.DATABASE_ERROR, e);
throw new RuntimeException(e);
} finally {
close(statement);
}
}
The close() method looks like this:
public void close(PreparedStatement statement) {
try {
if (statement != null && !statement.isClosed())
statement.close();
} catch (SQLException e) {
logger.debug(LoggerCodes.TRACE, "Warning! PreparedStatement could not be closed.");
}
}
You should close JDBC statements when you are done. ResultSets should be released when associated statements are closed - but you can do it explicitly if you want.
You need to make sure that you also close all JDBC resources in exception cases.
Use Try-Catch-Finally block - eg:
try {
conn = dataSource.getConnection();
stmt = conn.createStatement();
rs = stmet.executeQuery("select * from sometable");
stmt.close();
conn.close();
} catch (Throwable t) {
// do error handling
} finally {
try {
if (stmt != null) {
stmt.close();
}
if (conn != null) {
conn.close();
}
} catch(Exception e) {
}
}