Investigating slow simple queries in JDBC and MySQL - java

PreparedStatement.executeQuery() is taking ~20x longer to execute than if it were run directly via the shell. I've logged with timers to determine that this method is the culprit.
The query and some DB info (ignoring the Java issue for the moment):
mysql> SELECT username from users where user_id = 1; // lightning fast
Running that same query 1,000 times via mysqlslap is also lightning fast.
mysqlslap --create-schema=mydb --user=root -p --query="select username from phpbb_users where user_id = 1" --number-of-queries=1000 --concurrency=1
Benchmark
Average number of seconds to run all queries: 0.051 seconds
Minimum number of seconds to run all queries: 0.051 seconds
Maximum number of seconds to run all queries: 0.051 seconds
Number of clients running queries: 1
Average number of queries per client: 1000
The Problem: Performing the same query in JDBC slows things significantly. In a for loop calling the below queryUsername() 1,000 times (this is called in the Main method, which isn't shown here) takes around 872ms. That's ~17x slower! I've tracked down the heavy usage by placing timers in various spots (omitted some for brevity). The primary suspect is stmt.executeQuery() which took 776ms of the 872ms runtime.
public static String queryUsername() {
String username = "";
// DBCore.getConnection() returns HikariDataSource.getConnection() implementation exactly as per https://www.baeldung.com/hikaricp
try (Connection connection = DBCore.getConnection();
PreparedStatement stmt = connection.prepareStatement("SELECT username from phpbb_users where user_id = ?");) {
stmt.setInt(1, 1); // just looking for user_id 1 for now
// Google timer used to measure how long executeQuery() is taking
// Another Timer is used outside of this method call to see how long
// total execution takes.
// Approximately 1 second in for loop calling this method 1000 times
Stopwatch s = Stopwatch.createStarted();
try (ResultSet rs = stmt.executeQuery();) {
s.stop(); // stopping the timer after executeQuery() has been called
timeElapsed += s.elapsed(TimeUnit.MICROSECONDS);
while (rs.next())
{
username = rs.getString("username"); // the query returns 1 record
}
}
} catch (SQLException e) {
e.printStackTrace();
}
return username;
}
Additional context and things tried:
SHOW OPEN TABLES has several tables open, but all have In_use=0 and Name_locked=0.
SHOW FULL PROCESSLIST looks healthy.
user_id is an indexed primary key
The Server is an Upcloud $5/month 1-Core, 1GB RAM running Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-66-generic x86_64). Mysql Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
JDBC Driver is mysql-connector-java_8.0.23.jar, which was obtained from mysql-connector-java_8.0.23-1ubuntu20.04_all via https://dev.mysql.com/downloads/connector/j/

Don't reconnect each time. Open the connection at the start; reuse it until the web page (or program) is finished.

Chances are that you are comparing different realities.
When running mysqlslap you are most likely using Unix Domain Sockets in the communication between the tool and MySQL server. Try changing that to TCP and you should observe an immediate performance drop. Connector/J, on the other hand, creates TCP based connections by default (Unix Domain Sockets can be used but only by using a third party library).
Also, in mysqlslap you are running a simple query directly, which are handled by a COM_QUERY protocol command. In the Java sample you are preparing the query first and then executing it. Depending on how Connector/J is configured this may result in a single COM_QUERY protocol command or a pair of commands, namely, COM_STMT_PREPARE and COM_STMT_EXECUTE. Connector/J is also affected by how its statement caches are configured (and/or the CP ones). However, you are only measuring the executeQuery part so, theoretically, Connector/J could be being favored here.
Finally, unless you actually come up with a use case where you guarantee that both executions are effectively doing the same work under the same circumstances, you can compare results and point out differences, but you can't take any conclusions from it. For example, it's not that hard to introduce caches and make those simple iterations even completely skip communicating to the server... that would make things extremely fast.

move borrowing connection and Stopwatch related code out of method. then measure as:
Stopwatch s = Stopwatch.createStarted();
try (Connection con = ....) {
for (int i=0; i < 1000; i++) {
queryUsername( con );
}
}
s.stop();
print s.elapsed(TimeUnit.MICROSECONDS);

Related

Oracle UCP performance issue during binding variables

Recently we change our connection pool to migrate for Oracle UCP. Before the migration, we used the pool embeded with the oracle jdbc driver (ojdbc6.jar).
Our problem is the elapsed time during the phase of the bind variables.
With UCP, the time to bind a variable is greater than the old pool because it use introspection.
In a normal case (select or update), the time to bind the variables is very small compare to the time of the execution of the sql query.
But, when we used a PreparedStatement for a batch execution, we do a lot of binding variables and sometimes we run the query by calling pst.excuteBatch().
As an example, this is a small program to illustrate the elasped time with the two pools.
PreparedStatement ppst = connection.prepareStatement(INSERT_SQL);
...
private long setParam(PreparedStatement prepStmt) throws SQLException {
long d = 0;
for (long i = 1; i <= 750 000; i++) {
int index = 1;
prepStmt.setString(index++, "1470");
prepStmt.setTimestamp(index++, new Timestamp(System.currentTimeMillis()));
prepStmt.setInt(index++, 1);
prepStmt.setObject(index++, String.valueOf(i));
prepStmt.addBatch();
}
prepStmt.clearBatch();
return d;
}
With the old pool embeded with the ojdbc6 driver, elapsed time is : 7.653 sec.
With the UCP pool, elapsed time is : 10.92 sec.
In this example we have 750 000 iterations with 4 bind variables.
In our production batch, we have 50 000 000 iterations. So the elapsed time to bind variables is long and our batch time has grow up.
Technical Informations :
Old pool : ojdbc6.jar (11.2.0.3.0)
New Pool : ojdbc6.jar (11.2.0.3.0) + ucp.jar (11.2.0.4.0)
We have profiled the binding variable phase :
With the new pool UCP, each variable is binded with java relection api which is slower.
With the old pool, each variable is binded directly with the corresponding method of the variable type.
How can we improve the performance of the binding variable for the UCP pool ? Do you know a way to disable the usage of java reflection api ?
The proxy mechanism in UCP has been improved in the 12.2.0.1 to use dynamic proxies and will offer better performance than the Java's proxies that were used in 11.2.0.4. You will need to upgrade both ucp and jdbc (both have to be on the same version) to 12.2.0.1.

Spring JDBC template ROW Mapper is too slow

I have a db fetch call with Spring jdbcTemplate and rows to be fetched is around 1 millions. It takes too much time iterating in result set. After debugging the behavior I found that it process some rows like a batch and then waits for some time and then again takes a batch of rows and process them. It seems row processing is not continuous so overall time is going into minutes. I have used default configuration for data source. Please help.
[Edit]
Here is some sample code
this.prestoJdbcTempate.query(query, new RowMapper<SomeObject>() {
#Override
public SomeObject mapRow(final ResultSet rs, final int rowNum) throws SQLException {
System.out.println(rowNum);
SomeObject obj = new SomeObject();
obj.setProp1(rs.getString(1));
obj.setProp2(rs.getString(2));
....
obj.setProp8(rs.getString(8));
return obj;
}
});
As most of the comments tell you, One mllion records is useless and unrealistic to be shown in any UI - if this is a real business requirement, you need to educate your customer.
Network traffic application and database server is a key factor in performance in scenarios like this. There is one optional parameter that can really help you in this scenario is : fetch size - that too to certain extent
Example :
Connection connection = //get your connection
Statement statement = connection.createStatement();
statement.setFetchSize(1000); // configure the fetch size
Most of the JDBC database drivers have a low fetch size by default and tuning this can help you in this situation. **But beware ** of the following.
Make sure your jdbc driver supports fetch size
Make sure your JVM heap setting ( -Xmx) is wide enough to handle objects created as a result of this.
Finally, select only the columns you need to reduce network overhead.
In spring, JdbcTemplate lets you set the fetchSize

Improve JDBC Performance

I am executing the following set of statements in my java application. It connects to a oracle database.
stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
obj1.setAttr1(rs.getString(1));
obj1.setAttr2(rs.getString(1));
obj1.setAttr3(rs.getString(1));
obj1.setAttr4(rs.getString(1));
ResultSet rs1 = stat1.executeQuery(SMALLQ1);
while(rs1.next()) {
obj1.setAttr5(rs1.getString(1));
}
ResultSet rs2 = stat1.executeQuery(SMALLQ2);
while(rs2.next()) {
obj1.setAttr6(rs2.getString(1));
}
.
.
.
LinkedBlockingqueue.add(obj1);
}
//all staements and connections close
The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.
My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.
Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?
PS : I am unable to post the query here, but I am assured that all queries are optimized.
To improve JDBC performance for your scenario you can apply some modifications.
As you will see, all these modifications can significantly speed your task.
1. Using batch operations.
You can read your big query and store results in some kind of buffer.
And only when buffer is full you should run subquery for all data collected in buffer.
This significantly reduces number of SQL statements to execute.
static final int BATCH_SIZE = 1000;
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);
while (rs.hasNext()) {
MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
buffer.add( record );
if (buffer.size() == BATCH_SIZE) {
processBatch( buffer );
}
}
void processBatch( List<MyData> buffer ) {
String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
stat1.executeQuery(sql); // query for all IDs in buffer
while(stat1.hasNext()) { ... }
...
}
2. Using efficient maps to store content from many selects.
If your records are no so big you can store them all at once event for 4 mln table.
I used this approach many times for night processes (with no normal users).
Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).
To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc)
which is much better that java standard library maps.
final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);
// query 1
while (rs.hasNext()) {
MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
map.put(record.getId(), record);
}
// query 2
while (rs.hasNext()) {
int id = rs.getInt(1); // my data id
String x = rs.getString(...);
int y = rs.getInt(...);
MyData record = map.get(id);
record.add( new MyDetail(x,y) );
}
// query 3
// same pattern as query 2
After this you have map filled with all data collected. Probably with a lot of memory allocated.
This is why you can use that method only if you hava such resources.
Another topic is how to write MyData and MyDetail classes to be as small as possible.
You can use some tricks:
storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
storing Date objects as integer (yymmdd)
calling str.intern() for each string fetched from DB
3. Transactions
If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions.
This is too much for most database configurations.
Use approach 1) and commit transaction for each batch.
On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.
If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.
4. Jdbc fetch size.
When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection.
This reduces number of physical hits to database socket and speeds your process.
Example:
// jdbc
statement.setFetchSize(500);
// spring
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);
Here you can find some benchmarks and patterns for using fetch size:
http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html
5. PreparedStatement
Use PreparedStatement rather than Statement.
6. Number of sql statements.
Always try to minimize number of sql statements you send to database.
Try this
resultSet.setFetchSize(100);
while(resultSet.next) {
...
}
The parameter is the number of rows that should be retrieved from the
database in each roundtrip

Java and MySQL query

I have problem with java and MySQL. My code:
Connection connection;
// ...
for (String query : updateAndInsertQuery) {
Statement stm = connection.createStatement();
stm.execute(query);
stm.close();
}
Statement stm2 = connection.createStatement();
System.out.println("Before query");
System.out.flush();
ResultSet Result = stm2.executeQuery(selectQuery);
System.out.println("After query");
System.out.flush();
int vfrom, vto;
while (Result.next()) {
// ...
}
When I run program i see in MySQL queries and run
show processlist;
selectQuery is visible on list wth status Sending data or Writing to net. On console print: Before query. Next
show processlist;
returns empty list, but application don't print After query. Do you have similar problem?
-- edit
I resolve my problem.
I think:
wen MySQL returns data and query isn't visible on processlist in MySQL
I should immediately get on console message: After query
but console was empty, java process works (cpu usage was 90-100%) so I think it was my mistake, but after 1h application throws Exception
Increase memory limit resolve my problem.
So I have next question why application throw exception after hour? Garbage collection try dealocate unised objects?
executing queries manually usually leads into many different problems - all of which are platform-specific and DB-specific. I think your best answer will be : "switch to ORM".
This framework has proven to be exceptionally good, wrappig all your SQL-data into Entities and transactions (if required) will resolve most of your problems at the same time - you will only need to annotate your entities and relationships correctly. Database-queries can be executed via JPA-"criteria"s which are platform-independent AND allow you to avoid a lot of problems as well as making your code READABLE.
Tutorial : http://www.vogella.com/tutorials/JavaPersistenceAPI/article.html
SO-question : https://stackoverflow.com/questions/743065/what-is-the-fastest-way-to-learn-jpa
With JPA, you wont need to care about statements or queries anymore (well, at least most of the time) and your mentioned problem will disappear - PLUS : it only takes 30-60min to implement.
Additional tip : use Maven & Eclipselink (JPA2 implementation) - thats a very powerful, portable combination

JDBC/Resultset error

My mysql-query in Java always stops (i.e. freezes and does not continue) at a certain position, which namely is 543,858; even though the table contains approx. 2,000,000 entries. I've checked this by logging the current result-fetching.
It is reproducible and happens every time at the very same position.
"SELECT abc from bcd WHERE DATEDIFF(CURDATE(), timestamp) <= '"+days+"'");
Addition: It definitely is a Java error, I've just tried out this statement in Navicat (50s running time).
The query seems to freeze after the log tells me that it's now adding the result of position 543,858.
try {
...
ResultSet res = new ResultSet();
PreparedStatement stmt = new PreparedStatement(); // prepare statmenet etc.
stmt.setFetchSize(Integer.MIN_VALUE);
res = stmt.executeQuery();
...
System.out.println(res.getStatement());
...
while (res.next())
treeSet.add(res.getString("userid"));
} catch (Exception e) {
e.printStackTrace();
}
Edit: We were able to figure out the problem. This method is fine and the returned result (500,000 instead of 2,000,000) is right as well (looked up in the wrong db to verify the amount); the problem was, that the next method-call that used the result of the one posted above takes literally forever, but had no logging-implemented. So I've been fooled by missing console-logs.
Thanks anyways!
I think you might be running out of memory after processing half a million records. Try assigning more memory using command line options -Xmx etc. See here for more info about command line options.
In mysql to use streaming ResultSets you have to specify more parameters, not only fetchSize.
Try:
stmt = conn.createStatement('select ...', java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
and see if that works.
It's documented in the ResultSet section.
Strange that it doesn't throw exception, but this is the only suspect I have. Maybe it starts garbage collection/flushes memory to disk and it takes so much time it doesn't get to throw it.
I would try to add to your query " LIMIT 543857" and then " LIMIT 543857" and see what happens.
If the above does not help, use the limit directive combined with order by.
I suspect that there is invalid entry in your table and the way to find it is binary search.

Categories

Resources