Oracle UCP performance issue during binding variables - java

Recently we change our connection pool to migrate for Oracle UCP. Before the migration, we used the pool embeded with the oracle jdbc driver (ojdbc6.jar).
Our problem is the elapsed time during the phase of the bind variables.
With UCP, the time to bind a variable is greater than the old pool because it use introspection.
In a normal case (select or update), the time to bind the variables is very small compare to the time of the execution of the sql query.
But, when we used a PreparedStatement for a batch execution, we do a lot of binding variables and sometimes we run the query by calling pst.excuteBatch().
As an example, this is a small program to illustrate the elasped time with the two pools.
PreparedStatement ppst = connection.prepareStatement(INSERT_SQL);
...
private long setParam(PreparedStatement prepStmt) throws SQLException {
long d = 0;
for (long i = 1; i <= 750 000; i++) {
int index = 1;
prepStmt.setString(index++, "1470");
prepStmt.setTimestamp(index++, new Timestamp(System.currentTimeMillis()));
prepStmt.setInt(index++, 1);
prepStmt.setObject(index++, String.valueOf(i));
prepStmt.addBatch();
}
prepStmt.clearBatch();
return d;
}
With the old pool embeded with the ojdbc6 driver, elapsed time is : 7.653 sec.
With the UCP pool, elapsed time is : 10.92 sec.
In this example we have 750 000 iterations with 4 bind variables.
In our production batch, we have 50 000 000 iterations. So the elapsed time to bind variables is long and our batch time has grow up.
Technical Informations :
Old pool : ojdbc6.jar (11.2.0.3.0)
New Pool : ojdbc6.jar (11.2.0.3.0) + ucp.jar (11.2.0.4.0)
We have profiled the binding variable phase :
With the new pool UCP, each variable is binded with java relection api which is slower.
With the old pool, each variable is binded directly with the corresponding method of the variable type.
How can we improve the performance of the binding variable for the UCP pool ? Do you know a way to disable the usage of java reflection api ?

The proxy mechanism in UCP has been improved in the 12.2.0.1 to use dynamic proxies and will offer better performance than the Java's proxies that were used in 11.2.0.4. You will need to upgrade both ucp and jdbc (both have to be on the same version) to 12.2.0.1.

Related

Investigating slow simple queries in JDBC and MySQL

PreparedStatement.executeQuery() is taking ~20x longer to execute than if it were run directly via the shell. I've logged with timers to determine that this method is the culprit.
The query and some DB info (ignoring the Java issue for the moment):
mysql> SELECT username from users where user_id = 1; // lightning fast
Running that same query 1,000 times via mysqlslap is also lightning fast.
mysqlslap --create-schema=mydb --user=root -p --query="select username from phpbb_users where user_id = 1" --number-of-queries=1000 --concurrency=1
Benchmark
Average number of seconds to run all queries: 0.051 seconds
Minimum number of seconds to run all queries: 0.051 seconds
Maximum number of seconds to run all queries: 0.051 seconds
Number of clients running queries: 1
Average number of queries per client: 1000
The Problem: Performing the same query in JDBC slows things significantly. In a for loop calling the below queryUsername() 1,000 times (this is called in the Main method, which isn't shown here) takes around 872ms. That's ~17x slower! I've tracked down the heavy usage by placing timers in various spots (omitted some for brevity). The primary suspect is stmt.executeQuery() which took 776ms of the 872ms runtime.
public static String queryUsername() {
String username = "";
// DBCore.getConnection() returns HikariDataSource.getConnection() implementation exactly as per https://www.baeldung.com/hikaricp
try (Connection connection = DBCore.getConnection();
PreparedStatement stmt = connection.prepareStatement("SELECT username from phpbb_users where user_id = ?");) {
stmt.setInt(1, 1); // just looking for user_id 1 for now
// Google timer used to measure how long executeQuery() is taking
// Another Timer is used outside of this method call to see how long
// total execution takes.
// Approximately 1 second in for loop calling this method 1000 times
Stopwatch s = Stopwatch.createStarted();
try (ResultSet rs = stmt.executeQuery();) {
s.stop(); // stopping the timer after executeQuery() has been called
timeElapsed += s.elapsed(TimeUnit.MICROSECONDS);
while (rs.next())
{
username = rs.getString("username"); // the query returns 1 record
}
}
} catch (SQLException e) {
e.printStackTrace();
}
return username;
}
Additional context and things tried:
SHOW OPEN TABLES has several tables open, but all have In_use=0 and Name_locked=0.
SHOW FULL PROCESSLIST looks healthy.
user_id is an indexed primary key
The Server is an Upcloud $5/month 1-Core, 1GB RAM running Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-66-generic x86_64). Mysql Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
JDBC Driver is mysql-connector-java_8.0.23.jar, which was obtained from mysql-connector-java_8.0.23-1ubuntu20.04_all via https://dev.mysql.com/downloads/connector/j/
Don't reconnect each time. Open the connection at the start; reuse it until the web page (or program) is finished.
Chances are that you are comparing different realities.
When running mysqlslap you are most likely using Unix Domain Sockets in the communication between the tool and MySQL server. Try changing that to TCP and you should observe an immediate performance drop. Connector/J, on the other hand, creates TCP based connections by default (Unix Domain Sockets can be used but only by using a third party library).
Also, in mysqlslap you are running a simple query directly, which are handled by a COM_QUERY protocol command. In the Java sample you are preparing the query first and then executing it. Depending on how Connector/J is configured this may result in a single COM_QUERY protocol command or a pair of commands, namely, COM_STMT_PREPARE and COM_STMT_EXECUTE. Connector/J is also affected by how its statement caches are configured (and/or the CP ones). However, you are only measuring the executeQuery part so, theoretically, Connector/J could be being favored here.
Finally, unless you actually come up with a use case where you guarantee that both executions are effectively doing the same work under the same circumstances, you can compare results and point out differences, but you can't take any conclusions from it. For example, it's not that hard to introduce caches and make those simple iterations even completely skip communicating to the server... that would make things extremely fast.
move borrowing connection and Stopwatch related code out of method. then measure as:
Stopwatch s = Stopwatch.createStarted();
try (Connection con = ....) {
for (int i=0; i < 1000; i++) {
queryUsername( con );
}
}
s.stop();
print s.elapsed(TimeUnit.MICROSECONDS);

Bulk insert/update using Stateless session - Hibernate

I have a requirement to insert/update more than 15000 rows in 3 tables. So that's 45k total inserts.
I used Statelesssession in hibernate after reading online that it is the best for batch processing as it doesn't have a context cache.
session = sessionFactory.openStatelessSession;
for(Employee e: emplList) {
session.insert(e);
}
transcation.commit;
But this codes takes more than an hour to complete.
Is there a way to save all the entity objects in one go?
Save the entire collection rather than doing it one by one?
Edit: Is there any other framework that can offer a quick insert?
Cheers!!
You should read this article of Vlad Mihalcea:
How to batch INSERT and UPDATE statements with Hibernate
You need to make sure that you've set the hibernate property:
hibernate.jdbc.batch_size
So that Hibernate can batch these inserts, otherwise they'll be done one at a time.
There is no way to insert all entities in one go. Even if you could do something like session.save(emplList) internally Hibernate will save one by one.
Accordingly to Hibernate User Guide StatelessSession do not use batch feature:
The insert(), update(), and delete() operations defined by the StatelessSession interface operate directly on database rows. They cause the corresponding SQL operations to be executed immediately. They have different semantics from the save(), saveOrUpdate(), and delete() operations defined by the Session interface.
Instead use normal Session and clear the cache from time to time. Acttually, I suggest you to measure your code first and then make changes like use hibernate.jdbc.batch_size, so you can see how much any tweak had improved your load.
Try to change it like this:
session = sessionFactory.openSession();
int count = 0;
int step = 0;
int stepSize = 1_000;
long start = System.currentTimeMillis();
for(Employee e:emplList) {
session.save(e);
count++;
if (step++ == stepSize) {
long elapsed = System.currentTimeMillis() - start;
long linesPerSecond = stepSize / elapsed * 1_000;
StringBuilder msg = new StringBuilder();
msg.append("Step time: ");
msg.append(elapsed);
msg.append(" ms Lines: ");
msg.append(count);
msg.append("/");
msg.append(emplList.size());
msg.append(" Lines/Seconds: ");
msg.append(linesPerSecond);
System.out.println(msg.toString());
start = System.currentTimeMillis();
step = 0;
session.clear();
}
}
transcation.commit;
About hibernate.jdbc.batch_size - you can try different values, including some very large depending on underlying database in use and network configuration. For example, I do use a value of 10,000 for a 1gbps network between app server and database server, giving me 20,000 records per second.
Change stepSize to the same value of hibernate.jdbc.batch_size.

Spring JDBC template ROW Mapper is too slow

I have a db fetch call with Spring jdbcTemplate and rows to be fetched is around 1 millions. It takes too much time iterating in result set. After debugging the behavior I found that it process some rows like a batch and then waits for some time and then again takes a batch of rows and process them. It seems row processing is not continuous so overall time is going into minutes. I have used default configuration for data source. Please help.
[Edit]
Here is some sample code
this.prestoJdbcTempate.query(query, new RowMapper<SomeObject>() {
#Override
public SomeObject mapRow(final ResultSet rs, final int rowNum) throws SQLException {
System.out.println(rowNum);
SomeObject obj = new SomeObject();
obj.setProp1(rs.getString(1));
obj.setProp2(rs.getString(2));
....
obj.setProp8(rs.getString(8));
return obj;
}
});
As most of the comments tell you, One mllion records is useless and unrealistic to be shown in any UI - if this is a real business requirement, you need to educate your customer.
Network traffic application and database server is a key factor in performance in scenarios like this. There is one optional parameter that can really help you in this scenario is : fetch size - that too to certain extent
Example :
Connection connection = //get your connection
Statement statement = connection.createStatement();
statement.setFetchSize(1000); // configure the fetch size
Most of the JDBC database drivers have a low fetch size by default and tuning this can help you in this situation. **But beware ** of the following.
Make sure your jdbc driver supports fetch size
Make sure your JVM heap setting ( -Xmx) is wide enough to handle objects created as a result of this.
Finally, select only the columns you need to reduce network overhead.
In spring, JdbcTemplate lets you set the fetchSize

Improve JDBC Performance

I am executing the following set of statements in my java application. It connects to a oracle database.
stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
obj1.setAttr1(rs.getString(1));
obj1.setAttr2(rs.getString(1));
obj1.setAttr3(rs.getString(1));
obj1.setAttr4(rs.getString(1));
ResultSet rs1 = stat1.executeQuery(SMALLQ1);
while(rs1.next()) {
obj1.setAttr5(rs1.getString(1));
}
ResultSet rs2 = stat1.executeQuery(SMALLQ2);
while(rs2.next()) {
obj1.setAttr6(rs2.getString(1));
}
.
.
.
LinkedBlockingqueue.add(obj1);
}
//all staements and connections close
The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.
My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.
Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?
PS : I am unable to post the query here, but I am assured that all queries are optimized.
To improve JDBC performance for your scenario you can apply some modifications.
As you will see, all these modifications can significantly speed your task.
1. Using batch operations.
You can read your big query and store results in some kind of buffer.
And only when buffer is full you should run subquery for all data collected in buffer.
This significantly reduces number of SQL statements to execute.
static final int BATCH_SIZE = 1000;
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);
while (rs.hasNext()) {
MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
buffer.add( record );
if (buffer.size() == BATCH_SIZE) {
processBatch( buffer );
}
}
void processBatch( List<MyData> buffer ) {
String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
stat1.executeQuery(sql); // query for all IDs in buffer
while(stat1.hasNext()) { ... }
...
}
2. Using efficient maps to store content from many selects.
If your records are no so big you can store them all at once event for 4 mln table.
I used this approach many times for night processes (with no normal users).
Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).
To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc)
which is much better that java standard library maps.
final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);
// query 1
while (rs.hasNext()) {
MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
map.put(record.getId(), record);
}
// query 2
while (rs.hasNext()) {
int id = rs.getInt(1); // my data id
String x = rs.getString(...);
int y = rs.getInt(...);
MyData record = map.get(id);
record.add( new MyDetail(x,y) );
}
// query 3
// same pattern as query 2
After this you have map filled with all data collected. Probably with a lot of memory allocated.
This is why you can use that method only if you hava such resources.
Another topic is how to write MyData and MyDetail classes to be as small as possible.
You can use some tricks:
storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
storing Date objects as integer (yymmdd)
calling str.intern() for each string fetched from DB
3. Transactions
If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions.
This is too much for most database configurations.
Use approach 1) and commit transaction for each batch.
On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.
If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.
4. Jdbc fetch size.
When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection.
This reduces number of physical hits to database socket and speeds your process.
Example:
// jdbc
statement.setFetchSize(500);
// spring
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);
Here you can find some benchmarks and patterns for using fetch size:
http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html
5. PreparedStatement
Use PreparedStatement rather than Statement.
6. Number of sql statements.
Always try to minimize number of sql statements you send to database.
Try this
resultSet.setFetchSize(100);
while(resultSet.next) {
...
}
The parameter is the number of rows that should be retrieved from the
database in each roundtrip

Long running queries MySQL with Java

I'm using MySQL 5.1, Apache Tomcat 7, MyBatis 3.1
I have a method with code like this:
for( Order o : orders) {
List<Details> list = getDetails(o);
//Create PDF report ...
}
Where getDetails is a method that executes a stored procedure that takes some time to execute ( 1 to 2 seconds), The problem here is that I have many orders (near 4000) and I need to execute this method to process every order, and when I hit that method, the CPU usage of the MySQL process goes up to 90 - 100%
Is that normal?, Do I need to use Thread.sleep() after getDetails if executed?, Or do I need to do some modifications to my query?,

Categories

Resources