Optimal number of connections in connection pool

Optimal number of connections in connection pool - java

Currently we are using 4 cpu windows box with 8gb RAM with MySQL 5.x installed on same box. We are using Weblogic application server for our application. We are targeting for 200 concurrent users for our application (Obviously not for same module/screen). So what is optimal number of connections should we configured in connection pool (min and max number) (We are using weblogic AS' connection pooling mechanism) ?

Did you really mean 200 concurrent users or just 200 logged in users? In most cases, a browser user is not going to be able to do more than 1 page request per second. So, 200 users translates into 200 transactions per second. That is a pretty high number for most applications.
Regardless, as an example, let's go with 200 transactions per second. Say each front end (browser) tx takes 0.5 seconds to complete and of the 0.5 seconds, 0.25 are spent in the database. So, you would need 0.5 * 200, or 100 connections in the WebLogic thead pool and 0.25 * 200 = 50 connections in the DB connection pool.
To be safe, I would set the max thread pool sizes to at least 25% larger than you expect to allow for spikes in load. The minimums can be a small fraction of the max, but the tradeoff is that it could take longer for some users because a new connection would have to be created. In this case, 50 - 100 connections is not that many for a DB so that's probably a good starting number.
Note, that to figure out what your average transaction response times are, along with your average DB query time, you are going to have to do a performance test because your times at load are probably not going to be the times you see with a single user.

There is a very simple answer to this question:
The number of connections in the connection pool should be equal the number of the exec threads configured in WebLogic.
The rationale is very simple: If the number of the connections is less than the number of threads, some of the thread maybe waiting for a connection thus making the connection pool a bottleneck. So, it should be equal at least the number the exec threads (thread pool size).

Sizing a connection pool is not a trivial thing to do. You basically need:
metrics to investigate the connection usage
failover mechanisms for when there is no connection available
FlexyPool aims to aid you in figuring out the right connection pool size.

You should profile the different expected workflows to find out. Ideally, your connection pool will also dynamically adjust the number of live connections based on recent usage, as it's pretty common for load to be a function of the current time of day in your target geographical area.
Start with a small number and try to reach a reasonable number of concurrent users, then crank it up. I think it's likely that you'll find that your connection pooling mechanism is not nearly as instrumental in your scalability as the rest of the software.

The connection pool should be able to grow and shink based on actual needs. Log the numbers needed to do analysis on the running system, either through logging statements or through JMX surveillance. Consider setting up alerts for scenarios like "peak detected: more than X new entries had to be allocated in Y seconds", "connection was out of pool for more than X seconds" which will allow you to give attention to performance issues before they get real problems.

It's difficult to get hard data for this. It's also dependent on a number of factors you don't mention -
200 concurrent users, but how much of their activity will generate database queries? 10 queries per page load? 1 query just on login? etc. etc.
Size of the queries and the db obviously. Some queries run in milliseconds, some in minutes.
You can monitor mysql to watch the current active queries with "show processlist". This could give you a better sense of how much activity is actually going on in the db under peak load.

This is something that needs to be tested and determined on an individual basis - it's pretty much impossible to give an accurate answer for your circumstances without intimately being familiar with them.

Based on my experience on high transaction financial systems, if you want to handle for example 1K requests per seconds, and you have 32 CPU's, You need to have 1000/32 open connection polls to your database.
Here is my formula:
RPS / CPU_COUNT
If most cases, your database engine will be able to handle your requests even in much lower numbers, but your connections will be in waiting mode if the number is low.
I think it's pretty important to mention that your database should be able to handle those transactions (based on your disk speed, database configuration and server power).
Good luck.

Related

HikariCP behavior when no connection is available

I noticed that even when the database is down, so no connection is actually available in the pool, Hikari CP still waits for the connection timeout to expire before sending an exception to the client.
I agree this is desirable when the database is available, but in my case I would like for the pool not to wait before sending an exception when no connection is available.
The reason is that the database itself answers in less than 2ms , so I can handle thousands of transactions per second but when there is no connection available the pool will wait for a lot longer (the minimum acceptable timeout recommended being 250 ms) so I can no longer handle the throughput. On the other hand, my logic can work without the database for a period of time.
How should I manage this?
EDIT:
This link is almost what I want to achieve, minus the fact that I would prefer HikariCP to do this automatically, I shouldn't activate the suspend state.

Perhaps you should introduce a counter somewhere in your application code and if the number of concurrent requests exceeds the value don't use database. It's hard to tell without knowing what you are dealing with e.g. read vs write.
As per brettwooldridge comment regarding connectionTimeout property lower timeout is unreliable due to thread scheduling, even when there are available connections:
We can certainly consider a lower floor, but 125ms would be the absolute minimum.
Both Windows and Linux have a default scheduler quantum of 20ms. If 16 threads are running on a 4-core CPU, a single thread may have to wait up to 80ms just to be allowed to run. If the pool has a vacancy due to, for example, the retirement of a connection at maxLifetime, this leaves precious little time to establish a connection to fill the slot without returning a spurious failure to the client.
If careful consideration is not taken to ensure the CPU and scheduler are not oversaturated, running at a 125ms timeout, puts your application tier at risk of spurious failures even if the pool has available connections. For example, running 32 threads on a 4-core CPU can lead to thread starvations under load as long as 120ms -- very close to the edge.

What are the risks / factors to watch when setting a high maxPoolSize with JDBC

My application is a Piwik Server that receives incoming tracking data from tracking codes placed on hundreds of websites. The bulk of the workload is small writes to the database hundreds of times per second as these tracking requests come in. I'm using MySQL server with JDBC and Hibernate.
I've recently been increasing the maxPoolSize setting gradually on my application to improve performance. It certainly seems like the higher I set the configuration, the more responsive the application is, and the more stable the disk queue depth.
My current configuration:
jdbc.maxPoolSize=100
jdbc.minPoolSize=100
jdbc.maxStatements=1000
Essentially, my question is what risks I should be watching for when I increase the maxPoolSize? Are there any specific factors or metrics that I should watch to judge whether I've configured this setting too high? Obviously if increasing the maxPoolSize was a magic bullet for resolving performance problems, everyone would want to set it as high as possible. Apologies in advance if this is a duplicate, but I couldn't find any answer addressing how to assess if your connection pool is too large.
I'm running MySQL on an AWS RDS instance. These are my guesses as to what the concerns might be:
Avoid exceeding the maximum number of connections allowed by the RDS instance type
Would an excessively high setting suck up all the memory on the server and impact performance?
Will too many threads cause tables to lock and increase queue time for some of the queries?
Any assistance in understanding what factors to watch for would be greatly appreciated.

I strongly recommend setting up DropWizard metrics and/or JMX monitoring.
In the case of JMX, graph the "Active Connections" over time, if your pool never crosses (or rarely crosses) a given threshold, setting the maximumPoolSize above that is simply wasting resources.
In the case of DropWizard metrics, the "Usage" measurement -- reflecting how long connections are out of the pool -- would give a "comparable" for you to check when playing with the maximumPoolSize.
If connections tend to be out of the pool longer when the maximumPoolSize is 50 (for example) compared to 40, that would indicate that the database is oversaturated, and 40 is closer to ideal.
If there is no difference between a maximumPoolSize of 30 compared to 40 (again, just an example), it could mean that 40 is simply unnecessarily high, or it could mean that the period of time over which those metrics were collected was simply a low period of demand and 40 may still be correct.
Best of all is to combine the above metrics with total web request service times and overlay them on a graph or at least side-by-side.
Metrics are the key to analysis! Find and track as many relevant ones as you can; patterns will emerge.
Lastly, you might try setting the pool for minumumIdle=20 and maximumPoolSize=100 and see where the pool generally settles, ignoring the occasional spike. RDS is unlike typical databases, where you control the hardware where the database is running. With RDS you really don't know how Amazon is spreading the load, so it is just going to require experimentation. Let each experiment run long enough (several hours) to collect sufficient data, and take screenshots of your monitor for comparison.

Avoid exceeding the maximum number of connections allowed by the RDS instance type.
That is plausible.
Would an excessively high setting suck up all the memory on the server and impact performance?
That is possible. Each active connection in the pool will have associated buffers, etcetera. However I would expect the buffers to be bounded.
Will too many threads cause tables to lock and increase queue time for some of the queries?
Possibly. However, if you are mostly doing small writes then I'd not have thought that locking would be a concern for other writes. But if you are doing simultaneous queries that entail a table scan, locking could be a concern.
However, I'd not have thought that increasing the pool size (above 100) is likely to increase throughput. Check the CPU and/or disk I/O load on the database instance, or network traffic between your front end and the DB instance. If the database is where the bottleneck is, then allowing the front end to make more simultaneous requests is likely to make performance worse.
You need to consider what happens if the load (e.g. request rate) on your system goes above the overall throughput that it can sustain. If the pool size is too large, then the front-end load spike could turn into a database load-spike that leads to a drop in throughput. The problem is that you won't know when the load spike is going to happen, and unless you have load tested your system beforehand with the tweaked pool size, you won't know what the (actual) affect of the pool size change will be ...

How to enhance the performance of a Java application that makes many calls to the database?

As per my understanding the execution of java programs is pretty fast, things that slow down an application are mainly Network and IO operations.
For example, if I have a for loop running 10000 times which opens a file, processes some data and saves the data back into the file. If at all the application is slow, it not because of the loop executing 10000 times but because of the file opening and closing within the loop.
I have an MVC application where before I view a page I go through a Controller which in turn calls Services, which finally calls some DAO methods.
The problem is that there are so many queries being fired before the page loads and hence the page load time is 2 mins, which is pathetic.
Since the service calls various DAO methods and each DAO method uses a different connection object, I thought of doing this: "Create only one DAO method that the Service would call and this DAO method would fire all queries on one Connection object."
So this would save the time of connecting and disconnecting to the database.
But, the connection object in my application is coming from a connection pool. And most connection pools don't close connections they just send them back to the connection pools. So my above solution would not have any effect as anyways there is no opening and closing of connections.
How can I enhance the performance of my application?

Firstly you should accurately determine where the time is spent using tools like Profiler.
Once the root cause is known you can see if the operations can be optimized, i.e remove unnecessary steps. If not then you can see if the result of the operations can be cached and reused.
Without accurate understanding of processing that is taking time, it will be difficult to make any reasonable optimization.

If you reuse connection objects from the pool, this means that the connection/disconnection does not create any performance problem.
I agree with Ashwinee K Jha that a Profiler would give you some clear information of what you could optimize.
Meanwhile some other ideas/suggestions:
Could you maintain some cache of answers? I guess that not all of the 10,000 queries are distinct!
Try tuning the number of Connection objects in the Pool. There should be an optimal number.
Is your query execution already multi-threaded? I guess it is, so try tuning the number of threads. Generally, the number of cores is a good number of threads BUT, in the case of I/Os a much larger number is optimal (the big cost is the I/Os, not the CPU)

Some suggestions :
Scale your database. Perhaps the database itself is just slow.
Use 'second level caching' or application session caching to
potentially speed things up and reduce the need to query the
database.
Change your queries, application or schemas to reduce the number of
calls made.

You can use The Apache DBCP which use connection pool, calling database IO is costly, but mostly db connection openning and closing take good chunk of time.
You can also increase the maxIdle time (The maximum number of connections that can remain idle in the pool)
Also you can look into in memory data grid eg hazelcast etc

Connection pooling: is it appropriate

I'm building a Java based web app (primarily JSPs deployed with Tomcat). The number of users will never be more than ~30 people. It is a work log, so the users will constantly be updating/accessing the database (SQL Server). There are many methods in the web app that require a connection to the database.
I open a new connection each time one is required (I also close it appropriately), but this seems like a lot of opening/closing of connections. Does connection pooling apply to this situation? I'm trying to understand the role of the pool, but I'm confused; would there be a connection pool for each user?
If I'm way off track (which I have a suspicion I am), then is there a better solution to this problem? Is there even a problem?
Thanks!

Yes, I'd size the pool for 30 connections and let it manage them. You'll amortize the cost of opening connections over all requests that way.
There's one pool that many users would access to get connections; one connection per request.

The connection pool is for the application (not per user). The concept of connection pool is to reuse open connections as much as possible and open a new one when it is absolutely needed. Opening a connection to a database is an expensive operation in terms of both cpu cycles and memory. Thats why a connection pool is needed. For 30 users, i would recommend using a connection pool.
You can size the pool anywhere between 15 to 30 connections in the pool.
take a look at http://commons.apache.org/dbcp/

You certainly could pool connections to the database. Generally you'd use one pool per DB (though there could be reasons that you'd have more).
You're right to ask whether there's even a problem. Connection pooling is going to reduce the number of new connections that have to be negotiated, so it will reduce the time it takes to service a request and also reduce load on the servers. Also it will reduce the number of sockets used, which (for larger applications) can be a factor in system performance.
However: do you have a performance problem that you're trying to solve? Are response times acceptable? Is load acceptable? Balance what you'd gain in perf versus development cost. Pre-built connection pools exist, so it's likely easy to integrate one. But it's not free and optimization should generally be done with specific goals, not "because I should".

The point here is less the number of users, but the number for requests that require opening a connection.
If you have
for (int i = 0; i < 1000 ; i ++ ) {
Connection c = getConnection();
dosomwthingWith(c);
c.close();
}
You will still benefit from the connection pool, as the c.close() is not really closing the connection, but just putting it back into the pool.

limit of connections with database and number of java threads in an application

I am working to develop a JMS application(stand alone multithreaded java application) which can receive 100 messages at a time , they need to be processed and database procedures need to be called for inserting/updating data. Procedures are very heavy as validations are also performed in them. Each procedure is taking about 30 to 50 seconds of time to execute and they are capable to run concurrently.
My concern is to execute 100 procedures for all 100 messages and also send reply within time limit of 90 seconds by jms application.
No application server to be used(requirement) and database is Teradata (RDBMS)
I am using connection pool and thread pool in java code and testing code with 90 connections.
Question is :
(1) What should be the limit on number of connections with database at a time?
(2) How many threads at a time are recommended?
Thanks,
Jyoti

90 seems like a lot. My recommendation is to benchmark this. Your criteria is uniques and you need to make sure you get the maximum throughput.
I would make the code configurable with how many concurrent connections you use and run it with 10 ... 100 connections going up 10 at a time. This should not take long. When you start slowing down then you know you have exceeded the benefits of running concurrently.
Do it several times to make sure your results are predictable.

Another concern is your statement of 'procedure is taking about 30 to 50 seconds to run'. How much of this time is processing via Java and how much time is waiting for the database to process an SQL statement? Should both times really be added to determine the max number of connections you need?
Generally speaking, you should get a connection, use it, and close it as quickly as possible after processing your java logic if possible. If possible, you should avoid getting a connection, do a bunch of java side processing, call the database, do more java processing, then close the conection. There is probably no need to hold the connection open that long. A consideration to keep in mind when doing this approach is what processing (including database access) you need to keep in single transaction.
If for example, of the 50 seconds to run, only 1 second of database access is necessary, then you probably don't need such a high max number of connections.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.