Wrapping a JDBC Connection singleton in a ThreadLocal - java

I'm working on small CRUD application that use plain JDBC, with a Connection enum-based singleton, after reading the first part of Java Concurrency in Practice I just liked the ThreadLocalapproach to write thread-safe code, my question is :
When wrapping global JDBC connection in a ThreadLocal considered a good practice ?

When wrapping global JDBC connection in a ThreadLocal considered a good practice ?
Depends a lot on the particulars. If there are a large number of threads then each one of them is going to open up their own connection which may be prohibitive. Then you are going to have connections that stagnate as threads lie dormant.
It would be better to use a reentrant connection pool. Then you can reuse connections that are already open but not currently in use but limit the number of connections to the minimum you need to work concurrently. Apache's DBCP is a good example and is well thought of.
To quote from their docs:
Creating a new connection for each user can be time consuming (often requiring multiple seconds of clock time), in order to perform a database transaction that might take milliseconds. Opening a connection per user can be unfeasible in a publicly-hosted Internet application where the number of simultaneous users can be very large. Accordingly, developers often wish to share a "pool" of open connections between all of the application's current users. The number of users actually performing a request at any given time is usually a very small percentage of the total number of active users, and during request processing is the only time that a database connection is required.

Related

HikariCP behavior when no connection is available

I noticed that even when the database is down, so no connection is actually available in the pool, Hikari CP still waits for the connection timeout to expire before sending an exception to the client.
I agree this is desirable when the database is available, but in my case I would like for the pool not to wait before sending an exception when no connection is available.
The reason is that the database itself answers in less than 2ms , so I can handle thousands of transactions per second but when there is no connection available the pool will wait for a lot longer (the minimum acceptable timeout recommended being 250 ms) so I can no longer handle the throughput. On the other hand, my logic can work without the database for a period of time.
How should I manage this?
EDIT:
This link is almost what I want to achieve, minus the fact that I would prefer HikariCP to do this automatically, I shouldn't activate the suspend state.
Perhaps you should introduce a counter somewhere in your application code and if the number of concurrent requests exceeds the value don't use database. It's hard to tell without knowing what you are dealing with e.g. read vs write.
As per brettwooldridge comment regarding connectionTimeout property lower timeout is unreliable due to thread scheduling, even when there are available connections:
We can certainly consider a lower floor, but 125ms would be the absolute minimum.
Both Windows and Linux have a default scheduler quantum of 20ms. If 16 threads are running on a 4-core CPU, a single thread may have to wait up to 80ms just to be allowed to run. If the pool has a vacancy due to, for example, the retirement of a connection at maxLifetime, this leaves precious little time to establish a connection to fill the slot without returning a spurious failure to the client.
If careful consideration is not taken to ensure the CPU and scheduler are not oversaturated, running at a 125ms timeout, puts your application tier at risk of spurious failures even if the pool has available connections. For example, running 32 threads on a 4-core CPU can lead to thread starvations under load as long as 120ms -- very close to the edge.

How to enhance the performance of a Java application that makes many calls to the database?

As per my understanding the execution of java programs is pretty fast, things that slow down an application are mainly Network and IO operations.
For example, if I have a for loop running 10000 times which opens a file, processes some data and saves the data back into the file. If at all the application is slow, it not because of the loop executing 10000 times but because of the file opening and closing within the loop.
I have an MVC application where before I view a page I go through a Controller which in turn calls Services, which finally calls some DAO methods.
The problem is that there are so many queries being fired before the page loads and hence the page load time is 2 mins, which is pathetic.
Since the service calls various DAO methods and each DAO method uses a different connection object, I thought of doing this: "Create only one DAO method that the Service would call and this DAO method would fire all queries on one Connection object."
So this would save the time of connecting and disconnecting to the database.
But, the connection object in my application is coming from a connection pool. And most connection pools don't close connections they just send them back to the connection pools. So my above solution would not have any effect as anyways there is no opening and closing of connections.
How can I enhance the performance of my application?
Firstly you should accurately determine where the time is spent using tools like Profiler.
Once the root cause is known you can see if the operations can be optimized, i.e remove unnecessary steps. If not then you can see if the result of the operations can be cached and reused.
Without accurate understanding of processing that is taking time, it will be difficult to make any reasonable optimization.
If you reuse connection objects from the pool, this means that the connection/disconnection does not create any performance problem.
I agree with Ashwinee K Jha that a Profiler would give you some clear information of what you could optimize.
Meanwhile some other ideas/suggestions:
Could you maintain some cache of answers? I guess that not all of the 10,000 queries are distinct!
Try tuning the number of Connection objects in the Pool. There should be an optimal number.
Is your query execution already multi-threaded? I guess it is, so try tuning the number of threads. Generally, the number of cores is a good number of threads BUT, in the case of I/Os a much larger number is optimal (the big cost is the I/Os, not the CPU)
Some suggestions :
Scale your database. Perhaps the database itself is just slow.
Use 'second level caching' or application session caching to
potentially speed things up and reduce the need to query the
database.
Change your queries, application or schemas to reduce the number of
calls made.
You can use The Apache DBCP which use connection pool, calling database IO is costly, but mostly db connection openning and closing take good chunk of time.
You can also increase the maxIdle time (The maximum number of connections that can remain idle in the pool)
Also you can look into in memory data grid eg hazelcast etc

Should I use connection-pooling in multi-threaded program?

I am using multiple threads to insert insert records in different tables. In addition, I am using batch processing for the insertion of records to improve the efficiency.
Note: Number of records to be inserted are in millions.
My question is should I use connection pooling in this multi-threaded environment?
My Concern:
Each threads gonna run for quite sometime to perform the database operation. So, if the size of my connection pool is 2 and number of threads are 4,then at a given moment only 2 threads are gonna run. Consequently, other 2 threads gonna sit ideal for a long time to get the connection, as the db operations for million records are time consuming. Moreover, such connection-pooling will hinder the purpose of using multiple threads.
Using a connection pool in a batch is a matter of convenciency. It will help you limit the amount of open connections, abandoned time, close connections if you forget to close them verify if the connection is open etc.
Check out the Plain Ol' Java example here

Connection pooling: is it appropriate

I'm building a Java based web app (primarily JSPs deployed with Tomcat). The number of users will never be more than ~30 people. It is a work log, so the users will constantly be updating/accessing the database (SQL Server). There are many methods in the web app that require a connection to the database.
I open a new connection each time one is required (I also close it appropriately), but this seems like a lot of opening/closing of connections. Does connection pooling apply to this situation? I'm trying to understand the role of the pool, but I'm confused; would there be a connection pool for each user?
If I'm way off track (which I have a suspicion I am), then is there a better solution to this problem? Is there even a problem?
Thanks!
Yes, I'd size the pool for 30 connections and let it manage them. You'll amortize the cost of opening connections over all requests that way.
There's one pool that many users would access to get connections; one connection per request.
The connection pool is for the application (not per user). The concept of connection pool is to reuse open connections as much as possible and open a new one when it is absolutely needed. Opening a connection to a database is an expensive operation in terms of both cpu cycles and memory. Thats why a connection pool is needed. For 30 users, i would recommend using a connection pool.
You can size the pool anywhere between 15 to 30 connections in the pool.
take a look at http://commons.apache.org/dbcp/
You certainly could pool connections to the database. Generally you'd use one pool per DB (though there could be reasons that you'd have more).
You're right to ask whether there's even a problem. Connection pooling is going to reduce the number of new connections that have to be negotiated, so it will reduce the time it takes to service a request and also reduce load on the servers. Also it will reduce the number of sockets used, which (for larger applications) can be a factor in system performance.
However: do you have a performance problem that you're trying to solve? Are response times acceptable? Is load acceptable? Balance what you'd gain in perf versus development cost. Pre-built connection pools exist, so it's likely easy to integrate one. But it's not free and optimization should generally be done with specific goals, not "because I should".
The point here is less the number of users, but the number for requests that require opening a connection.
If you have
for (int i = 0; i < 1000 ; i ++ ) {
Connection c = getConnection();
dosomwthingWith(c);
c.close();
}
You will still benefit from the connection pool, as the c.close() is not really closing the connection, but just putting it back into the pool.

Optimal number of connections in connection pool

Currently we are using 4 cpu windows box with 8gb RAM with MySQL 5.x installed on same box. We are using Weblogic application server for our application. We are targeting for 200 concurrent users for our application (Obviously not for same module/screen). So what is optimal number of connections should we configured in connection pool (min and max number) (We are using weblogic AS' connection pooling mechanism) ?
Did you really mean 200 concurrent users or just 200 logged in users? In most cases, a browser user is not going to be able to do more than 1 page request per second. So, 200 users translates into 200 transactions per second. That is a pretty high number for most applications.
Regardless, as an example, let's go with 200 transactions per second. Say each front end (browser) tx takes 0.5 seconds to complete and of the 0.5 seconds, 0.25 are spent in the database. So, you would need 0.5 * 200, or 100 connections in the WebLogic thead pool and 0.25 * 200 = 50 connections in the DB connection pool.
To be safe, I would set the max thread pool sizes to at least 25% larger than you expect to allow for spikes in load. The minimums can be a small fraction of the max, but the tradeoff is that it could take longer for some users because a new connection would have to be created. In this case, 50 - 100 connections is not that many for a DB so that's probably a good starting number.
Note, that to figure out what your average transaction response times are, along with your average DB query time, you are going to have to do a performance test because your times at load are probably not going to be the times you see with a single user.
There is a very simple answer to this question:
The number of connections in the connection pool should be equal the number of the exec threads configured in WebLogic.
The rationale is very simple: If the number of the connections is less than the number of threads, some of the thread maybe waiting for a connection thus making the connection pool a bottleneck. So, it should be equal at least the number the exec threads (thread pool size).
Sizing a connection pool is not a trivial thing to do. You basically need:
metrics to investigate the connection usage
failover mechanisms for when there is no connection available
FlexyPool aims to aid you in figuring out the right connection pool size.
You should profile the different expected workflows to find out. Ideally, your connection pool will also dynamically adjust the number of live connections based on recent usage, as it's pretty common for load to be a function of the current time of day in your target geographical area.
Start with a small number and try to reach a reasonable number of concurrent users, then crank it up. I think it's likely that you'll find that your connection pooling mechanism is not nearly as instrumental in your scalability as the rest of the software.
The connection pool should be able to grow and shink based on actual needs. Log the numbers needed to do analysis on the running system, either through logging statements or through JMX surveillance. Consider setting up alerts for scenarios like "peak detected: more than X new entries had to be allocated in Y seconds", "connection was out of pool for more than X seconds" which will allow you to give attention to performance issues before they get real problems.
It's difficult to get hard data for this. It's also dependent on a number of factors you don't mention -
200 concurrent users, but how much of their activity will generate database queries? 10 queries per page load? 1 query just on login? etc. etc.
Size of the queries and the db obviously. Some queries run in milliseconds, some in minutes.
You can monitor mysql to watch the current active queries with "show processlist". This could give you a better sense of how much activity is actually going on in the db under peak load.
This is something that needs to be tested and determined on an individual basis - it's pretty much impossible to give an accurate answer for your circumstances without intimately being familiar with them.
Based on my experience on high transaction financial systems, if you want to handle for example 1K requests per seconds, and you have 32 CPU's, You need to have 1000/32 open connection polls to your database.
Here is my formula:
RPS / CPU_COUNT
If most cases, your database engine will be able to handle your requests even in much lower numbers, but your connections will be in waiting mode if the number is low.
I think it's pretty important to mention that your database should be able to handle those transactions (based on your disk speed, database configuration and server power).
Good luck.

Categories

Resources