Overhead on using DB connection pool

Overhead on using DB connection pool - java

I have a bunch of java programs that are run every few minutes. These programs are started by a script every few minutes and terminates in less than a minute. Most of them are single threaded and to access MySQL DB I use:
DriverManager.getConnection()
They just need to connect once, and execute a query.
Now I'm adding a new program to this group which is multi threaded and all the threads need to access DB concurrently. I'm thinking of using a DB connection pool (c3p0) for this.
My question is, as all these programs share a common DAO for accessing DB, is there an overhead of using a DB connection pool for the single threaded programs even though they just need one connection?
I'm planning to set initialPool size to 1, min pool size to 1 and max pool size to 10.

The main goal of connection pools is to have some ready-to-use connections, rather then open and close each time you want to get a connection. This approach saves quite enough time in terms if DB is used quite often.
Apache DBCP is single-threaded, but anyway it significantly increases performance, if your application uses DB connection very often.
c3p0 is a good choice, but for choosing proper connection pool please check this discussion: Connection pooling options with JDBC: DBCP vs C3P0

Related

Should I close my database connections?

I was wondering: Should I close my database connection or leave it open in the following scenario:
My application executes and after every 1-2 seconds it updates a table from the database. This happens until the application is terminated.
Basically what would be more optimal and put less stress on the server, every time this is executed about 500 rows need to be updated with at least 11 fields each (at least 5500 fields combined).
I'm currently using the JDBC driver if it matters at all.
EDIT: Also, would it be more efficient to update certain rows or erase the entire table contents and re-write the updated data (Some rows may be completely different in the updated data).

You should use a connection pool for this. Check this answer about connection pooling outside an application server.

You have to consider dropped connections here as well as stress on the server. You would be better using a connection pool to manage your connections then you don't have this worry.

Try out HikariCP for connection pooling. Disclaimer: I am one of the authors.

when to use a db connection pool

I am working on a Java GUI and I often need to connect to the database,but I would like to only use the connection statements once,instead of writting the whole thing everytime i use it.
I guess I would probably connect around 20 times in my whole system. So I wanted to know in which situation it is best to use Connection Pool?

Typically a connection pool is used when there are multiple threads requiring access to the database at the same time (a web application for example), each would retrieve a connection from the pool and return it when it's finished executing.
Typically GUI applications wouldn't require the amount concurrent DB access that warrants a connection pool, and a single (static) connection that is initialised when the application starts would normally suffice.
I hope this points you in the right direction; without knowing more about the nature of the application that you're creating it's difficult to make a more informed decision!

java connection pool, how many max connections in a multithreaded batch?

I have a Java batch which does a select with a large resulset (I process the elements using a Spring callbackhandler).
The callbackhandler puts a task in a fixed threadpool to process the row.
My poolsize is fixed on 16 threads.
The resulset contains about 100k elements.
All db access code is handled through a JdbcTemplate or through Hibernate/Spring, no manual connection management is present.
I have tried with Atomikos and with Commons DBCP as connection pool.
Now, I would think that 17 max connections in my connectionpool would be enough to get this batch to finish. One for the select and 16 for the threads in the connectionpool which update some rows. However that seems to be too naive, as I have to specify a max pool size a magnitude larger (haven't tried for an exact value), first I tried 50 which worked on my local Windows machine, but doesn't seem to be enough on our Unix test environment. There I have to specify 128 to make it work (again, I didn't even try a value between 50 and 128, I went straight to 128).
Is this normal? Is there some fundamental mechanism in connection pooling I'm missing? I find it hard to debug this as I don't know how to see what happens with the opened connections. I tried various log4j settings but didn't get any satisfactory result.
edit, additional info: when the connectionpool size seems to be too low, the batch seems to hang. If I do a jstat on the process I can see all threads are waiting for a new connection. At first I didn't specify the maxWait property on the dbcp connection pool, which causes threads to wait indefinitely on a new connection, and I noticed the batch kept hanging. So no connections were released. However, that only happened after processing +-70k rows, which dismissed my initial hunch of connection leakage somehow.
edit2: I forgot to mention I already rewrote the update part in my tasks. I qeueu my updates in a ConcurrentLinkedQueue, I empty that on 1000 elements. So i actually only do about 100 updates.
edit3: I'm using Oracle and I am using the concurrent utils. So i have an executor configured with fixed poolsize of 16. I submit my tasks on this executor. I don't use connections manually in my tasks, I use jdbctemplate which is threadsafe and asks it connections from the connectionpool. I suppose Spring/DBCP handles the connection/thread issue.

If you are using linux, you can try MySql administrator to monitor you connection status graphically, provided you are using MySQL.
Irrespective of that, even 100 connections is not uncommon for large enterprise applications, handling a few thousand requests per minute.
But if the requests are low or each request doesnt need unique a transaction, then I would recommend you to tune your operation inside threads.
That is, how are you distributing the 100k elements to 16 threads?
If you try to acquire the connection every time you read a row from the shared location(or buffer), then it is expected to take time.
See whether this helps.
getConnection
for each element until the buffer size becomes zero
process it.
if you need to update,
open a transaction
update
commit/rollback the transaction
go to step 2
release the connection
you can synchronize the buffer by using java.util.concurrent collections
Dont use one Runnable/Callable for each element. This will degrade the performance.
Also how are you creating threads? use Executors to run your runnable/callable. Also remember that DB connections are NOT expected to be shared across threads. So use 1 connection in 1 thread at a time.
For eg. create an Executor and submit 16 runnalbles each having its own connection.

I switched to c3p0 instead of DBCP. In c3p0 you can specify a number of helper threads. I notice if I put that number as high as the number of threads I'm using, the number of connections stays really low (using the handy jmx bean of c3p0 to inspect the active connections). Also, I have several dependencies with each its own entity manager. Apparently a new connection is needed for each entity manager, so I have about 4 entitymanagers/thread, which would explain the high number of connections. I think my tasks are all so short-lived that DBCP couldn't follow with closing/releasing connections, since c3p0 works more asynchronous and you can specify the number of helperthreads, it is able to release my connections in time.
edit: but the batch keeps hanging when deployed to the test environment, all threads are blocking when releasing the connection, the lock is on the pool. Just the same as with DBPC :(
edit: all my problems dissapeared when I switched to BoneCP, and I got a huge performance increase as bonus too

Connection Pooling - How much of an overhead is it?

I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.

Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.

The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.

A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.

To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.

I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.

Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.

Reusing a connection while polling a database in JDBC?

Have a use case wherein need to maintain a connection open to a database open to execute queries periodically.
Is it advisable to close connection after executing the query and then reopen it after the period interval (10 minutes). I would guess no since opening a connection to database is expensive.
Is connection pooling the alternative and keep using the connections?

You should use connection pooling. Write your application code to request a connection from the pool, use the connection, then return the connection back to the pool. This keeps your code clean. Then you rely on the pool implementation to determine the most efficient way to manage the connections (for example, keeping them open vs closing them).
Generally it is "expensive" to open a connection, typically due to the overhead of setting up a TCP/IP connection, authentication, etc. However, it can also be expensive to keep a connection open "too long", because the database (probably) has reserved resources (like memory) for use by the connection. So keeping a connection open can tie-up those resources.
You don't want to pollute your application code managing these types of efficiency trade-offs, so use a connection pool.

Yes, connection pooling is the alternative. Open the connection each time (as far as your code is concerned) and close it as quickly as you can. The connection pool will handle the physical connection in an appropriately efficient manner (including any keepalives required, occasional "liveness" tests etc).
I don't know what the current state of the art is, but I used c3p0 very successfully for my last Java project involving JDBC (quite a while ago).

The answer here really depends on the application. If there are other connections being used simultaneously for the same database from the same application, then a pool is definitely your answer.
If all your application does is query the db, wait 10 minutes, then query again, then simply connect and reconnect. A connection is considered to be an expensive operation, but all things are relative. It is not expensive if you do it only once every 10 minutes. If the application is this simple, don't introduce unnecessary complexity.
NOTE:
OK, complexity is also relative, so if are already using something like Spring and already know how to use its pooling mechanism, then apply it for this case. If this is not true, keep it simple.

Connection pooling would be an option for you. You can then leave your code as it is including opening and closing connections. The connection pool will care about the connections. If you close a connection of a pool it will not be closed but just be made available in the pool again. If you open a connection after you closed one if there is a open connection in the pool the pool will return this. So in an application server you can use the build-in connection pools. For simple java applications most of the JDBC drivers also include a pool driver.

There are many, many tradeoffs in opening and closing connections, keeping them open, making sure that connections that have been "kept alive" are still "valid" when you start to use them again, invalidating connections that get corrupted, etc. These kinds of complex tradeoffs make it difficult (but certainly not impossible) to implement the "best" connection management strategy for your specific case. The "safest" method is to open a connection, use it, and then close it. But, as you already realize, that is not at all the most efficient method. If you manage your own connections, then as you do things to make your strategy more efficient, the complexity will rise very quickly (especially in the presence of any less-than-perfect JDBC drivers, of which there are many.)
There are many connection pooling libraries available out there that can take care of all of this for you in extremely configurable ways (they almost always come pre-configured out-of-the-box for the most typical cases, and until you get up to the point that you're doing high-load activities, you probably don't have to worry about all that configurability - but you will be glad to have it if you scale up!) As is always the case, the libraries themselves may be of variable quality.
I have successfully used both C3P0 and Apache DBCP. If I were choosing again today, I would probably go with DBCP.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.