java connection pool, how many max connections in a multithreaded batch?

java connection pool, how many max connections in a multithreaded batch? - java

I have a Java batch which does a select with a large resulset (I process the elements using a Spring callbackhandler).
The callbackhandler puts a task in a fixed threadpool to process the row.
My poolsize is fixed on 16 threads.
The resulset contains about 100k elements.
All db access code is handled through a JdbcTemplate or through Hibernate/Spring, no manual connection management is present.
I have tried with Atomikos and with Commons DBCP as connection pool.
Now, I would think that 17 max connections in my connectionpool would be enough to get this batch to finish. One for the select and 16 for the threads in the connectionpool which update some rows. However that seems to be too naive, as I have to specify a max pool size a magnitude larger (haven't tried for an exact value), first I tried 50 which worked on my local Windows machine, but doesn't seem to be enough on our Unix test environment. There I have to specify 128 to make it work (again, I didn't even try a value between 50 and 128, I went straight to 128).
Is this normal? Is there some fundamental mechanism in connection pooling I'm missing? I find it hard to debug this as I don't know how to see what happens with the opened connections. I tried various log4j settings but didn't get any satisfactory result.
edit, additional info: when the connectionpool size seems to be too low, the batch seems to hang. If I do a jstat on the process I can see all threads are waiting for a new connection. At first I didn't specify the maxWait property on the dbcp connection pool, which causes threads to wait indefinitely on a new connection, and I noticed the batch kept hanging. So no connections were released. However, that only happened after processing +-70k rows, which dismissed my initial hunch of connection leakage somehow.
edit2: I forgot to mention I already rewrote the update part in my tasks. I qeueu my updates in a ConcurrentLinkedQueue, I empty that on 1000 elements. So i actually only do about 100 updates.
edit3: I'm using Oracle and I am using the concurrent utils. So i have an executor configured with fixed poolsize of 16. I submit my tasks on this executor. I don't use connections manually in my tasks, I use jdbctemplate which is threadsafe and asks it connections from the connectionpool. I suppose Spring/DBCP handles the connection/thread issue.

If you are using linux, you can try MySql administrator to monitor you connection status graphically, provided you are using MySQL.
Irrespective of that, even 100 connections is not uncommon for large enterprise applications, handling a few thousand requests per minute.
But if the requests are low or each request doesnt need unique a transaction, then I would recommend you to tune your operation inside threads.
That is, how are you distributing the 100k elements to 16 threads?
If you try to acquire the connection every time you read a row from the shared location(or buffer), then it is expected to take time.
See whether this helps.
getConnection
for each element until the buffer size becomes zero
process it.
if you need to update,
open a transaction
update
commit/rollback the transaction
go to step 2
release the connection
you can synchronize the buffer by using java.util.concurrent collections
Dont use one Runnable/Callable for each element. This will degrade the performance.
Also how are you creating threads? use Executors to run your runnable/callable. Also remember that DB connections are NOT expected to be shared across threads. So use 1 connection in 1 thread at a time.
For eg. create an Executor and submit 16 runnalbles each having its own connection.

I switched to c3p0 instead of DBCP. In c3p0 you can specify a number of helper threads. I notice if I put that number as high as the number of threads I'm using, the number of connections stays really low (using the handy jmx bean of c3p0 to inspect the active connections). Also, I have several dependencies with each its own entity manager. Apparently a new connection is needed for each entity manager, so I have about 4 entitymanagers/thread, which would explain the high number of connections. I think my tasks are all so short-lived that DBCP couldn't follow with closing/releasing connections, since c3p0 works more asynchronous and you can specify the number of helperthreads, it is able to release my connections in time.
edit: but the batch keeps hanging when deployed to the test environment, all threads are blocking when releasing the connection, the lock is on the pool. Just the same as with DBPC :(
edit: all my problems dissapeared when I switched to BoneCP, and I got a huge performance increase as bonus too

Related

Weblogic 12c data source high count connection

So, we have already deploying an application, which consist a heavy business logic that my company uses. After some time, the performance was quite slower than before, actually in the weblogic data source configuration, we set the maximum connection to only 100, but recently it keeps on increasing until its limit.
We reconfigure the data source to 200, but it keeps on increasing, this is not ideal, because 100 is the max connection that we want it to be deployed.
Meanwhile, there were some thread stuck in the server too. But i think it's not the problem. Do someone knows why is this occuring so suddenly? (after implementation of a newer yet stable version, they said)

From the screenshot attached I can see that Active Connection Count is ~80. Also I can see connection are being leaked.
Enable Inactive Connection Timeout by defining some value (Based on avg time taken to execute statement).
Make sure that all JDBC connections are closed in your code after using it.

Guideline to Configure max threads in Play framework

We use Playframework 1.x.
We haven't touched thread pool size and we use the default value which (nb processors + 1). Our production server has 4 core processor and I assume 5 threads at a time.
For use we need atleast 100 threads to be served at a time. Can we increase the thread pool size to 100, Will it make any issues?

In my project, we use about 30 thread pool to serve about 100 concurrent. Play 1.x works very fast so the threads can be released before next request to process.
But you should make load test your code... I think it's not good if you increase thread pool to 100.
By the way, you should use async job to implement your application as Play recommended: http://www.playframework.com/documentation/1.2.7/asynchronous

Play is build around the idea of handling short requests as fast as possible and therefor being able to keep the thread pool as small as possible. The main reasons for wanting a small pool are to keep resources consumption low instead of wasting.
Play and Java can happily run with a higher thread pool, like 100 or 1000 (although your server might not always support it, some Linux distributions for example have a fixed limit of threads per application per user), but it is recommended to analyze your problem and see if you really need that big pool.
In most situations, needing a big pool means that you have to many blocking threads and should look into Play's async features or that you have an action that tries to do to many things at once, that would perform better when chopped into smaller pieces.
If a request results in a long blocking thread on the server, this usually means it also results in a long, blocked interface on the users end.

Overhead on using DB connection pool

I have a bunch of java programs that are run every few minutes. These programs are started by a script every few minutes and terminates in less than a minute. Most of them are single threaded and to access MySQL DB I use:
DriverManager.getConnection()
They just need to connect once, and execute a query.
Now I'm adding a new program to this group which is multi threaded and all the threads need to access DB concurrently. I'm thinking of using a DB connection pool (c3p0) for this.
My question is, as all these programs share a common DAO for accessing DB, is there an overhead of using a DB connection pool for the single threaded programs even though they just need one connection?
I'm planning to set initialPool size to 1, min pool size to 1 and max pool size to 10.

The main goal of connection pools is to have some ready-to-use connections, rather then open and close each time you want to get a connection. This approach saves quite enough time in terms if DB is used quite often.
Apache DBCP is single-threaded, but anyway it significantly increases performance, if your application uses DB connection very often.
c3p0 is a good choice, but for choosing proper connection pool please check this discussion: Connection pooling options with JDBC: DBCP vs C3P0

Monitor number of connections a DB has

We are using hibernate and C3PO for connections. Sometimes, load on the DB increases and we start facing issues.
How can we monitor the number of connections we make without exceeding the pool limit?
What other monitoring can be put to avoid load on the DB? A few examples are as follows:
a. Thread count.
b. CPU usage.
c. Space left.
c. I don't know if huge number of transactions could cause any issue. If they do, how to get their counts. etc.

Try JavaMelody - http://code.google.com/p/javamelody/
Among many other useful things it can report on number of connections used.

c3p0 datasources can be accessed via a JMX administration. The PooledDataSource has a large number of interesting operations that are exposed through JMX.
More info on Configuring and Managing c3p0 via JMX.
Please not that monitoring does what it does , it monitors the execution of the datasources. Monitoring does not avoid load on the DB. It can be used to analyse the runtime performance and tweak connection pools where appropriate.

Connection Pooling - How much of an overhead is it?

I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.

Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.

The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.

A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.

To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.

I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.

Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.