We are using hibernate and C3PO for connections. Sometimes, load on the DB increases and we start facing issues.
How can we monitor the number of connections we make without exceeding the pool limit?
What other monitoring can be put to avoid load on the DB? A few examples are as follows:
a. Thread count.
b. CPU usage.
c. Space left.
c. I don't know if huge number of transactions could cause any issue. If they do, how to get their counts. etc.
Try JavaMelody - http://code.google.com/p/javamelody/
Among many other useful things it can report on number of connections used.
c3p0 datasources can be accessed via a JMX administration. The PooledDataSource has a large number of interesting operations that are exposed through JMX.
More info on Configuring and Managing c3p0 via JMX.
Please not that monitoring does what it does , it monitors the execution of the datasources. Monitoring does not avoid load on the DB. It can be used to analyse the runtime performance and tweak connection pools where appropriate.
Related
Application is hosted on multiple Virtual Machines and DB is on single server. All VMs are pointing to single Instance of DB.
In this architecture, I have a table having very few record. But this table is accessed and updated by threads running on VMs very heavily. This is causing a performance bottleneck and sometimes record level exception. Database level locking does not seem to be best option as it is introducing significant delays in request processing.
Please suggest if there is any other technique to solve this problem.
Few questions first!
Is your application using connection pooling? If not, please use it. Creating a JDBC connection is expensive!
Is your application read heavy/write heavy?
What kind of storage engine are you using in your MySQL tables? InnoDB or MyISAM. If your application is write heavy, please use InnoDB based tables as it uses row level locking and will serve concurrent requests better.
One special case - if you are implementing queues on top of database tables, find a database that has a built-in queue operation and use that, or use a reliable messaging service. Building queues on top of databases is typically not efficient. See e.g. http://mikehadlow.blogspot.co.uk/2012/04/database-as-queue-anti-pattern.html
In general, running transactions on databases is slow because at the end of each transaction the database needs to be sure that enough has been saved out to disk that if the system died right now the changes made by the transaction would be safely preserved. If you don't need this you might find it faster to write a single non-database application that does what the database does but doesn't write anything out to disk, or still does database IO but does the minimum possible. Then instead of all of the VMs talking to the database directly they would all talk to this application.
There are 2 different applications running on the same jboss server. I want to connect these 2 applications with the same mysql database through the same data source.
what kind of impact can be occur in running these 2 application -
I am supposing that these issues could be happen .
- Table Locking issues, Slow performance, connectivity issue, ACID properties lost issue.
Are there any drawbacks with this approach?
Nothing of this can happen (Table Locking issues, Slow performance, connectivity issue, ACID properties lost issue).
The database cannot really distinguish if two connections come from the same application.
Of course, two applications still means twice as many requests, so performance may be affected. ACIDity is not affected and you are unlikely to run out of TCP ports either.
The performance of two apps accessing the same database is the same as the performance of one app twice as popular.
There are no drawbacks, as long as there are enough connections for both apps and your transactions are well written its normal to share the same datasource
Perhaps you should consider if there are any advantages?
In my opinion there is have to be compelling reasons to not create separate datasources for separate applications - I sense that this is not the case for you.
In any case, some of the drawbacks you mention may happen when you share a connection pool between applications or when you don't since they are properties of the database not the connections.
Edit: Summarizing the good discussion with Jan Dvorak below, here are some arguments for using one (or more) datasources per application:
It enables one to discard eg. stale or appserver-culled connections (-threads) from one application without affecting others
Having multiple DS'es per application enables one to reset admin connections without affecting production connections.
Cheers,
The main issue would be with pooling. As more applications use the datasource, more connections are open so the default pool may not be enough.
Apart from that, all the other issues are related to the DB engine and your DB design than to the datasource. A DB engine won't stop being ACID compliant no matter how it is accessed, won't be slower than if the accesses were from different datasources, and so on...
I have a Java batch which does a select with a large resulset (I process the elements using a Spring callbackhandler).
The callbackhandler puts a task in a fixed threadpool to process the row.
My poolsize is fixed on 16 threads.
The resulset contains about 100k elements.
All db access code is handled through a JdbcTemplate or through Hibernate/Spring, no manual connection management is present.
I have tried with Atomikos and with Commons DBCP as connection pool.
Now, I would think that 17 max connections in my connectionpool would be enough to get this batch to finish. One for the select and 16 for the threads in the connectionpool which update some rows. However that seems to be too naive, as I have to specify a max pool size a magnitude larger (haven't tried for an exact value), first I tried 50 which worked on my local Windows machine, but doesn't seem to be enough on our Unix test environment. There I have to specify 128 to make it work (again, I didn't even try a value between 50 and 128, I went straight to 128).
Is this normal? Is there some fundamental mechanism in connection pooling I'm missing? I find it hard to debug this as I don't know how to see what happens with the opened connections. I tried various log4j settings but didn't get any satisfactory result.
edit, additional info: when the connectionpool size seems to be too low, the batch seems to hang. If I do a jstat on the process I can see all threads are waiting for a new connection. At first I didn't specify the maxWait property on the dbcp connection pool, which causes threads to wait indefinitely on a new connection, and I noticed the batch kept hanging. So no connections were released. However, that only happened after processing +-70k rows, which dismissed my initial hunch of connection leakage somehow.
edit2: I forgot to mention I already rewrote the update part in my tasks. I qeueu my updates in a ConcurrentLinkedQueue, I empty that on 1000 elements. So i actually only do about 100 updates.
edit3: I'm using Oracle and I am using the concurrent utils. So i have an executor configured with fixed poolsize of 16. I submit my tasks on this executor. I don't use connections manually in my tasks, I use jdbctemplate which is threadsafe and asks it connections from the connectionpool. I suppose Spring/DBCP handles the connection/thread issue.
If you are using linux, you can try MySql administrator to monitor you connection status graphically, provided you are using MySQL.
Irrespective of that, even 100 connections is not uncommon for large enterprise applications, handling a few thousand requests per minute.
But if the requests are low or each request doesnt need unique a transaction, then I would recommend you to tune your operation inside threads.
That is, how are you distributing the 100k elements to 16 threads?
If you try to acquire the connection every time you read a row from the shared location(or buffer), then it is expected to take time.
See whether this helps.
getConnection
for each element until the buffer size becomes zero
process it.
if you need to update,
open a transaction
update
commit/rollback the transaction
go to step 2
release the connection
you can synchronize the buffer by using java.util.concurrent collections
Dont use one Runnable/Callable for each element. This will degrade the performance.
Also how are you creating threads? use Executors to run your runnable/callable. Also remember that DB connections are NOT expected to be shared across threads. So use 1 connection in 1 thread at a time.
For eg. create an Executor and submit 16 runnalbles each having its own connection.
I switched to c3p0 instead of DBCP. In c3p0 you can specify a number of helper threads. I notice if I put that number as high as the number of threads I'm using, the number of connections stays really low (using the handy jmx bean of c3p0 to inspect the active connections). Also, I have several dependencies with each its own entity manager. Apparently a new connection is needed for each entity manager, so I have about 4 entitymanagers/thread, which would explain the high number of connections. I think my tasks are all so short-lived that DBCP couldn't follow with closing/releasing connections, since c3p0 works more asynchronous and you can specify the number of helperthreads, it is able to release my connections in time.
edit: but the batch keeps hanging when deployed to the test environment, all threads are blocking when releasing the connection, the lock is on the pool. Just the same as with DBPC :(
edit: all my problems dissapeared when I switched to BoneCP, and I got a huge performance increase as bonus too
I will deploy a web application that will run on JBoss 4.2.3 on a production environment. I would appreciate if you can give me some information or references about how can I estimate the minimum (<min-pool-size>) and maximum (<max-pool-size>) pool size for the datasource.
It really depends on the load: how many users do access your application in a concurrent manner. And since people really seldom do things really concurrently it will be tough to guess.
Your best strategy might be to set the value pretty high and use a management console to observe connections. As far as I can recall the management console would show peaks, so take this value and set your max to a bigger value.
I would set min-pool-size to an value little less then average concurrent connection count or just leave it on default if you app doesn't show performance problems. Having ready to use connections speeds up the app but if you don't see performance problems why bother.
And of course you should take your database into accounts: how many concurrent connections does it allow, do you pay anything by concurrent connections or not.
Observe performance: does your database server runs on the same machine as JBoss? More possible connections mean more concurrent work for the database server means more CPU usage - this may also affect performance of the app server.
So again the best bet is a management console and a load test.
I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.
Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.
The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.
A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.
To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.
I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.
Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.