How do I know Connection Pooling and Prepared Statements are working?

How do I know Connection Pooling and Prepared Statements are working? - java

I have been developing a web application for almost a year which I had started in college. After finishing, I made some changes to the database code that I had written during college. I used simple JDBC when I was in college, without the use of connection pooling and prepared statements. In the past month, I have realized the potential, and need to use connection pooling, and prepared statements because of the increase in data being queried and used.
I have implemented the connection pooling, but have not really noticed an increase in performance. If there was an increase in performance, then my question would be answered, but how can I validate that connection pooling and prepared statements are being used properly?

I think prepared statement is more important from a security than a performance standpoint. If you are using them cool, you really should be. As long as 100% of your SQL is a prepared statement, you are "using them right"...
Connection pooling is something you won't notice under light load. 1 or 2 or 10 users, you will likely see 0 or near 0 difference. You would need to create some kind of load testing test to simulate 100s of simultaneous users, then compare performance with and without pooling.

You won't see a performance improvement if opening and closing connections wasn't rate-limiting. Try looking at your database's profiling tools for a list of open connections. If you see open connections from your process when you have ostensibly closed them, the interface layer is doing connection pooling.
See also this other SO question on forcing a pooled connection to drop.
(Caveat:I use ADO and ADO.NET; JDBC might behave differently.)

You ask the connection pool for statistics to see if it works. If the connection pool library does not provide statistics, find a better library.

Related

Using DBCP validationQuery="SELECT 1" effects to overall performance and causes BasicDataSource.getConnection() timeout

I have been using DBCP connection pooling with following details (https://stackoverflow.com/questions/37613362/slowest-component-org-apache-tomcat-dbcp-dbcp-basicdatasource-getconnection-du) but did not find any proper solution yet.
Anyway, With this configuration, the query SELECT 1 is executed each time the Java code calls Connection con = dataSource.getConnection();.
This guarantees that the connection has been tested before it's handed to the application. However, for applications using connections very frequently for short periods of time, this has a severe impact on performance.
I am curious to know that it really effects or causes of slowness which i am having, if so then is there alternative way(Increasing the number of maxActive and maxIdle) around to fix this issue?

Java MySQL JDBC Slow/Taking turns

We're currently trying to make our server software use a connection pool to greatly reduce lag however instead of reducing the time queries take to run, it is doubling the time and making it even slower than it was before the connection pool.
Are there any reasons for this? Does JDBC only allow a single query at a time or is there another issue?
Also, does anyone have any examples of multi-threaded connection pools to reduce the time hundreds of queries take as the examples we have found only made it worse.
We've tried using BoneCP and Apache DBCP with similar results...
That one is using Apache's DBCP. We also have tried using BoneCP with the same result...

A connection pool helps mitigating the overhead/cost of creating new connections to the database, by reusing already existing ones. This is important if your workload requires many, short to medium living connections, e.g. an app that processes concurrent user requests by querying the database. Unfortunately your example benchmark code does not have such a profile. You are just using 4 connections in parallel and there is no reuse involved.
What a connection pool cannot achieve is magically speeding up execution times or improving the concurrency level beyond that, which is provided by the database. If the benchmark code represents the expected workload, I would advise you to look into batching statements instead of threading. That will massively increase performance of INSERT/UPDATE operations.
update :
Using multiple connections in parallel can enhance performance. Just keep in mind, that there is not necessarily a relation between multiple threads in your Java application and in the database. JDBC is just a wrapper around the database driver, using multiple connections results in multiple queries being submitted to the database server in parallel. If those queries are suited for it, every modern RDBMS will be able to process them in parallel. But if those queries are very work intensive, or even worse include table locks or conflicting updates, the DB may not be able to do so. If you experience bad performance, check which queries are lagging and optimize them (are they efficient? proper indexes in place? denormalizing the schema may help in more extreme cases. Use prepared statements and batch mode for larger updates, etc.). If your db is overloaded with many, similar and small queries, consider caching frequently used data.

Connection Pooling - How much of an overhead is it?

I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.

Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.

The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.

A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.

To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.

I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.

Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.

Do I have to explicitly disconnect from a database when using Java?

It is necessary to disconnect from the database after the job is done in Java? If it is not disconnected, will it lead to memory leaks?

You must always close all your Connections, Statements and ResultSets.
If not, is more probable you can't obtain new connections from the pool than a memory leak.

You should provide more details like which framework you are using or something.
Anyway, are you using JDBC? If so you should close the following objects by using their respective close() methods: Statement, ResultSet and Connection.

Assuming you are using JDBC, the answer is yes. If you don't close the connection, then the JDBC driver might try to close it in a finallizer, but that could hold the connection open for a very long time, causing resource issues (the amount of database connections allowed to be open at one time is finite). Typically JDBC programming is done with a database pool, and not closing the connection will mean that the pool will run out of available connections very quickly.
Some application servers (e.g. JBoss) will detect when a connection wasn't closed and close it for you if it is managing the transactions, but you should not rely on that.
Of course some JDBC drivers are not pure java drivers, at which point memory leaks become a very real possibility.

I don't have a source, but I believe (if I remember right, it's been a while since I've touched JDBC) that it depends on the JDBC driver implementation. You should always close your connections and clean up after yourself as not all JDBC drivers do it for you (although some might).
This goes back to a rule that I like to follow - If I create or open something, I'm responsible for deleting or closing it.

yes and yes

Reusing a connection while polling a database in JDBC?

Have a use case wherein need to maintain a connection open to a database open to execute queries periodically.
Is it advisable to close connection after executing the query and then reopen it after the period interval (10 minutes). I would guess no since opening a connection to database is expensive.
Is connection pooling the alternative and keep using the connections?

You should use connection pooling. Write your application code to request a connection from the pool, use the connection, then return the connection back to the pool. This keeps your code clean. Then you rely on the pool implementation to determine the most efficient way to manage the connections (for example, keeping them open vs closing them).
Generally it is "expensive" to open a connection, typically due to the overhead of setting up a TCP/IP connection, authentication, etc. However, it can also be expensive to keep a connection open "too long", because the database (probably) has reserved resources (like memory) for use by the connection. So keeping a connection open can tie-up those resources.
You don't want to pollute your application code managing these types of efficiency trade-offs, so use a connection pool.

Yes, connection pooling is the alternative. Open the connection each time (as far as your code is concerned) and close it as quickly as you can. The connection pool will handle the physical connection in an appropriately efficient manner (including any keepalives required, occasional "liveness" tests etc).
I don't know what the current state of the art is, but I used c3p0 very successfully for my last Java project involving JDBC (quite a while ago).

The answer here really depends on the application. If there are other connections being used simultaneously for the same database from the same application, then a pool is definitely your answer.
If all your application does is query the db, wait 10 minutes, then query again, then simply connect and reconnect. A connection is considered to be an expensive operation, but all things are relative. It is not expensive if you do it only once every 10 minutes. If the application is this simple, don't introduce unnecessary complexity.
NOTE:
OK, complexity is also relative, so if are already using something like Spring and already know how to use its pooling mechanism, then apply it for this case. If this is not true, keep it simple.

Connection pooling would be an option for you. You can then leave your code as it is including opening and closing connections. The connection pool will care about the connections. If you close a connection of a pool it will not be closed but just be made available in the pool again. If you open a connection after you closed one if there is a open connection in the pool the pool will return this. So in an application server you can use the build-in connection pools. For simple java applications most of the JDBC drivers also include a pool driver.

There are many, many tradeoffs in opening and closing connections, keeping them open, making sure that connections that have been "kept alive" are still "valid" when you start to use them again, invalidating connections that get corrupted, etc. These kinds of complex tradeoffs make it difficult (but certainly not impossible) to implement the "best" connection management strategy for your specific case. The "safest" method is to open a connection, use it, and then close it. But, as you already realize, that is not at all the most efficient method. If you manage your own connections, then as you do things to make your strategy more efficient, the complexity will rise very quickly (especially in the presence of any less-than-perfect JDBC drivers, of which there are many.)
There are many connection pooling libraries available out there that can take care of all of this for you in extremely configurable ways (they almost always come pre-configured out-of-the-box for the most typical cases, and until you get up to the point that you're doing high-load activities, you probably don't have to worry about all that configurability - but you will be glad to have it if you scale up!) As is always the case, the libraries themselves may be of variable quality.
I have successfully used both C3P0 and Apache DBCP. If I were choosing again today, I would probably go with DBCP.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.