Reusing a connection while polling a database in JDBC?

Reusing a connection while polling a database in JDBC? - java

Have a use case wherein need to maintain a connection open to a database open to execute queries periodically.
Is it advisable to close connection after executing the query and then reopen it after the period interval (10 minutes). I would guess no since opening a connection to database is expensive.
Is connection pooling the alternative and keep using the connections?

You should use connection pooling. Write your application code to request a connection from the pool, use the connection, then return the connection back to the pool. This keeps your code clean. Then you rely on the pool implementation to determine the most efficient way to manage the connections (for example, keeping them open vs closing them).
Generally it is "expensive" to open a connection, typically due to the overhead of setting up a TCP/IP connection, authentication, etc. However, it can also be expensive to keep a connection open "too long", because the database (probably) has reserved resources (like memory) for use by the connection. So keeping a connection open can tie-up those resources.
You don't want to pollute your application code managing these types of efficiency trade-offs, so use a connection pool.

Yes, connection pooling is the alternative. Open the connection each time (as far as your code is concerned) and close it as quickly as you can. The connection pool will handle the physical connection in an appropriately efficient manner (including any keepalives required, occasional "liveness" tests etc).
I don't know what the current state of the art is, but I used c3p0 very successfully for my last Java project involving JDBC (quite a while ago).

The answer here really depends on the application. If there are other connections being used simultaneously for the same database from the same application, then a pool is definitely your answer.
If all your application does is query the db, wait 10 minutes, then query again, then simply connect and reconnect. A connection is considered to be an expensive operation, but all things are relative. It is not expensive if you do it only once every 10 minutes. If the application is this simple, don't introduce unnecessary complexity.
NOTE:
OK, complexity is also relative, so if are already using something like Spring and already know how to use its pooling mechanism, then apply it for this case. If this is not true, keep it simple.

Connection pooling would be an option for you. You can then leave your code as it is including opening and closing connections. The connection pool will care about the connections. If you close a connection of a pool it will not be closed but just be made available in the pool again. If you open a connection after you closed one if there is a open connection in the pool the pool will return this. So in an application server you can use the build-in connection pools. For simple java applications most of the JDBC drivers also include a pool driver.

There are many, many tradeoffs in opening and closing connections, keeping them open, making sure that connections that have been "kept alive" are still "valid" when you start to use them again, invalidating connections that get corrupted, etc. These kinds of complex tradeoffs make it difficult (but certainly not impossible) to implement the "best" connection management strategy for your specific case. The "safest" method is to open a connection, use it, and then close it. But, as you already realize, that is not at all the most efficient method. If you manage your own connections, then as you do things to make your strategy more efficient, the complexity will rise very quickly (especially in the presence of any less-than-perfect JDBC drivers, of which there are many.)
There are many connection pooling libraries available out there that can take care of all of this for you in extremely configurable ways (they almost always come pre-configured out-of-the-box for the most typical cases, and until you get up to the point that you're doing high-load activities, you probably don't have to worry about all that configurability - but you will be glad to have it if you scale up!) As is always the case, the libraries themselves may be of variable quality.
I have successfully used both C3P0 and Apache DBCP. If I were choosing again today, I would probably go with DBCP.

Related

Abandoned connection cleanup in mariadb (compared to mysql)?

Switching from mysql-connector to mariadb client library:
What is the equivalent of the mysql class com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.checkedShutdown()?
If there is any at all?
(I'm also using hikari connection pool).

I don't believe there is an equivalent, it looks like this feature was not migrated to Maria DB. It would be more prudent to fix the connection leak in the application instead.
As explained by HikariCP pool author in this message, this feature of force closing abandoned connections has a number of problems:
Yes, we have considered it (removing abandoned connections), but ultimately we decided to pass. The problem with closing leaked connections is several fold. Some thread is possibly using that connection, and its going to blow-up (in production) somewhere if we close it. Or nothing is using that connection, and closing it has no negative impact, but now we've just covered up a leak that will cause constant cycling of connections in the pool.
Applications are responsible for cleaning up resources. Java developers tend to get lazy compared to C/C++ programmers. This is leak just like a memory leak, and both can and rightfully should eventually kill your application. How else would you 1) know a problem exist, and 2) be motivated to track it down and fix it.
We do appreciate all input, even if not adopted. In this case, users looking for a library to defensively cover-up coding errors should probably look to tomcat-jdbc.
Note, leak detection can be run in production, and can be enabled at runtime through a JMX console, so there's not a lot of justification for adding proactive connection reclamation.

How to keep jdbc to postgres alive

So I've been tracking a bug for a day or two now which happens out on a remote server that I have little control over. The ins and outs of my code are, I provide a jar file to our UI team, which wraps postgres and provides storage for data that users import. The import process is very slow due to multiple reasons, one of which is that the users are importing unpredictable, large amounts of data (which we can't really cut down on). This has lead to a whole plethora of time out issues.
After some preliminary investigation, I've narrowed it down to the jdbc to the postgres database is timing out. I had a lot of trouble replicating this on my local test setup, but have finally managed to by reducing the 'socketTimeout' of the connection properties to 10s (there's more than 10s between each call made on the connection).
My question now is, what is the best way to keep this alive? I've set the 'tcpKeepAlive' to true, but this doesn't seem to have an effect, do I need to poll the connection manually or something? From what I've read, I'm assuming that polling is automatic, and is controlled by the OS. If this is true, I don't really have control of the OS settings in the run environment, what would be the best way to handle this?
I was considering testing the connection each time it is used, and if it has timed out, I will just create a new one. Would this be the correct course of action or is there a better way to keep the connection alive? I've just taken a look at this post where people are suggesting that you should open and close a connection per query:
When my app loses connection, how should I recover it?
In my situation, I have a series of sequential inserts which take place on a single thread, if a single one fails, they all fail. To achieve this I've used transactions:
m_Connection.setAutoCommit(false);
m_TransactionSave = m_Connection.setSavepoint();
// Do something
m_Connection.commit();
m_TransactionSave = null;
m_Connection.setAutoCommit(true);
If I do keep reconnecting, or use a connection pool like PGBouncer (like someone suggested in comments), how do I persist this transaction across them?

JDBC connections to PostGres can be configured with a keep-alive setting. An issue was raised against this functionality here: JDBC keep alive issue. Additionally, there's the parameter help page.
From the notes on that, you can add the following to your connection parameters for the JDBC connection:
tcpKeepAlive=true;
Reducing the socketTimeout should make things worse, not better. The socketTimeout is a measure of how long a connection should wait when it expects data to arrive, but it has not. Making that longer, not shorter would be my instinct.
Is it possible that you are using PGBouncer? That process will actively kill connections from the server side if there is no activity.
Finally, if you are running on Linux, you can change the TCP keep alive settings with: keep alive settings. I am sure something similar exists for Windows.

How do I know Connection Pooling and Prepared Statements are working?

I have been developing a web application for almost a year which I had started in college. After finishing, I made some changes to the database code that I had written during college. I used simple JDBC when I was in college, without the use of connection pooling and prepared statements. In the past month, I have realized the potential, and need to use connection pooling, and prepared statements because of the increase in data being queried and used.
I have implemented the connection pooling, but have not really noticed an increase in performance. If there was an increase in performance, then my question would be answered, but how can I validate that connection pooling and prepared statements are being used properly?

I think prepared statement is more important from a security than a performance standpoint. If you are using them cool, you really should be. As long as 100% of your SQL is a prepared statement, you are "using them right"...
Connection pooling is something you won't notice under light load. 1 or 2 or 10 users, you will likely see 0 or near 0 difference. You would need to create some kind of load testing test to simulate 100s of simultaneous users, then compare performance with and without pooling.

You won't see a performance improvement if opening and closing connections wasn't rate-limiting. Try looking at your database's profiling tools for a list of open connections. If you see open connections from your process when you have ostensibly closed them, the interface layer is doing connection pooling.
See also this other SO question on forcing a pooled connection to drop.
(Caveat:I use ADO and ADO.NET; JDBC might behave differently.)

You ask the connection pool for statistics to see if it works. If the connection pool library does not provide statistics, find a better library.

Connection Pooling - How much of an overhead is it?

I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.

Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.

The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.

A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.

To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.

I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.

Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.

Do I have to explicitly disconnect from a database when using Java?

It is necessary to disconnect from the database after the job is done in Java? If it is not disconnected, will it lead to memory leaks?

You must always close all your Connections, Statements and ResultSets.
If not, is more probable you can't obtain new connections from the pool than a memory leak.

You should provide more details like which framework you are using or something.
Anyway, are you using JDBC? If so you should close the following objects by using their respective close() methods: Statement, ResultSet and Connection.

Assuming you are using JDBC, the answer is yes. If you don't close the connection, then the JDBC driver might try to close it in a finallizer, but that could hold the connection open for a very long time, causing resource issues (the amount of database connections allowed to be open at one time is finite). Typically JDBC programming is done with a database pool, and not closing the connection will mean that the pool will run out of available connections very quickly.
Some application servers (e.g. JBoss) will detect when a connection wasn't closed and close it for you if it is managing the transactions, but you should not rely on that.
Of course some JDBC drivers are not pure java drivers, at which point memory leaks become a very real possibility.

I don't have a source, but I believe (if I remember right, it's been a while since I've touched JDBC) that it depends on the JDBC driver implementation. You should always close your connections and clean up after yourself as not all JDBC drivers do it for you (although some might).
This goes back to a rule that I like to follow - If I create or open something, I'm responsible for deleting or closing it.

yes and yes

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.