I'm pretty green to DBMS and I'm required to write a Java program using JDBC to interact with an Access database file. I'm wondering if it's better practice, or even possible, to initialize the Connection in main and pass it to each method as needed (closing it after the program has run) or to open and close a new connection in each individual method.
Sorry if this is a repeat but none of the questions/answers on I've found on this have been conclusive.
Opening a connection takes quite a long time. You should use the same connection through your program if there is no special reason to close it.
There is even a special technique called connection pooling, which allows re-using open connections in large applications, which improves the performance.
I think creating a single connection object is a best way as you are decreasing the overhead for JVM for creating and garbage collecting an object.
(Use try-with-resource. it will take care for closing of connection object automatically)
Related
So I have a Java process that runs indefinitely as a TCP server (receives messages from another process, and has onMsg handlers).
One of the things I want to do with the message in my Java program is to write it to disk using a database connection to postgres. Right now, I have one single static connection object which I call every time a message comes in. I do NOT close and reopen the connection for each message.
I am still a bit new to Java, I wanted to know 1) whether there are any pitfalls or dangers with using one connection object open indefinitely, and 2) Are there performance benefits to never closing the connection, as opposed to reopening/closing every time I want to hit the database?
Thanks for the help!
I do NOT close and reopen the connection for each message.
Yes you do... at least as far as the plain Connection object is concerned. Otherwise, if you ever end up with a broken connection, it'll be broken forever, and if you ever need to perform multiple operations concurrently, you'll have problems.
What you want is a connection pool to manage the "real" connections to the database, and you just ask for a connection from the pool for each operation and close it when you're done with it. Closing the "logical" connection just returns the "real" connection to the pool for another operation. (The pool can handle keeping the connection alive with a heartbeat, retiring connections over time etc.)
There are lots of connection pool technologies available, and it's been a long time since I've used "plain" JDBC so I wouldn't like to say where the state of the art is at the moment - but that's research you can do for yourself :)
Creating a database connection is always a performance hit. Only a very naive implementation would create and close a connection for each operation. If you only needed to do something once an hour, then it would be acceptable.
However, if you have a program that performs several database accesses per minute (or even per second for larger apps), you don't want to actually close the connection.
So when do you close the connection? Easy answer: let a connection pool handle that for you. You ask the pool for a connection, it'll give you an open connection (that it either has cached, or if it really needs to, a brand new connection). When you're finished with your queries, you close() the connection, but it actually just returns the connection to the pool.
For very simple programs setting up a connection pool might be extra work, but it's not very difficult and definitely something you'll want to get the hang of. There are several open source connection pools, such as DBCP from Apache and 3CPO.
I hate stating questions that apparently seem to have a lot of solutions online, but we really cannot seem to find any valid best-practice solution for our case, and therefore felt we had no choice.
We are building an RESTful server application in which the periods between use may differ from a couple of hours to multiple months.
The server is hosted by Jetty. We are not using any ORM, but the application is layered into three layers (WebService- , Business- and Data Layer). The Data layer exist of one class injected through the Guice framework. The JDBC (MySQL connection) is instantiated within the constructor of this class. At first, we had a lot of trouble with too many connections before we understood that Guice by default creates a new instance on each request(ref). To get rid of this problem, and because our Data layer class is stateful, we made the class injected as Singleton.
Now we've foreseen that we might run into trouble when our REST application is not used for some time, since the connection will time out, and no new connection will be instantiated, as the constructor will only be called once.
We now have multiple solutions, but we cannot seem to figure out the best way to solve this, as none of them really seems to be that good. Any input or suggestions to other solutions would be well appreciated.
1. Extend the configured mysql timeout interval
We really do not want this, as we think it's really not best practice. We should of course not have any leaking connection objects, but if we have, they would fill up the free space of connections available.
2. Instantiate a new connection at the beginning of each method, and close it at the end
This is, as far as we understand, not best practice at all, as it would cause a lot of overhead, and should be avoided if possible?
3. Change the injections back to "per-request", and close the pool at the end of each method
This would be even worse than #2, as we would not only instantiate a new connection, but also instantiate a new object on each request?
4. Check the status of the connection at the beginning of each method, and instantiate a new connection if it's closed
An example would be to ping (example) the mysql, and instantiate a new connection if it throws an exceptions. This would work, but it would create some overhead. Any ideas of whether this input actually would make any difference to the performance?
5. Explicitly catch any exceptions being thrown in the methods indicating that the connection is down, and if so - instantiate a new connection
This way, we would get rid of the ping overhead, but it would complicate our code remarkably, as we would have to figure out a way to make sure that the methods will return what they would have returned if the connection where already alive.
6. Use a connection pool
We are not familiar with connection pools, other than when using an application server (i.e Glassfish). We're also wondering whether this actually would solve our problem? And if so; any suggestions on any framework providing us with connection pools? Here they suggest using PLUS with Jetty.
Please ask if there's anything unclear. I might have forgotten to add some vital information. This is to me more of a design question, but I'd be glad to provide any code if anyone thinks that would help.
Thanks in advance!
Connection pools are the way to go.
They have a number of advantages:
They check your connections for you - this deals with timeouts
They control the number of connections
You can simply close the connection when your done - you don't need to keep references
You should certainly keep connections in some sort of pool, and in fact you will almost certainly end up writing one yourself eventually if you don't bite the bullet.
By the time you have implemented connection checking so that they don't go stale, some sort of connection holder so that you don't need to re-open them each time, some sort of exception handling code...you get my drift.
I have used dbcp and boneCP and both are very easy to use and configure and will save you hours and hours of frustration dealing with JDBC connection issues.
I am not overly familiar with Guice but I assume it has some way to provide your own factory method for Object, so you can use that to get connections from your pool and then simple call close() when you're done to return them to the pool.
If you're using a webserver you can always use an interceptor or filter to bind connections to the work thread and discard them after processing in which case your connection provider would only need to yank the one tied to the current thread.
Inject a Provider<Connection> instead and have the provider give out connections (EDIT: at the time you need it) from a connection pool which can detect stale entries.
Unreturned connections should be discarded from the pool.
I have gone through couple of articles on singleton example. And I could see that developers sometimes make the database connection object or connection manager as singleton implementation. In some of the posts even it was advised to use Database connection pool.
Well Singleton means to create a single instance, so basically we restrict the access. e.g. Printer or hardware access, logger access in which we try to restrict the access of the user to one at a time using singleton. However what is the purpose of using singleton in DB connection objects?
If I can understand correctly creating a Database connection as singleton means that app server will have only one instance. Does this mean only one user can access the Database connection and next user has to wait until the connection is closed?
Please advise.
I think you understand correctly the implication of making the connection itself a singleton. Generally it is not a good idea (although in some very particular case it could make sense).
Making a connection manager or a connection pool a singleton is completely different. The pool itself would handle a collection of connections and it can create new as they are needed (up to a limit) or re-use the ones that have been already used and discarded.
Having several connection pools at the same time would lose the advantages of the pool:
It would be harder to control the total number of connections open
One pool could be creating connections while other could have connections available
Hope this helps to clarify the subject. You might want to read more on connection pools.
Q: "However what is the purpose of using singleton in DB connection objects?" A: There is (almost always) none. So your thinking is correct.
Q: "Does this mean only one user can access the Database connection and next user has to wait until the connection is closed?"
A: Depends (to first part) and No (to second part after "and"). In single-threaded application only one user will use the database at one time and another will wait, until dispatch of first user ends but not when the connection is closed. Once connection is closed, you need to create another connection to make use of database. In multi-threaded application many threads may be using the same connection instance and result really depends on the vendor implementation: may block dispatching (effectively transforming your app to single-threaded app) or throw exceptions or even something different. However, such design in multi-threaded app is in my opinion a programmer-error.
We all know that we should rather reuse a JDBC PreparedStatement than creating a new instance within a loop.
But how to deal with PreparedStatement reuse between different method invocations?
Does the reuse-"rule" still count?
Should I really consider using a field for the PreparedStatement or should I close and re-create the prepared statement in every invocation (keep it local)?
(Of course an instance of such a class would be bound to a Connection which might be a disadvantage in some architectures)
I am aware that the ideal answer might be "it depends".
But I am looking for a best practice for less experienced developers that they will do the right choice in most of the cases.
Of course an instance of such a class would be bound to a Connection which might be a disadvantage
Might be? it would be a huge disadvantage. You'd either need to synchronize access to it, which would kill your multi-user performance stone-dead, or create multiple instances and keep them in a pool. Major pain in the ass.
Statement pooling is the job of the JDBC driver, and most, if not all, of the current crop of drivers do this for you. When you call prepareStatement or prepareCall, the driver will handle re-use of existing resource and pre-compiled statements.
Statement objects are tied to a connection, and connections should be used and returned to the pool as quickly as possible.
In short, the standard practice of obtaining a PreparedStatement at the start of the method, using it repeatedly within a loop, then closing it at the end of the method, is best practice.
Many database workloads are CPU-bound, not IO-bound. This means that the database ends up spending more time doing work such as parsing SQL queries and figuring out how to handle them (doing the 'execution plan'), than it spends accessing the disk. This is more true of 'transactional' workloads than 'reporting' workloads, but in both cases the time spent preparing the plan may be more than you expect.
Thus it is always a good idea, if the statement is going to be executed frequently and the hassle of making (correct) arrangements to cache PreparedStatements 'between method invocations' is worth your developer time. As always with performance, measurement is key, but if you can do it cheaply enough, cache your PreparedStatement out of habit.
Some JDBC drivers and/or connection pools offer transparent 'prepared statement caching', so that you don't have to do it yourself. So long as you understand the behaviour of your particular chosen transparent caching strategy, it's fine to let it keep track of things ... what you really want to avoid is the hit on the database.
Yes it can be reused, but I believe this only counts if the same Connection object is being used and if you are using a Database Connection Pool (from within a Web Application, for example) then the Connection objects will be potentially different each time.
I always recreate the PreparedStatement before each use within a Web Application for this reason.
If you aren't using a Connection Pool then you are golden!
I don't see the difference: If I execute the same statement repeatedly against the same connection, why not reuse the PreparedStatement in any way? If multiple methods execute the same statement, then maybe that statement needs to be encapsulated in its own method (or even its own class). That way you wouldn't need to pass around a PreparedStatement.
What is the fastest option to issue stored procedures in a threaded environment in Java? According to http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html#connector-j-examples-preparecall Connection.prepareCall() is an expensive method. So what's the alternative to calling it in every thread, when synchronized access to a single CallableStatement is not an option?
The most JDBC drivers use only a single socket per connection. I think MySQL also use also a single socket. That it is a bad performance idea to share one connection between multiple threads.
If you use multiple connection between different threads then you need a CallableStatment for every connection. You need a CallabaleStatement pool for every connection. The simplest to pool it in this case is to wrap the connection class and delegate all calls to the original class. This can be create very fast with Eclipse. In the wrapped method prepareCall() you can add a simple pool. You need also a wrapped class of the CallableStatement. The close method return the CallableStatement to the pool.
But first you should check if the call is real expensive because many driver has already such poll inside. Create a loop of prepareCall() and close() and count the time.
Connection is not thread safe, so you can't share it across threads.
When you prepareCall, the JDBC driver (may) be telling the RDBMS system to do a lot of work that is stored on the server side. You may be guilty of premature optimization here.
After giving this a little thought it seems that if you are having issues with this infrastructure code then your problems are elsewhere. Most applications do not take an inordinate amount of time doing this stuff.
Make sure you are using a DataSource, most do connection caching and some even do caching of statements.
Also for this to be a performance bottle neck it would imply that you are doing many queries one after the other, or that your pool of connections is too small. Maybe you should do some benchmarking on your code to see how much time the stored proc is taking vs how much time the JDBC code is taking.
Of course I would follow the MySQL recommendation of using CallableStatement, I am sure they have benchmarked this. Most apps do not share anything between Threads and it is rarely an issue.