I am working on a Java GUI and I often need to connect to the database,but I would like to only use the connection statements once,instead of writting the whole thing everytime i use it.
I guess I would probably connect around 20 times in my whole system. So I wanted to know in which situation it is best to use Connection Pool?
Typically a connection pool is used when there are multiple threads requiring access to the database at the same time (a web application for example), each would retrieve a connection from the pool and return it when it's finished executing.
Typically GUI applications wouldn't require the amount concurrent DB access that warrants a connection pool, and a single (static) connection that is initialised when the application starts would normally suffice.
I hope this points you in the right direction; without knowing more about the nature of the application that you're creating it's difficult to make a more informed decision!
Related
I have a bunch of java programs that are run every few minutes. These programs are started by a script every few minutes and terminates in less than a minute. Most of them are single threaded and to access MySQL DB I use:
DriverManager.getConnection()
They just need to connect once, and execute a query.
Now I'm adding a new program to this group which is multi threaded and all the threads need to access DB concurrently. I'm thinking of using a DB connection pool (c3p0) for this.
My question is, as all these programs share a common DAO for accessing DB, is there an overhead of using a DB connection pool for the single threaded programs even though they just need one connection?
I'm planning to set initialPool size to 1, min pool size to 1 and max pool size to 10.
The main goal of connection pools is to have some ready-to-use connections, rather then open and close each time you want to get a connection. This approach saves quite enough time in terms if DB is used quite often.
Apache DBCP is single-threaded, but anyway it significantly increases performance, if your application uses DB connection very often.
c3p0 is a good choice, but for choosing proper connection pool please check this discussion: Connection pooling options with JDBC: DBCP vs C3P0
What is the best (highest performance) way in java to manage reading/writing to an SqlLite file?
1. Create one connection object
Create and share a single Connection object on app startup. In this case do the calls need to be synchronised/serialised? Java programmers seem to like always closing connections to prevent bad code leaving a a connection in a bad state or with unclosed statements, etc...
2. Open and close a connection for every transaction
Avoids the above mentioned problem, and would allow for the code to be run in a multi threaded environment if need be in the future? I also read that some mobile versions of java may require this behaviour.
I am hoping someone else already has experience in this area and can share it, otherwise I am going to have to learn the hard way. I am using the xerial jdbc driver if that makes a difference
SQLite supports three different threading modes:
Single-thread. In this mode, all mutexes are disabled and SQLite is
unsafe to use in more than a single thread at once.
Multi-thread. In this mode, SQLite can be safely used by multiple
threads provided that no single database connection is used
simultaneously in two or more threads.
Serialized. In serialized mode, SQLite can be safely used by multiple
threads with no restriction.
For details see http://www.sqlite.org/threadsafe.html
Note one multi-thread problem I've encountered is if you close a connection in one thread while still reading from a ResultSet in another thread. If you use the native library this will cause a fatal access violation error and crash the JVM proces. You would need to detect this condition in your code and not read from the ResultSet or even close the ResultSet.
So, here is the deal.
I'm developing an Android application (although it could just as easily be any other mobile platform) that will occasionally be sending queries to a server (which is written is Java). This server will then search a MySQL database for the query, and send the results back to the Android. Although this sounds fairly generic, here are some specifics:
The Android will make a new TCP connection to the server every time it queries. The server is geographically close, the Android could potentially be moving around a lot, and, since the Android app might run for hours while only sending a few queries, this seemed the best use of resources.
The server could potentially have hundreds (or possibly even thousands) of these queries at once.
Since each query runs in its own Thread, each query will at least need its own Statement (and could have its own Connection).
Right now, the server is set up to make one Connection to the database, and then create a new Statement for each query. My questions for those of you with some database experience (MySQL in particular, since it is a MySQL database) are:
a) Is it thread safe to create one Statement per Thread from a single Connection? From what I understand it is, just looking for confirmation.
b) Is there any thread safe way for multiple threads to use a single PreparedStatement? These queries will all be pretty much identical, and since each thread will execute only one query and then return, this would be ideal.
c) Should I be creating a new Connection for each Thread, or is it better to spawn new Statements from a single Connection? I think a single Connection would be better performance-wise, but I have no idea what the overhead for establishing a DB Connection is.
d) Is it best to use stored SQL procedures for all this?
Any hints / comments / suggestions from your experience in these matters are greatly appreciated.
EDIT:
Just to clarify, the android sends queries over the network to the server, which then queries the database. The android does not directly communicate with the database. I am mainly wondering about best practices for the server-database connection here.
Just because a Connection object is thread safe does not mean its thread efficient. You should use a Connection pool as a best practice to avoid potential blocking issues. But in answer to your question, yes you can share a Connection object between multiple threads.
You do need to create a new Statements/Prepared Statements in each thread that will be accessing the database, they are NOT thread safe. I would highly recommend using Prepared Statements as you will gain efficiency and protection against SQL injection attacks.
Stored procedures will speed up your database queries since the execution plan is compiled already and saved - highly recommended to use if you can.
Have you looked at caching your database data? Take a look at spymemcached if you can, its a great product for reducing number of calls to your data store.
From my experience, you should devote a little time to wrap the database in a web service. This accomplishes two things:
You are forced to examine the data for wider consumption
You make it easier for new consumers to consume the data
A bit more development time, but direct connections to a database via an open network (Internet) is more problematic than specifying what can be accessed through a method.
Use a connection pool such as Commons DBCP. It will handle all the stuff you're worrying about, out of the box.
I am running a webapp inside Webpshere Application Server 6.1. This webapp has a rules kind of engine, where every rule obtains its very own connection from the websphere data source pool. So, I see that when an use case is run, for 100 records of input, about 400-800 connections are obtained from the pool and released back to the pool. I have a feeling that if this engine goes to production, it might take too much time to complete processing.
Is it a bad practice to obtain connections from pool that frequently? What are the overhead costs involved in obtaining connections from pool? My guess is that costs involved should be minimal as pool is nothing but a resource cache. Please correct me if I am wrong.
Connection pooling keeps your connection alive in anticipation, if another user connects the ready connection to the db is handed over and the database does not have to open a connection all over again.
This is actually a good idea because opening a connection is not just a one-go thing. There are many trips to the server (authentication, retrieval, status, etc) So if you've got a connection pool on your website, you're serving your customers faster.
Unless your website is not visited by people you can't afford not to have a connection pool working for you.
The pool doesn't seem to be your problem. The real problem lies in the fact that your "rules engine" doesn't release connections back to the pool before completing the entire calculation. The engine doesn't scale well, so it seems. If the number of database connections somehow depends on the number of records being processed, something is almost always very wrong!
If you manage to get your engine to release connections as soon as possible, it may be that you only need a few connections instead of a few hundred. Failing that, you could use a connection wrapper that re-uses the same connection every time the rules engine asks for one, that somewhat negates the benefits of having a connection pool though...
Not to mention that it introduces many multithreading and transaction isolation issues, if the connections are read-only, it might be an option.
A connection pool is all about connection re-use.
If you are holding on to a connection at times where you don't need a connection, then you are preventing that connection from being re-used somewhere else. And if you have a lot of threads doing this, then you must also run with a larger pool of connections to prevent pool exhaustion. More connections takes longer to create and establish, and they take more resources to maintain; there will be more reconnecting as the connections grow old and your database server will also be impacted by the greater number of connections.
In other words: you want to run with the smallest possible pool without exhausting it. And the way to do that is to hold on to your connections as little as possible.
I have implemented a JDBC connection pool myself and, although many pool implementations out there probably could be faster, you are likely not going to notice because any slack going on in the pool is most likely dwarfed by the time it takes to execute queries on your database.
In short: connection pools just love it when you return their connections. Or they should anyway.
To really check if your pool is a bottle neck you should profile you program. If you find the pool is a problem, then you have tuning problem. A simple pool should be able to handle 100K allocations per second or more or about 10 micro-seconds. However, as soon as you use a connection, it will take between 200 and 2,000 micro-seconds to do something useful.
I think this is a poor design. Sounds like a Rete rules engine run amok.
If you assume 0.5-1.0 MB minimum per thread (e.g. for stack, etc.) you'll be thrashing a lot of memory. Checking the connections in and out of the pool will be the least of your problems.
The best way to know is to do a performance test and measure memory, wall times for each operation, etc. But this doesn't sound like it'll end well.
Sometimes I see people assume that throwing all their rules into Blaze or ILOG or JRules or Drools simply because it's "standard" and high tech. It's a terrific resume item, but how many of those solutions would be better served by a simpler table-driven decision tree? Maybe your problem is one of those.
I'd recommend that you get some data, see if there's a problem, and be prepared to redesign if the data tells you it's necessary.
Could you provide more details on what your rules engine does exactly? If each rule "firing" is performing data updates, you may want to verify that the connection is being properly released (Put this in the finally block of your code to ensure that the connections are really being released).
If possible, you may want to consider capturing your data updates to a memory buffer, and write to the database only at the end of the rule session/invocation.
If the database operations are read-only, consider caching the information.
As bad as you think 400-800 connections being created and released to the pool is, I suspect it'll be much much worse if you have to create and close 400-800 unpooled connections.
Have a use case wherein need to maintain a connection open to a database open to execute queries periodically.
Is it advisable to close connection after executing the query and then reopen it after the period interval (10 minutes). I would guess no since opening a connection to database is expensive.
Is connection pooling the alternative and keep using the connections?
You should use connection pooling. Write your application code to request a connection from the pool, use the connection, then return the connection back to the pool. This keeps your code clean. Then you rely on the pool implementation to determine the most efficient way to manage the connections (for example, keeping them open vs closing them).
Generally it is "expensive" to open a connection, typically due to the overhead of setting up a TCP/IP connection, authentication, etc. However, it can also be expensive to keep a connection open "too long", because the database (probably) has reserved resources (like memory) for use by the connection. So keeping a connection open can tie-up those resources.
You don't want to pollute your application code managing these types of efficiency trade-offs, so use a connection pool.
Yes, connection pooling is the alternative. Open the connection each time (as far as your code is concerned) and close it as quickly as you can. The connection pool will handle the physical connection in an appropriately efficient manner (including any keepalives required, occasional "liveness" tests etc).
I don't know what the current state of the art is, but I used c3p0 very successfully for my last Java project involving JDBC (quite a while ago).
The answer here really depends on the application. If there are other connections being used simultaneously for the same database from the same application, then a pool is definitely your answer.
If all your application does is query the db, wait 10 minutes, then query again, then simply connect and reconnect. A connection is considered to be an expensive operation, but all things are relative. It is not expensive if you do it only once every 10 minutes. If the application is this simple, don't introduce unnecessary complexity.
NOTE:
OK, complexity is also relative, so if are already using something like Spring and already know how to use its pooling mechanism, then apply it for this case. If this is not true, keep it simple.
Connection pooling would be an option for you. You can then leave your code as it is including opening and closing connections. The connection pool will care about the connections. If you close a connection of a pool it will not be closed but just be made available in the pool again. If you open a connection after you closed one if there is a open connection in the pool the pool will return this. So in an application server you can use the build-in connection pools. For simple java applications most of the JDBC drivers also include a pool driver.
There are many, many tradeoffs in opening and closing connections, keeping them open, making sure that connections that have been "kept alive" are still "valid" when you start to use them again, invalidating connections that get corrupted, etc. These kinds of complex tradeoffs make it difficult (but certainly not impossible) to implement the "best" connection management strategy for your specific case. The "safest" method is to open a connection, use it, and then close it. But, as you already realize, that is not at all the most efficient method. If you manage your own connections, then as you do things to make your strategy more efficient, the complexity will rise very quickly (especially in the presence of any less-than-perfect JDBC drivers, of which there are many.)
There are many connection pooling libraries available out there that can take care of all of this for you in extremely configurable ways (they almost always come pre-configured out-of-the-box for the most typical cases, and until you get up to the point that you're doing high-load activities, you probably don't have to worry about all that configurability - but you will be glad to have it if you scale up!) As is always the case, the libraries themselves may be of variable quality.
I have successfully used both C3P0 and Apache DBCP. If I were choosing again today, I would probably go with DBCP.