I've been using StackOverFlow for a long time now and always found existing answers, but this time I couldn't find any information about what I'm trying to do.
Using java, I have a process composed of about 10 different tasks that gather distinct data from the database using pure jdbc (no ejb/jpa here). Each task (callable) can actually be run concurrently and is responsible for obtaining a connection, which is what we are doing. However we're randomly experiencing trouble with the connection pool (accessed via jndi), sometimes we're blocked because the connection pool doesn't have any available connection.
To solve this problem, I thought we could change the way we're obtaining the connections, instead of letting each callable opening and closing a connection ( following the number of tasks to execute and the number of threads to use in the ThreadPoolExecutor), I would like to create some kind of local connections pool dedicated to this process, so that we're sure nothing will block later (eventually if we can't acquire all the requested connections, we would then adapt the number of threads to launch with a minimum of 1)
My colleagues approve this idea, but what surprises me is that I can't found any similar approaches or discussion on the web (maybe I'm not using the right keywords).
I would like to know what you think about this idea, whether you already tried something similar or if I'm missing something important.
In advance, thank you.
You have not mentioned which connection pool is used. If it is not HikariCP and you are allowed to switch, having contributed there I recommend it.
HikariCP seems rather interesting finally, i'll have to check this further. But this isn't directly related to the question :)
Just a little return of experience, my idea is working, with one caveat, I couldn't get rid of one downcast from a runnable to my implementation on which I can do .setConnection() during the before() of my ExecutorService. And all tasks must have been given to the executor with the execute() method, otherwise the runnable is autolatically wrapped in a FutureTask without the ability to access the inner runnable. Maybe one of you know of to do this correctly ?
Related
I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.
I want to implement mechanism which will be closing connections if there are not used by specific period of time. This time is constant for all of the connections. Opened connections can be used many times, so I need to update usage time and always compute difference between current time and usage time. I also need to close connections which excess my timeout.
My opened connections are in Map. (Map<Id, Connection>) where Id is an Integer.
I thought about resolving my problem with DelayQueue, but there is no possible to update usage (in this case delay) time in this type of Queue.
I also know that this mechanism should work in separate thread.
Please, give me some tip about the best way of implementation or example. What kind of data structure should I use?
I can use Spring also (maybe there is some good mechanism and I don't know about it).
If you're speaking about database connectivity then just use connection poolers such as c3po, hikariCP, BoneCP and so on. Don't reinvent the wheel.
Take a look at the HikariCP code. Specifically, look at:
ConcurrentBag
PoolBagEntry
BaseDataSource.getConnection()
BaseDataSource.releaseConnection()
HouseKeeper inner class
While HikariCP is a database connection pool, you can use ConcurrentBag as is, use the HouseKeeper basically as is, slightly modify PoolBagEntry, and lift the basic gist from getConnection() and releaseConnection(), to create a generic pool.
I hate stating questions that apparently seem to have a lot of solutions online, but we really cannot seem to find any valid best-practice solution for our case, and therefore felt we had no choice.
We are building an RESTful server application in which the periods between use may differ from a couple of hours to multiple months.
The server is hosted by Jetty. We are not using any ORM, but the application is layered into three layers (WebService- , Business- and Data Layer). The Data layer exist of one class injected through the Guice framework. The JDBC (MySQL connection) is instantiated within the constructor of this class. At first, we had a lot of trouble with too many connections before we understood that Guice by default creates a new instance on each request(ref). To get rid of this problem, and because our Data layer class is stateful, we made the class injected as Singleton.
Now we've foreseen that we might run into trouble when our REST application is not used for some time, since the connection will time out, and no new connection will be instantiated, as the constructor will only be called once.
We now have multiple solutions, but we cannot seem to figure out the best way to solve this, as none of them really seems to be that good. Any input or suggestions to other solutions would be well appreciated.
1. Extend the configured mysql timeout interval
We really do not want this, as we think it's really not best practice. We should of course not have any leaking connection objects, but if we have, they would fill up the free space of connections available.
2. Instantiate a new connection at the beginning of each method, and close it at the end
This is, as far as we understand, not best practice at all, as it would cause a lot of overhead, and should be avoided if possible?
3. Change the injections back to "per-request", and close the pool at the end of each method
This would be even worse than #2, as we would not only instantiate a new connection, but also instantiate a new object on each request?
4. Check the status of the connection at the beginning of each method, and instantiate a new connection if it's closed
An example would be to ping (example) the mysql, and instantiate a new connection if it throws an exceptions. This would work, but it would create some overhead. Any ideas of whether this input actually would make any difference to the performance?
5. Explicitly catch any exceptions being thrown in the methods indicating that the connection is down, and if so - instantiate a new connection
This way, we would get rid of the ping overhead, but it would complicate our code remarkably, as we would have to figure out a way to make sure that the methods will return what they would have returned if the connection where already alive.
6. Use a connection pool
We are not familiar with connection pools, other than when using an application server (i.e Glassfish). We're also wondering whether this actually would solve our problem? And if so; any suggestions on any framework providing us with connection pools? Here they suggest using PLUS with Jetty.
Please ask if there's anything unclear. I might have forgotten to add some vital information. This is to me more of a design question, but I'd be glad to provide any code if anyone thinks that would help.
Thanks in advance!
Connection pools are the way to go.
They have a number of advantages:
They check your connections for you - this deals with timeouts
They control the number of connections
You can simply close the connection when your done - you don't need to keep references
You should certainly keep connections in some sort of pool, and in fact you will almost certainly end up writing one yourself eventually if you don't bite the bullet.
By the time you have implemented connection checking so that they don't go stale, some sort of connection holder so that you don't need to re-open them each time, some sort of exception handling code...you get my drift.
I have used dbcp and boneCP and both are very easy to use and configure and will save you hours and hours of frustration dealing with JDBC connection issues.
I am not overly familiar with Guice but I assume it has some way to provide your own factory method for Object, so you can use that to get connections from your pool and then simple call close() when you're done to return them to the pool.
If you're using a webserver you can always use an interceptor or filter to bind connections to the work thread and discard them after processing in which case your connection provider would only need to yank the one tied to the current thread.
Inject a Provider<Connection> instead and have the provider give out connections (EDIT: at the time you need it) from a connection pool which can detect stale entries.
Unreturned connections should be discarded from the pool.
I ran into the exact issue that setNetworkTimeout is supposed to solve according to Oracle. A query got stuck in socket.read() for several minutes.
But I have little idea what the first parameter for this method needs to be. Submitting a null causes an AbstractMethodError exception, so... does the implementation actually need some sort of thread pool just to set a network timeout?
Is there some way to achieve the same effect without running a thread pool just for this one condition?
It seems like the documentation explains this horribly, but without looking at any code behind the class my guess would be that you are expected to pass an Executor instance to the method so that implementations can spawn jobs/threads in order to check on the status of the connection.
Since connection reads will block, in order to implement any sort of timeout logic it's necessary to have another thread besides the reading one which can check on the status of the connection.
It sounds like a design decision was made that instead of the JDBC driver implementing the logic internally, of how/when to spawn threads to handle this, the API wants you as the client to pass in an Executor that will be used to check on the timeouts. This way you as the client can control things like how often the check executes, preventing it from spawning more threads in your container than you like, etc.
If you don't already have an Executor instance around you can just create a default one:
conn.setNetworkTimeout(Executors.newFixedThreadPool(numThreads), yourTimeout);
As far as Postgres JDBC driver is concerned (postgresql-42.2.2.jar), the setNetworkTimeout implementation does not make use of the Executor parameter. It simply sets the specified timeout as the underlying socket's timeout using the Socket.setSoTimeout method.
It looks like the java.sql.Connection interface is trying not to make any assumptions about the implementation and provides for an executor that may be used if the implementation needs it.
I am using java to create an interface to connect to a database. Each time I want to make a call to the database I need to create new connections to the database, which would make calling the database say 10 times slow.
To avoid having to create new connections each time I want to call the database I have a java thread running that holds all of the connection information.
To write/read from the database I want to create a thread that uses the connection information stored in the thread that's already running, use it to execute specified read/write functions, and then exit.
However I am having trouble accessing this information from the thread which is already running. What would be the best way to accomplish this?
This is a terrible idea, because java.sql.Connection is not thread-safe.
A better idea would be to use a connection pool. Let each thread check out a connection, use it, and put it back.
best way is not to re-invent the wheel. there are good open spource implementations of the connection pooling and i suggest you use them.
if you are already running in a container then use DataSource. look into c3p0 (http://sourceforge.net/projects/c3p0/) and commons-dbcp (http://commons.apache.org/dbcp/)
Why do you need a thread running to keep your connection open, just store it somewhere and execute queries as soon as you need it.. should it work?
In any case if you really want a thread you should care about having a synchronized collection (check Collections.asSynchronizedList) that can be accessed and managed from your thread and others too.
To overcome visibility problems just declare it as a static final variable, so you won't have any problems in accessing it from outside the thread you declared it into.
Another easy solution (since connection seems to be not thread-safe) is not to use a thread but use just a monitor: you can easily manage a wait()/notify() mechanism for which a thread that wants to execute a query checks if connection is "free". if it is occupies the monitor and do whatever it wants before notifying all waiting threads.
why are you doing this? There are frameworks, like Spring or equivalent, which will manage your connections for you. Don't reinvent the wheel man....
I would recommend to use a generic object pool instead of building your own solution and suggest to check Commons Pool from Apache Commons (this is an API for generic Object pooling, this isn't DBCP).