I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.
Related
I have a use case where in I need to process some queries in batch ( stmt.addBatch() ) and some queries need to be executed as soon as they are created (stmt.executeQuery() ) because their result will be used in the queries going for batch processing. The database is (obviously) common for both. My question is can I keep two open connections throughout for the above use case ? Will that be a good idea considering consistency & viability in mind ?
Edit : What if the queries in question were to act upon different records, can 2 simultaneous active connection objects be kept alive for handling each scenario ?
P.S. - I am relatively new to backend & databases specifically, request you to include some explanation / further reading pointers for the same.
In your case it is basically not predictable which will start first as also processors can shift commands ad risc level.
in a view of a rdms, you can have simultaneous access to the same table/ row for that you can read more about locking
for a executebatch is one transaction for the hole batch, so if it would start first, your single command would be locked out, till the lock was lifted, and then proceed if not the timeout hits first.
so for one rdms the consistency is guaranteed, for replicated system it gets more complicated
Oh I see an edit:
Row level locks are mentioned in the first link, which would answer your edit also:
I have a scenario in production for a web app, where when a form is submitted the data gets stored in 3 tables in Oracle DB through JDBC. Sometimes I am seeing connection time out errors in logs while the app is trying to connect to Oracle DB through Java code. This is intermittent.
Below is the exception:
SQL exception while storing data in table
java.sql.SQLRecoverableException: IO Error: Connection timed out
Most of the times the web app is able to connect to data base and insert values in it but some times and I am getting this time out error and unable to insert data in it. I am not sure why am I getting this intermittent issue. When I checked the connections pool config in my application, I noticed the following things:
Pool Size (Maximum number of Connections that this pool can open) : 10
Pool wait (Maximum wait time, in milliseconds, before throwing an Exception if all pooled Connections are in use) : 1000
Since the pool size is just 10 and if there are multiple users trying to connect to data base will this connection time out issue occur ?
Also since there are 3 tables where the data insertion occurs we are doing the whole insertion in just one connection itself. We are not opneing each DB connection for each individual table.
NOTE: This application is deployed on AEM (Content Management system) server and connections pool config is provided by them.
Update: I tried setting the validation query in the connections pool but still I am getting the connection time out error. I am not sure whether the connections pool has checked the validation query or not. I have attached the connections pool above for reference.
I would try two things:
Try setting a validation query so each time the pool leases a connection, you're sure it's actually available. select 1 from dual should work. On recent JDBC drivers that should not be required but you might give it a go.
Estimate the concurrency of your form. A 10 connections pool is not too small depending on the complexity of your work on DB. It seems you're saving a form so it should not be THAT complex. How many users per day do you expect? Then, on peak time, how many users do you expect to be using the form at the same time? A 10 connections pool often leases and retrieves connections quite fast so it can handle several transactions per second. If you expect more, increase the size slightly (more than 25-30 actually degrades DB performance as more queries compete for resources there).
If nothing seems to work, it would be good to check what's happening on your DB. If possible, use Enterprise Manager to see if there are latches while doing stuff on those three tables.
I give this answer from programming point of view. There are multiple possibilities for this problem. These are following and i have added appropriate solution for it. As connection timeout occurs, means your new thread do not getting database access within mentioned time and it is due to:
Possibility I: Not closing connection, there should be connection leakage somewhere in your application Solution
You need to ensure this thing and need to check for this leakage and close the connection after use.
Possibility II: Big Transaction Solution
i. Is these insertion synchronized, if it is so then use it very carefully. Use it at block level not method level. And your synchronized block size should be minimum as much as possible.
What happen is if we have big synchronized block, we give connection, but it will be in waiting state as this synchronized block needs too much time for execution. so other thread waiting time increases. Suppose we have 100 users, each have 100 threads for that operation. 1st is executing and it takes too long time. and others are waiting. So there may be a case where 80th 90th,etc thread throw timeout. And For some thread this issue occurs.
So you must need to reduce size of the synchronized block.
ii. And also for this case also check If the transaction is big, then try to cut the transaction into smaller ones if possible:-
For an example here, for one insertion one small transaction. for second other small transaction, like this. And these three small transaction completes operation.
Possibility III: Pool size is not enough if usability of application is too high Solution
Need to increase the pool size. (It is applicable if you properly closes all the connection after use)
You can use Java Executor service in this case .One thread One connection , all asynchronous .Once transaction completed , release the connection back to pool.That way , you can get rid of this timeout issue.
If one connection is inserting the data in 3 tables and other threads trying to make connection are waiting, timeout is bound to happen.
I've been using StackOverFlow for a long time now and always found existing answers, but this time I couldn't find any information about what I'm trying to do.
Using java, I have a process composed of about 10 different tasks that gather distinct data from the database using pure jdbc (no ejb/jpa here). Each task (callable) can actually be run concurrently and is responsible for obtaining a connection, which is what we are doing. However we're randomly experiencing trouble with the connection pool (accessed via jndi), sometimes we're blocked because the connection pool doesn't have any available connection.
To solve this problem, I thought we could change the way we're obtaining the connections, instead of letting each callable opening and closing a connection ( following the number of tasks to execute and the number of threads to use in the ThreadPoolExecutor), I would like to create some kind of local connections pool dedicated to this process, so that we're sure nothing will block later (eventually if we can't acquire all the requested connections, we would then adapt the number of threads to launch with a minimum of 1)
My colleagues approve this idea, but what surprises me is that I can't found any similar approaches or discussion on the web (maybe I'm not using the right keywords).
I would like to know what you think about this idea, whether you already tried something similar or if I'm missing something important.
In advance, thank you.
You have not mentioned which connection pool is used. If it is not HikariCP and you are allowed to switch, having contributed there I recommend it.
HikariCP seems rather interesting finally, i'll have to check this further. But this isn't directly related to the question :)
Just a little return of experience, my idea is working, with one caveat, I couldn't get rid of one downcast from a runnable to my implementation on which I can do .setConnection() during the before() of my ExecutorService. And all tasks must have been given to the executor with the execute() method, otherwise the runnable is autolatically wrapped in a FutureTask without the ability to access the inner runnable. Maybe one of you know of to do this correctly ?
Let's say we have a class that writes in a database a log message. This class is called from different parts of the code and executes again and again the same INSERT statement. It seems that is calling to use a PreparedStatement.
However I am wondering what is the right usage of it. Do I still get the benefit of using it, like the DBMS using the same execution path each time it is executed, even if I create a new PreparedStatement each time the method is called or should I have a PreparedStatement as a class member and never close it in order to re use it and get benefit from it?
Now, if the only way to obtain benefit using the PreparedStatement in this scenario is to keeping it opened as class member, may the same connection have different PreparedStatement's (with different queries) opened at the same time? What happens when two of these PreparedStatements are executed at the same time? Does the JDBC driver queue the execution of the PreparedStatements?
Thanks in advance,
Dani.
For all I know and experienced, statements don't run in parallel on one connection. And as you observed correctly, PreparedStatements are bound to the Connection they were created on.
As you probably don't want to synchronize your logging call (one insert at a time plus locking overhead), you'd have to keep the connections reserved for this logging statement.
But having a dedicated pool for only one statement seems very wasteful - don't want to do that as well.
So what options are left?
prepare the statement for every insert. As you'll have I/O operations to send data to the db, the overhead of preparing is relatively small.
prepare the statement inside your pool on creating a new connection and build a Map <Connection,PreparedStatement> to reference them later. Makes creating new connections a bit slower but allowes to recycle the statement.
Use some async way to queue your logs (JMS) and do the Insert as batch inside a message driven bean or similar
Probably some more options - but that's all I could think of right now.
Good luck with that.
What is the fastest option to issue stored procedures in a threaded environment in Java? According to http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html#connector-j-examples-preparecall Connection.prepareCall() is an expensive method. So what's the alternative to calling it in every thread, when synchronized access to a single CallableStatement is not an option?
The most JDBC drivers use only a single socket per connection. I think MySQL also use also a single socket. That it is a bad performance idea to share one connection between multiple threads.
If you use multiple connection between different threads then you need a CallableStatment for every connection. You need a CallabaleStatement pool for every connection. The simplest to pool it in this case is to wrap the connection class and delegate all calls to the original class. This can be create very fast with Eclipse. In the wrapped method prepareCall() you can add a simple pool. You need also a wrapped class of the CallableStatement. The close method return the CallableStatement to the pool.
But first you should check if the call is real expensive because many driver has already such poll inside. Create a loop of prepareCall() and close() and count the time.
Connection is not thread safe, so you can't share it across threads.
When you prepareCall, the JDBC driver (may) be telling the RDBMS system to do a lot of work that is stored on the server side. You may be guilty of premature optimization here.
After giving this a little thought it seems that if you are having issues with this infrastructure code then your problems are elsewhere. Most applications do not take an inordinate amount of time doing this stuff.
Make sure you are using a DataSource, most do connection caching and some even do caching of statements.
Also for this to be a performance bottle neck it would imply that you are doing many queries one after the other, or that your pool of connections is too small. Maybe you should do some benchmarking on your code to see how much time the stored proc is taking vs how much time the JDBC code is taking.
Of course I would follow the MySQL recommendation of using CallableStatement, I am sure they have benchmarked this. Most apps do not share anything between Threads and it is rarely an issue.