I encountered several stuck JDBC connections in my code due to poor network health. I am planning java.sql.Connection.setNetworkTimeout library function. As per docs:-
Sets the maximum period a Connection or objects created from the Connection will wait for the database to reply to any one request
Now, what exactly is the request here? my query takes really long time to respond and even longer time to process (I am using jdbc interface to a big data DB). So do I need to keep this timeout time, bigger than the expected query execution time (to prevent false trigger) or will there exist keep alive messages, being exchanged to keep track on network connection?, in which case I will keep it really low
So if your NetworkTimeout is smaller than the QueryTimeout, the query will be terminated on your side - thread that waits for the DB to reply (notice that setNetworkTimeout has Executor executor parameter) will be interrupted. Depending on the underlying implementation NetworkTimeout may cancel the query on the DB side as well.
If NetworkTimeout > QueryTimeout, and query completes within QueryTimeout then nothing bad should happen. If problems you experience are exactly in this case, you should try to work on the OS level settings for keeping TCP connections alive so that no firewall terminates them too soon.
When it comes to keeping TCP connections alive it is usually more a matter of the OS level settings than the application itself. You can read more about it (Linux) here.
Related
I have this simple Spring boot based web app that downloads data from several APIs. Some of them don't respond in time, since my connectionTimeout is set to somewhat 4 seconds.
As soon as I get rid of connectionTimeout setting, I'm getting an exceptions after 20 or so seconds.
So, my question is, for how long am I able to try to connect to an API and what does it depend on? Where do those 20 seconds come from? What if an API responds after 40 minutes of time and I won't be able to catch that specific moment and just gonna lose data. I don't want that to happen. What are my options?
Here's the code to set the connection. Nothing special.
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new HttpComponentsClientHttpRequestFactory(HttpClientBuilder.create().build());
clientHttpRequestFactory.setConnectTimeout(4000);
RestTemplate restTemplate = new RestTemplate(clientHttpRequestFactory);
Then I retrieve the values via:
myObject.setJsonString(restTemplate.getForObject(url, String.class));
Try increasing your timeout. 4 seconds is too little.
It will need to connect, formulate data and return. So 4 seconds is just for connecting, by the time it attempts to return anything, your application has already disconnected.
Set it to 20 seconds to test it. You can set it to much longer to give the API enough time to complete. This does not mean you app will use up all of the connection timeout time. It will finish as soon as a result is returned. Also API's are not designed to take long. They will perform the task and return the result as fast as possible
Connection timeout means that your program couldn't connect to the server at all within the time specified.
The timeout can be configured, as, like you say, some systems may take a longer time to connect to, and if this is known in advance, it can be allowed for. Otherwise the timeout serves as a guard to prevent the application from waiting forever, which in most cases doesn't really give a good user experience.
A separate timeout can normally be configured for reading data (socket timeout). They are not inclusive of each other.
To solve your problem:
Check that the server is running and accepting incoming connections.
You might want to use curl or depending on what it is simply your browser to try and connect.
If one tool can connect, but the other can't, check your firewall settings and ensure that outgoing connections from your Java program are permitted. The easiest way to test whether this is a problem is to disable anti virus and firewall tools temporarily. If this allows the connection, you'll either need to leave the FW off, or better add a corresponding exception.
Leave the timeout on a higher setting (or try setting it to 0, which is interpreted as infinite) while testing. Once you have it working, you can consider tweaking it to reflect your server spec and usability requirements.
Edit:
I realised that this doesn't necessarily help, as you did ultimately connect. I'll leave the above standing as general info.
for how long am I able to try to connect to an API and what does it depend on?
Most likely the server that the API is hosted on. If it is overloaded, response time may lengthen.
Where do those 20 seconds come from?
Again this depends on the API server. It might be random, or it may be processing each request for a fixed period of time before finding itself in an error state. In this case that might take 20 seconds each time.
What if an API responds after 40 minutes of time and I won't be able to catch that specific moment and just gonna lose data. I don't want that to happen. What are my options?
Use a more reliable API, possibly paying for a service guarantee.
Tweak your connection and socket timeouts to allow for the capabilities of the server side, if known in advance.
If the response is really 40 minutes, it is a really poor service, but moving on with that assumption - if the dataset is that large, explore whether the API offers a streaming callback, whereby you pass in an OutputStream into the API's library methods, to which it will (asynchronously) write the response when it is ready.
Keep in mind that connection and socket timeout are separate things. Once you have connected, the connection timeout becomes irrelevant (socket is established). As long as you begin to receive and continue to receive data (packet to packet) within the socket timeout, the socket timeout won't be triggered either.
Use infinite timeouts (set to 0), but this could lead to poor usability within your applications, as well as resource leaks if a server is in fact offline and will never respond. In that case you will be left with dangling connections.
The default and maximum has nothing to do with the the server. It depends on the client platform, but it is around a minute. You can decrease it, but not increase it. Four seconds is far too short. It should be measured in tens of seconds in most circumstances.
And absent or longer connection timeouts do not cause server errors of any kind. You are barking up the wrong tree here.
I have a scenario in production for a web app, where when a form is submitted the data gets stored in 3 tables in Oracle DB through JDBC. Sometimes I am seeing connection time out errors in logs while the app is trying to connect to Oracle DB through Java code. This is intermittent.
Below is the exception:
SQL exception while storing data in table
java.sql.SQLRecoverableException: IO Error: Connection timed out
Most of the times the web app is able to connect to data base and insert values in it but some times and I am getting this time out error and unable to insert data in it. I am not sure why am I getting this intermittent issue. When I checked the connections pool config in my application, I noticed the following things:
Pool Size (Maximum number of Connections that this pool can open) : 10
Pool wait (Maximum wait time, in milliseconds, before throwing an Exception if all pooled Connections are in use) : 1000
Since the pool size is just 10 and if there are multiple users trying to connect to data base will this connection time out issue occur ?
Also since there are 3 tables where the data insertion occurs we are doing the whole insertion in just one connection itself. We are not opneing each DB connection for each individual table.
NOTE: This application is deployed on AEM (Content Management system) server and connections pool config is provided by them.
Update: I tried setting the validation query in the connections pool but still I am getting the connection time out error. I am not sure whether the connections pool has checked the validation query or not. I have attached the connections pool above for reference.
I would try two things:
Try setting a validation query so each time the pool leases a connection, you're sure it's actually available. select 1 from dual should work. On recent JDBC drivers that should not be required but you might give it a go.
Estimate the concurrency of your form. A 10 connections pool is not too small depending on the complexity of your work on DB. It seems you're saving a form so it should not be THAT complex. How many users per day do you expect? Then, on peak time, how many users do you expect to be using the form at the same time? A 10 connections pool often leases and retrieves connections quite fast so it can handle several transactions per second. If you expect more, increase the size slightly (more than 25-30 actually degrades DB performance as more queries compete for resources there).
If nothing seems to work, it would be good to check what's happening on your DB. If possible, use Enterprise Manager to see if there are latches while doing stuff on those three tables.
I give this answer from programming point of view. There are multiple possibilities for this problem. These are following and i have added appropriate solution for it. As connection timeout occurs, means your new thread do not getting database access within mentioned time and it is due to:
Possibility I: Not closing connection, there should be connection leakage somewhere in your application Solution
You need to ensure this thing and need to check for this leakage and close the connection after use.
Possibility II: Big Transaction Solution
i. Is these insertion synchronized, if it is so then use it very carefully. Use it at block level not method level. And your synchronized block size should be minimum as much as possible.
What happen is if we have big synchronized block, we give connection, but it will be in waiting state as this synchronized block needs too much time for execution. so other thread waiting time increases. Suppose we have 100 users, each have 100 threads for that operation. 1st is executing and it takes too long time. and others are waiting. So there may be a case where 80th 90th,etc thread throw timeout. And For some thread this issue occurs.
So you must need to reduce size of the synchronized block.
ii. And also for this case also check If the transaction is big, then try to cut the transaction into smaller ones if possible:-
For an example here, for one insertion one small transaction. for second other small transaction, like this. And these three small transaction completes operation.
Possibility III: Pool size is not enough if usability of application is too high Solution
Need to increase the pool size. (It is applicable if you properly closes all the connection after use)
You can use Java Executor service in this case .One thread One connection , all asynchronous .Once transaction completed , release the connection back to pool.That way , you can get rid of this timeout issue.
If one connection is inserting the data in 3 tables and other threads trying to make connection are waiting, timeout is bound to happen.
I'm trying to better understand what will happen if multiple threads try to execute different sql queries, using the same JDBC connection, concurrently.
Will the outcome be functionally correct?
What are the performance implications?
Will thread A have to wait for thread B to be completely done with its query?
Or will thread A be able to send its query immediately after thread B has sent its query, after which the database will execute both queries in parallel?
I see that the Apache DBCP uses synchronization protocols to ensure that connections obtained from the pool are removed from the pool, and made unavailable, until they are closed. This seems more inconvenient than it needs to be. I'm thinking of building my own "pool" simply by creating a static list of open connections, and distributing them in a round-robin manner.
I don't mind the occasional performance degradation, and the convenience of not having to close the connection after every use seems very appealing. Is there any downside to me doing this?
I ran the following set of tests using a AWS RDS Postgres database, and Java 11:
Create a table with 11M rows, each row containing a single TEXT column, populated with a random 100-char string
Pick a random 5 character string, and search for partial-matches of this string, in the above table
Time how long the above query takes to return results. In my case, it takes ~23 seconds. Because there are very few results returned, we can conclude that the majority of this 23 seconds is spent waiting for the DB to run the full-table-scan, and not in sending the request/response packets
Run multiple queries in parallel (with different keywords), using different connections. In my case, I see that they all complete in ~23 seconds. Ie, the queries are being efficiently parallelized
Run multiple queries on parallel threads, using the same connection. I now see that the first result comes back in ~23 seconds. The second result comes back in ~46 seconds. The third in ~1 minute. etc etc. All the results are functionally correct, in that they match the specific keyword queried by that thread
To add on to what Joni mentioned earlier, his conclusion matches the behavior I'm seeing on Postgres as well. It appears that all "correctness" is preserved, but all parallelism benefits are lost, if multiple queries are sent on the same connection at the same time.
Since the JDBC spec doesn't give guarantees of concurrent execution, this question can only be answered by testing the drivers you're interested in, or reading their source code.
In the case of MySQL Connector/J, all methods to execute statements lock the connection with a synchronized block. That is, if one thread is running a query, other threads using the connection will be blocked until it finishes.
Doing things the wrong way will have undefined results... if someone runs some tests, maybe they'll answer all your questions exactly, but then a new JVM comes out, or someone tries it on another jdbc driver or database version, or they hit a different set of race conditions, or tries another platform or JVM implementation, and another different undefined result happens.
If two threads modify the same state at the same time, anything could happen depending on the timing. Maybe the 2nd one overwrites the first's query, and then both run the same query. Maybe the library will detect your error and throw an exception. I don't know and wouldn't bother testing... (or maybe someone already knows or it should be obvious what would happen) so this isn't "the answer", but just some advice. Just use a connection pool, or use a synchronized block to ensure problems don't happen.
We had to disable the statement cache on Websphere, because it was throwing ArrayOutOfBoundsException at PreparedStatement level.
The issue was that some guy though it was smart to share a connection with multiple threads.
He said it was to save connections, but there is no point multithreading queries because the db won't run them parallel.
There was also an issue with a java runnables that were blocking each others because they used the same connection.
So that's just something to not do, there is nothing to gain.
There is an option in websphere to detect this multithreaded access.
I implemented my own since we use jetty in developpement.
I'm having an issue with the jdbc Connection Pool on glassfish handing out dead database connections. I'm running Glassfish 3.1.2.2 using jconn3 (com.sybase.jdbc3) to connect to Sybase 12.5. Our organization has a nightly reboot process during which time we restart the Sybase server. My issue manifests itself when an attempt to use a database connection during the reboot occurs. Here are the order of operations to produce my issue:
Sybase is down for restart.
Connection is requested from the pool.
DB operation fails as expected.
Connection is returned to the pool in a closed state.
Sybase is back up.
Connection is requested from the pool.
DB operation fails due to "Connection is already closed" exception.
Connection is returned to the pool
I've implemented a database recovery singleton that attempts to recover from this scenario. Any time a database exception occurs I make a jmx call to pause all queue's and execute a flushConnectionPool operation on the JDBC Connection Pool. If the database connection is still not up the process sets up a timer to retry in 10 minutes. While this process works, it's not without flaws.
I realize there's a setting on the pool so that you can require validation on the database connection prior to handing it out but I've shied away from this for performance reasons. My process performs approximately 5 million database transactions a day.
My question is, does anyone know of a way to avoid returning a dead connection back to the pool in the first place?
You've pretty well summed up your options. We had that problem, the midnight DB going down. For us, we turned on connection validation, but we don't have your transaction volume.
Glassfish offers a custom validation option, with which a class can be specified to do the validation.
By default, all the classes provided by Glassfish do (You'll see them offered as options in the console) is a SQL statement like this:
SELECT 1;
The syntax varies a bit between databases, SQL Server is uses '1', whereas for Postgres, it just uses 1. But the intent is the same.
The net is that it will cost you an extra DB hit every time you try to get a connection, but it's a really, really cheap hit. But still, it's a hit.
But you could implement your own version. It could do the check, say, every 10th request, or even less frequent. Roll a random number from 1 to N (N = 10, 20, 100...), if you get a '1', do the select (and fail, if it fails), otherwise return "true". But at the same time, configure it so that if you do detect an error, purge the entire pool. Clearly tweak this so you have a good chance of it happening when your db goes down at night (dunno how busy your system is at night) vs peak processing.
You could even "lower the odds" during peak processing. "if time between 6am and 6pm then odds = 1000 else odds = 100; if (random(odds) == 1) { do select... }"
A random option removes the need to maintain a thread safe counter.
In the end, it doesn't really matter, you just want a timely note that the DB is down so you can ask GF to abort the pool.
I can definitely see it thrashing a bit at the very beginning as the DB comes up, possibly refreshing the pool more than once, but that should be harmless.
Different ways you could play with that, but that's an avenue to consider.
So I have a Java process that runs indefinitely as a TCP server (receives messages from another process, and has onMsg handlers).
One of the things I want to do with the message in my Java program is to write it to disk using a database connection to postgres. Right now, I have one single static connection object which I call every time a message comes in. I do NOT close and reopen the connection for each message.
I am still a bit new to Java, I wanted to know 1) whether there are any pitfalls or dangers with using one connection object open indefinitely, and 2) Are there performance benefits to never closing the connection, as opposed to reopening/closing every time I want to hit the database?
Thanks for the help!
I do NOT close and reopen the connection for each message.
Yes you do... at least as far as the plain Connection object is concerned. Otherwise, if you ever end up with a broken connection, it'll be broken forever, and if you ever need to perform multiple operations concurrently, you'll have problems.
What you want is a connection pool to manage the "real" connections to the database, and you just ask for a connection from the pool for each operation and close it when you're done with it. Closing the "logical" connection just returns the "real" connection to the pool for another operation. (The pool can handle keeping the connection alive with a heartbeat, retiring connections over time etc.)
There are lots of connection pool technologies available, and it's been a long time since I've used "plain" JDBC so I wouldn't like to say where the state of the art is at the moment - but that's research you can do for yourself :)
Creating a database connection is always a performance hit. Only a very naive implementation would create and close a connection for each operation. If you only needed to do something once an hour, then it would be acceptable.
However, if you have a program that performs several database accesses per minute (or even per second for larger apps), you don't want to actually close the connection.
So when do you close the connection? Easy answer: let a connection pool handle that for you. You ask the pool for a connection, it'll give you an open connection (that it either has cached, or if it really needs to, a brand new connection). When you're finished with your queries, you close() the connection, but it actually just returns the connection to the pool.
For very simple programs setting up a connection pool might be extra work, but it's not very difficult and definitely something you'll want to get the hang of. There are several open source connection pools, such as DBCP from Apache and 3CPO.