I have this simple Spring boot based web app that downloads data from several APIs. Some of them don't respond in time, since my connectionTimeout is set to somewhat 4 seconds.
As soon as I get rid of connectionTimeout setting, I'm getting an exceptions after 20 or so seconds.
So, my question is, for how long am I able to try to connect to an API and what does it depend on? Where do those 20 seconds come from? What if an API responds after 40 minutes of time and I won't be able to catch that specific moment and just gonna lose data. I don't want that to happen. What are my options?
Here's the code to set the connection. Nothing special.
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new HttpComponentsClientHttpRequestFactory(HttpClientBuilder.create().build());
clientHttpRequestFactory.setConnectTimeout(4000);
RestTemplate restTemplate = new RestTemplate(clientHttpRequestFactory);
Then I retrieve the values via:
myObject.setJsonString(restTemplate.getForObject(url, String.class));
Try increasing your timeout. 4 seconds is too little.
It will need to connect, formulate data and return. So 4 seconds is just for connecting, by the time it attempts to return anything, your application has already disconnected.
Set it to 20 seconds to test it. You can set it to much longer to give the API enough time to complete. This does not mean you app will use up all of the connection timeout time. It will finish as soon as a result is returned. Also API's are not designed to take long. They will perform the task and return the result as fast as possible
Connection timeout means that your program couldn't connect to the server at all within the time specified.
The timeout can be configured, as, like you say, some systems may take a longer time to connect to, and if this is known in advance, it can be allowed for. Otherwise the timeout serves as a guard to prevent the application from waiting forever, which in most cases doesn't really give a good user experience.
A separate timeout can normally be configured for reading data (socket timeout). They are not inclusive of each other.
To solve your problem:
Check that the server is running and accepting incoming connections.
You might want to use curl or depending on what it is simply your browser to try and connect.
If one tool can connect, but the other can't, check your firewall settings and ensure that outgoing connections from your Java program are permitted. The easiest way to test whether this is a problem is to disable anti virus and firewall tools temporarily. If this allows the connection, you'll either need to leave the FW off, or better add a corresponding exception.
Leave the timeout on a higher setting (or try setting it to 0, which is interpreted as infinite) while testing. Once you have it working, you can consider tweaking it to reflect your server spec and usability requirements.
Edit:
I realised that this doesn't necessarily help, as you did ultimately connect. I'll leave the above standing as general info.
for how long am I able to try to connect to an API and what does it depend on?
Most likely the server that the API is hosted on. If it is overloaded, response time may lengthen.
Where do those 20 seconds come from?
Again this depends on the API server. It might be random, or it may be processing each request for a fixed period of time before finding itself in an error state. In this case that might take 20 seconds each time.
What if an API responds after 40 minutes of time and I won't be able to catch that specific moment and just gonna lose data. I don't want that to happen. What are my options?
Use a more reliable API, possibly paying for a service guarantee.
Tweak your connection and socket timeouts to allow for the capabilities of the server side, if known in advance.
If the response is really 40 minutes, it is a really poor service, but moving on with that assumption - if the dataset is that large, explore whether the API offers a streaming callback, whereby you pass in an OutputStream into the API's library methods, to which it will (asynchronously) write the response when it is ready.
Keep in mind that connection and socket timeout are separate things. Once you have connected, the connection timeout becomes irrelevant (socket is established). As long as you begin to receive and continue to receive data (packet to packet) within the socket timeout, the socket timeout won't be triggered either.
Use infinite timeouts (set to 0), but this could lead to poor usability within your applications, as well as resource leaks if a server is in fact offline and will never respond. In that case you will be left with dangling connections.
The default and maximum has nothing to do with the the server. It depends on the client platform, but it is around a minute. You can decrease it, but not increase it. Four seconds is far too short. It should be measured in tens of seconds in most circumstances.
And absent or longer connection timeouts do not cause server errors of any kind. You are barking up the wrong tree here.
Related
The setup:
We have an https://Main.externaldomain/xmlservlet site, which is authenticating/validating/geo-locating and proxy-ing (slightly modified) requests to http://London04.internaldomain/xmlservlet for example.
There's no direct access to internaldomain exposed to end-users at all. The communication between the sites gets occasionally interrupted and sometimes the internaldomain nodes become unavailable/dead.
The Main site is using org.apache.http.impl.client.DefaultHttpClient (I know it's deprecated, we're gradually upgrading this legacy code) with readTimeout set to 10.000 milli-seconds.
The request and response have xml payload/body of variable length and the Transfer-Encoding: chunked is used, also the Keep-Alive: timeout=15 is used.
The problem:
Sometimes London04 actually needs more than 10 seconds (let's say 2 minutes) to execute. Sometimes it non-gracefully crashes. Sometimes other (networking) issues happen.
Sometimes during those 2 minutes - the portions of response-xml-data are being so gradually filled that there're no 10-second gaps between the portions and therefore the readTimeout is never exceeded,
sometimes there's a 10+ seconds gap and HttpClient times out...
We could try to increase the timeout on Main side, but that would easily bloat/overload the listener pool (just by regular traffic, not even being DDOSed yet).
We need a way to distinguish between internal-site-still-working-on-generating-the-response and the cases where it really crashed/network_lost/etc.
And a best thing feels to be some kind of heart-beat (every 5 seconds) during the communication.
We thought the Keep-Alive would save us, but it seems to only secure the gaps between the requests (not during the requests) and it seems to not do any heartbeating during the gap (just having/waiting_for the timeout).
We thought chunked-encoding may save us by sending some heartbeat (0-bytes-sized-chunks) to let other side aware, but there seems to be no such/default implementation of supporting any heartbeat this way and moreso it seems that 0-bytes-sized chunk is an EOD indicator itself...
Question(s):
If we're correct in assumptions that KeepAlive/ChunkedEncoding won't help us with achieving the keptAlive/hearbeat/fastDetectionOfDeadBackend then:
1) which layer such a heart-beat should be rather implemented at? Http? tcp?
2) any standard framework/library/setting/etc implementing it already? (if possible: Java, REST)
UPDATE
I've also looked into heartbeat-implementers for WADL/WSDL, though found none for REST, checked out the WebSockets...
Also looked into TCP-keepalives which seem to be the right feauture for the task:
https://en.wikipedia.org/wiki/Keepalive
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
Socket heartbeat vs keepalive
WebSockets ping/pong, why not TCP keepalive?
BUT according to those I'd have to set up something like:
tcp_keepalive_time=5
tcp_keepalive_intvl=1
tcp_keepalive_probes=3
which seems to be a counter-recommendation (2h is the recommended, 10min already presented as an odd value, is going to 5s sane/safe?? if it is - might be my solution upfront...)
also where should I configure this? on London04 alone or on Main too? (if I set it up on Main - won't it flood client-->Main frontend communication? or might the NATs/etc between sites ruin the keepalive intent/support easily?)
P.S. any link to an RTFM is welcome - I might just be missing something obvious :)
My advice would be don't use a heartbeat. Have your external-facing API return a 303 See Other with headers that indicates when and where the desired response might be available.
So you might call:
POST https://public.api/my/call
and get back
303 See Other
Location: "https://public.api/my/call/results"
Retry-After: 10
To the extent your server can guess how long a response will take to build, it should factor that into the Retry-After value. If a later GET call is made to the new location and the results are not yet done being built, return a response with an updated Retry-After value. So maybe you try 10, and if that doesn't work, you tell the client to wait another 110, which would be two minutes in total.
Alternately, use a protocol that's designed to stay open for long periods of time, such as WebSockets.
Take a look SSE
example code:
https://github.com/rsvoboda/resteasy-sse
or vertx event-bus:
https://vertx.io/docs/apidocs/io/vertx/core/eventbus/EventBus.html
I encountered several stuck JDBC connections in my code due to poor network health. I am planning java.sql.Connection.setNetworkTimeout library function. As per docs:-
Sets the maximum period a Connection or objects created from the Connection will wait for the database to reply to any one request
Now, what exactly is the request here? my query takes really long time to respond and even longer time to process (I am using jdbc interface to a big data DB). So do I need to keep this timeout time, bigger than the expected query execution time (to prevent false trigger) or will there exist keep alive messages, being exchanged to keep track on network connection?, in which case I will keep it really low
So if your NetworkTimeout is smaller than the QueryTimeout, the query will be terminated on your side - thread that waits for the DB to reply (notice that setNetworkTimeout has Executor executor parameter) will be interrupted. Depending on the underlying implementation NetworkTimeout may cancel the query on the DB side as well.
If NetworkTimeout > QueryTimeout, and query completes within QueryTimeout then nothing bad should happen. If problems you experience are exactly in this case, you should try to work on the OS level settings for keeping TCP connections alive so that no firewall terminates them too soon.
When it comes to keeping TCP connections alive it is usually more a matter of the OS level settings than the application itself. You can read more about it (Linux) here.
In my current project, it will receive some messages from upstream systems, and upload them to another storage server by http service concurrently.
Since the system may receive many messages from upstream system at a short time, I use apache HttpClient with a pool, and, set:
If http client can't connect to storage server in 10s, it will throw exception
If http connection can't receive response from storage server in 10s, it will throw exception
If system can't get http connection from pool in 30 seconds, it will throw exception.
But my friend disagrees the 3rd point. She says, if the new messages can't get connections from the pool, just let them wait, since they will get a connection finally and save to storage server. The exceptions are not necessary in this case.
But I'm afraid that if we received too many messages from upstream, that there will be too many threads are blocking to wait connections, this may result the system unstable.
Do you think point 3 is good or bad? Do I need to set a timeout for it?
If system can't get http connection from pool in 30 seconds, it will throw exception.
Do you think point 3 is good or bad? Do I need to set a timeout for it?
This seems very much to be a business decision and not a coding issue. Is it okay for the request to wait for a long time for things? Is it okay for the storage interface to throw an exception if some time expires?
If the storage-server is somehow hosed I'm assuming that all of the requests (that are able to get a connection) are waiting 10 seconds and then throwing. If you have enough connections in the queue then this may cause your persist operations to wait a long time to even get a connection. Seems like a timeout is warranted but again this a business decision.
Generally, I would provide a timeout parameter (in seconds or millis) for the persist method to wait to complete. Then the caller can pass in Long.MAX_VALUE if they want otherwise they will get an exception. Or have another method that does not have a timeout parameter that is documented to chain to the other method with Long.MAX_VALUE.
In designing my GWT/GAE app, it has become evident to me that my client-side (GWT) will be generating three types of requests:
Synchronous - "answer me right now! I'm important and require a real-time response!!!"
Asynchronous - "answer me when you can; I need to know the answer at some point but it's really not all that ugent."
Command - "I don't need an answer. This isn't really a request, it's just a command to do something or process something on the server-side."
My game plan is to implement my GWT code so that I can specify, for each specific server-side request (note: I've decided to go with RequestFactory over traditional GWT-RPC for reasons outside the scope of this question), which type of request it is:
SynchronousRequest - Synchronous (from above); sends a command and eagerly awaits a response that it then uses to update the client's state somehow
AsynchronousRequest - Asynchronous (from above); makes an initial request and somehow - either through polling or the GAE Channel API, is notified when the response is finally received
CommandRequest - Command (from above); makes a server-side request and does not wait for a response (even if the server fails to, or refuses to, oblige the command)
I guess my intention with SynchronousRequest is not to produce a totally blocking request, however it may block the user's ability to interact with a specific Widget or portion of the screen.
The added kicker here is this: GAE strongly enforces a timeout on all of its frontend instances (60 seconds). Backend instances have much more relaxed constraints for timeouts, threading, etc. So it is obvious to me that AsynchronousRequests and CommandRequests should be routed to backend instances so that GAE timeouts do not become an issue with them.
However, if GAE is behaving badly, or if we're hitting peak traffic, or if my code just plain sucks, I have to account for the scenario where a SynchronousRequest is made (which would have to go through a timeout-regulated frontend instance) and will timeout unless my GAE server code does something fancy. I know there is a method in the GAE API that I can call to see how many milliseconds a request has before its about to timeout; but although the name of it escapes me right now, it's what this "fancy" code would be based off of. Let's call it public static long GAE.timeLeftOnRequestInMillis() for the sake of this question.
In this scenario, I'd like to detect that a SynchronousRequest is about to timeout, and somehow dynamically convert it into an AsynchronousRequest so that it doesn't time out. Perhaps this means sending an AboutToTimeoutResponse back to the client, and force the client to decide about whether to resend as an AsynchronousRequest or just fail. Or perhaps we can just transform the SynchronousRequest into an AsynchronousRequest and push it to a queue where a backend instance will consume it, process it and return a response. I don't have any preferences when it comes to implementation, so long as the request doesn't fail or timeout because the server couldn't handle it fast enough (because of GAE-imposed regulations).
So then, here is what I'm actually asking here:
How can I wrap a RequestFactory call inside SynchronousRequest, AsynchronousRequest and CommandRequest in such a way that the RequestFactory call behaves the way each of them is intended? In other words, so that the call either partially-blocks (synchronous), can be notified/updated at some point down the road (asynchronous), or can just fire-and-forget (command)?
How can I implement my requirement to let a SynchronousRequest bypass GAE's 60-second timeout and still get processed without failing?
Please note: timeout issues are easily circumvented by re-routing things to backend instances, but backends don't/can't scale. I need scalability here as well (that's primarily why I'm on GAE in the first place!) - so I need a solution that deals with scalable frontend instances and their timeouts. Thanks in advance!
If the computation that you want GAE to do is going to take longer than 60 seconds, then don't wait for the results to be computed before sending a response. According to your problem definition, there is no way to get around this. Instead, clients should submit work orders, and wait for a notification from the server when the results are ready. Requests would consist of work orders, which might look something like this:
class ComputeDigitsOfPiWorkOrder {
// parameters for the computation
int numberOfDigitsToCompute;
// Used by the GAE app to contact the requester when results are ready.
ClientId clientId;
}
This way, your GAE app can respond as soon as the work order is saved (e.g. in Task Queue), and doesn't have to wait until it actually finishes calculating a billion digits of pi before responding. Your GWT client then waits for the result using the Channel API.
In order to give some work orders higher priority, you can use multiple task queues. If you want Task Queue work to scale automatically, you'll want to use push queues. Implementing priority using push queues is a little tricky, but you can configure high priority queues to have faster feed rate.
You could replace Channel API with some other notification solution, but that would probably be the most straightforward.
What is the best workflow taken when connection error occurs.
Let say we have a client that connects to the middle tier.
class MyClient{
...
void callServer(){
try{
middletier.saveRecord(123);
}catch(...){
// log the error
// what to do next (notify the user, take the same step again)?
// reinitialize connection ?
}
}
}
What to do if the connection error occured (time out, broken connection ...)
Should I notify the user that the connection has a problem and to try again?
Can something be done automatically, and transparent for the user ?
All I like is, not to bother the user with errors and then to tell the user what to do next.
So what is the best workflow for handling such errors ?
I can highly recommend Michael Nygards "Release It!" which spends quite a bit of time explaining how to make your software more robust.
http://www.pragprog.com/titles/mnee/release-it
It depends... If the action is caused by user interaction inform the user. The user can decide how often he wants to retry. The code may retry by itself but if it is a timeout than the user may wait for a several minutes (or abort the action not getting any feedback).
If it is a background task, try again after some delay (but not infinitely - eventually abort the action). you may reinitialize the connection to be sure, that depends on the used technology and if you use connection pooling.
Of course if you want to invest more time you can handle different errors differently. First of all, differentiate permanent errors (a retry in a few minutes wouldn't help) from intermittent errors (could be OK the next time). For instance, with a broken connection you could retry with a new one (maybe the firewall dropped an open connection because of inactivity). But you probably can't do anything about a time out (maybe a network configuration problem) or "HTTP 404 Not found" (assuming you can't change the URL you use for a HTTP call).
You could gather all this knowledge in "diagnosis&repair" component.
I also recommend reading "Release it!".
This is absolutely dependent on application requirements. Sometimes it is better to inform user immediately and sometimes it is better to retry the request several times before informing the user. You have to consult this with your customer / analyst.
From perspective of caller MyClient : Generally speaking, the failed method invocation should leave the MyClient in the state that it was in prior to the invocation. That means you should take care how to recover state of pre-middletier.saveRecord(123);