Java UrlConnection triggering "Connection reset" exceptions under high load. Why?

Java UrlConnection triggering "Connection reset" exceptions under high load. Why? - java

I'm using Java to stream files from Amazon S3, on Linux (Ubuntu 10) 64-bit servers.
I'm using a separate thread for each file, and each file opens an HttpURLConnection which downloads and processes each file concurrently.
Everything works beautifully until I reach a certain number of streams (usually around 2-300 concurrent streams). At irregular points after this, several (say 10) of the threads will start experiencing java.net.IOException: Connection reset errors simultaneously.
I am throttling the download speed, and am way below the 250mbit/s limit of an m1.large instance. There is also insignificant load on all other server aspects (e.g. CPU, load average and memory usage are all fine).
What could be causing this, or how could I track it down?

not trivial to guess what may happen but this is a couple of hints , may be some may apply into your context:
can you check your shell (linux bash /zsh or any other) to see if you raise up the standard limits restricting the number of file descriptors (but sockets too),
man ulimit with bash shell
did you close the streams explicitly in your Java code ? not closing streams may induce such clever problems
try to google for Linux TCP kernel tuning to try to see if your ubuntu server has a well suited stack for such load context...
HTH
Jerome

They might have spillover problem at VIPs because of number of con-current connections reached the limit. You may decrease the size and see...

The problem here is largely in your language. The high load is triggering the error condition, and the error condition results in the exception. Not the other way around.

One relatively common reason for problems like this is that an intermediate proxy (firewall, load balancer) drops what it deems inactive (or too long-lived) HTTP connection.
But beyond this general possibility, EC2 definitely has more kinks as others have suggested.

You are probably running out of ephemeral ports. This happens under load when many short lived connections are opened and closed rapidly. The standard Java HttpURLConnection is not going to get you the flexibility you need to set the proper socket options. I recommend going with the Apache HttpComponents project, and setting options like so...
...
HttpGet httpGet = new HttpGet(uri);
HttpParams params = new BasicHttpParams();
params.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 16 * 1000); // 16 seconds
params.setParameter(CoreConnectionPNames.SO_REUSEADDR, true); // <-- teh MOJO!
DefaultHttpClient httpClient = new DefaultHttpClient(connectionManager, params);
BasicHttpContext httpContext = new BasicHttpContext();
HttpResponse httpResponse = httpClient.execute(httpGet, httpContext);
StatusLine statusLine = httpResponse.getStatusLine();
if (statusLine.getStatusCode() >= HTTP_STATUS_CODE_300)
{
...
I've omitted some code, like the connectionManager setup, but you can grok that from their docs.
[Update]
You might also add params.setParameter(CoreConnectionPNames.SO_LINGER, 1); to keep ephemeral ports from lingering around before reclamation.

Related

gRPC connection cycling

We are setting up a cluster to handle inferencing (with Tensorflow Serving) over gRPC. We intend to use a layer-7 load balancer (AWS ALB) to distribute the load. For our work load, inferencing will occur many times per minute from each client account. It is my understand that gRPC holds connection state for each of these channels. As a result, in order for the ALB to do its job, we need to periodically teardown and rebuild the connection on the client instance.
My question: what is the best practice for cycling a connection in Java?
Below is my proposed code, which would be called every couple minutes on each client channel. I assume that while the first connection is being shutdown, we can go about creating new one and immediately issue a request on it; or do we need to wait while the prior channel is shutdown first. In our situation, the channel will (very likely) be empty since the previous request will have been 10 seconds earlier.
if (mChannel != null)
mChannel.shutdown();
mChannel = ManagedChannelBuilder.forAddress(mHost, mPort).usePlaintext().build();
mStub = PredictionServiceGrpc.newBlockingStub(mChannel);

The best practice is to use Lookaside Load Balancing.
However, you can do few tweaks to terminate client connections.
var builder = ManagedChannelBuilder.forAddress(mHost, mPort)
.keepAliveTime(15, TimeUnit.SECONDS)
.keepAliveTimeout(5, TimeUnit.SECONDS);
The above config will ensure to terminate sticky gRPC connections, and AWS ALB can do its job to load balance requests uniformly.
There are other options that you can try depending upon your use case, e.g retries, etc. See ManagedChannelBuilder

How to to limit connections for the same host (setMaxPerRoute) using PoolingHttpClientConnectionManager?

I use PoolingHttpClientConnectionManager for sending multiple GET/POST requests in parallel to following services:
(1) http://localhost:8080/submit
(2) http://localhost:8080/query
Both services are heavily used but the first service (1) has a higher priority.
I need to set setMaxPerRoute for service (1) so that it will consume 80% of available connections.
The rest 20% limit will be allocated for the rest requests with longer timeouts (including service (2)). Here is my code:
...
PoolingHttpClientConnectionManager httpClientManager =
new PoolingHttpClientConnectionManager();
httpClientManager.setMaxTotal(10);
httpClientManager.setDefaultMaxPerRoute(2);
HttpHost httpHost = new HttpHost("http://localhost/submit",8080);
HttpRoute submitRoute = new HttpRoute(httpHost);
httpClientManager.setMaxPerRoute(submitRoute, 8);
...
The problem is that HttpHost apparently cannot be the same to differentiate among routes. In fact, two URL-s have the same host (http://localhost:8080), but different request pages. In the result, both services are used the same resources.
Is there any way to implement such a limitation for the same host?
Thanks for help.

After suggestions from my co-workers I've found the solution.
We need to control the maximum number of connections so that when requesting URL (1) we have the pool with max 20 connections, whereas the rest type of requests including request (2) we have the pool with max 2 connections.
It can be solved by creating two different HttpClient objects each one has its own PoolingHttpClientConnectionManager. The first manager is set with setMaxTotal=20, while the second with setMaxTotal=2.
Now each pool is has different limitations for the same domain.

Axis2 1.5.1 connections management

HttpConnections where not being used efficiently by our code using Axis2 1.5.1 project. By setting a certain limit of max connections per host and stressing the application, responsiveness was not the good I expected according the intentional limits and sometimes connections got stucked indefinitly, so the available connections were each time less till reaching the point that none request was attended by the application.
Configuration:
MultiThreadedHttpConnectionManager connManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams connectionManagerParams = connManager.getParams();
connectionManagerParams.setMaxTotalConnections(httpMaxConnections);
connectionManagerParams.setDefaultMaxConnectionsPerHost(httpMaxConnectionsPerHost);
HttpClient httpClient = new HttpClient(connManager);
ConfigurationContext axisContext;
try {
axisContext = ConfigurationContextFactory.createDefaultConfigurationContext();
} catch (Exception e) {
throw new AxisFault(e.getMessage());
}
axisContext.setProperty(HTTPConstants.CACHED_HTTP_CLIENT, httpClient);
service = new MyStub(axisContext, url);
ServiceClient serviceClient = service._getServiceClient();
serviceClient.getOptions().setProperty(HTTPConstants.CONNECTION_TIMEOUT, httpConnectionTimeout);
serviceClient.getOptions().setProperty(HTTPConstants.SO_TIMEOUT, httpReadTimeout);
serviceClient.getOptions().setProperty(HTTPConstants.REUSE_HTTP_CLIENT, Constants.VALUE_TRUE);
So, as you can see, we're defining max. connections and timeouts.
I have a workaround I will share, hoping to help somebody under hurries as I was. I'll mark my answer as the good one a few days later if there isn't any better answer from experts.

1) PoolTimeout to prevent the connections that got stucked (for any reason)
Next line helped us to prevent Axis2 to lose connections that got stucked forever:
httpClient.getParams().setParameter(HttpClientParams.CONNECTION_MANAGER_TIMEOUT, 1000L);
Let's call it PoolTimeout in this entry. Make sure it's a Long, since an Integer (or int) would raise a ClassCastException that will prevent your service to even be triggered outside your client.
The system you're developing, and that is using Axis, could be in turn a client for another system. And that other system will have for sure an specific ConnectionTimeout. So I suggest
PoolTimeout <= ConnectionTimeout
Example:
serviceClient.getOptions().setProperty(HTTPConstants.CONNECTION_TIMEOUT, httpConnectionTimeout);
httpClient.getParams().setParameter(HttpClientParams.CONNECTION_MANAGER_TIMEOUT, Long.valueOf(httpConnectionTimeout) );
2) Connections release
I was using Amila's suggestion for connection management, but actually the connections were not released as fast as in advance I expected they would be (because I prepared consciously the delay times mocked external system would respond to fit limits accordingly my tunning configuration).
So I found that next lines, in method org.apache.axis2.client.OperationClient.executeImpl(boolean), helped to mark as available the connection in the pool as soon as it's been used:
HttpMethod method = (HttpMethod) getOperationContext().getMessageContext(WSDLConstants.MESSAGE_LABEL_OUT_VALUE)
.getProperty(HTTPConstants.HTTP_METHOD);
method.releaseConnection();
That's what Axis is trying to do when calling serviceClient.cleanupTransport() but it seems the context is not correct.
Now, performance tunning is working in a predictable way, so it's in hands of our integrators to select the tunning configuration that best suits production needs.
A better answer will be highly appreciated.

Timeout for UnknownHostException with established connection but no internet

I have an interesting issue.
I have an application where inside of it, I'm trying to account for the condition where the phone is connected to a router, but that router is not connected to the internet.
I've tried multiple methods of establishing the connection, but NONE of the timeouts account for this condition.
I've tried:
HttpParams httpParameters = new BasicHttpParams();
int timeoutSocket = 1000;
HttpConnectionParams.setSoTimeout(httpParameters, timeoutSocket);
HttpConnectionParams.setConnectionTimeout(httpParameters, timeoutSocket);
I've also tried:
HttpURLConnection huc = (HttpURLConnection)serverAddress.openConnection();
huc.setDoOutput(true);
huc.setRequestMethod("PUT"); // For amazon
//huc.setRequestMethod("POST"); // For regular server.
huc.setRequestProperty("Content-Type", "text/plain");
huc.setRequestProperty("Content-Length", String.valueOf(bytes));
huc.setFixedLengthStreamingMode(bytes);
huc.setConnectTimeout(1000); // Establishing connection timeout
huc.setReadTimeout(1000);
But in BOTH cases, when I execute/ get the output stream, it takes about 20 seconds to receive an UnknownHostException error.
I would like that reduced to a maximum of 5 seconds before reaching that conclusion.
Is there any way to do this?
Cheers

Through lots of searching and through the help of this link I've found a solid solution that seams to be working so far.
My understanding of the conclusion is that when I use methods like:
DataOutputStream wr = new DataOutputStream(huc.getOutputStream());
or
InputStream is = ucon.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is);
(uploading or downloading)
There is a lot of things happening under the hood. Including a DNS lookup call. With no-connectivity, but while connected to a router, this is taking about 20 seconds to finally reach a UnknownHostException.
However, if I add this line of code first before the above code is executed:
InetAddress iAddr = InetAddress.getByName("myserverName.com");
Then it will give me the proper SocketTimeOutException and responds exactly how I would hope/expect it to. The above line of code apparently caches the DNS Lookup, and the timeouts work as expected.
Also, something to note: that once the failure is cached, executing the code above will take as long to fail as the other previous code. (Can't tell you exactly what will trigger this) But if you connect to the internet again, and then enter the connected but no connectivity state again, the earlier success will be cached and the timeouts will again work properly.
This wasn't particularly easy to find or figure out, so I hope this helps somebody.
Cheers,

You could implement a CountDownTimer that has a limit of 5000ms see this http://dewful.com/?p=3

BindException: address already in use on a client socket?

I've got a client-server tiered architecture with the client making RPC-like requests to the server. I'm using Tomcat to host the servlets, and the Apache HttpClient to make requests to it.
My code goes something like this:
private static final HttpConnectionManager CONN_MGR = new MultiThreadedHttpConnectionManager();
final GetMethod get = new GetMethod();
final HttpClient httpClient = new HttpClient(CONN_MGR);
get.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
get.getParams().setParameter(HttpMethodParams.USER_AGENT, USER_AGENT);
get.setQueryString(encodedParams);
int responseCode;
try {
responseCode = httpClient.executeMethod(get);
} catch (final IOException e) {
...
}
if (responseCode != 200)
throw new Exception(...);
String responseHTML;
try {
responseHTML = get.getResponseBodyAsString(100*1024*1024);
} catch (final IOException e) {
...
}
return responseHTML;
It works great in a lightly-loaded environment, but when I'm making hundreds of requests per second I start to see this -
Caused by: java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:336)
at java.net.Socket.bind(Socket.java:588)
at java.net.Socket.<init>(Socket.java:387)
at java.net.Socket.<init>(Socket.java:263)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
Any thoughts on how to fix this? I'm guessing it's something to do with the client trying to reuse the ephemeral client ports, but why is this happening / how can I fix it?
Thanks!

A very good discussion of the problem you are running into can be found here. On the Tomcat side, by default it will use the SO_REUSEADDR option, which will allow the server to reuse sockets which are in TIME_WAIT. Additionally, the Apache http client will by default use keep-alives, and attempt to reuse connections.
Your problems seems to be caused by not calling releaseConnection on the HttpClient. This is required in order for the connection to be reused. Otherwise, the connection will remain open until garbage collector comes and closes it, or the server disconnects the keep-alive. In both cases, it won't be returned to the pool.

With hundreds of connections a second, and without knowing how long your connections keep to open, do their thing, close, and get recycled, I suspect that this is just a problem you're going to have. One thing you can do is catch the BindException in your try block, use that to do anything you need to do in the bind-unsuccessful case, and wrap the whole call in a while loop that depends on a flag indicating whether the bind succeeded. Off the top of my head:
boolean hasBound = false;
while (!hasBound) {
try {
hasBound = true;
responseCode = httpClient.executeMethod(get);
} catch (BindException e) {
// do anything you want in the bound-unsuccessful case
} catch (final IOException e) {
...
}
}
Update with question: One curious question: what are the maximum total and per-host number of connections allowed by your MultiThreadedHttpConnectionManager? In your code, that'd be:
CONN_MGR.getParams().getDefaultMaxConnectionsPerHost();
CONN_MGR.getParams().getMaxTotalConnections();

Thus, you've fired more requests than TCP/IP ports are allowed to be opened. I don't do HttpClient, so I can't go in detail about this, but in theory there are three solutions for this particular problem:
Hardware based: add another NIC (network interface card).
Software based: close connections directly after use and/or increase the connection timeout.
Platform based: increase the amount of TCP/IP ports which are allowed to be opened. May be OS-specific and/or NIC driver-specific. The absolute maximum is 65535, of which several may already be reserved/in use (e.g. port 80).

So it turns out the problem was that one of the other HttpClient instances accidentally wasn't using the MultiThreadedHttpConnectionManager I instantiated, so I effectively had no rate limiting at all. Fixing this problem fixed the exception being thrown.
Thanks for all the suggestions, though!

Even though we invoke HttpClientUtils.closeQuietly(client); but in your code in case trying to read the content from HttpResponse entity like InputStream contentStream = HttpResponse.getEntity().getContent(), then you should close the inputstream also then only HttpClient connection get closed properly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java UrlConnection triggering "Connection reset" exceptions under high load. Why? - java

They might have spillover problem at VIPs because of number of con-current connections reached the limit. You may decrease the size and see...

The problem here is largely in your language. The high load is triggering the error condition, and the error condition results in the exception. Not the other way around.

One relatively common reason for problems like this is that an intermediate proxy (firewall, load balancer) drops what it deems inactive (or too long-lived) HTTP connection. But beyond this general possibility, EC2 definitely has more kinks as others have suggested.

Related

gRPC connection cycling

How to to limit connections for the same host (setMaxPerRoute) using PoolingHttpClientConnectionManager?

Axis2 1.5.1 connections management

Timeout for UnknownHostException with established connection but no internet

BindException: address already in use on a client socket?

Categories

Resources