In my test application I execute consecutive HttpGet requests to the same host with Apache HttpClient but upon each next request it turns out that the previous HttpConnection is closed and the new HttpConnection is created.
I use the same instance of HttpClient and don't close responses. From each entity I get InputStream, read from it with Scanner and then close the Scanner. I have tested KeepAliveStrategy, it returns true. The time between requests doesn't exceed keepAlive or connectionTimeToLive durations.
Can anyone tell me what could be the reason for such behavior?
Updated
I have found the solution. In order to keep the HttpConnecton alive it is necessary to set HttpClientConnectionManager when building HttpClient. I have used BasicHttpClientConnectionManager.
ConnectionKeepAliveStrategy keepAliveStrat = new DefaultConnectionKeepAliveStrategy() {
#Override
public long getKeepAliveDuration(HttpResponse response, HttpContext context)
{
long keepAlive = super.getKeepAliveDuration(response, context);
if (keepAlive == -1)
keepAlive = 120000;
return keepAlive;
}
};
HttpClientConnectionManager connectionManager = new BasicHttpClientConnectionManager();
try (CloseableHttpClient httpClient = HttpClients.custom()
.setConnectionManager(connectionManager) // without this setting connection is not kept alive
.setDefaultCookieStore(store)
.setKeepAliveStrategy(keepAliveStrat)
.setConnectionTimeToLive(120, TimeUnit.SECONDS)
.setUserAgent(USER_AGENT)
.build())
{
HttpClientContext context = new HttpClientContext();
RequestConfig config = RequestConfig.custom()
.setCookieSpec(CookieSpecs.DEFAULT)
.setSocketTimeout(10000)
.setConnectTimeout(10000)
.build();
context.setRequestConfig(config);
HttpGet httpGet = new HttpGet(uri);
CloseableHttpResponse response = httpClient.execute(httpGet, context);
HttpConnection conn = context.getConnection();
HttpEntity entity = response.getEntity();
try (Scanner in = new Scanner(entity.getContent(), ENC))
{
// do something
}
System.out.println("open=" + conn.isOpen()); // now open=true
HttpGet httpGet2 = new HttpGet(uri2); // on the same host with other path
// and so on
}
Updated 2
In general checking connections with conn.isOpen() is not proper way to check the connections state because: "Internally HTTP connection managers work with instances of ManagedHttpClientConnection acting as a proxy for a real connection that manages connection state and controls execution of I/O operations. If a managed connection is released or get explicitly closed by its consumer the underlying connection gets detached from its proxy and is returned back to the manager. Even though the service consumer still holds a reference to the proxy instance, it is no longer able to execute any I/O operations or change the state of the real connection either intentionally or unintentionally." (HttpClent Tutorial)
As have pointed #oleg the proper way to trace connections is using the logger.
First of all you need to make sure remote server you're working with does support keep-alive connections. Just simply check whether remote server does return header Connection: Keep-Alive or Connection: Closed in each and every response. For Close case there is nothing you can do with that. You can use this online tool to perform such check.
Next, you need to implement the ConnectionKeepAliveStrategy as defined in paragraph #2.6 of this manual. Note that you can use existent DefaultConnectionKeepAliveStrategy since HttpClient version 4.0, so that your HttpClient will be constructed as following:
HttpClient client = HttpClients.custom()
.setKeepAliveStrategy(DefaultConnectionKeepAliveStrategy.INSTANCE)
.build();
That will ensure you HttpClient instance will reuse the same connection via keep-alive mechanism if it is being supported by server.
Your application must be closing response objects in order to ensure proper resource de-allocation of the underlying connections. Upon response closure HttpClient keeps valid connections alive and returns them back to the connection manager (connection pool).
I suspect your code simply leaks connections and every request ens up with a newly created connection while all previous connections keep on piling up in memory.
From the example at HttpClient website:
// In order to ensure correct deallocation of system resources
// the user MUST call CloseableHttpResponse#close() from a finally clause.
// Please note that if response content is not fully consumed the underlying
// connection cannot be safely re-used and will be shut down and discarded
// by the connection manager.
So as #oleg said you need to close the HttpResponse before checking the connection status.
Related
We want to migrate all our apache-httpclient-4.x code to java-http-client code to reduce dependencies. While migrating them, i ran into the following issue under java 11:
How to set the socket timeout in Java HTTP Client?
With apache-httpclient-4.x we can set the connection timeout and the socket timeout like this:
DefaultHttpClient httpClient = new DefaultHttpClient();
int timeout = 5; // seconds
HttpParams httpParams = httpClient.getParams();
httpParams.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, timeout * 1000);
httpParams.setParameter(CoreConnectionPNames.SO_TIMEOUT, timeout * 1000);
With java-http-client i can only set the connection timeout like this:
HttpClient httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(5))
.build()
But i found no way to set the socket timeout. Is there any way or an open issue to support that in the future?
You can specify it at the HttpRequest.Builder level via the timeout method:
HttpClient httpClient = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(5))
.build();
HttpRequest httpRequest = HttpRequest.newBuilder()
.uri(URI.create("..."))
.timeout(Duration.ofSeconds(5)) //this
.build();
httpClient.send(httpRequest, HttpResponse.BodyHandlers.ofString());
If you've got connected successfully but not able to receive a response at the desired amount of time, java.net.http.HttpTimeoutException: request timed out will be thrown (in contrast with java.net.http.HttpConnectTimeoutException: HTTP connect timed out which will be thrown if you don't get a successful connection).
There doesn't seem to be a way to specify a timeout on the flow of packets (socket timeout) on the Java Http Client.
I found an enhancement request on OpenJDK which seems to cover this possibility - https://bugs.openjdk.org/browse/JDK-8258397
Content from the link
The HttpClient lets you set a connection timeout (HttpClient.Builder) and a request timeout (HttpRequest.Builder). However the request timeout will be cancelled as soon as the response headers have been read. There is currently no timeout covering the reception of the body.
A possibility for the caller is to make use of the CompletableFuture API (get/join will accept a timeout, or CF::orTimeout can be called).
IIRC - in that case, it will still be the responsibility of the caller to cancel the request. We might want to reexamine and possibility change that.
The disadvantage here is that some of our BodyHandlers (ofPublisher, ofInputStream) will return immediately - so the CF API won't help in this case.
This might be a good thing (or not).
Another possibility could be to add a body timeout on HttpRequest.Builder. This would then cover all cases - but do we really want to timeout in the case of ofInputStream or ofPublisher if the caller doesn't read the body fast enough?
I'm trying to workout if a try-with-resource closes just the CloseableHttpClient, or also closes the response too.
For example,
private static CloseableHttpResponse sendRequest()
throws IOException
{
final HttpUriRequest httpUriRequest =
buildRequest(url, requestMethod, requestParameters, httpHeaders);
try (CloseableHttpClient client = HttpClientBuilder.create().build())
{
return client.execute(httpUriRequest);
}
}
We all know this will close the CloseableHttpClient as expected. What about the result of that call though? CloseableHttpClient returns a CloseableHttpResponse. Does that also need to be closed, either in the invoking code, or somewhere else? Or is it closed at the same time as CloseableHttpClient with that try-with-resource?
Bonus question: How can I prove to myself that things are actually being closed? I'm looking at the thread pool in IntelliJ but can't quite workout where/when things are closing.
The answer given by Jhilton is perfectly correct (my +1). However, as far as HttpClient specific resource management is concerned closure of HttpClient instance results in closure of all kept alive and active connection, including those currently associated with HttpResponse instances. That essentially means one does not have to close HttpResponse instances if closing HttpClient instance used to execute the message exchange but such pattern is very much discouraged.
try-with-resource will close only the resources declared in the try clause
try (CloseableHttpClient client = HttpClientBuilder.create().build())
e.g It will only close the variable "client"
Also if the response were to be closed, It would become a problem to extract the data, so the responsibility of closing it should fall elsewhere.
I recently switched from java.net to org.apache.http.client, I have setup a ClosableHttpClient with the HttpClientBuilder. As connection manager I am using the BasicHttpClientConnectionManager.
Now I have the problem that very often when I create some HTTP request I get a timeout exception. It seems that the connection manager is keeping connections open to reuse them but if the system is idle for a few minutes then this connection will timeout and when I make the next request the first thing I get is a timeout. Repeating the same request one more time then usually works without any problem.
Is there a way to configure the BasicHttpClientConnectionManager in order to not reuse its connections and create a new connection each time?
There several ways of dealing with the problem
Evict idle connections once no longer needed. The code below effectively disables connection persistence by closing out persistent connections after each HTTP exchange.
BasicHttpClientConnectionManager cm = new BasicHttpClientConnectionManager();
CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
...
try (CloseableHttpResponse response = httpclient.execute(new HttpGet("/"))) {
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
}
cm.closeIdleConnections(0, TimeUnit.MILLISECONDS);
Limit connection keep-alive time to something relatively small-ish
BasicHttpClientConnectionManager cm = new BasicHttpClientConnectionManager();
CloseableHttpClient httpclient = HttpClients.custom()
.setConnectionManager(cm)
.setKeepAliveStrategy((response, context) -> 1000)
.build();
try (CloseableHttpResponse response = httpclient.execute(new HttpGet("/"))) {
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
}
(Recommended) Use pooling connection manager and set connection total time to live to a finite value. There are no benefits to using the basic connection manager compared to the pooling one unless your code is expected to run in an EJB container.
CloseableHttpClient httpclient = HttpClients.custom()
.setConnectionTimeToLive(5, TimeUnit.SECONDS)
.build();
try (CloseableHttpResponse response = httpclient.execute(new HttpGet("/"))) {
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
}
I'm building a simple web-scraper and i need to fetch the same page a few hundred times, and there's an attribute in the page that is dynamic and should change at each request. I've built a multithreaded HttpClient based class to process the requests and i'm using an ExecutorService to make a thread pool and run the threads. The problem is that dynamic attribute sometimes doesn't change on each request and i end up getting the same value on like 3 or 4 subsequent threads. I've read alot about HttpClient and i really can't find where this problem comes from. Could it be something about caching, or something like it!?
Update: here is the code executed in each thread:
HttpContext localContext = new BasicHttpContext();
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
HttpProtocolParams.setUseExpectContinue(params, true);
ClientConnectionManager connman = new ThreadSafeClientConnManager();
DefaultHttpClient httpclient = new DefaultHttpClient(connman, params);
HttpHost proxy = new HttpHost(inc_proxy, Integer.valueOf(inc_port));
httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
String iden = null;
int timeoutConnection = 10000;
HttpConnectionParams.setConnectionTimeout(httpGet.getParams(),
timeoutConnection);
try {
HttpResponse response = httpclient.execute(httpGet, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
String result = convertStreamToString(instream);
// System.out.printf("Resultado\n %s",result +"\n");
instream.close();
iden = StringUtils
.substringBetween(result,
"<input name=\"iden\" value=\"",
"\" type=\"hidden\"/>");
System.out.printf("IDEN:%s\n", iden);
EntityUtils.consume(entity);
}
}
catch (ClientProtocolException e) {
// TODO Auto-generated catch block
System.out.println("Excepção CP");
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("Excepção IO");
}
HTTPClient does not use cache by default (when you use DefaultHttpClient class only). It does so, if you use CachingHttpClient which is HttpClient interface decorator enabling caching:
HttpClient client = new CachingHttpClient(new DefaultHttpClient(), cacheConfiguration);
Then, it analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache.
I suspect, that your issue is caused by proxy server standing between your application and remote server.
You can test it easily with curl application; execute some number of requests omitting proxy:
#!/bin/bash
for i in {1..50}
do
echo "*** Performing request number $i"
curl -D - http://yourserveraddress.com -o $i -s
done
And then, execute diff between all downloaded files. All of them should have differences you mentioned. Then, add -x/--proxy <host[:port]> option to curl, execute this script and compare files again. If some responses are the same as others, then you can be sure that this is proxy server issue.
Generally speaking, in order to test whether or not HTTP requests are being made over the wire, you can use a "sniffing" tool that analyzes network traffic, for example:
Fiddler ( http://fiddler2.com/fiddler2/ ) - I would start with this
Wireshark ( http://www.wireshark.org/ ) - more low level
I highly doubt HttpClient is performing caching of any sort (this would imply it needs to store pages in memory or on disk - not one of its capabilities).
While this is not an answer, its a point to ponder: Is it possible that the server (or some proxy in between) is returning you cached content? If you are performing many requests (simultaneously or near simultaneously) for the same content, the server may be returning you cached content because it has decided that the information has not "expired" yet. In fact the HTTP protocol provides caching directives for such functionality. Here is a site that provides a high level overview of the different HTTP caching mechanisms:
http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/
I hope this gives you a starting point. If you have already considered these avenues then that's great.
You could try appending some unique dummy parameter to the URL on every request to try to defeat any URL-based caching (in the server, or somewhere along the way). It won't work if caching isn't the problem, or if the server is smart enough to reject requests with unknown parameters, or if the server is caching but only based on parameters it cares about, or if your chosen parameter name collides with a parameter the site actually uses.
If this is the URL you're using
http://www.example.org/index.html
try using
http://www.example.org/index.html?dummy=1
Set dummy to a different value for each request.
Is there any class for reading http pages that return a java.io.InputStream and its timeout be reliable?
I tried java.net.URLConnection and it doesn't have a reliable timeout (it takes more time that it set to timeout reach)? My Code is here:
URLConnection con = url.openConnection();
con.setConnectTimeout(2000);
con.setReadTimeout(2000);
InputStream in = con.getInputStream();
I expect that the reason that the timeout is not working for you is that you are setting the timeout after the connection has been established, or you are using the wrong setter. It is also possible that you are using "non-standard" version of URLConnection ...
"Some non-standard implementation of this method ignores the specified timeout. To see the read timeout set, please call getReadTimeout()." (or getConnectTimeout())
If you posted the relevant part of your actual code we could give you a better answer ...
Alternatively, use the Apache HttpClient library.
You can use Apache HttpClient to read http pages, it also has an http parser.check this for further reference about httpclient. you can get an InputStream object using their API like this.
HttpClient httpclient = new DefaultHttpClient();
// Prepare a request object
HttpGet httpget = new HttpGet("http://www.apache.org/");
// Execute the request
HttpResponse response = httpclient.execute(httpget);
// Examine the response status
System.out.println(response.getStatusLine());
// Get hold of the response entity
HttpEntity entity = response.getEntity();
// If the response does not enclose an entity, there is no need
// to worry about connection release
if (entity != null) {
InputStream instream = entity.getContent();
and coming to timeout part, it totally depends on the network and you cant do much about it from your java code.