these days i'm struggling with a quite weird issue regarding Apache HttpClient and threads.
The point is that I have a HttpClient shared by all the threads and the use it to execute an HttpPut request to upload a small file (8k aprox.). Well with a small amount of threads everything is allright and the times are good (200-600 milliseconds), but when we start increasing the number of concurrent threads the times are awful (8 seconds).
We checked the server to ensure the problem wasn't there. Using jmeter with the same load (1000 threads in a second) we got response times of milliseconds!!
The implentation uses a thread-safe connection manager:
PoolingHttpClientConnectionManager httpConnectionManager = new PoolingHttpClientConnectionManager();
httpConnectionManager.setMaxTotal(5000);
httpConnectionManager.setDefaultMaxPerRoute(5000);
HttpClient httpClient = HttpClientBuilder.create()
.setConnectionManager(httpConnectionManager)
.build();
And the threads run the following code:
HttpPut put = new HttpPut(urlStr);
put.setConfig(RequestConfig.custom()
.setExpectContinueEnabled(true)
.setStaleConnectionCheckEnabled(false)
.setRedirectsEnabled(true).build());
put.setEntity(new FileEntity(new
File("image.tif")));
put.setHeader("Content-Type", "image/tiff");
put.setHeader("Connection", "keep-alive");
HttpResponse response = httpClient.execute(put, HttpClientContext.create());
It looks like if there was a shared resource that has a huge impact when there is a high load.
Looking at the sourcecode of Apache Jmeter I don't see relevant differences respect this code.
Any idea guys?
You need to turn on debugging on the client side in order to do the following:
verify that the pool of 5000 is actually being used to great depth. The logger will display the changing totals for "available remain in pool" and the number of the current Pool entry being used.
verify that you are immediately clean-up and RETURN to pool the entry. Remember to close your resources ( Streams used to access response, Entity objects )
CloseableHttpResponse response
case PUT:
HttpPut httpPut = new HttpPut(url);
httpPut.setProtocolVersion(new ProtocolVersion("HTTP", 1,1));
httpPut.setConfig(this.config);
httpPut.setEntity(new StringEntity(data, ContentType.APPLICATION_JSON));
response = httpClient.execute(httpPut, context);
httpPut.releaseConnection();
More info
Related
I m using Jetty's http2client to make synchronous calls to my server, my sample program is a follows,
Client part
Security.addProvider(new OpenSSLProvider());
SslContextFactory sslContextFactory = new SslContextFactory(true);
sslContextFactory.setProvider("Conscrypt");
sslContextFactory.setProtocol("TLSv1.3");
HTTP2Client http2Client = new HTTP2Client();
http2Client.setConnectTimeout(5000);
http2Client.setIdleTimeout(5000);
HttpClient httpClient = new org.eclipse.jetty.client.HttpClient(new HttpClientTransportOverHTTP2(http2Client), sslContextFactory);
httpClient.setMaxConnectionsPerDestination(20);
httpClient.setMaxRequestsQueuedPerDestination(100);
httpClient.setConnectTimeout(5000);
httpClient.addBean(sslContextFactory);
httpClient.start();
Request Part
Request request = httpClient.POST("my url goes here");
request.header(HttpHeader.CONTENT_TYPE, "application/json");
request.content(new StringContentProvider("xmlRequest PayLoad goes here","utf-8"));
ContentResponse response = request.send();
String res = new String(response.getContent());
I need to instrument to get metrics like number of connections per destination, number of requests per connections, number of failed transactions, etc.
My application runs in a server where using wireshark or any other tcp tool is restricted. so I need to get this data within java. Enabling debug logs of jetty is not viable as it writes GBs of data.
Is there a way to get these metrics either by some util or by java reflection?
Thanks in Advance
http2Client.setMaxConcurrentPushedStreams(1000);
This is way too big, it's unlikely the server will push 1000 concurrent streams.
http2Client.setConnectTimeout(30);
http2Client.setIdleTimeout(5);
The timeouts are measured in milliseconds, so these values are way too small.
I also recommend the idle timeout to be a larger value than 5000 milliseconds, something like 20000-30000 is typically better.
String res = new String(response.getContent());
This is wrong, as you don't take into account the response charset.
Use instead response.getContentAsString().
As for the metrics, you can use JMX and extract a number of metrics using a JMX console (or via standard JMX APIs).
To setup JMX for HttpClient you can do this:
MBeanServer mbeanServer = ManagementFactory.getPlatformMBeanServer();
MBeanContainer mbeanContainer = new MBeanContainer(mbeanServer);
httpClient.addBean(mbeanContainer);
The code above will export the HttpClient components to JMX and there you can query the various components for the metrics you are interested in.
Using Apache HttpClient 4.5.x in my client webapp which connects to (and log in to) another (say main) server webapp.
The relationship between these 2 webapps is many to many - meaning for some user's request in client webapp, it has to login as another user + make rest call, in the server webapp. So some separation of cookiestores is needed and there's no way (is there?) to get/set the cookie store after creating a httpclient instance, so each request thread received in client webapp does something like this (and need to optimize):
HttpClient client = HttpClientBuilder.create().setDefaultCookieStore(new BasicCookieStore()).build();
//Now POST to login end point and get back JSESSIONID cookie and then make one REST call, and then the client object goes out of scope when the request ends.
I was hoping to ask on the best practice of caching the httpclient instance object as its heavy and is supposed to be reused for at least multiple requests, if not for the whole client webapp as a static singleton.
Specifically, I was hoping for advice on which of these (if any) approaches would constitute a best practice:
Use a static ConcurrentHashMap to cache the httpclient and its associated basiccookiestore for each "user" in client webapp, and to login only when the contained cached cookie is near to its expiry time. Not sure about memory usage, and un/rarely-used httpclient stay in memory without eviction.
Cache only the Cookie (somehow), but recreate a new httpclient object whenever the need arises to use that cookie for a rest call. This saves the prior call to login until the cookie expires, but no reuse of htptclient.
PooledConnectionManager - but can't find examples easily, though might require devising an eviction strategy, max number of threads etc. (so can be complex).
Is there any better way of doing this ? Thanks.
References:
http://hc.apache.org/httpclient-3.x/performance.html
Generally it is recommended to have a single instance of HttpClient
per communication component or even per application
Similar at java httpclient 4.x performance guide to resolve issue
Using concurrent hash map would be the simplest way to achieve what you want to do.
Additionally, if you are using Spring, you might want to create a bean for holding the HTTP client.
Why would you want to do all this? One can assign a different CookieStore on a per request basis by using a local HttpContext.
If needed one can maintain a map of CookieStore instances per unique user.
CloseableHttpClient httpclient = HttpClients.createDefault();
CookieStore cookieStore = new BasicCookieStore();
// Create local HTTP context
HttpClientContext localContext = HttpClientContext.create();
// Bind custom cookie store to the local context
localContext.setCookieStore(cookieStore);
HttpGet httpget = new HttpGet("http://httpbin.org/cookies");
System.out.println("Executing request " + httpget.getRequestLine());
// Pass local context as a parameter
CloseableHttpResponse response = httpclient.execute(httpget, localContext);
try {
System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
List<Cookie> cookies = cookieStore.getCookies();
for (int i = 0; i < cookies.size(); i++) {
System.out.println("Local cookie: " + cookies.get(i));
}
EntityUtils.consume(response.getEntity());
} finally {
response.close();
}
I am running the following code:
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://10.0.0.22:8086/db/cadvisorDB/series?u=root&p=root&q=select%20max(memory_usage)%20from%20stats%20where%20container_name%20%3D%27execution_container_"+bench_list+"_"+i+"%27%20and%20memory_usage%20%3C%3E%200%20group%20by%20container_name");
//Thread.sleep(10000);
CloseableHttpResponse requestResponse = httpclient.execute(httpGet);
String response=EntityUtils.toString(requestResponse.getEntity());
System.out.println(response);
Output console:
[]
When I wait for the HttpResponse 30s it works. I got the complete response (JSON with data points) :
Thread.sleep(30000);
IS it possible using Apache Java client to tell the client to wait until getting a value different than "[]". I mean a non empty Json.
Using timeouts does not solve the problem.
Thank you in advance
Then setting the timeout will work.
HttpGet request = new HttpGet(url);
// set timeouts as you like
RequestConfig config = RequestConfig.custom()
.setSocketTimeout(60 * 1000).setConnectTimeout(20 * 1000)
.setConnectionRequestTimeout(20 * 1000).build();
request.setConfig(config);
To be specific, no it is not possible simply using HttpClient "to tell the client to wait until getting a value different than" what it gets when the call is over. You have to program this yourself (in a loop or something.)
Does it make a difference if the sleep() is before HttpClients.createDefault() ?
Is it possible that your server at 10.0.0.22:8086 is just not ready when your code is executed? Is this server launched by the same app?
I had also same issue , but problem was 2 Http call making sequentially. so i have putted Thread.sleep(2000) for seconds and it worked.
Please confirm if your code making two rest call sequentially?
then may be you can place Thread.sleep just before second http call.
We have the following code, which later on replaced with HttpHead method as we only need to pull back the header info of our web pages. After the change, we noticed that, on average, it took longer time for the HttpHead to return than the HttpGet for same sets of webpages. Is it normal? What could be wrong here?
HttpClient httpclient = new DefaultHttpClient();
// the time it takes to open TCP connection.
httpclient.getParams().setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, this.timeout);
// timeout when server does not send data.
httpclient.getParams().setParameter(CoreConnectionPNames.SO_TIMEOUT, this.timeout);
// the get method
HttpGet httpget = new HttpGet(url);
HttpResponse response = httpclient.execute(httphead);
Is it normal?
It certainly seems a bit peculiar.
What could be wrong here?
It is difficult to say. It would seem that the strange behavior is most likely on the server side. I would check the following:
Write a micro-benchmark that repeatedly GETs and HEADs the same page to make sure that the performance difference is real, and not an artifact of the way you measured it.
Use packet logger to look at what is actually being sent and received.
Check the server logs.
Profile the server code under load using your micro-benchmark.
One possible explanation is that the HEAD is loading data from a (slow) database or file system. The following GET could then be faster because the data has already been cached. (It could be explicit caching in the server code, the query caching in the back-end database, or file system caching.) You could test for this by seeing if a GET is slower if not preceded by a HEAD.
I'm building a simple web-scraper and i need to fetch the same page a few hundred times, and there's an attribute in the page that is dynamic and should change at each request. I've built a multithreaded HttpClient based class to process the requests and i'm using an ExecutorService to make a thread pool and run the threads. The problem is that dynamic attribute sometimes doesn't change on each request and i end up getting the same value on like 3 or 4 subsequent threads. I've read alot about HttpClient and i really can't find where this problem comes from. Could it be something about caching, or something like it!?
Update: here is the code executed in each thread:
HttpContext localContext = new BasicHttpContext();
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
HttpProtocolParams.setUseExpectContinue(params, true);
ClientConnectionManager connman = new ThreadSafeClientConnManager();
DefaultHttpClient httpclient = new DefaultHttpClient(connman, params);
HttpHost proxy = new HttpHost(inc_proxy, Integer.valueOf(inc_port));
httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
String iden = null;
int timeoutConnection = 10000;
HttpConnectionParams.setConnectionTimeout(httpGet.getParams(),
timeoutConnection);
try {
HttpResponse response = httpclient.execute(httpGet, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
String result = convertStreamToString(instream);
// System.out.printf("Resultado\n %s",result +"\n");
instream.close();
iden = StringUtils
.substringBetween(result,
"<input name=\"iden\" value=\"",
"\" type=\"hidden\"/>");
System.out.printf("IDEN:%s\n", iden);
EntityUtils.consume(entity);
}
}
catch (ClientProtocolException e) {
// TODO Auto-generated catch block
System.out.println("Excepção CP");
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("Excepção IO");
}
HTTPClient does not use cache by default (when you use DefaultHttpClient class only). It does so, if you use CachingHttpClient which is HttpClient interface decorator enabling caching:
HttpClient client = new CachingHttpClient(new DefaultHttpClient(), cacheConfiguration);
Then, it analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache.
I suspect, that your issue is caused by proxy server standing between your application and remote server.
You can test it easily with curl application; execute some number of requests omitting proxy:
#!/bin/bash
for i in {1..50}
do
echo "*** Performing request number $i"
curl -D - http://yourserveraddress.com -o $i -s
done
And then, execute diff between all downloaded files. All of them should have differences you mentioned. Then, add -x/--proxy <host[:port]> option to curl, execute this script and compare files again. If some responses are the same as others, then you can be sure that this is proxy server issue.
Generally speaking, in order to test whether or not HTTP requests are being made over the wire, you can use a "sniffing" tool that analyzes network traffic, for example:
Fiddler ( http://fiddler2.com/fiddler2/ ) - I would start with this
Wireshark ( http://www.wireshark.org/ ) - more low level
I highly doubt HttpClient is performing caching of any sort (this would imply it needs to store pages in memory or on disk - not one of its capabilities).
While this is not an answer, its a point to ponder: Is it possible that the server (or some proxy in between) is returning you cached content? If you are performing many requests (simultaneously or near simultaneously) for the same content, the server may be returning you cached content because it has decided that the information has not "expired" yet. In fact the HTTP protocol provides caching directives for such functionality. Here is a site that provides a high level overview of the different HTTP caching mechanisms:
http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/
I hope this gives you a starting point. If you have already considered these avenues then that's great.
You could try appending some unique dummy parameter to the URL on every request to try to defeat any URL-based caching (in the server, or somewhere along the way). It won't work if caching isn't the problem, or if the server is smart enough to reject requests with unknown parameters, or if the server is caching but only based on parameters it cares about, or if your chosen parameter name collides with a parameter the site actually uses.
If this is the URL you're using
http://www.example.org/index.html
try using
http://www.example.org/index.html?dummy=1
Set dummy to a different value for each request.