How to cache the httpclient object in Java? - java

Using Apache HttpClient 4.5.x in my client webapp which connects to (and log in to) another (say main) server webapp.
The relationship between these 2 webapps is many to many - meaning for some user's request in client webapp, it has to login as another user + make rest call, in the server webapp. So some separation of cookiestores is needed and there's no way (is there?) to get/set the cookie store after creating a httpclient instance, so each request thread received in client webapp does something like this (and need to optimize):
HttpClient client = HttpClientBuilder.create().setDefaultCookieStore(new BasicCookieStore()).build();
//Now POST to login end point and get back JSESSIONID cookie and then make one REST call, and then the client object goes out of scope when the request ends.
I was hoping to ask on the best practice of caching the httpclient instance object as its heavy and is supposed to be reused for at least multiple requests, if not for the whole client webapp as a static singleton.
Specifically, I was hoping for advice on which of these (if any) approaches would constitute a best practice:
Use a static ConcurrentHashMap to cache the httpclient and its associated basiccookiestore for each "user" in client webapp, and to login only when the contained cached cookie is near to its expiry time. Not sure about memory usage, and un/rarely-used httpclient stay in memory without eviction.
Cache only the Cookie (somehow), but recreate a new httpclient object whenever the need arises to use that cookie for a rest call. This saves the prior call to login until the cookie expires, but no reuse of htptclient.
PooledConnectionManager - but can't find examples easily, though might require devising an eviction strategy, max number of threads etc. (so can be complex).
Is there any better way of doing this ? Thanks.
References:
http://hc.apache.org/httpclient-3.x/performance.html
Generally it is recommended to have a single instance of HttpClient
per communication component or even per application
Similar at java httpclient 4.x performance guide to resolve issue

Using concurrent hash map would be the simplest way to achieve what you want to do.
Additionally, if you are using Spring, you might want to create a bean for holding the HTTP client.

Why would you want to do all this? One can assign a different CookieStore on a per request basis by using a local HttpContext.
If needed one can maintain a map of CookieStore instances per unique user.
CloseableHttpClient httpclient = HttpClients.createDefault();
CookieStore cookieStore = new BasicCookieStore();
// Create local HTTP context
HttpClientContext localContext = HttpClientContext.create();
// Bind custom cookie store to the local context
localContext.setCookieStore(cookieStore);
HttpGet httpget = new HttpGet("http://httpbin.org/cookies");
System.out.println("Executing request " + httpget.getRequestLine());
// Pass local context as a parameter
CloseableHttpResponse response = httpclient.execute(httpget, localContext);
try {
System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
List<Cookie> cookies = cookieStore.getCookies();
for (int i = 0; i < cookies.size(); i++) {
System.out.println("Local cookie: " + cookies.get(i));
}
EntityUtils.consume(response.getEntity());
} finally {
response.close();
}

Related

ssl handshake on every request in multiThreaded client

Architecture is midTier Liberty server that receives http requests and brokers to various back ends, some REST, some just JSON. When I configure for SSL (only thru envVars which is quite cool) ... it appears I get a full handShake w/every request. Additionally, the server side uses a different thread with each request (may be related). This is Liberty so it is multiThreaded. Servlet has static ref to POJO that does all apache httpClient work. Not using HttpClientContext (in this case). Basic flow is at end (struggling w/formatting for post legality)
EnvVars are:
-Djavax.net.ssl.keyStore=/root/lWasServers/certs/zosConnKey.jks
-Djavax.net.ssl.trustStore=/root/lWasServers/certs/zosConnTrust.jks
-Djavax.net.ssl.keyStorePassword=fredpwd
-Dhttp.maxConnections=40
Looked at many similar problems, but again, right now this flow does not use client context. Hoping I'm missing something simple. Code being appended on first response as I continue to struggle here w/FF in RHEL.
private static PoolingHttpClientConnectionManager cm = null ;
private static CloseableHttpClient httpClient = null ;
// ....
cm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(512);
cm.setDefaultMaxPerRoute(256) ;
httpClient = HttpClients.custom().setConnectionManager(cm).build() ;
// ...
responseBody = httpClient.execute(httpGet, responseHandler);
If a persistent HTTP connection is stateful and is associated with a particular security context or identity, such as SSL key or NTLM user name, HttpClient tries to make sure this connection cannot be accidentally re-used within a different security context or by a different user. Usually the most straight-forward way of letting HttpClient know that requests are logically related and belong to the same session is by executing those requests with the same HttpContext instance. See HttpClient tutorial for details. One can also disable connection state tracking if HttpClient can only be accessed by a single user or within the same security context. Use with caution.
OK, while I'm not exactly an expert at reading the ssl trace, I do believe I have resolved it. I am on a thread but that is controlled by the server. I now pass the HttpSession in and keep a reference to the HttpClientConnection that I now create for each session. I pool these HttpClientConnection objects (rudimentary pooling, basically just get/release). So all calls w/in an http session use the same HttpClientContext. Now it appears that I am NOT handShaking all the time. There may have been a better way to do it, but this does indeed work, I have a few gremlins to look into (socket timeouts in < 1 millisecond?) ... but I'm confident that I'm non longer handShaking with each request (only each time I end up creating a new context) ... so this is all good. Thanks,

HttpPut peformance with Apache HttpClient in a multithread environment

these days i'm struggling with a quite weird issue regarding Apache HttpClient and threads.
The point is that I have a HttpClient shared by all the threads and the use it to execute an HttpPut request to upload a small file (8k aprox.). Well with a small amount of threads everything is allright and the times are good (200-600 milliseconds), but when we start increasing the number of concurrent threads the times are awful (8 seconds).
We checked the server to ensure the problem wasn't there. Using jmeter with the same load (1000 threads in a second) we got response times of milliseconds!!
The implentation uses a thread-safe connection manager:
PoolingHttpClientConnectionManager httpConnectionManager = new PoolingHttpClientConnectionManager();
httpConnectionManager.setMaxTotal(5000);
httpConnectionManager.setDefaultMaxPerRoute(5000);
HttpClient httpClient = HttpClientBuilder.create()
.setConnectionManager(httpConnectionManager)
.build();
And the threads run the following code:
HttpPut put = new HttpPut(urlStr);
put.setConfig(RequestConfig.custom()
.setExpectContinueEnabled(true)
.setStaleConnectionCheckEnabled(false)
.setRedirectsEnabled(true).build());
put.setEntity(new FileEntity(new
File("image.tif")));
put.setHeader("Content-Type", "image/tiff");
put.setHeader("Connection", "keep-alive");
HttpResponse response = httpClient.execute(put, HttpClientContext.create());
It looks like if there was a shared resource that has a huge impact when there is a high load.
Looking at the sourcecode of Apache Jmeter I don't see relevant differences respect this code.
Any idea guys?
You need to turn on debugging on the client side in order to do the following:
verify that the pool of 5000 is actually being used to great depth. The logger will display the changing totals for "available remain in pool" and the number of the current Pool entry being used.
verify that you are immediately clean-up and RETURN to pool the entry. Remember to close your resources ( Streams used to access response, Entity objects )
CloseableHttpResponse response
case PUT:
HttpPut httpPut = new HttpPut(url);
httpPut.setProtocolVersion(new ProtocolVersion("HTTP", 1,1));
httpPut.setConfig(this.config);
httpPut.setEntity(new StringEntity(data, ContentType.APPLICATION_JSON));
response = httpClient.execute(httpPut, context);
httpPut.releaseConnection();
More info

difference of HttpClient singleton instance or not

As HttpClient document suggests - Generally it is recommended to have a single instance of HttpClient per communication component or even per application.
I got different behaviours between HttpClient is singleton or not.
1) With singleton, I first created a global static HttpClient instance, and send every request with this instance with below segment,
PostMethod post = new PostMethod(url);
int status = httpClient.executeMethod(post);
2) Without singleton, I send every request by creating a new HttpClient instance
PostMethod post = new PostMethod(url);
HttpClient httpClient = new HttpClient();
int status = httpClient.executeMethod(post);
The differences are, without singleton, everything is OK, I can get the correct result separately in consecutive requests. But with singleton, it seems there is some request context, the second request doesn't return the response string as expected because of the first request parameter (weird!!).
I don't have the service codes and server configuration. Can you help me figure out the possible reason?
Thanks in advance.

HTTPClient sends out two requests when using Basic Auth?

I have been using HTTPClient version 4.1.2 to try to access a REST over HTTP API that requires Basic Authentication. Here is client code:
DefaultHttpClient httpClient = new DefaultHttpClient(new ThreadSafeClientConnManager());
// Enable HTTP Basic Auth
httpClient.getCredentialsProvider().setCredentials(
new AuthScope(AuthScope.ANY_HOST, AuthScope.ANY_PORT),
new UsernamePasswordCredentials(this.username, this.password));
HttpHost proxy = new HttpHost(this.proxyURI.getHost(), this.proxyURI.getPort());
httpClient.getParams().setParameter(ConnRouteParams.DEFAULT_PROXY, proxy);
When I construct a POST request, like this:
HttpPost request = new HttpPost("http://my/url");
request.addHeader(new BasicHeader("Content-type", "application/atom+xml; type=entry")); // required by vendor
request.setEntity(new StringEntity("My content"));
HttpResponse response = client.execute(request);
I see in Charles Proxy that there are two requests being sent. One without the Authorization: Basic ... header and one with it. The first one fails with a 401, as you would expect, but the second goes through just fine with a 201.
Does anyone know why this happens? Thanks!
EDIT:
I should make clear that I have already looked at this question, but as you can see I set the AuthScope the same way and it didn't solve my problem. Also, I am creating a new HttpClient every time I made a request (though I use the same ConnectionManager), but even if I use the same HttpClient for multiple requests, the problem still persists.
EDIT 2:
So it looks like what #LastCoder was suggesting is the way to do. See this answer to another question. The problem stems from my lack of knowledge around the HTTP spec. What I'm looking to do is called "preemptive authentication" and the HttpClient docs mention it here. Thankfully, the answer linked to above is a much shorter and cleaner way to do it.
Rather than using .setCredentials() why don't you just encode USERNAME:PASSWORD and add the authentication header with .addHeader()
This means that your server/target endpoint is creating a new session for every client request. This forces every request of yours to go through a hand-shake, which means the clients first makes the call and realizes that it needs authorization, then it follows with the authorization. What you need to do is send the authorization preemptively as follows:
httpClient.getParams().setAuthenticationPreemptive(true);
Just to understand the process you may log your client request headers, to give you an idea of what your client is sending and receiving:
See if this works.

Java HttpClient seems to be caching content

I'm building a simple web-scraper and i need to fetch the same page a few hundred times, and there's an attribute in the page that is dynamic and should change at each request. I've built a multithreaded HttpClient based class to process the requests and i'm using an ExecutorService to make a thread pool and run the threads. The problem is that dynamic attribute sometimes doesn't change on each request and i end up getting the same value on like 3 or 4 subsequent threads. I've read alot about HttpClient and i really can't find where this problem comes from. Could it be something about caching, or something like it!?
Update: here is the code executed in each thread:
HttpContext localContext = new BasicHttpContext();
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
HttpProtocolParams.setUseExpectContinue(params, true);
ClientConnectionManager connman = new ThreadSafeClientConnManager();
DefaultHttpClient httpclient = new DefaultHttpClient(connman, params);
HttpHost proxy = new HttpHost(inc_proxy, Integer.valueOf(inc_port));
httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
String iden = null;
int timeoutConnection = 10000;
HttpConnectionParams.setConnectionTimeout(httpGet.getParams(),
timeoutConnection);
try {
HttpResponse response = httpclient.execute(httpGet, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
String result = convertStreamToString(instream);
// System.out.printf("Resultado\n %s",result +"\n");
instream.close();
iden = StringUtils
.substringBetween(result,
"<input name=\"iden\" value=\"",
"\" type=\"hidden\"/>");
System.out.printf("IDEN:%s\n", iden);
EntityUtils.consume(entity);
}
}
catch (ClientProtocolException e) {
// TODO Auto-generated catch block
System.out.println("Excepção CP");
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("Excepção IO");
}
HTTPClient does not use cache by default (when you use DefaultHttpClient class only). It does so, if you use CachingHttpClient which is HttpClient interface decorator enabling caching:
HttpClient client = new CachingHttpClient(new DefaultHttpClient(), cacheConfiguration);
Then, it analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache.
I suspect, that your issue is caused by proxy server standing between your application and remote server.
You can test it easily with curl application; execute some number of requests omitting proxy:
#!/bin/bash
for i in {1..50}
do
echo "*** Performing request number $i"
curl -D - http://yourserveraddress.com -o $i -s
done
And then, execute diff between all downloaded files. All of them should have differences you mentioned. Then, add -x/--proxy <host[:port]> option to curl, execute this script and compare files again. If some responses are the same as others, then you can be sure that this is proxy server issue.
Generally speaking, in order to test whether or not HTTP requests are being made over the wire, you can use a "sniffing" tool that analyzes network traffic, for example:
Fiddler ( http://fiddler2.com/fiddler2/ ) - I would start with this
Wireshark ( http://www.wireshark.org/ ) - more low level
I highly doubt HttpClient is performing caching of any sort (this would imply it needs to store pages in memory or on disk - not one of its capabilities).
While this is not an answer, its a point to ponder: Is it possible that the server (or some proxy in between) is returning you cached content? If you are performing many requests (simultaneously or near simultaneously) for the same content, the server may be returning you cached content because it has decided that the information has not "expired" yet. In fact the HTTP protocol provides caching directives for such functionality. Here is a site that provides a high level overview of the different HTTP caching mechanisms:
http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/
I hope this gives you a starting point. If you have already considered these avenues then that's great.
You could try appending some unique dummy parameter to the URL on every request to try to defeat any URL-based caching (in the server, or somewhere along the way). It won't work if caching isn't the problem, or if the server is smart enough to reject requests with unknown parameters, or if the server is caching but only based on parameters it cares about, or if your chosen parameter name collides with a parameter the site actually uses.
If this is the URL you're using
http://www.example.org/index.html
try using
http://www.example.org/index.html?dummy=1
Set dummy to a different value for each request.

Categories

Resources