How can I forcefully cache all my HTTP responses? - java

I'm using the DefaultHTTPClient to make some HTTP GET requests. I'd like to forcefully cache all the HTTP responses for a week. After going through the docs and some SO answers, I've done this:
I installed an HttpResponseCache via the onCreate method of my main activity.
try {
File httpCacheDir = new File(getApplicationContext().getCacheDir(), "http");
long httpCacheSize = 10 * 1024 * 1024; // 10 MiB
HttpResponseCache.install(httpCacheDir, httpCacheSize);
} catch (IOException e) {
Log.i("dd", "HTTP response cache installation failed:" + e);
}
I added a custom HttpResponseInterceptor for my HTTP client, but I still don't get a cache hit. Here's my response interceptor that decompresses GZIPped content, strips caching headers and adds a custom one:
class Decompressor implements HttpResponseInterceptor {
public void process(HttpResponse hreResponse, HttpContext hctContext) throws HttpException, IOException {
hreResponse.removeHeaders("Expires");
hreResponse.removeHeaders("Pragma");
hreResponse.removeHeaders("Cache-Control");
hreResponse.addHeader("Cache-Control", "max-age=604800");
HttpEntity entity = hreResponse.getEntity();
if (entity != null) {
Header ceheader = entity.getContentEncoding();
if (ceheader != null) {
HeaderElement[] codecs = ceheader.getElements();
for (int i = 0; i < codecs.length; i++) {
if (codecs[i].getName().equalsIgnoreCase("gzip")) {
hreResponse.setEntity(new HttpEntityWrapper(entity) {
#Override
public InputStream getContent() throws IOException, IllegalStateException {
return new GZIPInputStream(wrappedEntity.getContent());
}
#Override
public long getContentLength() {
return -1;
}
});
return;
}
}
}
}
}
}
Here's how I make my request:
String strResponse = null;
HttpGet htpGet = new HttpGet(strUrl);
htpGet.addHeader("Accept-Encoding", "gzip");
htpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1");
DefaultHttpClient dhcClient = new DefaultHttpClient();
dhcClient.addResponseInterceptor(new Decompressor(), 0);
HttpResponse resResponse = dhcClient.execute(htpGet);
Log.d("helpers.network", String.format("Cache hit count: %d", HttpResponseCache.getInstalled().getHitCount()));
strResponse = EntityUtils.toString(resResponse.getEntity());
return strResponse;
I can't seem to pinpoint what I'm doing wrong. Would any of you know?

Not sure if this answers your question, but instead of relying on an HTTP server interpreting your cache control headers, have you thought about simply adding a client-side cache using Android's own cache directories?
What we did in ignition is simply write server responses to disk as byte streams, thus having full control over caching (and expiring caches.)
There's a sample app here, but it would require you to use the library's HTTP API (which, however, is merely a thin wrapper around HttpClient.) Or simply look at how the cache works and go from there.

I failed miserably with this. According to the Android documentation, it specifically says about the HttpResponseCache — "Caches HTTP and HTTPS responses to the filesystem so they may be reused, saving time and bandwidth. This class supports HttpURLConnection and HttpsURLConnection; there is no platform-provided cache for DefaultHttpClient or AndroidHttpClient."
So that was out.
Now Apache's HTTP client has a CachingHttpClient and this new version of HTTP client has been back ported to Android through this project. Of course, I could use this.
I didn't want to use the hackish version of the Apache HTTP Client libraries so one idea was to cannibalise the caching related bits from the HTTP Client and reoll my own but it was too much work.
I even considered moving to the recommended HttpURLConnection class as recommended but I've run into other issues. There doesn't seem to be good cookie-persistence implementation for that class.
Anyway, I skipped everything and though. I'm reducing the loading time by caching, why not got go a step further and since I was using jSoup to scrape records from the page and create an ArrayList of a custom structure, I might as well serialize the whole ArrayList by implementing the Serializable method on my structure. Now I don't have to wait for the page request, not do I have to wait for the jSoup parsing slowness. Win.

Related

HTTP request gets authorized and unauthorized on different environments with as it seems same setup

We have a funny situation where basic GET HTTP request doesn't pass Windows NTLM autorization at IIS server. At the same time we have same code running on another environment which gets successfully executed.
When we repeat request via browser it gets successfully executed.
It seems that somehow Java code doesn't send correct authorization information with the request. We are trying to figure out how this can be. Classes used are from java.net package.
We tried switching account to Local System under which Tomcat is running back and forth with no success.
Code is as simple as it can be:
public static String sendHttpRequest(String urlString, String method, boolean
disconnect) {
HttpURLConnection urlConnection = null;
String result = null;
try {
URL url = new URL(urlString);
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.addRequestProperty("Content-Length", "0");
urlConnection.setRequestMethod(method);
urlConnection.setUseCaches(false);
StringBuilder sb = new StringBuilder();
try (InputStream is = urlConnection.getInputStream()) {
InputStream buffer = new BufferedInputStream(is);
Reader reader = new InputStreamReader(buffer, "UTF-8");
int c;
while ((c = reader.read()) != -1) {
sb.append((char) c);
}
}
int statusCode = urlConnection.getResponseCode();
if (statusCode < 200 || statusCode >= 300) {
} else
result = sb.toString();
} catch (IOException e) {
LOGGER.warning(e.getMessage());
} finally {
if (disconnect && urlConnection != null) {
urlConnection.disconnect();
}
}
return result;
}
Explicit questions to answer:
How to log/trace/debug information used for authentication purpose on the client side? Any hint would be appreciated :)
Apache HttpClient in it's newer versions supports native Windows Negotiate, Kerberos and NTLM via SSPI through JNA when running on Windows OS. So if you have the option to use the newer version (from 4.4 I believe), this is a non-issue.
For example:
http://hc.apache.org/httpcomponents-client-4.4.x/httpclient-win/examples/org/apache/http/examples/client/win/ClientWinAuth.java
As far as I am aware, vanilla JDKs themselves do not have built-in NTLM support. This would then mean that you have to manually wire the steps of the protocol yourself.
I am writing this based on my experience: I had to roll a whole multi-round SPNEGO myself, which supports Oracle and IBM JRE. Was (not) fun. Although, that was just the server-side of the fun, I remember stumbling into this feature missing on Java-client side too (because it was SPNEGO and not plain NTLM, the client side was a browser, so I could skip that part).
This may have changed with the new HTTP Client in Java 11, I do not have any experience with that yet.
In order to debug your authentication exchange, you can add command line parameter :
-Djavax.net.debug=ssl,handshake.
This way, you'll have for both client step by step in handshake process.

Java LittleProxy (on top of Netty): How to access the POST Body data?

I followed the simple example shown GitHub:LittleProxy and have added the following in clientToProxyRequest(HttpObject httpObject) Method.
public HttpResponse clientToProxyRequest(HttpObject httpObject)
{
if(httpObject instanceof DefaultHttpRequest)
{
DefaultHttpRequest httpRequest = (DefaultHttpRequest)httpObject;
logger.info(httpRequest.getUri());
logger.info(httpRequest.toString());
// How to access the POST Body data?
HttpPostRequestDecoder d = new HttpPostRequestDecoder(httpRequest);
d.getBodyHttpDatas(); //NotEnoughDataDecoderException
}
return null;
}
The logger report this, IMO only these two header are relevant here. It's a POST request and there is content ...
POST http://www.... HTTP/1.1
Content-Length: 522
Looking into Netty API documentation the HttpPostRequestDecoder seems to be promising, but I get a NotEnoughDataDecoderException. In Netty JavaDoc this is written, but I do not know how to offer data?
This getMethod returns a List of all HttpDatas from body.
If chunked, all chunks must have been offered using offer() getMethod. If not, NotEnoughDataDecoderException will be raised.
In fact I'm also unsure if this is the right approach to get the POST data in the proxy.
try to add this in your HttpFiltersSourceAdapter to aviod NotEnoughDataDecoderException:
#Override
public int getMaximumRequestBufferSizeInBytes() {
return 1048576;
}
1048576 here is the maximum length of the aggregated content. See POSTing data to netty with Apache HttpClient.
This will enables decompression and aggregation of content, see the source code in org.littleshoot.proxy.impl.ClientToProxyConnection:
// Enable aggregation for filtering if necessary
int numberOfBytesToBuffer = proxyServer.getFiltersSource()
.getMaximumRequestBufferSizeInBytes();
if (numberOfBytesToBuffer > 0) {
aggregateContentForFiltering(pipeline, numberOfBytesToBuffer);
}

How to selectively GZIP encode POST and PUT requests

I'm using Jersey on both the server and client of a web application. On the server I have Interceptors as noted in https://jersey.java.net/documentation/latest/filters-and-interceptors.html to handle GZIP compression going out and coming in. From the server side, it's easy enough to select which resource methods are compressed using the #Compress annotation. However, if I also want to selectively compress entities from the Client to the Server, what's the best way to do that?
I had started adding a Content-Encoding: x-gzip header to the request, but my client side Interceptor does not see that header (presumably because it's not an official client side header).
Before you point to section 10.6 of the Jersey documentation, note that this works for the Server side. Although I could do something similar on the Client, I don't want to restrict it by URL. I'd rather control the compression flag as close to the request as possible (i.e. Header?).
Here's what I have so far, but it does not work since my header is removed:
class GzipWriterClientInterceptor implements WriterInterceptor {
private static final Set<String> supportedEncodings = new GZipEncoder().getSupportedEncodings(); //support gzip and x-gzip
#Override
public void aroundWriteTo(WriterInterceptorContext context)
throws IOException, WebApplicationException {
if (supportedEncodings.contains(context.getHeaders().getFirst(HttpHeaderConstants.CONTENT_ENCODING_HEADER))) {
System.out.println("ZIPPING DATA");
final OutputStream outputStream = context.getOutputStream();
context.setOutputStream(new GZIPOutputStream(outputStream));
} else {
context.headers.remove(HttpHeaderConstants.CONTENT_ENCODING_HEADER) //remove it since we won't actually be compressing the data
}
context.proceed();
}
}
Sample Request:
Response response = getBaseTarget().path(getBasePath()).path(graphic.uuid.toString())
.request(DEFAULT_MEDIA_TYPE)
.header(HttpHeaderConstants.CONTENT_ENCODING_HEADER, MediaTypeConstants.ENCODING_GZIP)
.put( Entity.entity(graphic, DEFAULT_MEDIA_TYPE))
I also have a logging filter as well that shows all the request headers. I've simplified the above, but all other headers I add are logged.

How can I tell when HttpClient.execute() is finished fetching all content in a large request?

I'm working with this issue on Android but it's not an Android-specific problem.
Using org.apache.http.client.HttpClient, I can make a request to a URL of size 1kb and the entire response is contained within HttpResponse:
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet("http://10.16.83.67/1kb.log");
HttpResponse response = null;
BufferedReader rd = null;
response = client.execute(request);
I can then get the HttpEntity from the response:
HttpEntity entity = response.getEntity();
rd = new BufferedReader(new InputStreamReader(entity.getContent()));
And then, with BufferedReader...
while ((line = rd.readLine()) != null) {
// ... parse or whatever
}
I'm watching WireShark and I see one transmit in each direction for the above code: the request for the log file, and then the entire log is delivered in the response.
However, if I request something larger, say, my 1MB log file, I see something very different. The data is chunked into frames and then streamed over the wire within the rd.readLine() loop.
It seems the first kb or so is included in the initial response. But then as readLine() runs, it makes additional requests to the server and the data is streamed to the socket. If the network connection is interrupted, I get an IO error. For large requests, entity.isStreaming() is true.
This is an asynchronous call (mandatory on Android, since network calls cannot be made on the UI thread) but I don't want to continue until I'm sure that I'm done receiving all of the data from this request. Simply waiting an amount of time and then continuing and hoping for the best is, unfortunately, not an option.
My question is this: do my HttpClient, HttpGet, HttpResponse, or HttpEntity objects ever know when they are done receiving data from this request? Or do I have to rely on BufferedReader to know when the stream is closed?
Are you making use of AsyncTask? Otherwise you could go for this
private class DownloadFilesTask extends AsyncTask<URL, Integer, Long> {
protected Long doInBackground(URL... urls) {}
// do all the http requests in the method above
protected void onProgressUpdate(Integer... progress) {
setProgressPercent(progress[0]);
}
protected void onPostExecute(Long result) {
showDialog("I pop up when all the code in the DoInbackground is finished");
}
}
What httpclient are you useing? you may want to evaluate that.
There are good , httpclient libs that allowing using stuff like this
Note this example from 3X. I use 4X and dont have any chunked response code to post.
If you want to look for new options, you might check here

Java HttpClient seems to be caching content

I'm building a simple web-scraper and i need to fetch the same page a few hundred times, and there's an attribute in the page that is dynamic and should change at each request. I've built a multithreaded HttpClient based class to process the requests and i'm using an ExecutorService to make a thread pool and run the threads. The problem is that dynamic attribute sometimes doesn't change on each request and i end up getting the same value on like 3 or 4 subsequent threads. I've read alot about HttpClient and i really can't find where this problem comes from. Could it be something about caching, or something like it!?
Update: here is the code executed in each thread:
HttpContext localContext = new BasicHttpContext();
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
HttpProtocolParams.setUseExpectContinue(params, true);
ClientConnectionManager connman = new ThreadSafeClientConnManager();
DefaultHttpClient httpclient = new DefaultHttpClient(connman, params);
HttpHost proxy = new HttpHost(inc_proxy, Integer.valueOf(inc_port));
httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
String iden = null;
int timeoutConnection = 10000;
HttpConnectionParams.setConnectionTimeout(httpGet.getParams(),
timeoutConnection);
try {
HttpResponse response = httpclient.execute(httpGet, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
String result = convertStreamToString(instream);
// System.out.printf("Resultado\n %s",result +"\n");
instream.close();
iden = StringUtils
.substringBetween(result,
"<input name=\"iden\" value=\"",
"\" type=\"hidden\"/>");
System.out.printf("IDEN:%s\n", iden);
EntityUtils.consume(entity);
}
}
catch (ClientProtocolException e) {
// TODO Auto-generated catch block
System.out.println("Excepção CP");
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("Excepção IO");
}
HTTPClient does not use cache by default (when you use DefaultHttpClient class only). It does so, if you use CachingHttpClient which is HttpClient interface decorator enabling caching:
HttpClient client = new CachingHttpClient(new DefaultHttpClient(), cacheConfiguration);
Then, it analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache.
I suspect, that your issue is caused by proxy server standing between your application and remote server.
You can test it easily with curl application; execute some number of requests omitting proxy:
#!/bin/bash
for i in {1..50}
do
echo "*** Performing request number $i"
curl -D - http://yourserveraddress.com -o $i -s
done
And then, execute diff between all downloaded files. All of them should have differences you mentioned. Then, add -x/--proxy <host[:port]> option to curl, execute this script and compare files again. If some responses are the same as others, then you can be sure that this is proxy server issue.
Generally speaking, in order to test whether or not HTTP requests are being made over the wire, you can use a "sniffing" tool that analyzes network traffic, for example:
Fiddler ( http://fiddler2.com/fiddler2/ ) - I would start with this
Wireshark ( http://www.wireshark.org/ ) - more low level
I highly doubt HttpClient is performing caching of any sort (this would imply it needs to store pages in memory or on disk - not one of its capabilities).
While this is not an answer, its a point to ponder: Is it possible that the server (or some proxy in between) is returning you cached content? If you are performing many requests (simultaneously or near simultaneously) for the same content, the server may be returning you cached content because it has decided that the information has not "expired" yet. In fact the HTTP protocol provides caching directives for such functionality. Here is a site that provides a high level overview of the different HTTP caching mechanisms:
http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/
I hope this gives you a starting point. If you have already considered these avenues then that's great.
You could try appending some unique dummy parameter to the URL on every request to try to defeat any URL-based caching (in the server, or somewhere along the way). It won't work if caching isn't the problem, or if the server is smart enough to reject requests with unknown parameters, or if the server is caching but only based on parameters it cares about, or if your chosen parameter name collides with a parameter the site actually uses.
If this is the URL you're using
http://www.example.org/index.html
try using
http://www.example.org/index.html?dummy=1
Set dummy to a different value for each request.

Categories

Resources