Reading from response body multiple times in Apache HttpClient 4.x

Reading from response body multiple times in Apache HttpClient 4.x - java

I'm using the Apache HttpClient 4.2.3 in my application. We store the response of an HTTP call like so:
HttpResponse httpResponse = (DefaultHttpClient)httpClient.execute(httpRequest);
The response body is an InputStream in the 4.x API:
InputStream responseStream = httpResponse.getEntity().getContent();
My problem is I need to read the response body as a string and as a byte[] at various points in the application. But the InputStream used by Apache is an EofSensorInputStream, which means once I reach the stream EOF, it gets closed. Is there anyway I can get the string and byte[] representations multiple times and not close the stream?
I've already tried wrapping the byte array in a new ByteArrayInputStream and setting that as the request body, but it doesn't work since my response body can reach a few gigs. I've also tried this, but I noticed the original response stream still gets closed.
Any pointers would be welcome.
EDIT: On a related note, it would be also be great if I could find the length of the InputStream either without consuming the stream or by reversing the consumption.

1 . I think you have somewhat conflicting requirements:
a)
it doesn't work since my response body can reach a few gigs
b)
Is there anyway I can get the string and byte[] representations multiple times and not close the stream
If you do not have enough memory this is not possible.
Btw, another way to get the response as bytes is EntityUtils.byte[] toByteArray(final HttpEntity entity).
Do you really need N-gigs String? What are you going to do with it?
2 .
it would be also be great if I could find the length of the InputStream
httpResponse.getEntity().getContentLength()
3 . Since the response does not fit into the memory I would suggest to save it into a file (or temp file). Then set up InputStream on that file, and then read it as many times as you need.

Related

Unzip http response

Beginner in java, I try to decompress an HTTP response in Gzip format. Roughly, I have a bufferReader which allows me to read lines of http response from a socket. Thanks to that, I parse the http header and if it specifies that the body is in gzip format then I have to decompress it. Here is the code which I use:
DataInputStream response = new DataInputStream(clientSideSocket.getInputStream());
BufferedReader buffer = new BufferedReader(new InputStreamReader(response))
header = parseHTTPHeader(buffer); // return a map<String,String> with header options
StringBuilder SBresponseBody = new StringBuilder();
String responseBody = new String();
String line;
while((line = buffer.readLine())!= null) // extract the body as if was a string...
SBresponseBody.append(line);
responseBody = SBresponseBody.toString();
if (header.get("Content-Encoding").contains("gzip"))
responseBody = unzip(responseBody); // function I try to construct
My attempt for the unzip function is as follows:
private String unzip(String body) throws IOException {
String responseBody = "";
byte[] readBuffer = new byte[5000];
GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream(body.getBytes());
int read = gzip.read(readBuffer,0,readBuffer.length);
gzip.close();
byte[] result = Arrays.copyOf(readBuffer, read);
responseBody = new String(result, "UTF-8");
return responseBody;
}
I get an error in the GZIPInputStream: not GZIP format (because gzip header is not found in body).
Here are my thoughts:
• Is body.toByte() wrong since it has been read by a bufferReader as a character string and therefore converting it back to byte[] makes no sense since it has already been interpreted in the wrong way? Or do I reconvert Sting body to byte[] in the wrong way?
• Do I have to build a GZIP header myself using the information provided in the HTTP header and adding it to the String body ?
• Do I need to create another InputStream from my socket.getInputStream() to read the information byte by byte, or is it tricky since there is already a buffer "connected" to this socket?

Roughly, I have a bufferReader which allows me to read lines of http response from a socket.
You've handrolled a HTTP client.
This is not a good thing; HTTP is considerably more complicated than you think it is. gzip is just one of about 10,000 things you need to think about. There's HTTP/2.0, Spdy, http3, chunked transfer encoding, TLS, redirects, mime packing, and so much more to think about.
So, if you want to write an actual HTTP client, you need about 100x this code and a ton of domain knowledge, because the actual specs of the HTTP protocol, while handy, don't really tell the story. The de-facto protocol you're implementing is 'whatever servers connected to the internet tend to send' and what they tend to send is tightly wound up with 'whatever commonly used browsers tend to get right', which is almost, but not quite, what that spec document says. This is one of those cases where pragmatics and implementations are the 'real spec', and the actual spec is merely attempting to document reality.
That's a long way around to say: Your mistake is trying to handroll a HTTP client. Don't do that. Use OkHttp or the http client introduced in jdk11 in the core libraries.
But, I know what I want!
Your code is loaded up with bugs, though.
DataInputStream response = new DataInputStream(clientSideSocket.getInputStream());
DataInputStream is useless here. Remove that wrapper.
BufferedReader buffer = new BufferedReader(new InputStreamReader(response))
Missing semi-colon. Also, this is broken - this will convert the bytes flowing over the wire to characters using 'platform default encoding' which is wrong, you need to look at the Content-Type header.
responseBody = unzip(responseBody)
You cannot do this. Your major misunderstanding is that you appear to think that there is no difference between a bunch of bytes, and a sequence of characters.
That's wrong. Once you stored bytes into chars, you cannot unzip it anymore.
The fix is to check for the gzip header FIRST, then wrap your inputstream through GZipStream.

HttpUrlConnection gets response body on connect()

Consider the following code.
try {
httpURLConnection = (HttpURLConnection) new URL(strings[0]).openConnection();
httpURLConnection.setConnectTimeout(Config.HTTP_CONNECTION_TIMEOUT);
httpURLConnection.setReadTimeout(Config.HTTP_CONNECTION_TIMEOUT);
httpURLConnection.connect();
responseCode = httpURLConnection.getResponseCode();
httpURLConnection.getHeaderFields();
}
finally {
httpURLConnection.disconnect();
}
The issue is even when I don't use the InputStream to read the response, in my Internet/Wifi connection logs I can see the response-body. What I want is simply to check a field in the header and based upon that field I will continue reading the InputStream.
My questions are these:
Is it correct behavior for the connected stream to automatically download all/partial file even before a BufferedInputStream is created and read from?
If yes, then is it possible to stop the file download until an InputStream is used to read the response?
If not then is there something I am doing wrong or missing?

The response includes both the header and the body, the server does not stop for the client to acknowledge the headers before sending the body.
At the time the client is able to read the response code from the headers, a part of the body has already been sent, the size of which depends on the network latency, buffering, ....
The current implementation of HttpURLConnection.getResponseCode() even use getInputStream() to ensure that the connection is in the correct state.
The client can choose to ignore the body, but it's usually not recommended, because it may prevent a persistent connection to be reused.
I am not sure about Android but since Java 6, a background thread is automatically used to read the remaining data.
If If-Modified-Since is not an option, why not use a HEAD request ? :
The HTTP HEAD method requests the headers that are returned if the
specified resource would be requested with an HTTP GET method. Such a
request can be done before deciding to download a large resource to
save bandwidth, for example.

Upload file using java MultipartEntityBuilder throws Content too long. What's an alternative?

I'm trying to make a little utility that will synchronise data between two servers. Most of the calls there are REST calls with JSON, so I decided to use Apache HttpClient for this.
There is however a section where I need to upload a file. I'm trying to do this using the mutipart form data with the MutipartEntityBuilder but I encounter a Content too long problem. (I tried to gzip the contents of the file too, but I'm still going over the limit).
Here's my java code:
HttpPost request = new HttpPost(baseUrl+URL);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
//create upload file params
builder.addTextBody("scanName", "Test upload");
builder.addBinaryBody("myfile", f);
HttpEntity params= builder.build();
request.setEntity(params);
request.addHeader("content-type","multipart/form-data");
HttpResponse response = httpClient.execute(request);
Are there better atlernatives that I should be using for the file upload part? I'm also going to download the files from one of the server. Will I hit a similar issue when try to handle those responses?
Is there something I'm doing wrong?

I try to use your code and send some file with size something about 33MB and it was successful. So, I think your problem one of the follows:
Created http client has limitations for request size - in this case you need to change properties of client or use another client;
In some peace of code you call HttpEntity.getContent() method. For multipart request for this method exists limitations - 25kB. For this case you need to use writeTo(OutputStream) instead of getContent()
In comments you told about swagger, but I don't understand what does it mean. If you use swagger generated api, that problems maybe occurred at their code and you need to fix generation logic (or something like this - I never used swagger)
I hope my answer will help you

Java: HttpComponents gets rubbish Response from input Stream from a specific URL

I am currently trying to get HttpComponents to send HttpRequests and retrieve the Response.
On most URLs this works without a problem, but when I try to get the URL of a phpBB Forum namely http://www.forum.animenokami.com the client takes more time and the responseEntity contains passages more than once resulting in a broken html file.
For example the meta tags are contained six times. Since many other URLs work I can't figure out what I am doing wrong.
The Page is working correctly in known Browsers, so it is not a Problem on their side.
Here is the code I use to send and receive.
URI uri1 = new URI("http://www.forum.animenokami.com");
HttpGet get = new HttpGet(uri1);
get.setHeader(new BasicHeader("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:6.0) Gecko/20100101 Firefox/6.0"));
HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(get);
HttpEntity ent = response.getEntity();
InputStream is = ent.getContent();
BufferedInputStream bis = new BufferedInputStream(is);
byte[] tmp = new byte[2048];
int l;
String ret = "";
while ((l = bis.read(tmp)) != -1){
ret += new String(tmp);
}
I hope you can help me.
If you need anymore Information I will try to provide it as soon as possible.

This code is completely broken:
String ret = "";
while ((l = bis.read(tmp)) != -1){
ret += new String(tmp);
}
Three things:
This is converting the whole buffer into a string on each iteration, regardless of how much data has been read. (I suspect this is what's actually going wrong in your case.)
It's using the default platform encoding, which is almost never a good idea.
It's using string concatenation in a loop, which leads to poor performance.
Fortunately you can avoid all of this very easily using EntityUtils:
String text = EntityUtils.toString(ent);
That will use the appropriate character encoding specified in the response, if any, or ISO-8859-1 otherwise. (There's another overload which allows you to specify which character encoding to use if it's not specified.)
It's worth understanding what's wrong with your original code though rather than just replacing it with the better code, so that you don't make the same mistakes in other situations.

It works fine but what I don't understand is why I see the same text multiple times only on this URL.
It will be because your client is seeing more incomplete buffers when it reads the socket. Than could be:
because there is a network bandwidth bottleneck on the route from the remote site to your client,
because the remote site is doing some unnecessary flushes, or
some other reason.
The point is that your client must pay close attention to the number of bytes read into the buffer by the read call, otherwise it will end up inserting junk. Network streams in particular are prone not filling the buffer.

Java Servlet and HTTP Response object

Question on HttpResponse object in servlets. Can the contents of a HttpResponse be only read once?
If so do I need to user a filter and some form of "javax.servlet.http.HttpServletResponseWrapper" in order to read the content of a HttpResponse object as I need to read its content to retrieve XML/JSON from the response? At the moment Im getting the below exception when I go to read the HttpResponse object.
Content has been consumed
at org.apache.http.entity.BasicHttpEntity.getContent(BasicHttpEntity.java:84)
Thanks,
John

This is not a problem in the server/servlet side. It's a problem in the client side. The servlet doesn't send HttpServletResponse object to the client or something, it just sends a byte stream only once. You just need to read it only once into a reuseable object such as a byte[] or String, depending on the actual content and and then reuse/copy exactly this object in the remnant of the code.
InputStream input = httpResponse.getEntity().getContent();
ByteArrayOutputStream output = new ByteArrayOutputStream(); // Or some file?
IOUtils.copy(input, output);
byte[] content = output.toByteArray();
// Now you can reuse content as many times as you want.

Do you want to read the content of the response or request? Usually we write the content of the response and do not read it, unless you have an special case here.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.