Uncompress GZIPed HTTP Response in Java

Uncompress GZIPed HTTP Response in Java - java

I'm trying to uncompress a GZIPed HTTP Response by using GZIPInputStream. However I always have the same exception when I try to read the stream : java.util.zip.ZipException: invalid bit length repeat
My HTTP Request Header:
GET www.myurl.com HTTP/1.0\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
X-Requested-With: XMLHttpRequest\r\n
Cookie: Some Cookies\r\n\r\n
At the end of the HTTP Response header, I get path=/Content-Encoding: gzip, followed by the gziped response.
I tried 2 similars codes to uncompress :
UPDATE : In the following codes, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();
GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));
StringBuffer szBuffer = new StringBuffer ();
byte tByte [] = new byte [1024];
while (true)
{
int iLength = gzip.read (tByte, 0, 1024); // <-- Error comes here
if (iLength < 0)
break;
szBuffer.append (new String (tByte, 0, iLength));
}
And this one that I get on this forum :
InputStream gzipStream = new GZIPInputStream (new ByteArrayInputStream (tBytes));
Reader decoder = new InputStreamReader (gzipStream, "UTF-8");//<- I tried ISO-8859-1 and get the same exception
BufferedReader buffered = new BufferedReader (decoder);
I guess this is an encoding error.
Best regards,
bill0ute

You don't show how you get the tBytes that you use to set up the gzip stream here:
GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));
One explanation is that you are including the entire HTTP response in tBytes. Instead, it should be only the content after the HTTP headers.
Another explanation is that the response is chunked.
edit: You are taking the data after the content-encoding line as the message body. However, according to the HTTP 1.1 specification the header fields do not come in any particular order, so this is very dangerous.
As explained in this part of the HTTP specification, the message body of a request or response doesn't come after a particular header field but after the first empty line:
Request (section 5) and Response
(section 6) messages use the generic
message format of RFC 822 [9] for
transferring entities (the payload of
the message). Both types of message
consist of a start-line, zero or more
header fields (also known as
"headers"), an empty line (i.e., a
line with nothing preceding the CRLF)
indicating the end of the header
fields, and possibly a message-body.
You still haven't show how exactly you compose tBytes, but at this point I think you're erroneously including the empty line in the data that you try to decompress. The message body starts after the CRLF characters of the empty line.
May I suggest that you use the httpclient library instead to extract the message body?

Well there is the problem I can see here;
int iLength = gzip.read (tByte, 0, 1024);
Use following to fix that;
byte[] buff = new byte[1024];
byte[] emptyBuff = new byte[1024];
StringBuffer unGzipRes = new StringBuffer();
int byteCount = 0;
while ((byteCount = gzip.read(buff, 0, 1024)) > 0) {
// only append the buff elements that
// contains data
unGzipRes.append(new String(Arrays.copyOf(
buff, byteCount), "utf-8"));
// empty the buff for re-usability and
// prevent dirty data attached at the
// end of the buff
System.arraycopy(emptyBuff, 0, buff, 0,
1024);
}

Related

How to read http request properly?

How to read HTTP request using InputStream? I used to read it like this:
InputStream in = address.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder result = new StringBuilder();
String line;
while((line = reader.readLine()) != null) {
result.append(line);
}
System.out.println(result.toString());
But reader.readLine() could be blocked, because there is no guarantee that null line will be reached. Of course I can read Content-Length header and then read request in a loop:
for (int i = 0; i < contentLength; i++) {
int a = br.read();
body.append((char) a);
}
But if Content-Length is set too big (I guess it could be set manually for purpose), br.read() will be blocked.
I try to read bytes directly from InputStream like this:
byte[] bytes = getBytes(is);
public static byte[] getBytes(InputStream is) throws IOException {
int len;
int size = 1024;
byte[] buf;
if (is instanceof ByteArrayInputStream) {
size = is.available();
buf = new byte[size];
len = is.read(buf, 0, size);
} else {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
buf = new byte[size];
while ((len = is.read(buf, 0, size)) != -1)
bos.write(buf, 0, len);
buf = bos.toByteArray();
}
return buf;
}
But it waits forever. What do?

If you are implementing HTTP server you should detect the end of the request according to HTTP specification. Wiki - https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
First of all, you should read a request line, it is always a single line.
Then read all request headers. You read them until you have an empty line (i.e. two line endings - <CR><LF>).
After you have a status line and headers you should decide do you need to read body or no because not all requests might have a body - summary table
Then, if you need a body, you should parse your headers (which you already got) and get Content-Length. If it is - just read as many bytes from the stream as it is specified.
When Content-Length is missing the length is determined in other ways. Chunked transfer encoding uses a chunk size of 0 to mark the end of the content. Identity encoding without Content-Length reads content until the socket is closed.

Create a request wrapper which extends HttpServletRequestWrapper, which will override the getInputStream() which in turn return ServletInputStream , which has the safe read method. try that

Extracting the body from HTTP post requestH

I have a weird problem when trying to extract the body of a given
HTTP post request.
If I try to extract only the header, it works fine. When I try to extract the body, the method blocks (even thought the stream still has data in it).
Here is my code:
private void extractHeader() throws Exception {
StringBuffer buffer = new StringBuffer();
InputStreamReader reader = new InputStreamReader(socket.getInputStream());
BufferedReader bufferedReader = new BufferedReader(reader);
boolean extractBody = false;
int bodyLength = 0;
String line;
while (!(line = bufferedReader.readLine()).equals("")) {
buffer.append(line + "");
if (line.startsWith("POST")) {
extractBody = true;
}
if (line.startsWith("Content-Length:")) {
bodyLength = Integer.valueOf(line.substring(line.indexOf(' ') + 1, line.length()));
}
}
requestHeader = buffer.toString();
if (extractBody) {
char[] body = new char[bodyLength];
reader.read(body, 0, bodyLength);
requestBody = new String(body);
}
}
And this is the request request:
POST /params_info.html HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://localhost:8080/index.html
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 31
firstname=Mickey&lastname=Mouse
From what I understand, the loop will go until seeing the empty string
and then stoping. At this stage, the reader can read 'Content-Length' number of bytes. So it should have no problem reading the body and finish. Instead, it blocks on the line 'reader.read(body, 0, bodyLength);'
(The reason I don't use readLine() is because body does not end with \n).
I've tried debugging it in al kinds of ways but I get nothing. Can anyone please help with this?

You're reading the header using the bufferedReader:
while (!(line = bufferedReader.readLine()).equals("")) {
but read the body using reader, which has no data available, as this has been read and buffered by the bufferedReader:
reader.read(body, 0, bodyLength);
Change that line to
bufferedReader.read(body, 0, bodyLength);

Handling POST request via Socket in Java

I'm trying to handle a simple POST Request in Java using a Socket.
I can receive the request header and answer the request without any problem, but I certainly can not get the body of the request.
I read somewhere that I'd need to open a second InputStream to achive this, but this doesn't really makes sense to me. Do you have any tips on how to get the request body?
This is what I basically use to get the header:
BufferedReader in = new BufferedReader(new InputStreamReader(
clientSocket.getInputStream()));
char[] inputBuffer = new char[INPUT_BUFFER_LENGTH];
int inputMessageLength = in.read(inputBuffer, 0,
INPUT_BUFFER_LENGTH);
String inputMessage = new String(inputBuffer, 0, inputMessageLength);
So, the message I get is something like:
POST / HTTP/1.1
User-Agent: Java/1.8.0_45
Host: localhost:5555
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
But I can't get the parameters of the POST request.
Edit:
So it turned out I just had INPUT_BUFFER_LENGTH up high enough (I know, shame on me).
So as it worked I changed my ServerSocket to SSLServerSocket and tried again to send a request with a HttpsUrlConnection from Java, now I have the same problem again (already checked the buffer), getting something like this:
POST / HTTP/1.1
User-Agent: Java/1.8.0_45
Host: localhost:5555
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-type: application/x-www-form-urlencoded
Content-Length: 128
*Missing Body*
It turned out I only get this when sending requests with my Java-Client - Sending requests from Chrome, etc are working fine - so I assume I got something wrong in my code.
This is what I use to send the request:
System.setProperty("javax.net.ssl.trustStore", ...);
System.setProperty("javax.net.ssl.trustStorePassword", ...);
SSLSocketFactory socketFactory = (SSLSocketFactory) SSLSocketFactory
.getDefault();
String url = "https://...";
URL obj = new URL(url);
HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();
HttpsURLConnection.setDefaultSSLSocketFactory(socketFactory);
con.setRequestMethod("POST");
con.setDoOutput(true);
OutputStreamWriter writer = new OutputStreamWriter(con.getOutputStream());
writer.write(*Some String*);
writer.flush();
writer.close();
Any tips on what might be wrong with my code?

The code you have shown is not the correct way to read HTTP requests.
First off, Java has its own HttpServer and HttpsServer classes. You should consider using them.
Otherwise, you have to implement the HTTP protocol manually. You need to read the input line-by-line until you reach an empty line indicating the end of the request headers, then look at the headers you have read, in particular the Transfer-Encoding and Content-Length headers, to know how to read the remaining bytes of the request, per RFC 2616 Section 4.4:
4.4 Message Length
The transfer-length of a message is the length of the message-body as
it appears in the message; that is, after any transfer-codings have
been applied. When a message-body is included with a message, the
transfer-length of that body is determined by one of the following
(in order of precedence):
Any response message which "MUST NOT" include a message-body (such
as the 1xx, 204, and 304 responses and any response to a HEAD
request) is always terminated by the first empty line after the
header fields, regardless of the entity-header fields present in
the message.
If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.
If a Content-Length header field (section 14.13) is present, its
decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent
if these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.
If the message uses the media type "multipart/byteranges", and the
ransfer-length is not otherwise specified, then this self-
elimiting media type defines the transfer-length. This media type
UST NOT be used unless the sender knows that the recipient can arse
it; the presence in a request of a Range header with ultiple byte-
range specifiers from a 1.1 client implies that the lient can parse
multipart/byteranges responses.
A range header might be forwarded by a 1.0 proxy that does not
understand multipart/byteranges; in this case the server MUST
delimit the message using methods defined in items 1,3 or 5 of
this section.
By the server closing the connection. (Closing the connection
cannot be used to indicate the end of a request body, since that
would leave no possibility for the server to send back a response.)
For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
containing a message-body MUST include a valid Content-Length header
field unless the server is known to be HTTP/1.1 compliant. If a
request contains a message-body and a Content-Length is not given,
the server SHOULD respond with 400 (bad request) if it cannot
determine the length of the message, or with 411 (length required) if
it wishes to insist on receiving a valid Content-Length.
All HTTP/1.1 applications that receive entities MUST accept the
"chunked" transfer-coding (section 3.6), thus allowing this mechanism
to be used for messages when the message length cannot be determined
in advance.
Messages MUST NOT include both a Content-Length header field and a
non-identity transfer-coding. If the message does include a non-
identity transfer-coding, the Content-Length MUST be ignored.
When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.
Try something more like this (semi-pseudo code):
String readLine(BufferedInputStream in)
{
// HTTP carries both textual and binary elements.
// Not using BufferedReader.readLine() so it does
// not "steal" bytes from BufferedInputStream...
// HTTP itself only allows 7bit ASCII characters
// in headers, but some header values may be
// further encoded using RFC 2231 or 5987 to
// carry Unicode characters ...
InputStreamReader r = new InputStreamReader(in, StandardCharsets.US_ASCII);
StringBuilder sb = new StringBuilder();
char c;
while ((c = r.read()) >= 0) {
if (c == '\n') break;
if (c == '\r') {
c = r.read();
if ((c < 0) || (c == '\n')) break;
sb.append('\r');
}
sb.append(c);
}
return sb.toString();
}
...
BufferedInputStream in = new BufferedInputStream(clientSocket.getInputStream());
String request = readLine(in);
// extract method, resource, and version...
String line;
do
{
line = readLine(in);
if (line.isEmpty()) break;
// store line in headers list...
}
while (true);
// parse headers list...
if (request method has a message-body) // POST, etc
{
if ((request version >= 1.1) &&
(Transfer-Encoding header is present) &&
(Transfer-Encoding != "identity"))
{
// read chunks...
do
{
line = readLine(in); // read chunk header
int size = extract value from line;
if (size == 0) break;
// use in.read() to read the specified
// number of bytes into message-body...
readLine(in); // skip trailing line break
}
while (true);
// read trailing headers...
line = readLine(in);
while (!line.isEmpty())
{
// store line in headers list, updating
// any existing header as needed...
}
// parse headers list again ...
}
else if (Content-Length header is present)
{
// use in.read() to read the specified
// number of bytes into message-body...
}
else if (Content-Type is "multipart/...")
{
// use readLine(in) and in.read() as needed
// to read/parse/decode MIME encoded data into
// message-body until terminating MIME boundary
// is reached...
}
else
{
// fail the request...
}
}
// process request and message-body as needed..

Decompressing a gzipped http response

Hello fellow java developers. I receive a response with headers and body as below, but when I try to decompress it using the code below, it fails with this exception:
java.io.IOException: Not in GZIP format
Response:
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Encoding: gzip
Server: Jetty(6.1.x)
▼ ═UMs¢0►=7┐ép?╙6-C╚$╢gΩ↓╟±╪₧∟zS╨╓╓♦$FÆ╒÷▀G┬╚╞8N≤╤Cf°►╦█╖╗o↨æJÄ+`:↓2
♣»└√S▬L&?∙┬_)U╔|♣%ûíyk_à\,æ] hⁿ?▀xΓ∟o╜4♫ù\#MAHG?┤(Q¶╞⌡▌Ç?▼ô[7Fí¼↔φ☻I%╓╣Z♂?¿↨F;x|♦o/A╬♣╘≡∞─≤╝╘U∙♥0☺æ?|J%à{(éUmHµ %σl┴▼Ç9♣┌Ç?♫╡5╠yë~├╜♦íi♫╥╧
╬û?▓ε?╞┼→RtGqè₧ójWë♫╩∞j05├╞┘|>┘º∙↑j╪2┐|= ÷²
eY\╛P?#5wÑqc╙τ♦▓½Θt£6q∩?┌4┼t♠↕=7æƒ╙?╟|♂;║)∩÷≈═^╛{v⌂┌∞◄>6ä╝|
Code:
byte[] b= IOUtils.toByteArray(sock.getInputStream());
ByteArrayInputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);
String readed;
while ((readed = in.readLine()) != null) {
System.out.println("read: "+readed);
}
Please advise.
Thanks,
Pradeep

The MIME header is NOT in the GZIP format, it's in plain text. You have to read that first before you can decompress the stream.
Also, why not just use this:
InputStream in = sock.getInputStream();
readHeader(in);
InputStream zin = new GZIPInputStream(in);

There are libraries for all of this. You can use, for example, Apache HTTP Components, or you can read its open source to see what it does. At very least, read the relevant specification.

I second bmarguiles' answer.
Only the body (response-body in the RFC) is compressed, so you only need to decompress the part that is after the \r\n\r\n.
Generally speaking, you can cut the response in half by that double CRLF, and only decompress the second half.

how to decompress http response?

Can any one of you solve this problem !
Problem Description:
i have received content-encoding: gzip header from http web-server.
now i want to decode the content but when i use GZIP classes from jdk 1.6.12, it gives null.
does it means that contents are not in gzip format ? or are there some another classes for decompress http response content?
Sample Code:
System.out.println("Reading InputStream");
InputStream in = httpuc.getInputStream();// httpuc is an object of httpurlconnection<br>
System.out.println("Before reading GZIP inputstream");
System.out.println(in);
GZIPInputStream gin = new GZIPInputStream(in));
System.out.println("After reading GZIP inputstream");
Output:
Reading InputStream
Before reading GZIP inputstream
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#8acf6e
null
I have found one error in code, but don't able to understand it properly. what does it indicates.
Error ! java.io.EOFException
Thanks

I think you should have a look at HTTPClient, which will handle a lot of the HTTP issues for you. In particular, it allows access to the response body, which may be gzipped, and then you simply feed that through a GZIPInputStream
e.g.
Header hce = postMethod.getResponseHeader("Content-Encoding");
InputStream in = null;
if(null != hce)
{
if(hce.getValue().equals(GZIP)) {
in = new GZIPInputStream(postMethod.getResponseBodyAsStream());
}
// etc...

I second Brian's suggestion. Whenever u need to deal with getting/posting stuff via HTTP don't bother with low-level access use the Apache HTTP client.

InputStream is = con.getInputStream();
InputStream bodyStream = new GZIPInputStream(is);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int length;
while ((length = bodyStream.read(buffer)) > 0) {
outStream.write(buffer, 0, length);
}
String body = new String(outStream.toByteArray(), "UTF-8");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Uncompress GZIPed HTTP Response in Java - java

Related

How to read http request properly?

Extracting the body from HTTP post requestH

Handling POST request via Socket in Java

Decompressing a gzipped http response

how to decompress http response?

Categories

Resources