Decompressing a gzipped http response

Decompressing a gzipped http response - java

Hello fellow java developers. I receive a response with headers and body as below, but when I try to decompress it using the code below, it fails with this exception:
java.io.IOException: Not in GZIP format
Response:
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Encoding: gzip
Server: Jetty(6.1.x)
▼ ═UMs¢0►=7┐ép?╙6-C╚$╢gΩ↓╟±╪₧∟zS╨╓╓♦$FÆ╒÷▀G┬╚╞8N≤╤Cf°►╦█╖╗o↨æJÄ+`:↓2
♣»└√S▬L&?∙┬_)U╔|♣%ûíyk_à\,æ] hⁿ?▀xΓ∟o╜4♫ù\#MAHG?┤(Q¶╞⌡▌Ç?▼ô[7Fí¼↔φ☻I%╓╣Z♂?¿↨F;x|♦o/A╬♣╘≡∞─≤╝╘U∙♥0☺æ?|J%à{(éUmHµ %σl┴▼Ç9♣┌Ç?♫╡5╠yë~├╜♦íi♫╥╧
╬û?▓ε?╞┼→RtGqè₧ójWë♫╩∞j05├╞┘|>┘º∙↑j╪2┐|= ÷²
eY\╛P?#5wÑqc╙τ♦▓½Θt£6q∩?┌4┼t♠↕=7æƒ╙?╟|♂;║)∩÷≈═^╛{v⌂┌∞◄>6ä╝|
Code:
byte[] b= IOUtils.toByteArray(sock.getInputStream());
ByteArrayInputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);
String readed;
while ((readed = in.readLine()) != null) {
System.out.println("read: "+readed);
}
Please advise.
Thanks,
Pradeep

The MIME header is NOT in the GZIP format, it's in plain text. You have to read that first before you can decompress the stream.
Also, why not just use this:
InputStream in = sock.getInputStream();
readHeader(in);
InputStream zin = new GZIPInputStream(in);

There are libraries for all of this. You can use, for example, Apache HTTP Components, or you can read its open source to see what it does. At very least, read the relevant specification.

I second bmarguiles' answer.
Only the body (response-body in the RFC) is compressed, so you only need to decompress the part that is after the \r\n\r\n.
Generally speaking, you can cut the response in half by that double CRLF, and only decompress the second half.

Related

Filter HTTPEntity output to include new lines

I am using org.apache.http.HttpEntity for doing a multipart/form data POST to HTTPURLConnection to upload a file.
Here is the code that I am using.
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
String part1 = "\n{\"name\":\"test.txt\",\"creationTime\":1527023510389,\"fileUri\":\"/storage/test.txt\"}";
File file = new File("/storage/test.txt");
HttpEntity entity = MultipartEntityBuilder.create()
.setMode(HttpMultipartMode.BROWSER_COMPATIBLE)
.addBinaryBody("data", part1.getBytes(), ContentType.APPLICATION_JSON, "data.txt")
.addBinaryBody("file", file, ContentType.TEXT_PLAIN, filename)
.setBoundary(boundaryString)
.build();
OutputStream os = conn.getOutputStream();
entity.writeTo(os);
I see that the body is being posted as the following.
--BOUNDARY
Content-Disposition: form-data; name="metadata"; filename="metadata.txt"
Content-Type: application/json
{"name":"test.txt","creationTime":1527023510389,"fileUri":"/storage/test.txt"}
--BOUNDARY
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: text/plain; charset=ISO-8859-1
test file contents
--BOUNDARY--
The problem is that the server requires a new line between the Content-Type and the contents of the first part. I've tried adding extra "\n" to the beginning contents (as seen but it gets erased when using HttpEntity.writeto().
The output that I want is the following:
--BOUNDARY
Content-Disposition: form-data; name="metadata"; filename="metadata.txt"
Content-Type: application/json
{"name":"test.txt","creationTime":1527023510389,"fileUri":"/storage/test.txt"}
--BOUNDARY
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: text/plain; charset=ISO-8859-1
test file contents
--BOUNDARY--
I attempted to modify rewriting the output but not sure if this is the best way to do it by storing in a temporary file. The files I will be working with will be up to 20mb if that makes any difference.
entity.writeTo(new FileOutputStream("file.tmp"));
BufferedReader reader = new BufferedReader(new FileReader("file.tmp"));
OutputStream os = conn.getOutputStream();
PrintWriter writer = new PrintWriter(new BufferedOutputStream(os));
String str;
while ((str = reader.readLine()) != null) {
writer.println(str);
if (str.contains("Content-Type: ")) {
writer.println("\n");
}
}
writer.close();
reader.close();
os.close();
conn.connect();
if (conn.getResponseCode() == HttpURLConnection.HTTP_OK) {
// It's failing when accessing the above method
}
I tried running the above code and I get the following error:
java.lang.IllegalStateException: state: 2
at com.android.okhttp.internal.http.HttpConnection.readResponse(HttpConnection.java:234)
at com.android.okhttp.internal.http.HttpTransport.readResponseHeaders(HttpTransport.java:104)
at com.android.okhttp.internal.http.HttpEngine.readNetworkResponse(HttpEngine.java:1156)
at com.android.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:976)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:509)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:438)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:567)
at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105)
at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java)

It turns out that the HttpEntity.writeTo method is putting the necessary new lines but when I was printing the output to System.out, Android Studio's Logcat does not show plain new lines. I confirmed this by opening the file.tmp I was creating above and it had the proper new lines in there. It looks like there's some other error with the request since the body is valid for the server.
EDIT: Found the error in my request. I wasn't setting the Content-Type (I think I erased it while deleting some other code). I ended up using this to set the content type.
conn.addRequestProperty(entity.getContentType().getName(), entity.getContentType().getValue());

How to upload binary file using URLConnection

In order to upload a binary file to an URL, I have been advised to use this guide. However, the file is not in a directory, but is stored in a BLOB field in MySql db. The BLOB field is mapped as a byte[] property in JPA:
byte[] binaryFile;
I have slightly modified the code taken from the guide, in this way:
HttpURLConnection connection = (HttpURLConnection ) new URL(url).openConnection();
// set some connection properties
OutputStream output = connection.getOutputStream();
PrintWriter writer = new PrintWriter(new OutputStreamWriter(output, CHARSET), true);
// set some headers with writer
InputStream file = new ByteArrayInputStream(myEntity.getBinaryFile());
System.out.println("Size: " + file.available());
try {
byte[] buffer = new byte[4096];
int length;
while ((length = file.read(buffer)) > 0) {
output.write(buffer, 0, length);
}
output.flush();
writer.append(CRLF).flush();
writer.append("--" + boundary + "--").append(CRLF).flush();
}
// catch and close streams
I am not using chunked streaming. The headers used are:
username and password
Content-Disposition: form-data; name=\"file\"; filename=\"myFileName\"\r\nContent-Type: application/octet-stream"
Content-Transfer-Encoding: binary
All the headers are received correctly by the host. It also receives the uploaded file, but unfortunately complains that the file is not readable, and asserts that the size of the received file is 37 bytes larger than the size outputed by my code.
My knowledge of streams, connections and byte[] is too limited for grasping the way to fix this. Any hints appreciated.
EDIT
As suggested by the commenter, I have tried also to write the byte[] directly, without using the ByteArrayInputStream:
output.write(myEntity.getBinaryFile());
Unfortunately the host gives exactly the same answer as the other way.

My code was correct.
The host was giving an error because it didn't expect the Content-Transfer-Encoding header. After removing it, everything went fine.

Using HttpClient 4.1 to decode chunked data

I am using HttpClient to send a request a server which is supposed to return xml data. This data is returned as chunked data. I am then trying to write the received xml data to a file. The code I use is shown below:
HttpEntity entity = response.getEntity();
InputStream instream = entity.getContent();
try {
// do something useful
InputStreamReader isr = new InputStreamReader(instream);
FileWriter pw;
pw = new FileWriter(filename, append);
OutputStreamWriter outWriter = new OutputStreamWriter(new FileOutputStream(filename, append), "UTF-8");
BufferedReader rd = new BufferedReader(isr);
String line = "";
while ((line = rd.readLine()) != null) {
// pw.write(line);
outWriter.write(line);
}
isr.close();
pw.close();
} finally {
instream.close();
}
This results in data that looks as follows to be printed to the file:
This code works for non chunked data. How do I properly handle chunked data responses using HttpClient. Any help is greatly appreciated.

I don't think that your problem is the chunking of data.
XML data is plain text data - chunking it means that it is split into several parts that are transfered after another. Therefore each chunk should contain visible plain text xml data which is obviously not the case as shown in the data picture.
May be the content is encoded compressed via gzip or it is not plain text XML but binary encoded XML (e.g. like WBXML).
What concrete type you have you can see from the sent server response headers, especially the used mime type it contains.

Uncompress GZIPed HTTP Response in Java

I'm trying to uncompress a GZIPed HTTP Response by using GZIPInputStream. However I always have the same exception when I try to read the stream : java.util.zip.ZipException: invalid bit length repeat
My HTTP Request Header:
GET www.myurl.com HTTP/1.0\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
X-Requested-With: XMLHttpRequest\r\n
Cookie: Some Cookies\r\n\r\n
At the end of the HTTP Response header, I get path=/Content-Encoding: gzip, followed by the gziped response.
I tried 2 similars codes to uncompress :
UPDATE : In the following codes, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();
GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));
StringBuffer szBuffer = new StringBuffer ();
byte tByte [] = new byte [1024];
while (true)
{
int iLength = gzip.read (tByte, 0, 1024); // <-- Error comes here
if (iLength < 0)
break;
szBuffer.append (new String (tByte, 0, iLength));
}
And this one that I get on this forum :
InputStream gzipStream = new GZIPInputStream (new ByteArrayInputStream (tBytes));
Reader decoder = new InputStreamReader (gzipStream, "UTF-8");//<- I tried ISO-8859-1 and get the same exception
BufferedReader buffered = new BufferedReader (decoder);
I guess this is an encoding error.
Best regards,
bill0ute

You don't show how you get the tBytes that you use to set up the gzip stream here:
GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));
One explanation is that you are including the entire HTTP response in tBytes. Instead, it should be only the content after the HTTP headers.
Another explanation is that the response is chunked.
edit: You are taking the data after the content-encoding line as the message body. However, according to the HTTP 1.1 specification the header fields do not come in any particular order, so this is very dangerous.
As explained in this part of the HTTP specification, the message body of a request or response doesn't come after a particular header field but after the first empty line:
Request (section 5) and Response
(section 6) messages use the generic
message format of RFC 822 [9] for
transferring entities (the payload of
the message). Both types of message
consist of a start-line, zero or more
header fields (also known as
"headers"), an empty line (i.e., a
line with nothing preceding the CRLF)
indicating the end of the header
fields, and possibly a message-body.
You still haven't show how exactly you compose tBytes, but at this point I think you're erroneously including the empty line in the data that you try to decompress. The message body starts after the CRLF characters of the empty line.
May I suggest that you use the httpclient library instead to extract the message body?

Well there is the problem I can see here;
int iLength = gzip.read (tByte, 0, 1024);
Use following to fix that;
byte[] buff = new byte[1024];
byte[] emptyBuff = new byte[1024];
StringBuffer unGzipRes = new StringBuffer();
int byteCount = 0;
while ((byteCount = gzip.read(buff, 0, 1024)) > 0) {
// only append the buff elements that
// contains data
unGzipRes.append(new String(Arrays.copyOf(
buff, byteCount), "utf-8"));
// empty the buff for re-usability and
// prevent dirty data attached at the
// end of the buff
System.arraycopy(emptyBuff, 0, buff, 0,
1024);
}

how to decompress http response?

Can any one of you solve this problem !
Problem Description:
i have received content-encoding: gzip header from http web-server.
now i want to decode the content but when i use GZIP classes from jdk 1.6.12, it gives null.
does it means that contents are not in gzip format ? or are there some another classes for decompress http response content?
Sample Code:
System.out.println("Reading InputStream");
InputStream in = httpuc.getInputStream();// httpuc is an object of httpurlconnection<br>
System.out.println("Before reading GZIP inputstream");
System.out.println(in);
GZIPInputStream gin = new GZIPInputStream(in));
System.out.println("After reading GZIP inputstream");
Output:
Reading InputStream
Before reading GZIP inputstream
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#8acf6e
null
I have found one error in code, but don't able to understand it properly. what does it indicates.
Error ! java.io.EOFException
Thanks

I think you should have a look at HTTPClient, which will handle a lot of the HTTP issues for you. In particular, it allows access to the response body, which may be gzipped, and then you simply feed that through a GZIPInputStream
e.g.
Header hce = postMethod.getResponseHeader("Content-Encoding");
InputStream in = null;
if(null != hce)
{
if(hce.getValue().equals(GZIP)) {
in = new GZIPInputStream(postMethod.getResponseBodyAsStream());
}
// etc...

I second Brian's suggestion. Whenever u need to deal with getting/posting stuff via HTTP don't bother with low-level access use the Apache HTTP client.

InputStream is = con.getInputStream();
InputStream bodyStream = new GZIPInputStream(is);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
byte[] buffer = new byte[4096];
int length;
while ((length = bodyStream.read(buffer)) > 0) {
outStream.write(buffer, 0, length);
}
String body = new String(outStream.toByteArray(), "UTF-8");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Decompressing a gzipped http response - java

The MIME header is NOT in the GZIP format, it's in plain text. You have to read that first before you can decompress the stream. Also, why not just use this: InputStream in = sock.getInputStream(); readHeader(in); InputStream zin = new GZIPInputStream(in);

There are libraries for all of this. You can use, for example, Apache HTTP Components, or you can read its open source to see what it does. At very least, read the relevant specification.

I second bmarguiles' answer. Only the body (response-body in the RFC) is compressed, so you only need to decompress the part that is after the \r\n\r\n. Generally speaking, you can cut the response in half by that double CRLF, and only decompress the second half.

Related

Filter HTTPEntity output to include new lines

How to upload binary file using URLConnection

Using HttpClient 4.1 to decode chunked data

Uncompress GZIPed HTTP Response in Java

how to decompress http response?

Categories

Resources