Encoded Http Request/Response body - java

I've built an Android proxy server passing http request and responses using Java Sockets.
The proxy is working, all content in browser is passing through it. However I would be able to read requests/responses but their body seems to be encoded:
GET http://m.onet.pl/ HTTP/1.1
Host: m.onet.pl
Proxy-Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Linux; Android 4.4.4; XT1039 Build/KXB21.14-L1.56) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en;q=0.8,en-US;q=0.6,pl;q=0.4
Cookie: onet_ubi=201509221839473724130028; onetzuo_ticket=9AEDF08D278EC7965FF6A20BABD36EF0010012ED90FDD127C16068426F8B65A5D81A000000000000000050521881000000; onet_cid=dd6df83b3a8c33cd497d1ec3fcdea91b; __gfp_64b=2Mp2U1jvfJ3L9f.y6CbKfJ0oVfA7pVdBYfT58G1nf7T.p7; ea_uuid=201509221839478728300022; onet_cinf=1; __utma=86187972.1288403231.1442939988.1444999380.1445243557.40; __utmb=86187972.13.10.1445243557; __utmc=86187972; __utmz=86187972.1442939988.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
So both in request and response a lot of "���" occurs. I didn't find any info about http encoding. What is it ? How can I properly read body ?
Assuming it might be GZIPed message I tried:
while ((count = externalServerInputReader.read(buf, 0, buf.length)) != -1)
{
String stream = new String(buf, 0 , count);
proxyOutputStream.write(buf, 0, count);
if (stream.contains("content-encoding: gzip")) {
ByteArrayInputStream bais = new ByteArrayInputStream(buf);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);
String readed;
while ((readed = in.readLine()) != null) {
Log.d("Hello", "UnGzip: " + readed);
}
}
}
proxyOutputStream.flush();
However I get error on ungzipping attempt.
unknown format (magic number 5448)

I tried your sample request by saving it to "/tmp/req" and replaying it using cat /tmp/req | nc m.onet.pl 80. The server sent back a gzip encoded response, which I could tell from the response header content-encoding: gzip. In the case where the response is gzip encoded, you could decompress it in Java using java.util.zip.GZIPInputStream. Note that the user agent in your example is also advertising support for "deflate" and "sdch" too, so you may also get responses with those encodings. The "deflate" encoding can be decompressed using java.util.zip.InflaterInputStream. I'm not aware of any built in support for sdch, so you would need to find or write a library to decompress that - see this other Stack Overflow question for a possible starting point: "Java SDCH compressor/decompressor".
To address the updated part of your question where you added a stab at using GZIPInputStream, the most immediate issue is that you should only gunzip the stream after the HTTP response headers have ended. The simplest thing to do would be to wait for "\r\n\r\n" to come across the underlying InputStream (not a Reader) and then run the data starting with the next byte on through a single GZIPInputStream. That should probably work for the example you gave - I successfully decoded the replayed response I got using gunzip -c. For thoroughness, there are some other issues that will keep this from working as a general solution for arbitrary websites, but I think it will be enough to get you started. (Some examples: 1) you might miss a "content-encoding" header because you are splitting the response into chunks of length buf.length. 2) Responses which use chunked encoding would need to be de-chunked. 3) Keep-alive responses would necessitate that you track when the response ends rather than waiting for end of stream.)

Related

JAX-RS client sending an array of bytes in a PUT request

I've been banging my head against the wall with coding up a JAX-RS client for uploading a file to a Red Hat Satellite Six server. The REST API for the server specifies what appears to be a somewhat unusual method of uploading the file. The calling pattern specifies that I create an upload request and then use a PUT to a URL incorporating the id for the upload request (as part of the URL) and two parameters in the body: bytes from the file and an offset for where in the file those bytes belong. The intention is that callers would read in a file and send chunks of data to the server, which it will then re-assemble when a final call is invoked to commit the upload.
I've found a working Ruby client that implements the basic algorithm and I've confirmed it manipulates the API properly to get the file uploaded. The sequence of steps is basically what I have in my Java code: issue an API call to get an upload request id, then enter a loop to read in some bytes and PUT them to the upload URL. I've tcpdumped the client and see the following request (slightly truncated and cleaned up for readability):
PUT /katello/api/repositories/90/content_uploads/ee9028cf-bed6-40fb8561f91a86b95bdc HTTP/1.1
Accept: application/json;version=2
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Accept-Language: en
Multipart: true
Content-Length: 8643
User-Agent: Ruby
Authorization: Basic XXXXXXXX==
Host: mysathost.example.com
offset=0&content=%ED%AB%EE%DB%03%00%00%00%00%FFhelloworld-0.0.1-SNAPSHOT20150319153318%00%00%00%00% ...snip...
My JAX-RS Client's request looks like this:
PUT /katello/api/v2/repositories/90/content_uploads/ff4a4273-0b3f-49f0-8e61-223259e09f01 HTTP/1.1
Accept: application/json
Content-Type: application/x-www-form-urlencoded
Authorization: Basic XXXXXXXXX==
User-Agent: Jersey/2.13 (HttpUrlConnection 1.8.0_40)
Host: mysathost.example.com
Connection: keep-alive
Content-Length: 13567
offset=0&content=%EF%BF%BD%EF%BF%BD%EF%BF%BD%03%00%00%00%00%EF%BF%BDhelloworld-0.0.1-SNAPSHOT20150319153318%00%00%00%00 ...snip...
There are some obvious differences between the two. The most obvious is that the functioning Ruby client has a "Multipart: true" header that my client doesn't have. The Content-Length is different, possibly because the Ruby client is gzipping the request. Finally, it is obvious that the bytes for my file are being displayed differently between the two even though I'm using the same test file.
I have tried using the Multipart form and provider for Jersey but the raw requests don't seem like Multipart requests and I can't seem to find an Entity that has a way to natively send byte arrays (e.g. with a method that has byte[] in its signature) but still keep the right Content-Type.
As for the byte array encoding, the Ruby code appears to do a file.read of 4K worth of data at a time and dump the results into a variable and then passes that variable into its REST client machinery where it gets digested in a way that I can't trace. I thought it might be Base64 encoding the bytes (with URL escaping), but when I tried that with Commons Codec, the output of my tcpdumps didn't look remotely like the Ruby client. On the assumption that it is just treating the bytes as a Unicode string, I tried to do the same thing in my Java code. That looks closer to what the Ruby client does, but obviously the bytes don't seem to match exactly in the output and the Satellite Server complains that the file is corrupted when I commit the request. Currently my JAX-RS call looks like this:
WebTarget contentUpload = satServer.path("repositories").path(repoId).path("content_uploads").path(uploadRequestId);
Form uploadForm = new Form();
uploadForm.param("offset", Integer.toString(offset));
// data is a byte[]
uploadForm.param("content", new String(data));
Response response = contentUpload.request(MediaType.APPLICATION_JSON).put(Entity.form(uploadForm));
if (response.getStatus() != Response.Status.OK.getStatusCode()) {
StatusType statusInfo = response.getStatusInfo();
response.close();
throw new SatelliteException(
"Encountered error while uploading offset: " + offset
+ " for " + uploadRequestId + " : "
+ statusInfo.getStatusCode() + ": "
+ statusInfo.getReasonPhrase());
}
I have tried new String(data,"UTF-8") as well with no luck. I have also tried Commons Codec Base64 encoding with URL safety enabled. I have also tried a multipart/form, but the working request doesn't really seem to follow that pattern.
I'm looking for some idea of how to encode my bytes to match the working client or maybe some sort of Jersey media processing that can handle sending a byte array in a regular form. Or another suggestion about what I should be looking at. I'm not afraid to do some more digging if I can be pointed in a direction, but I feel I've exhausted what I can learn from the implementation of the Ruby client and server for the moment. It appears that receiving server is using Apipie and Ruby for its implementation, if that helps.

jquery ajax and java server, lost data

i have this ajax function that looks like so
$.ajax({
type: "POST",
url: "http://localhost:55556",
data: "lots and lots of pie",
cache: false,
success: function(result)
{
alert("sent");
},
failure: function()
{
alert('An Error has occured, please try again.');
}
});
and a server that looks like so
clientSocket = AcceptConnection();
inp = new BufferedReader(new InputStreamReader (clientSocket.getInputStream()));
String requestString = inp.readLine();
BufferedReader ed = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));
while(true){
String tmp = inp.readLine();
System.out.println(tmp);
}
now the odd thing is when i send my ajax my server gets by using system.out
Host: localhost:55556
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Content-Length: 20
Origin: null
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
the question is where is the data that i sent through, where is lots of pie?
The data should come after a blank line after the header lines, but I think the problem is that the data does not end with a newline character, and therefore, you cannot read it with the .readLine() method.
While looping through the header lines, you could look for the "Content-Length" line and get the length of the data. When you have reached the blank line, stop using .readLine(). Instead switch to reading one character at a time, reading the number of characters specified by the "Content-Length" header. I think you can find example code for this in this answer.
If you can, I suggest you use a library to help with this. I think the Apache HTTP Core library can help with this. See this answer.

Neglecting HTTP Post request headers when read using BufferedReader

I am currently listening on a port using BufferedReader like:
ServerSocket ss = new ServerSocket(2346);
Socket s = ss.accept();
BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));
while(true){
inputLine = in.readLine();
if(inputLine==null)
break;
}
Now I am getting all the headers and everything like:
POST /record HTTP/1.1
Accept-Encoding: gzip,deflate
Content-Type: text/xml;charset=UTF-8
SOAPAction: ""
Content-Length: 1969
Host: localhost:2346
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1.1 (java 1.5)
<S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope">...
The problem is that I need just the content of the POST request(the last line above), so is there a Java parser that could do it. And in my request to the socket I need to give an extra empty line to allow it to be read properly. Is there a solution for this?
Thanks
The response body is always separated by one blank line from the response header. You can either write your own parser or use a library like HttpCore http://hc.apache.org/httpcomponents-core-ga/

HTTP gzip encoding of html

For a project of mine i'm having to code my own lite webserver.
At the moment it's doing what i want it to do, but kinda ... slow. at least to slow for me.
Therefore i was thinking about implementing gzip compression to speed things up.
Here's how.
public static String encodeToGZip(String data) {
ByteArrayOutputStream bout = null;
try {
bout = new ByteArrayOutputStream();
GZIPOutputStream output = new GZIPOutputStream(bout);
output.write(data.getBytes());
output.flush();
output.close();
bout.close();
} catch (IOException ex) {
ex.printStackTrace();
}
try {
return new String(bout.toByteArray(), "UTF-8");
} catch (UnsupportedEncodingException ex) {
return null;
}
}
the problem is that the webserver can't decode the data i've sent. eventhough it states that it accepts gzip encoding so i must be sending some corrupt data.
this is the result.
wireshark sniff==>
GET /login.html HTTP/1.1
Host: localhost:9090
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.79 Safari/535.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
HTTP/1.1 200 OK
Connection: close
Server: My Lite Server v0
Content-Encoding: gzip
Content-Type: text/html
...............T...N...0....#.......O...?...$...........BB...g...6...[.....u...........6......................g6e...............S......c..$..........`I
Gw............AOAhU...XO...d...].... IU...h...+......[.....Y.........b...|x.........rm1.........1.....L...uI.........S...n............F......T2.[$X.......M.....M.#*...........d....58HL:....Wx......Z...........m...t...Z.)'XQdg
......X.........~......(......<.......p/.......
..........."...6|7........3
...r.Sv.../...rT...."..........SrJ..........M.vR^...4$...
.q...x.................../...8...........M...y#...j......7........d..le....;..................~......o....F......
return new String(bout.toByteArray(), "UTF-8");
This line in your method will produce garbage strings.
The above constructor performs a transcoding operation from the given encoding to UTF-16. You take a bunch of arbitrary bytes and try to decode them as UTF-8. You can only decode UTF-8 encoded character data as UTF-8. Java does not have binary-safe strings (all strings are UTF-16); you must use byte arrays instead.
Just write the compressed bytes to your OutputStream.
Avoid using data.getBytes() as it uses the default system encoding. This will produce non-portable code as the default system encoding is system and configuration dependent. Prefer always setting an encoding explicitly.

Http Post not posting data

I'm trying to post some data from a Java client using sockets. It talks to localhost running php code, that simply spits out the post params sent to it.
Here is Java Client:
public static void main(String[] args) throws Exception {
Socket socket = new Socket("localhost", 8888);
String reqStr = "testString";
String urlParameters = URLEncoder.encode("myparam="+reqStr, "UTF-8");
System.out.println("Params: " + urlParameters);
try {
Writer out = new OutputStreamWriter(socket.getOutputStream(), "UTF-8");
out.write("POST /post3.php HTTP/1.1\r\n");
out.write("Host: localhost:8888\r\n");
out.write("Content-Length: " + Integer.toString(urlParameters.getBytes().length) + "\r\n");
out.write("Content-Type: text/html\r\n\n");
out.write(urlParameters);
out.write("\r\n");
out.flush();
InputStream inputstream = socket.getInputStream();
InputStreamReader inputstreamreader = new InputStreamReader(inputstream);
BufferedReader bufferedreader = new BufferedReader(inputstreamreader);
String string = null;
while ((string = bufferedreader.readLine()) != null) {
System.out.println("Received " + string);
}
} catch(Exception e) {
e.printStackTrace();
} finally {
socket.close();
}
}
This is how post3.php looks like:
<?php
$post = $_REQUEST;
echo print_r($post, true);
?>
I expect to see an array (myparams => "testString") as the response. But its not passing post args to server.
Here is output:
Received HTTP/1.1 200 OK
Received Date: Thu, 25 Aug 2011 20:25:56 GMT
Received Server: Apache/2.2.17 (Unix) mod_ssl/2.2.17 OpenSSL/0.9.8r DAV/2 PHP/5.3.6
Received X-Powered-By: PHP/5.3.6
Received Content-Length: 10
Received Content-Type: text/html
Received
Received Array
Received (
Received )
Just a FYI, this setup works for GET requests.
Any idea whats going on here?
As Jochen and chesles rightly point out, you are using the wrong Content-Type: header - it should indeed be application/x-www-form-urlencoded. However there are several other issues as well...
The last header should be seperated from the body by a blank line between the headers and the body. This should be a complete CRLF (\r\n), in your code it is just a new line (\n). This is an outright protocol violation and I'm a little surprised you haven't just got a 400 Bad Request back from the server, although Apache can be quite forgiving in this respect.
You should specify Connection: close to ensure that you are not left hanging around with open sockets, the server will close the connection as soon as the request is complete.
The final CRLF sequence is not required. PHP is intelligent enough to sort this out by itself, but other server languages and implementations may not be...
If you are working with any standardised protocol in it's raw state, you should always start by at least scanning over the RFC.
Also, please learn to secure your Apache installs...
It looks like you are trying to send data in application/x-www-form-urlencoded format, but you are setting the Content-Type to text/html.
Use
out.write("Content-Type: application/x-www-form-urlencoded\n\n");
instead. As this page states:
The Content-Length and Content-Type headers are critical because they tell the web server how many bytes of data to expect, and what kind, identified by a MIME type.
For sending form data, i.e. data in the format key=value&key2=value2 use application/x-www-form-urlencoded. It doesn't matter if the value contains HTML, XML, or other data; the server will interpret it for you and you'll be able to retrieve the data as usual in the $_POST or $_REQUEST arrays on the PHP end.
Alternatively, you can send your data as raw HTML, XML, etc. using the appropriate Content-Type header, but you then have to retrieve the data manually in PHP by reading the special file php://input:
<?php
echo file_get_contents("php://input");
?>
As an aside, if you're using this for anything sufficiently complex, I would strongly recommend the use of an HTTP client library like HTTPClient.

Categories

Resources