HttpUrlConnection reading chunked response - java

I'm working on a project where i have to use HttpUrlConnection (Android~) for reading the input stream.
It turns out that when i'm reading the input stream the data is malformed and has a bigger size from the original content (which is sent by the server). Now, the server response header contains both "Content-Length" and "Transfer-Encoding: chunked", which from what i know is an issue as both of them shouldn't coexist.
Aside from that the input stream received from HttpUrlConnection contains all body content (with chunks offset informations).
I have two questions:
Shouldn't the HttpUrlConnection handle chunked data?
How to get the data from input stream without chunked informations?

The HttpUrlConnection should be handling chunked data, you're correct. The fact that you're seeing these headers at all means they're probably being malformed somewhere, and something has already sent either a \n\n or \r\n\r\n, so the HttpUrlConnection views it as part of the actual transmission.
If you WANT to be getting the raw data, use a socket and connect to the url on the correct port (probably 80, 443 for ssl)
EDIT: java.net.URLConnection states under the connect() method
Interact with the resource; query header fields and contents.
This shows that a URLConnection, prior to reading anything in from any sort of provided reader, queries the header information. Pardon me for not including this the first time.

Related

can I send same HTTP request with same HttpURLConnection multiple times?

In order to send a string data to the server once, I do as below:
Make “HttpURLConnection” to my URL address and open it
Set the required headers
for the connection I Set setDoOutput to True
new a DataOutputStream from my connection and finally write my string data to it.
HttpURLConnection myConn = (HttpURLConnection);
myUrl.openConnection();
myConn.setRequestProperty("Accept", "application/json, text/plain, */*");
myConn.setDoOutput(true);
DataOutputStream my_output = new DataOutputStream(myConn.getOutputStream());
my_output.write(myData.getBytes("UTF-8"));
But what to do if I want to send exactly the same data with same URl and headers multiple times?
Can I write to it multiple times?(I mean that is it possible to use the last line of code multiple times?) Or should I repeat the above steps and try it with a new connection?
And if yes should I wait for some second or millisecond before sending the next one?
I also searched for some other alternatives such as “HttpClient” Http API and making synchronous Http request which as far as I got can help me setting the headers only once.
At the end, I appreciate your help and support and any other alternatives would be welcome.
Thanks a million.
I understand that the question has be answered in the comments, but I am leaving this here so that future viewers can see it.
An HTTP request contains 3 main parts:
Request Line: Method, Path, Protocol
Headers: Key-Pairs
Body: Data
Running my_output.write() will just add bytes to the body until my_output.flush() has been executed. Flushing the stream will write the data to the server.
Because HTTP requests are usually closed by the server once all data has been sent/received, whether or not you create a new connection or just add on to the body depends on your intentions. Typically, clients will create a new connection for each request because each response should be handled individually, rather than sending a repetitive body. This will vary though because some servers choose to hold a connection (such as WebSockets).
If you are open to external libraries, you may find this chart insightful:
AsyncHttpClient would be a good fit for your intentions.
Alternatively, you can use cURL by running a terminal command with Runtime.getRuntime().exec(). More information about using cURL with POST Requests can be found here. While cURL is efficient, you have to depend on the fact that your OS supports the command (though usually all devices that can run Java have this command).

HttpUrlConnection gets response body on connect()

Consider the following code.
try {
httpURLConnection = (HttpURLConnection) new URL(strings[0]).openConnection();
httpURLConnection.setConnectTimeout(Config.HTTP_CONNECTION_TIMEOUT);
httpURLConnection.setReadTimeout(Config.HTTP_CONNECTION_TIMEOUT);
httpURLConnection.connect();
responseCode = httpURLConnection.getResponseCode();
httpURLConnection.getHeaderFields();
}
finally {
httpURLConnection.disconnect();
}
The issue is even when I don't use the InputStream to read the response, in my Internet/Wifi connection logs I can see the response-body. What I want is simply to check a field in the header and based upon that field I will continue reading the InputStream.
My questions are these:
Is it correct behavior for the connected stream to automatically download all/partial file even before a BufferedInputStream is created and read from?
If yes, then is it possible to stop the file download until an InputStream is used to read the response?
If not then is there something I am doing wrong or missing?
The response includes both the header and the body, the server does not stop for the client to acknowledge the headers before sending the body.
At the time the client is able to read the response code from the headers, a part of the body has already been sent, the size of which depends on the network latency, buffering, ....
The current implementation of HttpURLConnection.getResponseCode() even use getInputStream() to ensure that the connection is in the correct state.
The client can choose to ignore the body, but it's usually not recommended, because it may prevent a persistent connection to be reused.
I am not sure about Android but since Java 6, a background thread is automatically used to read the remaining data.
If If-Modified-Since is not an option, why not use a HEAD request ? :
The HTTP HEAD method requests the headers that are returned if the
specified resource would be requested with an HTTP GET method. Such a
request can be done before deciding to download a large resource to
save bandwidth, for example.

Does HttpURLConnection support compression/decompression out of the box?

I'm using the HttpURLConnection to make some GET request and fetch pages. I'd like to request gzipped responses but I haven't found any information whether HttpURLConnection support Gzip.
Do i need to simply add the header Accept-Encoidng: gzip to the request or is there something else I need to do in order to handle gzipped responses?
No, the HttpURLConnection does not "handle" compression out of the box. It simply streams the request and response using HTTP. You will need to handle the response compression if it is utilized, which you can find out by checking the response header, for example
Content-Encoding: gzip
The encoding type may be something other than gzip, too. Like you mentioned, you need to set your request header, stating you support compression.

how to remove a header from URLConnection

I am talking to a file upload service that accepts post data, not form data. By default, java's HttpURLConnection sets the Content-Type header to application/x-www-form-urlencoded. this is obviously wrong if i'm posting pure data.
I (the client) don't know the content type. I don't want the Content-Type header set at all. the service has a feature where it will guess at the content type (based on the file name, reading some data from the file, etc).
How do I unset a header? There's no remove header, and setting it to null doesn't change the value and setting it to the empty string results in the header being set with no value.
I haven't tested this approach but you can try this:
Extend HttpURLConnection and try by overriding its getContentHandler() and setContentHandler(...) methods. Most probably this should work as, you will take a look at code of getContentHandler().
Use Apache HttpClient instead of URLConnection
Use fluent Request to generate your request
use removeHeader()
What do you mean "i don't want the Content-Type header to set at all"?
The browser (or other http client) sends your post request to the server, so it has to inform the server which way it encoded the parameters.
If the Content-Type header is not set, on the server side you (= your server) won't be able to understand how to parse the received data.
If you didn't set Content-Type, the default value will be used.
You browser (or other http client) MUST do two things:
Send key/value pairs.
Inform the server how the key/value pairs were encoded.
So, it is impossible to completely get rid of this header.
I just accomplished this by setting the header to null.
connection.setRequestProperty(MY_HEADER, null);

How can I screen a URL for files / responses of a certain type?

I have a web page with links pointing to downloadable files. For example:
http://www.mysite.com/download.php?FILE=downloads/programming/various/ebook.pdf
But it can also have navigation links as follows:
http://www.mysite.com/index.php
http://www.mysite.com/index.php?category=programming
http://www.mysite.com/index.php?section=programming&category=various
How can I determine if a URL is pointing to a file as in the first link ? Or inversely, filter out URLs which don't fit ?
Going with your edited question: if you want to filter out files,
screen the Content-Type header.
Here is an informal list of common mime-types
You can inspect response headers to determine if the response will conform, e.g. to an application/pdf But you cannot, just from the URL / URI itself, make this determination.
In fact, I could construct a web application that would respond to the URL http://myapp.com/test.pdf with header Content-Type: image/jpeg and data of a JPG.
Also, I could really break things by sending a header Content-Type: image/jpeg and data of for a PDF.
Presuming that it wasn't intentionally-broken (as I mentioned above) then you can rely on the response.
Be aware if the content itself deviates from the Content-Type header then you could have an exploit happen. This is how the iPhone was jailbroken: through acting on malformed PDF data.
Look for a file name-like parameter?
Any URL could respond with a file when requested.
You have no way of knowing what a URL will respond with until you request it.
In HTTP, URLs don't point to files, ever; they identify resources, for which you get a representation when your "dereference" that URL (i.e. make a GET request).
Whether the user-agent chooses to store that representation as a file is its own choice. What to do with a representation is guided by the content-type.
You may obtain the content-type using a HEAD request. PDF documents should be using application/pdf but there are a number of other types. Most browsers tend to save application/octet-stream as files, by default. (There are also subtleties about content-type negotiation.)
In Java, you could make a HEAD request using something like this:
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
// Check connection.getContentType();

Categories

Resources