Does HttpURLConnection support compression/decompression out of the box? - java

I'm using the HttpURLConnection to make some GET request and fetch pages. I'd like to request gzipped responses but I haven't found any information whether HttpURLConnection support Gzip.
Do i need to simply add the header Accept-Encoidng: gzip to the request or is there something else I need to do in order to handle gzipped responses?

No, the HttpURLConnection does not "handle" compression out of the box. It simply streams the request and response using HTTP. You will need to handle the response compression if it is utilized, which you can find out by checking the response header, for example
Content-Encoding: gzip
The encoding type may be something other than gzip, too. Like you mentioned, you need to set your request header, stating you support compression.

Related

HttpUrlConnection reading chunked response

I'm working on a project where i have to use HttpUrlConnection (Android~) for reading the input stream.
It turns out that when i'm reading the input stream the data is malformed and has a bigger size from the original content (which is sent by the server). Now, the server response header contains both "Content-Length" and "Transfer-Encoding: chunked", which from what i know is an issue as both of them shouldn't coexist.
Aside from that the input stream received from HttpUrlConnection contains all body content (with chunks offset informations).
I have two questions:
Shouldn't the HttpUrlConnection handle chunked data?
How to get the data from input stream without chunked informations?
The HttpUrlConnection should be handling chunked data, you're correct. The fact that you're seeing these headers at all means they're probably being malformed somewhere, and something has already sent either a \n\n or \r\n\r\n, so the HttpUrlConnection views it as part of the actual transmission.
If you WANT to be getting the raw data, use a socket and connect to the url on the correct port (probably 80, 443 for ssl)
EDIT: java.net.URLConnection states under the connect() method
Interact with the resource; query header fields and contents.
This shows that a URLConnection, prior to reading anything in from any sort of provided reader, queries the header information. Pardon me for not including this the first time.

how to remove a header from URLConnection

I am talking to a file upload service that accepts post data, not form data. By default, java's HttpURLConnection sets the Content-Type header to application/x-www-form-urlencoded. this is obviously wrong if i'm posting pure data.
I (the client) don't know the content type. I don't want the Content-Type header set at all. the service has a feature where it will guess at the content type (based on the file name, reading some data from the file, etc).
How do I unset a header? There's no remove header, and setting it to null doesn't change the value and setting it to the empty string results in the header being set with no value.
I haven't tested this approach but you can try this:
Extend HttpURLConnection and try by overriding its getContentHandler() and setContentHandler(...) methods. Most probably this should work as, you will take a look at code of getContentHandler().
Use Apache HttpClient instead of URLConnection
Use fluent Request to generate your request
use removeHeader()
What do you mean "i don't want the Content-Type header to set at all"?
The browser (or other http client) sends your post request to the server, so it has to inform the server which way it encoded the parameters.
If the Content-Type header is not set, on the server side you (= your server) won't be able to understand how to parse the received data.
If you didn't set Content-Type, the default value will be used.
You browser (or other http client) MUST do two things:
Send key/value pairs.
Inform the server how the key/value pairs were encoded.
So, it is impossible to completely get rid of this header.
I just accomplished this by setting the header to null.
connection.setRequestProperty(MY_HEADER, null);

How can I screen a URL for files / responses of a certain type?

I have a web page with links pointing to downloadable files. For example:
http://www.mysite.com/download.php?FILE=downloads/programming/various/ebook.pdf
But it can also have navigation links as follows:
http://www.mysite.com/index.php
http://www.mysite.com/index.php?category=programming
http://www.mysite.com/index.php?section=programming&category=various
How can I determine if a URL is pointing to a file as in the first link ? Or inversely, filter out URLs which don't fit ?
Going with your edited question: if you want to filter out files,
screen the Content-Type header.
Here is an informal list of common mime-types
You can inspect response headers to determine if the response will conform, e.g. to an application/pdf But you cannot, just from the URL / URI itself, make this determination.
In fact, I could construct a web application that would respond to the URL http://myapp.com/test.pdf with header Content-Type: image/jpeg and data of a JPG.
Also, I could really break things by sending a header Content-Type: image/jpeg and data of for a PDF.
Presuming that it wasn't intentionally-broken (as I mentioned above) then you can rely on the response.
Be aware if the content itself deviates from the Content-Type header then you could have an exploit happen. This is how the iPhone was jailbroken: through acting on malformed PDF data.
Look for a file name-like parameter?
Any URL could respond with a file when requested.
You have no way of knowing what a URL will respond with until you request it.
In HTTP, URLs don't point to files, ever; they identify resources, for which you get a representation when your "dereference" that URL (i.e. make a GET request).
Whether the user-agent chooses to store that representation as a file is its own choice. What to do with a representation is guided by the content-type.
You may obtain the content-type using a HEAD request. PDF documents should be using application/pdf but there are a number of other types. Most browsers tend to save application/octet-stream as files, by default. (There are also subtleties about content-type negotiation.)
In Java, you could make a HEAD request using something like this:
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
// Check connection.getContentType();

send back compressed JSON in Restlet/GAE

I am writing a Restlet application on GAE similar as described here:
First Application
I am sending back a JSON represntation of an entity, and this works. But I am so far unsuccessful in sending the response compressed.
I tried to add to request an accept-encoding header with "gzip". but that didn't help. Here is how i tested it:
URL url = new URL(address);
URLConnection urlConn = url.openConnection();
urlConn.setRequestProperty("Accept-Encoding", "gzip");
InputStream openStream = urlConn.getInputStream();
Any ideas would be very much appreciated!
I believe you also need to specify the User-Agent header to force the compression. From the docs:
https://developers.google.com/appengine/docs/python/runtime#Responses
If the client sends HTTP headers with the request indicating that the
client can accept compressed (gzipped) content, App Engine compresses
the response data automatically and attaches the appropriate response
headers. It uses both the Accept-Encoding and User-Agent request
headers to determine if the client can reliably receive compressed
responses. Custom clients can force content to be compressed by
specifying both Accept-Encoding and User-Agent headers with a value of
"gzip".

Check if a URL's mimetype is not a web page

I want to check if a URL's mimetype is not a webpage. Can I do this in Java? I want to check if the file is a rar or mp3 or mp4 or mpeg or whatever, just not a webpage.
You can issue an HTTP HEAD request and check for Content-Type response headers. You can use the HttpURLConnection.setRequestMethod("HEAD") before you issue the request. Then issue the request with URLConnection.connect() and then use URLConnection.getContentType() which reads the HTTP headers.
The bonus of using a HEAD request is that the actual resource is never transmitted/generated. You can also use a GET request and inspect the resulting stream using URLConnection.guessContentTypeFromStream() which will inspect the actual bytes and try to guess what the stream represents. I think that it looks for magic numbers or other patterns in the stream.
There's nothing inherent in a URL which will tell you what you will receive when you request it. You have to actually request the resource, and then inspect the content-type header. At that point, it's still not clear what you should do - some content types will (almost) always be handled by the browser, e.g. text/html. Some types should be handled by a browser, e.g. application/xhtml+xml. Some types may be handled by the browser, e.g. application/pdf.
Which, if any, of these you consider to be "webpage" is still not clear - you'll need to decide for yourself.
You can inspect the content-type header once you're requested the resource, using, for example, the HttpURLConnection class.
content-type:text/html represents webpage.

Categories

Resources