HttpURLConnection: What's the deal with having to read the whole response?

HttpURLConnection: What's the deal with having to read the whole response? - java

My current problem is very similar to this one.
I have a downloadFile(URL) function that creates a new HttpURLConnection, opens it, reads it, returns the results. When I call this function on the same URL multiple times, the second time around it almost always returns a response code of -1 (But throws no exception!!!).
The top answer in that question is very helpful, but there are a few things I'm trying to understand.
So, if setting http.keepAlive to false solves the problem, it indicates what exactly? That the server is responding in a way that violates the http protocol? Or more likely, my code is violating the protocol in some way? What will the trace tell me? What should I look for?
And what's the deal with this:
You need to read everything from error
stream. Otherwise, it's going to
confuse next connection and that's the
cause of -1.
Does this mean if the response is some type of error (which would be what response code(s)?), the stream HAS to be fully read? Also, every time I am attempting an http request I am basically creating a new connection, and then disconnect()ing it at the end.
However, in my case I'm not getting a 401 or whatever. It's always a 200. But my second connection almost always fails. Does this mean there's some other data I should be reading that I'm not (in a similar manner that the error stream must be fully read)?
Please help shed some light on this? I feel like there's some fundamental http protocol understanding I'm missing.
PS If I were just using the Apache HttpClient, would I not have to deal with all these protocol details? Does it take care of everything for me?

The support for keep-alive in the default HTTP URL handler is very buggy. We always turn it off.
Use Apache HttpClient with a pooled connection manager if you want keep-alive. If you don't want change your code, you can get another handler like this one,
http://www.innovation.ch/java/HTTPClient/
If your second connection always fails, that means your server doesn't support keepalive. With Keepalive, the HTTP handler simply leaves connection open (even if you call disconnect). The server closes connection if keep-alive is not supported but the handler doesn't know till you make next request on the connection so the 2nd connection fails.
Regarding the read error stream, it only applies if you get non-200 responses.

i think you're probably talking about this HttpURLConnection bug, fixed in froyo:
http://code.google.com/p/android/issues/detail?id=2939
see that bug for other workarounds. if this isn't the bug you've hit, please raise a bug with a repeatable test case at http://code.google.com/p/android/issues/entry.

Related

limit size of requests with com.sun.net.httpserver.HttpExchange

abusive users may attempt to send really large requests to my httpserver so while a "maxRequestSize" config would've been useful, I have yet to find a way of dealing with this. I also thought there might be a timeout option somewhere but couldn't find anything like that either.
httpExchange.getRequestBody() returns an InputStream but based on my research there's no way to determine the length of an InputStream without first processing it

Append Custom ErrorCode in Mulesoft

For most of the errors in Mulesoft there is no error code defined. If it doesn't know, Mule flatly prints MULE_ERROR--2. Instead of this I want to put in my own error code which will be fetched from DB and include it in the exception payload. After this, the exception payload should be sent to a handler flow for re-submission based on error code. Hence in error handling part of the flow I need to have more than one component.
Tried Custom Exception Strategy, Catch Exception Strategy, Using Java component and flow-refs but none of them worked.
Also, I built a dummy code for this (without fetching the error code) to put my own custom error msg and what I noticed is, it throws the same error twice, once by default for the first time and again when I put my error msg and throw the error. To suppress this I put
<AsyncLogger name="org.mule.exception.CatchMessagingExceptionStrategy" level="FATAL"/>
in log4j2.xml.
Will this cause any issues?

You can always define your own Errors and customise it. Follow below link for further information,
http://blogs.mulesoft.com/dev/api-dev/api-best-practices-response-handling/
Below is the content for the same :
Use HTTP Status Codes
One of the most commonly misused HTTP Status Codes is 200 – ok or the request was successful. Surprisingly, you’ll find that a lot of APIs use 200 when creating an object (status code 201), or even when the response fails:
invalid200
In the above case, if the developer is solely relying on the status code to see if the request was successful, the program will continue on not realizing that the request failed, and that it did something wrong. This is especially important if there are dependencies within the program on that record existing. Instead, the correct status code to use would have been 400 to indicate a “Bad Request.”
By using the correct status codes, developers can quickly see what is happening with the application and do a “quick check” for errors without having to rely on the body’s response.
You can find a full list of status codes in the HTTP/1.1 RFC, but just for a quick reference, here are some of the most commonly used Status Codes for RESTful APIs:
200 Ok
201 Created
304 Not Modified
400 Bad Request
401 Not Authorized
403 Forbidden
404 Page/ Resource Not Found
405 Method Not Allowed
415 Unsupported Media Type
500 Internal Server Error
Of course, if you feel like being really creative, you can always take advantage of status code:
418 I’m a Teapot
It’s important to note that Twitter’s famed 420 status code – Enhance Your Calm, is not really a standardized response, and you should probably just stick to status code 429 for too many requests instead.
Use Descriptive Error Messages
Again, status codes help developers quickly identify the result of their call, allowing for quick success and failure checks. But in the event of a failure, it’s also important to make sure the developer understands WHY the call failed. This is especially crucial to the initial integration of your API (remember, the easier your API is to integrate, the more likely people are to use it), as well as general maintenance when bugs or other issues come up.
You’ll want your error body to be well formed, and descriptive. This means telling the developer what happened, why it happened, and most importantly – how to fix it. You should avoid using generic or non-descriptive error messages such as:
redx Your request could not be completed
redx An error occurred
redx Invalid request
Generic error messages are one of the biggest hinderances to API integration as developers may struggle for hours trying to figure out why the call is failing, even misinterpreting the intent of the error message altogether. And eventually, if they can’t figure it out, they may stop trying altogether.
For example, I struggled for about 30 minutes with one API trying to figure out why I was getting a “This call is not allowed” error response. After repeatedly reformatting my request and trying different approaches, I finally called support (in an extremely frustrated mood) only to find out it was referring to my access token, which just so happened to be one letter off due to my inability to copy and paste such things.
Just the same, an “Invalid Access Token” response would have saved me a ton of hassle, and from feeling like a complete idiot while on the line with support. It would have also saved them valuable time working on real bugs, instead of trying to troubleshoot the most basic of steps (btw – whenever I get an error the key and token are the first things I check now).
Here are some more examples of descriptive error messages:
greencheckmark Your API Key is Invalid, Generate a Valid API Key at http://…
greencheckmark A User ID is required for this action. Read more at http://…
greencheckmark Your JSON was not properly formed. See example JSON here: http://…
But you can go even further, remember- you’ll want to tell the developer what happened, why it happened, and how to fix it. One of the best ways to do that is by responding with a standardized error format that returns a code (for support reference), the description of what happened, and a link to the appropriate documentation so that they can learn more/ fix it:
{
"error" : {
"code" : "e3526",
"message" : "Missing UserID",
"description" : "A UserID is required to edit a user.",
"link" : "http://docs.mysite.com/errors/e3526/"
}
}
On a support and development side, by doing this you can also track the hits to these pages to see what areas tend to be more troublesome for your users – allowing you to provide even better documentation/ build a better API.

what does cache means in POST and GET

I have seen that that one of the main difference between POST and GET is that POST is not cached but GET is cached.
Could you explain me what do you mean about "cache"?
Also, if I use POST or GET server sends me response. Is there any difference? In all of cases, I have request data and response, is not it?
Thanks

To Cache (in the context of HTTP) means to store a page/response either on the client or some intermediate host - perhaps in a content distribution network. When the client requests a page, then the page can be served from the client's cache (if the client requested it before) or the intermediate host. This is faster and requires fewer resources than getting the page from the server that generated it.
One downside is that if the request changes some state on the server, that change won't happen if the page is served from a cache. This is why POST requests are usually not served from a cache.
Another downside to caching is that the cached copy may be out of date. The HTTP caching mechanisms try to prevent this.

The basic idea behind the GET and POST methods is that a GET message only retrieves information but never changes the state of the server. (Hence the name). As a result, just about any caching system will assume that you can remember the last GET response returned, and that the next one will look the same.
A POST on the other hand is a request that sends new information to the server. So not only can these not be cached (because there's no guaruantuee that the next POST won't modify things even more; think +1 like buttons for example) but they actually have to invalidate parts of the cache because they might modify pages.
As a result, your browser for example will warn you when you try to refresh a page to which you POSTed information, because you might make changes you did not want made by doing so. When GETting a page, it will not do so because you cannot change anything on the site by doing so.
(Or rather; it's your job as a programmer to make sure that nothing changes when GETting a page.)

GET is supposed to return the same result from the server and not change things at the server side and hence idempotent.
Whereas POST means it can modify something at the server(make an entry in db, delete something etc) and hence not idempotent.
And with regards to caching the data in GET has been addressed here in a nice manner.
http://www.ebaytechblog.com/2012/08/20/caching-http-post-requests-and-responses/#.VGy9ovmUeeQ

Most efficient java way to test 300,000+ URLs [duplicate]

This question already has answers here:
Preferred Java way to ping an HTTP URL for availability
(6 answers)
Closed 9 years ago.
I'm trying to find the most efficient way to test 300,000+ URLs in a database to basically check if the URLs are still valid.
Having looked around the site I've found many excellent answers and am now using something along the lines of:
Read URL from file....
Test URL:
final URL url = new URL("http://" + address);
final HttpURLConnection urlConn = (HttpURLConnection) url.openConnection();
urlConn.setConnectTimeout(1000 * 10);
urlConn.connect();
urlConn.getResponseCode(); // Do something with the code
urlConn.disconnect();
Write details back to file....
So a couple of questions:
1) Is there a more efficient way to test URLs and get response codes?
2) Initially I am able to test about 50 URLs per minute, but after 5 or so minutes things really slow down - I imagine there is some resources I'm not releasing but am not sure what
3) Certain URLs (e.g. www.bhs.org.au) will cause the above to hang for minutes (not good when I have so many URLs to test) even with the connect timeout set, is there anyway I can tighten this up?
Thanks in advance for any help, it's been a quite a few years since I've written any code and I'm starting again from scratch :-)

By far the fastest way to do this would be to use java.nio to open a regular TCP connection to your target host on port 80. Then, simply send it a minimal HTTP request and process the result yourself.
The main advantage of this is that you can have a pool of 10 or 100 or even 1000 connections open and loading at the same time rather than having to do them one after the other. With this, for example, it won't matter much if one server (www.bhs.org.au) takes several minutes to respond. It'll simply hog one of your many connections in the pool, but others will keep running.
You could also achieve that same thing with a little more overhead but a lot less complex coding by using a Thread Pool to run many HttpURLConnections (the way you are doing it now) in parallel in multiple threads.

This may or may not help, but you might want to change your request method to HEAD instead of using the default, which is GET:
urlConn.setRequestMethod("HEAD");
This tells the server that you do not really need a response back, other than the response code.
The article What Is a HTTP HEAD Request Good for describes some uses for HEAD, including link verification:
[Head] asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.... This can be used for example for creating a faster link verification service.

Apache get file date without downloading

I'm currently writing some update polling stuff. I try to avoid writing even a simple REST-Interface for this (we're using much REST, still I'm not sure this is necessary here. Why write an interface for functionality already there?)
My idea was to open an HttpUrlConnection and check headers for file's last modified date. Apache obviously sends "Last-Modified" date in UTC. After checking the header I'd close the connection without actually retrieving the body. I only fear that this might bring up errors in Apache log, which would be quite inconvenient. I just wanted to ask for you opinion. Do you think this might work? Better ideas?
(I need system proxy support, so my only option seems to be HttpUrlConnection.)
Regards,
Stev

If you look at the HTTP protocol, you'll see that it has a HEAD request which does just what you need. The default for HTTP requests in the Java runtime is GET and it's not really easy to change that.
Have a look at HttpClient for a framework which allows you to send any kind of request.

You are almost right but your task is even simpler that what you are explaining. There is special HTTP method named HEAD. You just have to create the same request you need to retrieve your data but use HEAD instead of GET

This sounds pretty much, what the HEAD method in HTTP is for.
Citing from Wikipedia:
HEAD
Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.

Try just sending an http HEAD request. See here: http://blog.mostof.it/what-is-a-http-head-request-good-for-some-uses/
http://www.grumet.net/weblog/archives/http-head-example.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HttpURLConnection: What's the deal with having to read the whole response? - java

Related

limit size of requests with com.sun.net.httpserver.HttpExchange

Append Custom ErrorCode in Mulesoft

what does cache means in POST and GET

Most efficient java way to test 300,000+ URLs [duplicate]

Apache get file date without downloading

Categories

Resources