Sync HTTP request (HttpURLConnection) with high concurrency?

Sync HTTP request (HttpURLConnection) with high concurrency? - java

I am using HttpURLConnection in my Java (Spring) app to send HTTP requests to external third-party servers. I need about 1000 http requests per second.
However, IMHO HttpURLConnection is synchronous, thus one thread can only do one http request, and only after that request is finished, this thread can do the next request. Therefore, this seems to be non-efficient, and I suspect this cannot even be handled (please correct me if I am wrong, e.g. this is actually very efficient).
I wonder whether there is a good way to handle these? IMHO I will use a thread pool (Executor) containing, say, 100 threads.
P.S. I cannot use any other libraries such as HttpClient since that SDK package is provided by third party :/
Thanks very much!

1.You are right about the request in one Thread.It is mentioned in the HttpURLConnection document
* Each HttpURLConnection instance is used to make a single request
* but the underlying network connection to the HTTP server may be
* transparently shared by other instances. Calling the close() methods
* on the InputStream or OutputStream of an HttpURLConnection
* after a request may free network resources associated with this
* instance but has no effect on any shared persistent connection.
* Calling the disconnect() method may close the underlying socket
* if a persistent connection is otherwise idle at that time.
*
That means that you could use openConnection to get a new HttpURLConnection instance,then do request and close that.The the underlying network connection to the HTTP server may be transparently shared by other instance.

Socket and SocketChannels might be a good option, although you'll have to 'roll your own' HTTP, which will be Very Not Easy if you have to deal with HTTPS. They are part of the standard JRE and can also be used asynchronously. You get into some hairy code when you go async because the Selector API is a bit difficult to work with but it would definitely be fast and have low overhead.
You might be able to use a custom SSLSocketFactory to jocky the socket so that you have direct access to the socket to get a SocketChannel from.

Related

HTTP requests without waiting for response

Is it possible to send an HTTP request without waiting for a response?
I'm working on an IoT project that requires logging of data from sensors. In every setup, there are many sensors, and one central coordinator (Will mostly be implemented with Raspberry Pi) which gathers data from the sensors and sends the data to the server via the internet.
This logging happens every second. Therefore, the sending of data should happen quickly so that the queue does not become too large. If the request doesn't wait for a response (like UDP), it would be much faster.
It is okay if few packets are dropped every now and then.
Also, please do tell me the best way to implement this. Preferably in Java.
The server side is implemented using PHP.
Thanks in advance!
EDIT:
The sensors are wireless, but the tech they use has very little (or no) latency in sending to the coordinator. This coordinator has to send the data over the internet. But, just assume the internet connection is bad. As this is going to be implemented in a remote part of India.

You are looking for an asynchronous HTTP library such as OkHttp. It allows to specify a Callback that is executed asynchronous (by a second thread).
Therefore your main thread continues execution.

You can set the TCP timeout for a GET request to less than a second, and keep retriggering the access in a thread. Use more threads for more devices.
Something like:
HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
con.setRequestMethod("GET");
con.setConnectTimeout(1000); //set timeout to 1 second
if (con.getResponseCode() == HttpURLConnection.HTTP_OK) {
...
}
Sleep the thread for the remainder of 1 second if the access is less than a second. You can consume the results on another thread if you add the results to thread-safe queues. Make sure to handle exceptions.
You can't use UDP with HTTP, HTTP is TCP only.

How to stop a URL connection upon thread interruption Java

I have a multithreaded program that visits URLs. The threads are run through an executor service, and when the user chooses to quit through the GUI, the program attempts to interrupt the threads by calling executor.shutdownNow(). However, it takes a very long time for the program to shut down because many of the threads are blocked in a url.openStream() call, and since this does not throw InterruptedException, so far I've been forced to just check before and after the call for Thread.currentThread().isInterrupted().
I'm wondering if there is a better way to interrupt a URL connection upon thread interruption? Otherwise, what would be the best approach to let the program shutdown as quickly as possible?
Note that I would prefer not to set a timeout on the connections because I would like all URLs to be visited while the program is still running.

If you look at the Javadocs for URLConnection, it gives you a hint on this: If you call getInputStream or getOutputStream on the URLConnection and then close either of these streams, it will close the connection. If you are stuck waiting on getInput/OutputStream call, then I don't think anything can be done. But if you have the stream, close it (it'll throw an IOException and release any threads waiting on the stream) and the connection is finished
FYI: I have found that this method of closing an InputStream either when you want it to time out or when you just want to stop processing is highly effective, much more so than using interrupt()

Use Apache HttpClient
The best thing is to use the Apache HttpClient instead of URLConnection. That's a way more advanced client than URLConnection, it's easier to use, and it's interruptible. The downside is that it's not included in all Java environments by default, so it can cause an additional dependency that you need to ship with your software. If that's of no concern, then go for it.
If you have to stick with URLConnection
Nowadays usually HTTP or HTTPS is used for URLs on the network, so your URLConnection might actually be of type HttpURLConnection or HttpsURLConnection respectively. If that's the case, you are lucky. The latter ones have a key advantage over URLConnection (see below). First, you need to cast to HttpURLConnection. You can do this safely when using HTTP or HTTPS in your URL.
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
HttpURLConnection has an additional method, disconnect(), which you can now use. Calling this method causes the getInputStream(), or any threads blocked in read() invocations on such a stream, to throw a SocketException: Socket closed:
conn.disconnect();
Caveat:
The javadoc gives a contrary idea for disconnect():
Calling the disconnect() method may close the underlying socket if a
persistent connection is otherwise idle at that time.
However, I've tested it in JDK1.8 under Ubuntu and on Android and it works nicely and in exactly the same way in both environments, so this sentence in the documentation is actually not true, and maybe it should not be true, because this is the only way to interrupt a URLConnection. It even works when both, the client and the server indicate that they can keep-alive the connection and even when there is no "otherwise idle connection".

I would use JAX-RS http client. It supports asynchronous requests. Forgive me using Kotlin, but you do it in Java the same way.
//compile 'org.glassfish.jersey.core:jersey-client:2.0.1'
import javax.ws.rs.client.ClientBuilder
val th = Thread {
val urlString = "http://long.running.page"
val cli = ClientBuilder.newClient()
val fut = cli.target(urlString).request().async().get()
println(fut.get())
}
th.start()
println("wait")
Thread.sleep(3000)
println("interrupt")
th.interrupt()
th.join()
After putting a comment to #DanielS answer, I also found examples of asynchronous clients in Apache library, using HttpAsyncClients. So seems like it supports interruption, but not with HttpClient class.

The easy answer is to use System.exit(0), which will force all threads to die.

How do Jetty and other containers leverage NIO while sticking to the Servlet specification?

I'm new to NIO, and I am trying to figure out how Jetty leverages NIO.
My understanding of how traditional servlet containers that use Blocking IO service a request is as follows:
A request arrives
A thread is allocated to process the request and the servlet method (doGet etc) is invoked
Servlet method is handed an InputStream and OutputStream
The servlet method reads from the InputStream and writes to the OutputStream
The InputStream and OutputStream are basically tied to the respective streams of the underlying Socket
What is different when an NIO connector is used? My guess is along the following lines:
A request arrives
Jetty uses NIO connector and buffers the entire request asynchronously
Once request has been read completely wrap the buffer in an InputStream
Create an empty response buffer (wrapped in an OutputStream)
Allocate a thread and invoke the servlet method (doGet etc) handing the above wrapper streams
Servlet method writes to the wrapped (buffered) response stream and returns from the servlet method
Jetty uses NIO to write the response buffer contents to the underlying SocketChannel
From the Jetty documentation, I found the following:
SelectChannelConnector - This connector uses efficient NIO buffers with a non-blocking threading model. Jetty uses Direct NIO buffers, and allocates threads only to connections with requests. Synchronization simulates blocking for the servlet API, and any unflushed content at the end of request handling is written asynchronously.
I'm not sure I understand what Synchronization simulates blocking for the servlet API means?

You don't have it exactly correct. When jetty uses an NIO connector (and 9 only supports NIO) it works as follows:
Idle state as a few threads (1-4 depending on # cores) calling the selector, looking for IO activity. This has been scaled to over 1,000,000 connections on Jetty.
When selector sees IO activity, it calls a handle method on the connection, which either:
something else has registered that it is blocked waiting for IO for this connection, so in that case the selector just wakes up whoever was blocked.
otherwise a thread is dispatched to handle the connection.
If a thread is dispatched, then it will attempt to read the connection and parse it. What happens now depends on if the connection is http, spdy, http2 or websocket.
for http, if the request headers are complete, the thread goes on to call the handling of the request (eventually this gets to the servlet) without waiting for any content.
for http2/spdy another dispatch is required, but see the discussion about Eat-What-You-Kill strategy on the list: http://dev.eclipse.org/mhonarc/lists/jetty-dev/msg02166.html
for websocket the message handling is called.
Once a thread is dispatched to a servlet, it looks to it like the servlet IO is blocking, but underneath the level of HttpInputStream and HttpOutputStream all the IO is async with callbacks. The blocking API uses a special blocking callback to achieve blocking. This means that if the servlet chooses to use async IO, then it is just bypassing the blocking callback and using the async API more or less directly.
A servlet can suspend using request.startAsync, in which case the dispatched thread is returned to the thread pool, but the associated connection is not marked as interested in IO. Async IO can be performed, but a AsyncContext event is need to either reallocate a thread or to re-enroll the connection for IO activity once the async cycle is complete.
This view is slightly complicated by http2 and spdy, which are multiplexed, so they can involve an extra dispatch.
Any HTTP framework that does not dispatch can go really really fast in benchmark code, but when faced with a real application that can do silly things like block on databases, files system, REST services etc... then lack of dispatch just means that one connection can hold up all the other connections on the system.
For some more info on how jetty handles async and dispatch see:
https://webtide.com/asynchronous-callbacks/
https://webtide.com/jetty-9-goes-fast-with-mechanical-sympathy/
https://webtide.com/avoiding-parallel-slowdown-in-jetty-9/
https://webtide.com/lies-damned-lies-and-benchmarks-2/
https://webtide.com/jetty-in-techempower-benchmarks/

Why are there streams in the HttpURLConnection API?

From what I understand about HTTP, it works like this: The client assembles a message, consisting of some header fields and (possibly) a body and sends it to the server. The server processes it, assembles its own response message and sends it back to the client.
And so I come to the question:
Why are there all of a sudden streams in HttpURLConnection?
This makes no sense to me. It makes it look like there is a continuous open channel. At what point does the message actually get sent to the server? On connect? On getInputStream? When trying to read from the stream? What if I have payload, does it get sent at a different time then? Can I do write-read-write-read with just a single connection?
I'm sure I just haven't understood it right yet, but right now it just seems like a bad API to me.
I would have more expected to see something like this:
HttpURLConnection http = url.openConnection();
HttpMessage req = new HttpMessage;
req.addHeader(...);
req.setBody(...);
http.post(req);
// Block until response is available (Future pattern)
HttpMessage res = http.getResponse();

IMHO HttpURLConnection has indeed a bad API. But handling the input and output message as streams is a way to deal efficiently with large amounts of data. I think all other answers (at this moment 5!) are correct. There are some questions open:
At what point does the message actually get sent to the server? On connect? On getInputStream? When trying to read from the stream?
There are some triggers when all collected data (e.g. headers, options for timeout, ...) is actually transmitted to the server. In most cases you don't have to call connect, this is done implicitly e.g. when calling getResponseCode() or getInputStream(). (BTW I recommend to call getResponseCode() before getInputStream() because if you get an error code (e.g. 404), getInputStream will throw an Exception and you should better call getErrorStream().)
What if I have payload, does it get sent at a different time then?
You have to call getOutputStream() and then send the payload. This should be done (obviously) after you added the headers. After closing the stream you can expect a response from the server.
Can I do write-read-write-read with just a single connection?
No. Technically this would be possible when using keep-alive. But HttpURLConnection handles this under the cover and you can only do one request-response roundtrip with an instance of this class.
Making life easier
If you don't want to fight with the horrible API of HttpURLConnection, you could have a look at some abstraction APIs listed on DavidWebb. When using DavidWebb, a typical request looks like this:
Webb webb = Webb.create();
String result = webb.post("http://my-server/path/resource")
.header("auth-token", myAuthToken)
.body(myBody)
.ensureSuccess()
.asString()
.getBody();

while the underlying transport does take place using individual packets, there's no guarantee that what you think about as a single http request/response will "fit" in a single http "packet". in turn, there's also no guarantee that a single http "packet" will fit in a single tcp packet, and so on.
imagine downloading a 20MB image using http. its a single http "response" but i guarantee there will be multiple packets going back and forth between the browser and the website serving it up.
every block is made up of possibly multiple smaller blocks, at each level, and since you might start processing the response before all the different bits of it have arrived, and you really dont want to concern yourself with how many of them there are, a stream is the common abstraction over this.

Here the Http protocol works on Connection-Oriented TCP connection. So internally, it creates a TCP connection. then send http request on that and receive the response back. then drop the TCP Connection. that is why two different streams are there.

Because streams are the generic way to push data between two places in Java, and that's what the HTTP connection does. HTTP works over TCP, which is a streamed connection so this API mimics that.
As for why it isn't abstracted further - consider that there is no size limits in HTTP requests. For example a file upload can be many MB or even GB in size.
Using a streamed API you can read data from a file or other source and stream it out over the connection at the same time without needing to load all that data into memory at once.

TCP is a byte stream. The body of an HTTP request or response is an arbitrary byte stream. Not sure what kind of API you were expecting, but when you have byte stream data you get a byte stream API.

A streaming response can be consumed on the fly not allocating up all the data in local memory, so it would be better from a memory point of view, for instance if you are to parse a huge json file doing this from stream and discards the raw data after it has been consumed. And in theory the parsing can begin as soon as the first byte has arrived.
And it is getInputStream that does the send/receive part as well as initiating the creation of the underlying socket

Java Check if HttpResponse is Still Alive

Is there a way from a java servlet to check if the httpresponse is still "alive?" For instance, in my situation I send an ajax request from the browser over to a servlet. In this case its a polling request so it may poll for up to 5 minutes, when the servlet is ready to respond with data i'd like to check if the user has closed the browser window, or moved to another page etc. In other words, check to see if sending the data to the response will actually do anything.

Generally, this problem can be solved by sending a dummy payload before the actual message.
If the socket was severed, an IOException or a SocketException or something similar is thrown (depending on the library). Technically, browsers are supposed to sever a connection whenever you navigate away from a page or close the browser (or anything similar), but I've found out that the implementation details can vary. Older versions of FF, for example, appropriately close a connection when navigating away from a page, but newer versions (especially when using AJAX) tend to leave connections open.
That's the main reason you may use a dummy packet before the actual message. Another important consideration is the timeout. I've done polling before and you either need to implement some sort of heartbeat to keep a connection alive or increase the server timeout (although keep in mind that some browsers may have timeouts as well - timeouts that you have no control over).
Instead of polling or pushing over AJAX, I strongly suggest trying to support (at least in part) a Websocket solution.

Java Servlet Response doesn't have any such method as it is based on request and response behavior. If you wish to check the status, then probably you need to work at lower level e.g. TCP/IP Sockets, which has several status check methods as below:
boolean isBound()
Returns the binding state of the socket.
boolean isClosed()
Returns the closed state of the socket.
boolean isConnected()
Returns the connection state of the socket.
boolean isInputShutdown()
Returns whether the read-half of the socket connection is closed.
boolean isOutputShutdown()
Returns whether the write-half of the socket connection is closed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.