How to stop a URL connection upon thread interruption Java - java

I have a multithreaded program that visits URLs. The threads are run through an executor service, and when the user chooses to quit through the GUI, the program attempts to interrupt the threads by calling executor.shutdownNow(). However, it takes a very long time for the program to shut down because many of the threads are blocked in a url.openStream() call, and since this does not throw InterruptedException, so far I've been forced to just check before and after the call for Thread.currentThread().isInterrupted().
I'm wondering if there is a better way to interrupt a URL connection upon thread interruption? Otherwise, what would be the best approach to let the program shutdown as quickly as possible?
Note that I would prefer not to set a timeout on the connections because I would like all URLs to be visited while the program is still running.

If you look at the Javadocs for URLConnection, it gives you a hint on this: If you call getInputStream or getOutputStream on the URLConnection and then close either of these streams, it will close the connection. If you are stuck waiting on getInput/OutputStream call, then I don't think anything can be done. But if you have the stream, close it (it'll throw an IOException and release any threads waiting on the stream) and the connection is finished
FYI: I have found that this method of closing an InputStream either when you want it to time out or when you just want to stop processing is highly effective, much more so than using interrupt()

Use Apache HttpClient
The best thing is to use the Apache HttpClient instead of URLConnection. That's a way more advanced client than URLConnection, it's easier to use, and it's interruptible. The downside is that it's not included in all Java environments by default, so it can cause an additional dependency that you need to ship with your software. If that's of no concern, then go for it.
If you have to stick with URLConnection
Nowadays usually HTTP or HTTPS is used for URLs on the network, so your URLConnection might actually be of type HttpURLConnection or HttpsURLConnection respectively. If that's the case, you are lucky. The latter ones have a key advantage over URLConnection (see below). First, you need to cast to HttpURLConnection. You can do this safely when using HTTP or HTTPS in your URL.
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
HttpURLConnection has an additional method, disconnect(), which you can now use. Calling this method causes the getInputStream(), or any threads blocked in read() invocations on such a stream, to throw a SocketException: Socket closed:
conn.disconnect();
Caveat:
The javadoc gives a contrary idea for disconnect():
Calling the disconnect() method may close the underlying socket if a
persistent connection is otherwise idle at that time.
However, I've tested it in JDK1.8 under Ubuntu and on Android and it works nicely and in exactly the same way in both environments, so this sentence in the documentation is actually not true, and maybe it should not be true, because this is the only way to interrupt a URLConnection. It even works when both, the client and the server indicate that they can keep-alive the connection and even when there is no "otherwise idle connection".

I would use JAX-RS http client. It supports asynchronous requests. Forgive me using Kotlin, but you do it in Java the same way.
//compile 'org.glassfish.jersey.core:jersey-client:2.0.1'
import javax.ws.rs.client.ClientBuilder
val th = Thread {
val urlString = "http://long.running.page"
val cli = ClientBuilder.newClient()
val fut = cli.target(urlString).request().async().get()
println(fut.get())
}
th.start()
println("wait")
Thread.sleep(3000)
println("interrupt")
th.interrupt()
th.join()
After putting a comment to #DanielS answer, I also found examples of asynchronous clients in Apache library, using HttpAsyncClients. So seems like it supports interruption, but not with HttpClient class.

The easy answer is to use System.exit(0), which will force all threads to die.

Related

Sync HTTP request (HttpURLConnection) with high concurrency?

I am using HttpURLConnection in my Java (Spring) app to send HTTP requests to external third-party servers. I need about 1000 http requests per second.
However, IMHO HttpURLConnection is synchronous, thus one thread can only do one http request, and only after that request is finished, this thread can do the next request. Therefore, this seems to be non-efficient, and I suspect this cannot even be handled (please correct me if I am wrong, e.g. this is actually very efficient).
I wonder whether there is a good way to handle these? IMHO I will use a thread pool (Executor) containing, say, 100 threads.
P.S. I cannot use any other libraries such as HttpClient since that SDK package is provided by third party :/
Thanks very much!
1.You are right about the request in one Thread.It is mentioned in the HttpURLConnection document
* Each HttpURLConnection instance is used to make a single request
* but the underlying network connection to the HTTP server may be
* transparently shared by other instances. Calling the close() methods
* on the InputStream or OutputStream of an HttpURLConnection
* after a request may free network resources associated with this
* instance but has no effect on any shared persistent connection.
* Calling the disconnect() method may close the underlying socket
* if a persistent connection is otherwise idle at that time.
*
That means that you could use openConnection to get a new HttpURLConnection instance,then do request and close that.The the underlying network connection to the HTTP server may be transparently shared by other instance.
Socket and SocketChannels might be a good option, although you'll have to 'roll your own' HTTP, which will be Very Not Easy if you have to deal with HTTPS. They are part of the standard JRE and can also be used asynchronously. You get into some hairy code when you go async because the Selector API is a bit difficult to work with but it would definitely be fast and have low overhead.
You might be able to use a custom SSLSocketFactory to jocky the socket so that you have direct access to the socket to get a SocketChannel from.

how to cancel XmlRpcClient.execute before timeout (java)

I'm programming in java and am using XML-RPC to submit data from a client to a server. My problem is that when I XmlRpcClient.execute code but whenever I have a connection error, the application gets stuck until I eventually get a Timeout exception (which I want). I placed this whole process in a new thread and wanted the ability to stop/cancel the process if I didn't want to wait for the timeout.
I learned how to stop Threads but idk if I can interrupt the XmlRpcClient.execute code.
any ideas?
The default execute method is, by nature, synchronous, that is, blocking.
If you are using Jakarta Commons HttpClient, you could set the socket timeout to a shorter value (the default is 0 meaning no timeout) with the transport's setConnectionTimeout method.
I believe, though, that the proper handling would be to use the executeAsync method and providing a callback to it in order to continue.

Apache HttpComponents code causing thread to block

I am currently running a program that will download the source code from a website using Apache HttpComponents. I will be downloading a lot (10,000s) and so am using multiple threads to do this.
Sometimes all threads die (join) and sometimes they don't. Through debugging I have determined that the line
CloseableHttpResponse response = httpClient.execute(httpget,context);
is the problem. Does anybody know how I can set a timeout for the this line, or why this line is blocking thread execution?
There can be various reasons for threads getting stuck in an i/o operation, incorrect timeout settings being the most likely cause. One can set desired timeout values using RequestConfig class. However if all threads get blocked at once inside #execute method connection leak (connection pool depletion) would be more likely. Make sure that you always close CloseableHttpResponse instances even if do not care about the response or its content. You can find out more details about request execution by turning on wire / context logging as described in the logging guide
I use the following timeout settings in HttpConnectionParams in my code (HttpParams are given to the HttpClient constructor):
org.apache.http.params.HttpConnectionParams.setConnectionTimeout(HttpParams, int)
org.apache.http.params.HttpConnectionParams.setSoTimeout(HttpParams, int)
A problem which I discovered when connecting to the same host with multiple threads, that blocking/timeouts occur when the maxPerRoute setting is lower than the number of threads. Have a look at PoolingClientConnectionManager:
org.apache.http.impl.conn.PoolingClientConnectionManager.setDefaultMaxPerRoute(int)

Java Check if HttpResponse is Still Alive

Is there a way from a java servlet to check if the httpresponse is still "alive?" For instance, in my situation I send an ajax request from the browser over to a servlet. In this case its a polling request so it may poll for up to 5 minutes, when the servlet is ready to respond with data i'd like to check if the user has closed the browser window, or moved to another page etc. In other words, check to see if sending the data to the response will actually do anything.
Generally, this problem can be solved by sending a dummy payload before the actual message.
If the socket was severed, an IOException or a SocketException or something similar is thrown (depending on the library). Technically, browsers are supposed to sever a connection whenever you navigate away from a page or close the browser (or anything similar), but I've found out that the implementation details can vary. Older versions of FF, for example, appropriately close a connection when navigating away from a page, but newer versions (especially when using AJAX) tend to leave connections open.
That's the main reason you may use a dummy packet before the actual message. Another important consideration is the timeout. I've done polling before and you either need to implement some sort of heartbeat to keep a connection alive or increase the server timeout (although keep in mind that some browsers may have timeouts as well - timeouts that you have no control over).
Instead of polling or pushing over AJAX, I strongly suggest trying to support (at least in part) a Websocket solution.
Java Servlet Response doesn't have any such method as it is based on request and response behavior. If you wish to check the status, then probably you need to work at lower level e.g. TCP/IP Sockets, which has several status check methods as below:
boolean isBound()
Returns the binding state of the socket.
boolean isClosed()
Returns the closed state of the socket.
boolean isConnected()
Returns the connection state of the socket.
boolean isInputShutdown()
Returns whether the read-half of the socket connection is closed.
boolean isOutputShutdown()
Returns whether the write-half of the socket connection is closed.

HttpURLConnections ignore timeouts and never return

We are getting some unexpected results randomly from some servers when trying to open an InputStream from an HttpURLConnection. It seems like those servers would accept the connection and reply with a "stay-alive" header which will keep the Socket open but doesn't allow data to be sent back to the stream.
That scenario makes an attempt for a multi-threaded crawler a little "complicated", because if some connection gets stuck, the thread running it would never return... denying the completion of it's pool which derives in the controller thinking that some threads are still working.
Is there some way to read the connection response header to identify that "stay-alive" answer and avoid trying to open the stream??
I'm not sure what I'm missing here but it seems to me you simply need getHeaderField()?
Did you try setting "read time out", in addition to "connect time out"?
See http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLConnection.html#setReadTimeout%28int%29

Categories

Resources