Apache HttpComponents code causing thread to block - java

I am currently running a program that will download the source code from a website using Apache HttpComponents. I will be downloading a lot (10,000s) and so am using multiple threads to do this.
Sometimes all threads die (join) and sometimes they don't. Through debugging I have determined that the line
CloseableHttpResponse response = httpClient.execute(httpget,context);
is the problem. Does anybody know how I can set a timeout for the this line, or why this line is blocking thread execution?

There can be various reasons for threads getting stuck in an i/o operation, incorrect timeout settings being the most likely cause. One can set desired timeout values using RequestConfig class. However if all threads get blocked at once inside #execute method connection leak (connection pool depletion) would be more likely. Make sure that you always close CloseableHttpResponse instances even if do not care about the response or its content. You can find out more details about request execution by turning on wire / context logging as described in the logging guide

I use the following timeout settings in HttpConnectionParams in my code (HttpParams are given to the HttpClient constructor):
org.apache.http.params.HttpConnectionParams.setConnectionTimeout(HttpParams, int)
org.apache.http.params.HttpConnectionParams.setSoTimeout(HttpParams, int)
A problem which I discovered when connecting to the same host with multiple threads, that blocking/timeouts occur when the maxPerRoute setting is lower than the number of threads. Have a look at PoolingClientConnectionManager:
org.apache.http.impl.conn.PoolingClientConnectionManager.setDefaultMaxPerRoute(int)

Related

Using ThreadPoolExecutor and connection pool without random blocking method call

I've been using StackOverFlow for a long time now and always found existing answers, but this time I couldn't find any information about what I'm trying to do.
Using java, I have a process composed of about 10 different tasks that gather distinct data from the database using pure jdbc (no ejb/jpa here). Each task (callable) can actually be run concurrently and is responsible for obtaining a connection, which is what we are doing. However we're randomly experiencing trouble with the connection pool (accessed via jndi), sometimes we're blocked because the connection pool doesn't have any available connection.
To solve this problem, I thought we could change the way we're obtaining the connections, instead of letting each callable opening and closing a connection ( following the number of tasks to execute and the number of threads to use in the ThreadPoolExecutor), I would like to create some kind of local connections pool dedicated to this process, so that we're sure nothing will block later (eventually if we can't acquire all the requested connections, we would then adapt the number of threads to launch with a minimum of 1)
My colleagues approve this idea, but what surprises me is that I can't found any similar approaches or discussion on the web (maybe I'm not using the right keywords).
I would like to know what you think about this idea, whether you already tried something similar or if I'm missing something important.
In advance, thank you.
You have not mentioned which connection pool is used. If it is not HikariCP and you are allowed to switch, having contributed there I recommend it.
HikariCP seems rather interesting finally, i'll have to check this further. But this isn't directly related to the question :)
Just a little return of experience, my idea is working, with one caveat, I couldn't get rid of one downcast from a runnable to my implementation on which I can do .setConnection() during the before() of my ExecutorService. And all tasks must have been given to the executor with the execute() method, otherwise the runnable is autolatically wrapped in a FutureTask without the ability to access the inner runnable. Maybe one of you know of to do this correctly ?

How to stop a URL connection upon thread interruption Java

I have a multithreaded program that visits URLs. The threads are run through an executor service, and when the user chooses to quit through the GUI, the program attempts to interrupt the threads by calling executor.shutdownNow(). However, it takes a very long time for the program to shut down because many of the threads are blocked in a url.openStream() call, and since this does not throw InterruptedException, so far I've been forced to just check before and after the call for Thread.currentThread().isInterrupted().
I'm wondering if there is a better way to interrupt a URL connection upon thread interruption? Otherwise, what would be the best approach to let the program shutdown as quickly as possible?
Note that I would prefer not to set a timeout on the connections because I would like all URLs to be visited while the program is still running.
If you look at the Javadocs for URLConnection, it gives you a hint on this: If you call getInputStream or getOutputStream on the URLConnection and then close either of these streams, it will close the connection. If you are stuck waiting on getInput/OutputStream call, then I don't think anything can be done. But if you have the stream, close it (it'll throw an IOException and release any threads waiting on the stream) and the connection is finished
FYI: I have found that this method of closing an InputStream either when you want it to time out or when you just want to stop processing is highly effective, much more so than using interrupt()
Use Apache HttpClient
The best thing is to use the Apache HttpClient instead of URLConnection. That's a way more advanced client than URLConnection, it's easier to use, and it's interruptible. The downside is that it's not included in all Java environments by default, so it can cause an additional dependency that you need to ship with your software. If that's of no concern, then go for it.
If you have to stick with URLConnection
Nowadays usually HTTP or HTTPS is used for URLs on the network, so your URLConnection might actually be of type HttpURLConnection or HttpsURLConnection respectively. If that's the case, you are lucky. The latter ones have a key advantage over URLConnection (see below). First, you need to cast to HttpURLConnection. You can do this safely when using HTTP or HTTPS in your URL.
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
HttpURLConnection has an additional method, disconnect(), which you can now use. Calling this method causes the getInputStream(), or any threads blocked in read() invocations on such a stream, to throw a SocketException: Socket closed:
conn.disconnect();
Caveat:
The javadoc gives a contrary idea for disconnect():
Calling the disconnect() method may close the underlying socket if a
persistent connection is otherwise idle at that time.
However, I've tested it in JDK1.8 under Ubuntu and on Android and it works nicely and in exactly the same way in both environments, so this sentence in the documentation is actually not true, and maybe it should not be true, because this is the only way to interrupt a URLConnection. It even works when both, the client and the server indicate that they can keep-alive the connection and even when there is no "otherwise idle connection".
I would use JAX-RS http client. It supports asynchronous requests. Forgive me using Kotlin, but you do it in Java the same way.
//compile 'org.glassfish.jersey.core:jersey-client:2.0.1'
import javax.ws.rs.client.ClientBuilder
val th = Thread {
val urlString = "http://long.running.page"
val cli = ClientBuilder.newClient()
val fut = cli.target(urlString).request().async().get()
println(fut.get())
}
th.start()
println("wait")
Thread.sleep(3000)
println("interrupt")
th.interrupt()
th.join()
After putting a comment to #DanielS answer, I also found examples of asynchronous clients in Apache library, using HttpAsyncClients. So seems like it supports interruption, but not with HttpClient class.
The easy answer is to use System.exit(0), which will force all threads to die.

how to cancel XmlRpcClient.execute before timeout (java)

I'm programming in java and am using XML-RPC to submit data from a client to a server. My problem is that when I XmlRpcClient.execute code but whenever I have a connection error, the application gets stuck until I eventually get a Timeout exception (which I want). I placed this whole process in a new thread and wanted the ability to stop/cancel the process if I didn't want to wait for the timeout.
I learned how to stop Threads but idk if I can interrupt the XmlRpcClient.execute code.
any ideas?
The default execute method is, by nature, synchronous, that is, blocking.
If you are using Jakarta Commons HttpClient, you could set the socket timeout to a shorter value (the default is 0 meaning no timeout) with the transport's setConnectionTimeout method.
I believe, though, that the proper handling would be to use the executeAsync method and providing a callback to it in order to continue.

Tomcat websocket and java

Hi guys am getting following error am using Websocket and Tomcat8.
java.lang.IllegalStateException: The remote endpoint was in state [TEXT_FULL_WRITING] which is an invalid state for called method
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1092)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.textStart(WsRemoteEndpointImplBase.java:1055)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendString(WsRemoteEndpointImplBase.java:186)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:37)
at com.iri.monitor.webSocket.IRIMonitorSocketServlet.broadcastData(IRIMonitorSocketServlet.java:369)
at com.iri.monitor.webSocket.IRIMonitorSocketServlet.access$0(IRIMonitorSocketServlet.java:356)
at com.iri.monitor.webSocket.IRIMonitorSocketServlet$5.run(IRIMonitorSocketServlet.java:279)
You are trying to write to a websocket that is not in a ready state. The websocket is currently in writing mode and you are trying to write another message to that websocket which raises an error. Using an async write or as not such good practice a sleep can prevent this from happening. This error is also normally raised when a websocket program is not thread safe.
Neither async or sleep can help.
The key problem is the send-method can not be called concurrently.
So it's just about concurrency, you can use locks or some other thing. Here is how I handle it.
In fact, I write a actor to wrap the socketSession. It will produce an event when the send-method is called. Each actor will be registered in an Looper which contains a work thread and an event queue. Meanwhile the work thread keeps sending message.
So, I will use the sync-send method inside, the actor model will make sure about the concurrency.
The key problem now is about the number of Looper. You know, you can't make neither too much or too few threads. But you can still estimate a number by your business cases, and keep adjusting it.
it is actually not a concurrency issue, you will have the same error in a single-threaded environment. It is about asynchronous calls that must not overlap.
You should use session.get**Basic**Remote().sendText instead of session.get**Async**Remote().sendText() to avoid this problem. Should not be an issue as long as the amount of data you are writing stays reasonable small.

HttpURLConnections ignore timeouts and never return

We are getting some unexpected results randomly from some servers when trying to open an InputStream from an HttpURLConnection. It seems like those servers would accept the connection and reply with a "stay-alive" header which will keep the Socket open but doesn't allow data to be sent back to the stream.
That scenario makes an attempt for a multi-threaded crawler a little "complicated", because if some connection gets stuck, the thread running it would never return... denying the completion of it's pool which derives in the controller thinking that some threads are still working.
Is there some way to read the connection response header to identify that "stay-alive" answer and avoid trying to open the stream??
I'm not sure what I'm missing here but it seems to me you simply need getHeaderField()?
Did you try setting "read time out", in addition to "connect time out"?
See http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLConnection.html#setReadTimeout%28int%29

Categories

Resources