Slow tcp connection causes my app to stall

Slow tcp connection causes my app to stall - java

I have found a class that handles TCP connections and I am using it to communicate with a gaming service. All was working fine until I realized that my application was stalling if the connection speed was slower. I have a thread polling let's say every 30 seconds.
I got the TCPClient class I use from this thread Java TCP sending first message, then infinite wait
This service requires 2 steps to verify a request. You first send a hash and you receive and acknowledge. Then you send the the actual request and you receive the response.
public byte[] getResponse(byte[] hash, byte[] request) throws Exception{
if(client == null || client.socket.isClosed() || !client.socket.isConnected()
|| client.socket.isInputShutdown() || client.socket.isOutputShutdown(){
client = new TCPClient(this.host, this.port);
}
client.SendToServer(hash);
byte[] ack = client.ReceiveFromServer();
if(checkAck(ack, getAckForRequest(request))){
client.SendToServer(request);
byte[] response = client.ReceiveFromServer();
return response;
}
}
My code looks something like this. I simplified it a bit to make it more readable.
I am using this function inside a try/catch block and when it throws an exception I store the request in a MySQL database.
Is there a way to avoid blocking my main thread if the connection is slow and do the same stuff?

Is there a way to avoid blocking my main thread if the connection is slow and do the same stuff?
Yes. One can call setSoTimeout() on a Socket.
The Oracle documentation:
https://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#setSoTimeout-int-
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the Socket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout.
If you just want to close the connection and give up it works well. If you want to resume the action later you have to keep track of the bytes already read which means just having more threads is usually an easier option.

Related

Why am i seeing lots of sockets in CLOSE_WAIT status when webservice stops working?

My java webservice running on Jetty falls over after a period of a few hours and investigation indicate many sockets in CLOSE_WAIT status. Whilst it is working ok there seems to be no sockets in CLOSE_WAIT status but when it goes wrong there are loads.
I found this definition
CLOSE-WAIT: The local end-point has received a connection termination request and acknowledged it e.g. a passive close has been performed and the local end-point needs to perform an active close to leave this state.
With netstat on my server I see a list of tcp sockets in CLOSE_WAIT status, the local address is my server and the foreign address my load balancer machine. So I assume this means the client (load balancer) has just terminated the connection at its end in some improper way, and my server has not properly closed the connection at its end.
But how do I do that, my Java code doesn't deal with low level sockets ?
Or is the load-balancer terminating connection because of an earlier problem caused by something my server is doing wrong in the code.

Sounds like a bug in Jetty or JVM, maybe this workaround will work for you:
http://www.tux.hk/index.php?entry=entry090521-111844
Add the following lines to /etc/sysctl.conf
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 1800
And then execute
sysctl -p
or do a reboot

I suspect this could be something causing a long or infinite loop/infinite wait in your server code, and Jetty simply never gets a chance to close the connection (unless there's some sort of timeout that forcibly closes the socket after a certain period). Consider the following example:
public class TestSocketClosedWaitState
{
private static class SocketResponder implements Runnable
{
private final Socket socket;
//Using static variable to control the infinite/waiting loop for testing purposes, with while(true) Eclipse would complain of dead code in writer.close() -line
private static boolean infinite = true;
public SocketResponder(Socket socket)
{
this.socket = socket;
}
#Override
public void run()
{
try
{
PrintWriter writer = new PrintWriter(socket.getOutputStream());
writer.write("Hello");
//Simulating slow response/getting stuck in an infinite loop/waiting something that never happens etc.
do
{
Thread.sleep(5000);
}
while(infinite);
writer.close(); //The socket will stay in CLOSE_WAIT from server side until this line is reached
}
catch(Exception e)
{
e.printStackTrace();
}
System.out.println("DONE");
}
}
public static void main(String[] args) throws IOException
{
ServerSocket serverSocket = new ServerSocket(12345);
while(true)
{
Socket socket = serverSocket.accept();
Thread t = new Thread(new SocketResponder(socket));
t.start();
}
}
}
With the infinite-variable set to true, the Printwriter (and underlying socket) never gets closed due to infinite loop. If I run this and connect to the socket with telnet, then quit the telnet-client, netstat will show the server side-socket still in CLOSE_WAIT -state (I could also see the client-side socket in FIN_WAIT2-state for a while, but it'll disappear):
~$ netstat -anp | grep 12345
tcp6 0 0 :::12345 :::* LISTEN 6460/java
tcp6 1 0 ::1:12345 ::1:34606 CLOSE_WAIT 6460/java
The server-side accepted socket gets stuck in the CLOSE_WAIT -state. If I check the thread stacks for the process, I can see the thread waiting inside the do...while -loop:
~$ jstack 6460
<OTHER THREADS>
"Thread-0" prio=10 tid=0x00007f424013d800 nid=0x194f waiting on condition [0x00007f423c50e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at TestSocketClosedWaitState$SocketResponder.run(TestSocketClosedWaitState.java:32)
at java.lang.Thread.run(Thread.java:701)
<OTHER THREADS...>
If I set the infinite-variable to false, and do the same (connect client & disconnect), the socket with CLOSE_WAIT -state will show until the writer is closed (closing the underlying socket), and then disappears. If the writer or socket is never closed, the server-side socket will again get stuck in CLOSED_WAIT, even if the thread terminates (I don't think this should occur in Jetty, if your method returns at some point, Jetty probably should take care of closing the socket).
So, steps I'd suggest you to try and find the culprit are
Add logging to your methods to see where there are going/what they are doing
Check your code, are there any places where the execution could get stuck in an infinite loop or take a really long while, preventing the underlying socket from being closed?
If it still occurs, take a thread dump from the running Jetty-process with jstack the next time this problem occurs and try to identify any "stuck" threads
Is there a chance something might throw something (OutOfMemoryError or such) that might not get caught by the underlying Jetty-architecture calling your method? I've never peeked inside Jetty's internals, it could very well be catching Throwables, so this is probably not the issue, but maybe worth checking if all else fails
You could also name the threads when they enter and exit your methods with something like
String originalName = Thread.currentThread().getName();
Thread.currentThread().setName("myMethod");
//Your code...
Thread.currentThread().setName(originalName);
to spot them easier if there are a lot of threads running.

We have the same problem in our project. I'm not sure that this is your case, but maybe it will be helpful.
The reason was that a huge number of requests was handled by business logic with synchronized block. So when the client sent packets to drop connection, the thread bound to this socket was busy, waiting for monitor.
The logs show exceptions for org.eclipse.jetty.io.WriteFlusher at write method:
DEBUG org.eclipse.jetty.io.WriteFlusher - write - write exception
org.eclipse.jetty.io.EofException: null
at org.eclipse.jetty.io.ChannelEndPoint.flush
(ChannelEndPoint.java:192) ~[jetty-io-9.2.10.v20150310.jar:9.2.10.v20150310]
and for org.eclipse.jetty.server.HttpOutput at close method. I think that exception at close step is the reason of sockets' CLOSE_WAIT state:
DEBUG org.eclipse.jetty.server.HttpOutput - close -
org.eclipse.jetty.io.EofException: null
at org.eclipse.jetty.server.HttpConnection$SendCallback.reset
(HttpConnection.java:622) ~[jetty-server-9.2.10.v20150310.jar:9.2.10.v20150310]
The fast solution in our case was to increase idleTimeout. The right solution (again in our case) is code refactoring.
So my advice is to carefully read Jetty's logs with DEBUG level to find exceptions and analyze application performance with VisualVM. Maybe the reason is performance bottleneck (synchronized blocks?).

I faced a similar problem, while the culprit code may differ, the symptoms were
1) Server (Jetty) was running yet not processing request
2) There was not extra ordinary load/exceptions
3) Too many CLOSE_WAIT connections were there.
These suggested that all the worker threads in the server are stuck somewhere. Jstack Thread dump showed that all our worker threads were stuck in apache HttpClient object. (because of unclosed response objects), and since all the threads were waiting infinitely, none were available to process the incoming request.

Is the load balancer still up? Try stopping the load balancer and see if this is the issue not the server.

This probably means you're not cleaning up your incoming connections. Make sure sockets are getting closed at the end of each transaction. (Best done in a finally block near the beginning of your server code so that connections get closed even if server side exceptions occur.)

ObjectInputStream's readObject() freezes after Client Socket connection is killed

I have following Socket server's code that reads stream from connected Socket.
try
{
ObjectInputStream in = new ObjectInputStream(client.getInputStream());
int count = 10;
while(count>0)
{
String msg = in.readObject().toString(); //Stucks here if this client is lost.
System.out.println("Client Says : "+msg);
count--;
}
in.close();
client.close();
}
catch(Exception ex)
{
ex.printStackTrace();
}
And I have a Client program, that connects with this server, sends some string every second for 10 times, and server reads from the socket for 10 times and prints the message, but if in between I kill the Client program, the Server freezes in between instead of throwing any exception or anything.
How can I detect this freeze condition? and make this loop iterate infinitely and print whatever client sends until connection is active and stable?

The problem is that the server side of the socket has no way of knowing that the client connection closed because the client code terminates without calling .close() on the client side of the socket, and therefore never sends the TCP FIN signal.
One possible way of fixing this would be to create a new Watcher thread that just periodically inspects the socket to see if it is still active. The problem with that approach is that the isConnected() on the Socket will not work for the same reason stated above so the only real way to inspect the connection is to attempt to write to it. However, this may cause random garbage to be sent to a potentially listening client.
Other options would be to implement some type of keep-alive protocol that the client should agree to (i.e., send keep-alive bits every so often so the Watcher has something to look for). You could also just move to the java.nio approach, which I believe does a better job at dealing with these conditions.
This thread is old, but provides more detail: http://www.velocityreviews.com/forums/t541628-sockets-checking-for-dropped-connections-and-close.html.

Client thread hang emulation blocks server from accepting any I/O for the time client is set to wait

As the topic suggests I have a server and some clients.
The server accepts I/O connections concurrently (no queueing in socket connections) but I have this troubling issue and I do not know how to bypass it!
If I force a client to throw an I/O Exception the server detects it and terminates the client thread correctly (verified from Task Manager (Windows) and System Monitor (Ubuntu) ). But If I emulate an I/O that is "hanging" like i.e. Thread.sleep(60*1000);or
private static Object lock = new Object();
synchronized(lock) {
while (true) {
try {
lock.wait();
} catch (InterruptedException e) {
/* Foo */
}
}
}
then all subsequent I/O operations (connection & data transfer) seem to block or wait until the "hanging" client is terminated. The applications makes use of the ExecutorService so if the "hanging" client does not complete the operations in the suggested time limit then the task will time out and the client is forced to exit. The subsequent "blocked" I/Os will resume but I wonder why the server doesn't accept any I/O connections or performs any I/O operations when a client "hangs"?
NOTE:The client threading takes place in the server main like this:
while (true) {
accept client connection;
submit client task;
||
\ /
\/
// ExecutorService here in the form
// spService.submit(new Callable<Tuple<String[], BigDecimal[]>>() {
// ... code ... }}).get(taskTimeout, taskTimeUnit);
check task result & perform cleanup if result is null;
otherwise continue;
}

The Problem :
This may very well indicate that your server ACCEPTS client connections concurrently, however, it only handles these connections synchronously. That means that even if a million clients connect, successfully, at any given time, if anyone of them takes a long time (or hangs), it will hold up the others.
The TEST:
To verify this : I would toggle the amount of time a client takes to connect by adding Thread.sleep statments(1000) in your clients.
Expected result :
I believe you will see that even adding a single Thread.sleep(1000) statement in your client delays all other connecting clients by 1000.

I think I have found the source of my problems!
I do use one thread-per-client model but I run my tests locally i.e. in the same machine which means all of them have the same IP! So each client is assigned the same IP with the server! I guess that this leaves server and clients to differ only in port number but since each client is mapped to a different localport for each server connection then the server shouldn't block. I have confirmed that each client and server use different I/Os (compared references) and I wrap their sockets using <Input/Output>Streams to BufferedReaders & PrintWriters but still when a client hangs all other clients hang too (so maybe the I/O channels are indeed the same???)!I will test this on another machine and check the results back with you! :)
EDIT: Confirmed the erratic behaviour. It seems that even with remote clients if one hangs the other clients seem to hang too! :/
Don't know but I am determined to fix this. It's just that it's pretty weird since I am pretty sure I use one thread-per-client (I/Os differ, client sockets differ, IPs seem to be not a problem, I even map each client in the server to a localport of my choice ...)
May be I'll switch to NIO if I don't find a solution soon enough.
SOLUTION: Solved the problem! It seemed that the ExecutorService had to be run in a seperate thread otherwise if an I/O in a client blocked, all I/Os would block! That's strange given the fact that I've tried both an Executors.newFixedThreadPool(<nThreads>); and Executors.newCachedThreadPool(); and the client actions (aka I/Os) should take place in a new Thread for each client.
In any case, I used a method and wrapped the calls so each client instace would use a final ExecutorService baseWorker = Executors.newSingleThreadExecutor(); and created a new Thread explicitly each time using <Thread instance>.start(); so each thread would run in the background :)

Java Sockets and Dropped Connections

What's the most appropriate way to detect if a socket has been dropped or not? Or whether a packet did actually get sent?
I have a library for sending Apple Push Notifications to iPhones through the Apple gatways (available on GitHub). Clients need to open a socket and send a binary representation of each message; but unfortunately Apple doesn't return any acknowledgement whatsoever. The connection can be reused to send multiple messages as well. I'm using the simple Java Socket connections. The relevant code is:
Socket socket = socket(); // returns an reused open socket, or a new one
socket.getOutputStream().write(m.marshall());
socket.getOutputStream().flush();
logger.debug("Message \"{}\" sent", m);
In some cases, if a connection is dropped while a message is sent or right before; Socket.getOutputStream().write() finishes successfully though. I expect it's due to the TCP window isn't exhausted yet.
Is there a way that I can tell for sure whether a packet actually got in the network or not? I experimented with the following two solutions:
Insert an additional socket.getInputStream().read() operation with a 250ms timeout. This forces a read operation that fails when the connection was dropped, but hangs otherwise for 250ms.
set the TCP sending buffer size (e.g. Socket.setSendBufferSize()) to the message binary size.
Both of the methods work, but they significantly degrade the quality of the service; throughput goes from a 100 messages/second to about 10 messages/second at most.
Any suggestions?
UPDATE:
Challenged by multiple answers questioning the possibility of the described. I constructed "unit" tests of the behavior I'm describing. Check out the unit cases at Gist 273786.
Both unit tests have two threads, a server and a client. The server closes while the client is sending data without an IOException thrown anyway. Here is the main method:
public static void main(String[] args) throws Throwable {
final int PORT = 8005;
final int FIRST_BUF_SIZE = 5;
final Throwable[] errors = new Throwable[1];
final Semaphore serverClosing = new Semaphore(0);
final Semaphore messageFlushed = new Semaphore(0);
class ServerThread extends Thread {
public void run() {
try {
ServerSocket ssocket = new ServerSocket(PORT);
Socket socket = ssocket.accept();
InputStream s = socket.getInputStream();
s.read(new byte[FIRST_BUF_SIZE]);
messageFlushed.acquire();
socket.close();
ssocket.close();
System.out.println("Closed socket");
serverClosing.release();
} catch (Throwable e) {
errors[0] = e;
}
}
}
class ClientThread extends Thread {
public void run() {
try {
Socket socket = new Socket("localhost", PORT);
OutputStream st = socket.getOutputStream();
st.write(new byte[FIRST_BUF_SIZE]);
st.flush();
messageFlushed.release();
serverClosing.acquire(1);
System.out.println("writing new packets");
// sending more packets while server already
// closed connection
st.write(32);
st.flush();
st.close();
System.out.println("Sent");
} catch (Throwable e) {
errors[0] = e;
}
}
}
Thread thread1 = new ServerThread();
Thread thread2 = new ClientThread();
thread1.start();
thread2.start();
thread1.join();
thread2.join();
if (errors[0] != null)
throw errors[0];
System.out.println("Run without any errors");
}
[Incidentally, I also have a concurrency testing library, that makes the setup a bit better and clearer. Checkout the sample at gist as well].
When run I get the following output:
Closed socket
writing new packets
Finished writing
Run without any errors

This not be of much help to you, but technically both of your proposed solutions are incorrect. OutputStream.flush() and whatever else API calls you can think of are not going to do what you need.
The only portable and reliable way to determine if a packet has been received by the peer is to wait for a confirmation from the peer. This confirmation can either be an actual response, or a graceful socket shutdown. End of story - there really is no other way, and this not Java specific - it is fundamental network programming.
If this is not a persistent connection - that is, if you just send something and then close the connection - the way you do it is you catch all IOExceptions (any of them indicate an error) and you perform a graceful socket shutdown:
1. socket.shutdownOutput();
2. wait for inputStream.read() to return -1, indicating the peer has also shutdown its socket

After much trouble with dropped connections, I moved my code to use the enhanced format, which pretty much means you change your package to look like this:
This way Apple will not drop a connection if an error happens, but will write a feedback code to the socket.

If you're sending information using the TCP/IP protocol to apple you have to be receiving acknowledgements. However you stated:
Apple doesn't return any
acknowledgement whatsoever
What do you mean by this? TCP/IP guarantees delivery therefore receiver MUST acknowledge receipt. It does not guarantee when the delivery will take place, however.
If you send notification to Apple and you break your connection before receiving the ACK there is no way to tell whether you were successful or not so you simply must send it again. If pushing the same information twice is a problem or not handled properly by the device then there is a problem. The solution is to fix the device handling of the duplicate push notification: there's nothing you can do on the pushing side.
#Comment Clarification/Question
Ok. The first part of what you understand is your answer to the second part. Only the packets that have received ACKS have been sent and received properly. I'm sure we could think of some very complicated scheme of keeping track of each individual packet ourselves, but TCP is suppose to abstract this layer away and handle it for you. On your end you simply have to deal with the multitude of failures that could occur (in Java if any of these occur an exception is raised). If there is no exception the data you just tried to send is sent guaranteed by the TCP/IP protocol.
Is there a situation where data is seemingly "sent" but not guaranteed to be received where no exception is raised? The answer should be no.
#Examples
Nice examples, this clarifies things quite a bit. I would have thought an error would be thrown. In the example posted an error is thrown on the second write, but not the first. This is interesting behavior... and I wasn't able to find much information explaining why it behaves like this. It does however explain why we must develop our own application level protocols to verify delivery.
Looks like you are correct that without a protocol for confirmation their is no guarantee the Apple device will receive the notification. Apple also only queue's the last message. Looking a little bit at the service I was able to determine this service is more for convenience for the customer, but cannot be used to guarantee service and must be combined with other methods. I read this from the following source.
http://blog.boxedice.com/2009/07/10/how-to-build-an-apple-push-notification-provider-server-tutorial/
Seems like the answer is no on whether or not you can tell for sure. You may be able to use a packet sniffer like Wireshark to tell if it was sent, but this still won't guarantee it was received and sent to the device due to the nature of the service.

java/groovy socket write timeout

I have a simple badly behaved server (written in Groovy)
ServerSocket ss = new ServerSocket(8889);
Socket s = ss.accept()
Thread.sleep(1000000)
And a client who I want to have timeout (since the server is not consuming it's input)
Socket s = new Socket("192.168.0.106", 8889)
s.setSoTimeout(100);
s.getOutputStream.write( new byte[1000000] );
However, this client blocks forever. How do I get the client to timeout?
THANKS!!

You could spawn the client in it's own thread and spin lock/wait(timeout long) on it to return. Possibly using a Future object to get the return value if the Socket is successful.
I do believe that the SO_TIMEOUT setting for a Socket only effects the read(..) calls from the socket, not the write.
You might try using a SocketChannel (rather then Stream) and spawn another thread that also has a handle to that Channel. The other thread can asynchronously close that channel after a certain timeout of it is blocked.

The socket timeout is at the TCP level, not at the application level. The source machine TCP is buffering the data to be sent and the target machine network stack is acknowledging the data received, so there's no timeout. Also, different TCP/IP implementations handle these timeouts differently. Take a look at what's going on on the wire with tcpdump (or wireshark if you are so unfortunate :) What you need is application level ACK, i.e. you need to define the protocol between client and the server. I can't comment on Java packages (you probably want to look at nio), but receive timeout on that ACK would usually be handled with poll/select.

There is no way to get the timeout, but you can always spawn a thread that closes the connection if the write hasn't finished.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.