What's the most appropriate way to detect if a socket has been dropped or not? Or whether a packet did actually get sent?
I have a library for sending Apple Push Notifications to iPhones through the Apple gatways (available on GitHub). Clients need to open a socket and send a binary representation of each message; but unfortunately Apple doesn't return any acknowledgement whatsoever. The connection can be reused to send multiple messages as well. I'm using the simple Java Socket connections. The relevant code is:
Socket socket = socket(); // returns an reused open socket, or a new one
socket.getOutputStream().write(m.marshall());
socket.getOutputStream().flush();
logger.debug("Message \"{}\" sent", m);
In some cases, if a connection is dropped while a message is sent or right before; Socket.getOutputStream().write() finishes successfully though. I expect it's due to the TCP window isn't exhausted yet.
Is there a way that I can tell for sure whether a packet actually got in the network or not? I experimented with the following two solutions:
Insert an additional socket.getInputStream().read() operation with a 250ms timeout. This forces a read operation that fails when the connection was dropped, but hangs otherwise for 250ms.
set the TCP sending buffer size (e.g. Socket.setSendBufferSize()) to the message binary size.
Both of the methods work, but they significantly degrade the quality of the service; throughput goes from a 100 messages/second to about 10 messages/second at most.
Any suggestions?
UPDATE:
Challenged by multiple answers questioning the possibility of the described. I constructed "unit" tests of the behavior I'm describing. Check out the unit cases at Gist 273786.
Both unit tests have two threads, a server and a client. The server closes while the client is sending data without an IOException thrown anyway. Here is the main method:
public static void main(String[] args) throws Throwable {
final int PORT = 8005;
final int FIRST_BUF_SIZE = 5;
final Throwable[] errors = new Throwable[1];
final Semaphore serverClosing = new Semaphore(0);
final Semaphore messageFlushed = new Semaphore(0);
class ServerThread extends Thread {
public void run() {
try {
ServerSocket ssocket = new ServerSocket(PORT);
Socket socket = ssocket.accept();
InputStream s = socket.getInputStream();
s.read(new byte[FIRST_BUF_SIZE]);
messageFlushed.acquire();
socket.close();
ssocket.close();
System.out.println("Closed socket");
serverClosing.release();
} catch (Throwable e) {
errors[0] = e;
}
}
}
class ClientThread extends Thread {
public void run() {
try {
Socket socket = new Socket("localhost", PORT);
OutputStream st = socket.getOutputStream();
st.write(new byte[FIRST_BUF_SIZE]);
st.flush();
messageFlushed.release();
serverClosing.acquire(1);
System.out.println("writing new packets");
// sending more packets while server already
// closed connection
st.write(32);
st.flush();
st.close();
System.out.println("Sent");
} catch (Throwable e) {
errors[0] = e;
}
}
}
Thread thread1 = new ServerThread();
Thread thread2 = new ClientThread();
thread1.start();
thread2.start();
thread1.join();
thread2.join();
if (errors[0] != null)
throw errors[0];
System.out.println("Run without any errors");
}
[Incidentally, I also have a concurrency testing library, that makes the setup a bit better and clearer. Checkout the sample at gist as well].
When run I get the following output:
Closed socket
writing new packets
Finished writing
Run without any errors
This not be of much help to you, but technically both of your proposed solutions are incorrect. OutputStream.flush() and whatever else API calls you can think of are not going to do what you need.
The only portable and reliable way to determine if a packet has been received by the peer is to wait for a confirmation from the peer. This confirmation can either be an actual response, or a graceful socket shutdown. End of story - there really is no other way, and this not Java specific - it is fundamental network programming.
If this is not a persistent connection - that is, if you just send something and then close the connection - the way you do it is you catch all IOExceptions (any of them indicate an error) and you perform a graceful socket shutdown:
1. socket.shutdownOutput();
2. wait for inputStream.read() to return -1, indicating the peer has also shutdown its socket
After much trouble with dropped connections, I moved my code to use the enhanced format, which pretty much means you change your package to look like this:
This way Apple will not drop a connection if an error happens, but will write a feedback code to the socket.
If you're sending information using the TCP/IP protocol to apple you have to be receiving acknowledgements. However you stated:
Apple doesn't return any
acknowledgement whatsoever
What do you mean by this? TCP/IP guarantees delivery therefore receiver MUST acknowledge receipt. It does not guarantee when the delivery will take place, however.
If you send notification to Apple and you break your connection before receiving the ACK there is no way to tell whether you were successful or not so you simply must send it again. If pushing the same information twice is a problem or not handled properly by the device then there is a problem. The solution is to fix the device handling of the duplicate push notification: there's nothing you can do on the pushing side.
#Comment Clarification/Question
Ok. The first part of what you understand is your answer to the second part. Only the packets that have received ACKS have been sent and received properly. I'm sure we could think of some very complicated scheme of keeping track of each individual packet ourselves, but TCP is suppose to abstract this layer away and handle it for you. On your end you simply have to deal with the multitude of failures that could occur (in Java if any of these occur an exception is raised). If there is no exception the data you just tried to send is sent guaranteed by the TCP/IP protocol.
Is there a situation where data is seemingly "sent" but not guaranteed to be received where no exception is raised? The answer should be no.
#Examples
Nice examples, this clarifies things quite a bit. I would have thought an error would be thrown. In the example posted an error is thrown on the second write, but not the first. This is interesting behavior... and I wasn't able to find much information explaining why it behaves like this. It does however explain why we must develop our own application level protocols to verify delivery.
Looks like you are correct that without a protocol for confirmation their is no guarantee the Apple device will receive the notification. Apple also only queue's the last message. Looking a little bit at the service I was able to determine this service is more for convenience for the customer, but cannot be used to guarantee service and must be combined with other methods. I read this from the following source.
http://blog.boxedice.com/2009/07/10/how-to-build-an-apple-push-notification-provider-server-tutorial/
Seems like the answer is no on whether or not you can tell for sure. You may be able to use a packet sniffer like Wireshark to tell if it was sent, but this still won't guarantee it was received and sent to the device due to the nature of the service.
Related
I am using java.net.DatagramSocket to send UDP packets to a statsd server from a Google App Engine servlet. This generally works; however, we periodically see the following exception:
IOException - Socket is closed: Unknown socket_descriptor..
When these IOExceptions occur, calling DatagramSocket.isClosed() returns false.
This issue happens frequently enough that it is concerning, and although I've put in place some workarounds (allocate a new socket and use a DeferredTask queue to retry), it would be good to understand the underlaying reason for these errors.
The Google docs mention, "Sockets may be reclaimed after 2 minutes of inactivity; any socket operation keeps the socket alive for a further 2 minutes." It is unclear to me how this would play into UDP datagrams; however, one suspicion I have is that this is related to GAE instance lifecycle in some way.
My code (sanitized and extracted) looks like:
DatagramSocket _socket;
void init() {
_socket = new DatagramSocket();
}
void send() {
DatagramPacket packet = new DatagramPacket(<BYTES>, <LENGTH>, <HOST>, <PORT>);
_socket.send(packet);
}
Appreciate any feedback on this!
The approach taken to workaround this issue was simply to manage a single static DatagramSocket instance with a couple of helper methods, getSocket() and releaseSocket() to release sockets throwing IOExceptions through the release method, and then allocate upon next access through the get method. Not shown in this code is retry logic to retry the failed socket.send(). Under load testing, this seems to work reliably.
try {
DatagramPacket packet = new DatagramPacket(<BYTES>, <LENGTH>, <HOST>, <PORT>);
getSocket().send(packet);
} catch (IOException ioe) {
releaseSocket();
}
I'm using this program to test a PULL socket with ROUTER. I create/bind a ROUTER, then connect a PULL socket with an identity to it; the ROUTER then sends a message addressed specifically for the client using its identity (basic zeromq enveloping)
Test Program
public static void main(String[] o){
ZContext routerCtx = new ZContext();
Socket rtr = routerCtx.createSocket( ZMQ.ROUTER);
rtr.setRouterMandatory(true);
rtr.bind("tcp://*:5500");
ZContext clientCtx = new ZContext();
Socket client1 = clientCtx.createSocket( ZMQ.PULL);
client1.setIdentity("client1".getBytes());
client1.connect("tcp://localhost:5500");
try{
//Thread.currentThread().sleep(2000);
rtr.sendMore("client1");
rtr.sendMore("");
rtr.send("Hello!");
System.out.println( client1.recvStr());
System.out.println("Client Received: " + client1.recvStr());
}catch(Exception e1){
System.out.println( "Could not send to client1: " + e1.getMessage());
}
routerCtx.destroy();
clientCtx.destroy();
}
Results
The expected result is to print Client Received: Hello!", but instead the ROUTER throws an exception consistent with unaddressable message; I'm using setRouterMandatory(true) to throw that exception under such circumstances, however, the client explicitly sets an identity and the server sends to that identity, so I don't understand why the exception is raised.
Temporary Fix
If I add a slight delay by uncommenting Thread.currentThread().sleep(2000);, the message is delivered successfully, but I despise using sleeps and waits, it creates messy and brittle code, but more importantly, doesn't answer the "why?"
Questions
Why is this happening? It was my understanding that "late joining" applied only to PUB/SUB sockets.
Is PULL with ROUTER an invalid socket combination? I'm using it for a chat program, and aside from this issue, it works great.
Why is this happening?
You have a race condition. The client1.connect call starts the connection process, but there is no guarantee the actual connection is established when you call rtr.sendMore("client1");. Your sleep() workaround pretty much proves this.
Changing PULL to DEALER is a step in the right direction, because DEALER can send and receive. In order to avoid the need for sleeps and waits you would have to change your protocol. A simple change to the code above would be to have the DEALER connect and then immediately send a "HELLO" message to the ROUTER (could be just an empty message). The router code must be redesigned such that it does nothing until it receives a HELLO message from the DEALER. Once you have received the HELLO message you know the connection is successfully established and you can safely send your chat messages.
Also, this protocol eliminates the need for your router to know the client id in advance. Instead you can extract it from the HELLO message. A message from a DEALER to a ROUTER is guaranteed to be a multi-part message and the first part is the client ID.
I have following Socket server's code that reads stream from connected Socket.
try
{
ObjectInputStream in = new ObjectInputStream(client.getInputStream());
int count = 10;
while(count>0)
{
String msg = in.readObject().toString(); //Stucks here if this client is lost.
System.out.println("Client Says : "+msg);
count--;
}
in.close();
client.close();
}
catch(Exception ex)
{
ex.printStackTrace();
}
And I have a Client program, that connects with this server, sends some string every second for 10 times, and server reads from the socket for 10 times and prints the message, but if in between I kill the Client program, the Server freezes in between instead of throwing any exception or anything.
How can I detect this freeze condition? and make this loop iterate infinitely and print whatever client sends until connection is active and stable?
The problem is that the server side of the socket has no way of knowing that the client connection closed because the client code terminates without calling .close() on the client side of the socket, and therefore never sends the TCP FIN signal.
One possible way of fixing this would be to create a new Watcher thread that just periodically inspects the socket to see if it is still active. The problem with that approach is that the isConnected() on the Socket will not work for the same reason stated above so the only real way to inspect the connection is to attempt to write to it. However, this may cause random garbage to be sent to a potentially listening client.
Other options would be to implement some type of keep-alive protocol that the client should agree to (i.e., send keep-alive bits every so often so the Watcher has something to look for). You could also just move to the java.nio approach, which I believe does a better job at dealing with these conditions.
This thread is old, but provides more detail: http://www.velocityreviews.com/forums/t541628-sockets-checking-for-dropped-connections-and-close.html.
I have encountered a problem of socket communication on linux system, the communication process is like below: client send a message to ask the server to do a compute task, and wait for the result message from server after the task completes.
But the client would hangs up to wait for the result message if the task costs a long time such as about 40 minutes even though from the server side, the result message has been written to the socket to respond to the client, but it could normally receive the result message if the task costs little time, such as one minute. Additionally, this problem only happens on customer environment, the communication process behaves normally in our testing environment.
I have suspected the cause to this problem is the default timeout value of socket is different between customer environment and testing environment, but the follow values are identical on these two environment, and both Client and server.
getSoTimeout:0
getReceiveBufferSize:43690
getSendBufferSize:8192
getSoLinger:-1
getTrafficClass:0
getKeepAlive:false
getTcpNoDelay:false
the codes on CLient are like:
Message msg = null;
ObjectInputStream in = client.getClient().getInputStream();
//if no message readObject() will hang here
while ( true ) {
try {
Object recObject = in.readObject();
System.out.println("Client received msg.");
msg = (Message)recObject;
return msg;
}catch (Exception e) {
e.printStackTrace();
return null;
}
}
the codes on server are like,
ObjectOutputStream socketOutStream = getSocketOutputStream();
try {
MessageJobComplete msgJobComplete = new MessageJobComplete(reportFile, outputFile );
socketOutStream.writeObject(msgJobComplete);
}catch(Exception e) {
e.printStackTrace();
}
in order to solve this problem, i have added the flush and reset method, but the problem still exists:
ObjectOutputStream socketOutStream = getSocketOutputStream();
try {
MessageJobComplete msgJobComplete = new MessageJobComplete(reportFile, outputFile );
socketOutStream.flush();
logger.debug("AbstractJob#reply to the socket");
socketOutStream.writeObject(msgJobComplete);
socketOutStream.reset();
socketOutStream.flush();
logger.debug("AbstractJob#after Flush Reply");
}catch(Exception e) {
e.printStackTrace();
logger.error("Exception when sending MessageJobComplete."+e.getMessage());
}
so do anyone knows what the next steps i should do to solve this problem.
I guess the cause is the environment setting, but I do not know what the environment factors would affect the socket communication?
And the socket using the Tcp/Ip protocal to communicate, the problem is related with the long time task, so what values about tcp would affect the timeout of socket communication?
After my analysis about the logs, i found after the message are written to the socket, there were no exceptions are thrown/caught. But always after 15 minutes, there are exceptions in the objectInputStream.readObject() codes snippet of Server Side which is used to accept the request from client. However, socket.getSoTimeout value is 0, so it is very strange that the a Timed out Exception was thrown.
{2012-01-09 17:44:13,908} ERROR java.net.SocketException: Connection timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:312)
at sun.security.ssl.InputRecord.read(InputRecord.java:350)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:809)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:766)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:94)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:69)
at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2265)
at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2558)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2568)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
so why the Connection Timed out exceptions are thrown?
This problem is solved. using the tcpdump to capture the messages flows. I have found that while in the application level, ObjectOutputStream.writeObject() method was invoked, in the tcp level, many times [TCP ReTransmission] were found.
So, I concluded that the connection is possibly be dead, although using the netstat -an command the tcp connection state still was ESTABLISHED.
So I wrote a testing application to periodically sent Testing messages as the heart-beating messages from the Server. Then this problem disappeared.
The read() methods of java.io.InputStream are blocking calls., which means they wait "forever" if they are called when there is no data in the stream to read.
This is completely expected behaviour and as per the published contract in javadoc if the server does not respond.
If you want a non-blocking read, use the java.nio.* classes.
Is there a way to have reliable communications (the sender get informed that the message it sent is already received by the receiver) using Java TCP/IP library in java.net.*? I understand that one of the advantages of TCP over UDP is its reliability. Yet, I couldn't get that assurance in the experiment below:
I created two classes:
1) echo server => always sending back the data it received.
2) client => periodically send "Hello world" message to the echo server.
They were run on different computers (and worked perfectly). During the middle of the execution, I disconnected the network (unplugged the LAN cable). After disconnected, the server still keep waiting for a data until a few seconds passed (it eventually raised an exception). Similarly, the client also keep sending a data until a few seconds passed (an exception is raised).
The problem is, objectOutputStream.writeObject(message) doesn't guarantee the delivery status of the message (I expect it to block the thread, keep resending the data until delivered). Or at least I get informed, which messages are missing.
Server Code:
import java.net.*;
import java.io.*;
import java.io.Serializable;
public class SimpleServer {
public static void main(String args[]) {
try {
ServerSocket serverSocket = new ServerSocket(2002);
Socket socket = new Socket();
socket = serverSocket.accept();
InputStream inputStream = socket.getInputStream();
ObjectInputStream objectInputStream = new ObjectInputStream(
inputStream);
while (true) {
try {
String message = (String) objectInputStream.readObject();
System.out.println(message);
Thread.sleep(1000);
} catch (Exception ex) {
ex.printStackTrace();
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Client code:
import java.net.*;
import java.io.*;
public class SimpleClient {
public static void main(String args[]) {
try {
String serverIpAddress = "localhost"; //change this
Socket socket = new Socket(serverIpAddress, 2002);
OutputStream outputStream = socket.getOutputStream();
ObjectOutputStream objectOutputStream = new ObjectOutputStream(
outputStream);
while (true) {
String message = "Hello world!";
objectOutputStream.writeObject(message);
System.out.println(message);
Thread.sleep(1000);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
If you need to know which messages have arrived in the peer application, the peer application has to send acknowledgements.
If you want this level of guarantees it sounds like you really want JMS. This can ensure not only that messages have been delivered but also have been processed correctly. i.e. there is no point having very reliable delivery if it can be discarded due to a bug.
You can monitor which messages are waiting and which consumers are falling behind. Watch a producer to see what messages it is sending, and have messages saved when it is down and are available when it restarts. i.e. reliable delivery even if the consumer is restarted.
TCP is always reliable. You don't need confirmations. However, to check that a client is up, you might also want to use a UDP stream with confirmations. Like a PING? PONG! system. Might also be TCP settings you can adjust.
Your base assumption (and understanding of TCP) here is wrong. If you unplug and then re-plug, the message most likely will not be lost.
It boils down on how long to you want the sender to wait. One hour, one day? If you'd make the timeout one day, you would unplug for two days and still say "does not work".
So the guaranteed delivery is that "either data is delivered - or you get informed". In the second case you need to solve it on application level.
You could consider using the SO_KEEPALIVE socket option which will cause the connection to be closed if no data is transmitted over the socket for 2 hours. However, obviously in many cases this doesn't offer the level of control typically needed by applications.
A second problem is that some TCP/IP stack implementations are poor and can leave your server with dangling open connections in the event of a network outage.
Therefore, I'd advise adding application level heartbeating between your client and server to ensure that both parties are still alive. This also offers the advantage of severing the connection if, for example a 3rd party client remains alive but becomes unresponsive and hence stops sending heartbeats.