Shutting down a netty TcpClient channel in Reactor 3.0.4

Shutting down a netty TcpClient channel in Reactor 3.0.4 - java

I just upgraded projectreactor.io from reactor OLD: [core: 3.0.1.RELEASE, netty: 0.5.2.RELEASE] to reactor NEW [core: 3.0.4.RELEASE, netty: 0.6.0.RELEASE].
I open a TcpClient connection and want to close it later.
In the OLD version I used
tcpClient.shutdown();
to disconnect my client from the server.
Is there an equivalent call in the NEW version? I could not find one!
I tried the following on both the NettyInbound and NettyOutbound that I get while creating my TcpClient with tcpClient.newHandler(...)
.context().dispose()
.context().channel().disconnect()
.context().channel().close()
TcpResources.reset()
None of them seem to do the job correctly.
I noticed that the respective .context().onClose(...)-callback is being called.
But after some additional waiting the server-side checks the connections.
Server-side is plain NIO2 not reactor/netty and while the client was upgraded, the server-side remained unchanged.
With the OLD client I got .isOpen() == false for every channel on server-side.
With the NEW client I get .isOpen() == true for every channel on server-side. Most of the time I can even write to the channel. And some channels switch to .isOpen() == false after writing few bytes.

This deserves an issue I think especially if channel().close() and reset() didn't work.
Otherwise it might be due to the default pooling and TcpClient.create(opts -> opts.disablePool()) might help, let us know and if you have a chance to post an issue on http://github.com/reactor/reactor-netty you would be a hero :D

Linked to this open issue https://github.com/reactor/reactor-netty/issues/15. We will review the dispose API.

The following code somehow destroys the channel but not completely.
ChannelFuture f = nettyContext.channel().close();
f.sync();
nettyContext.dispose();
The problem is that the channel still seems to be open on server-side.
For a NIO2-based server, the server should not test if the channel isOpen(). It's always true.
As a dirty workaround, the server must write to the channel twice. If it catches an ExecutionException on the second write then the channel was already closed by the Netty-TcpClient.
try {
channel.write(ByteBuffer.wrap("hello".getBytes())).get();
// No exception on first write but on second write.
channel.write(ByteBuffer.wrap("bye".getBytes())).get();
} catch (ExecutionException e) {
LOG.log(Level.SEVERE, "ExecutionException on writing from server into channel", e);
}

With reactor-core: 3.1.0.M3 and reactor-netty: 0.7.0.M1 the client-API was improved and works more reliable.
After blockingNettyContext.shutdown() I still need the following workaround on server-side to make sure the channel was closed:
I need to write into the channel and close it on exception:
// channel.isOpen == true
try {
channel.write(ByteBuffer.wrap("__test__".getBytes())).get();
} catch (ExecutionException e) {
channel.close();
}
// channel.isOpen == false

Related

How do I handle ServerSocketChannel.accept() IOException: too many open files in NIO?

I'm having a problem with one of my servers, on Friday morning I got the following IOException:
11/Sep/2015 01:51:39,524 [ERROR] [Thread-1] - ServerRunnable: IOException:
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) ~[?:1.7.0_75]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) ~[?:1.7.0_75]
at com.watersprint.deviceapi.server.ServerRunnable.acceptConnection(ServerRunnable.java:162) [rsrc:./:?]
at com.watersprint.deviceapi.server.ServerRunnable.run(ServerRunnable.java:121) [rsrc:./:?]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_75]
Row 162 of the ServerRunnable class is in the method below, it's the ssc.accept() call.
private void acceptConnection(Selector selector, SelectionKey key) {
try {
ServerSocketChannel ssc = (ServerSocketChannel) key.channel();
SocketChannel sc = ssc.accept();
socketConnectionCount++;
/*
* Test to force device error, for debugging purposes
*/
if (brokenSocket
&& (socketConnectionCount % brokenSocketNumber == 0)) {
sc.close();
} else {
sc.configureBlocking(false);
log.debug("*************************************************");
log.debug("Selector Thread: Client accepted from "
+ sc.getRemoteAddress());
SelectionKey newKey = sc.register(selector,
SelectionKey.OP_READ);
ClientStateMachine clientState = new ClientStateMachine();
clientState.setIpAddress(sc.getRemoteAddress().toString());
clientState.attachSelector(selector);
clientState.attachSocketChannel(sc);
newKey.attach(clientState);
}
} catch (ClosedChannelException e) {
log.error("ClosedChannelException: ", e);
ClientStateMachine clientState = (ClientStateMachine)key.attachment();
database.insertFailedCommunication(clientState.getDeviceId(),
clientState.getIpAddress(),
clientState.getReceivedString(), e.toString());
key.cancel();
} catch (IOException e) {
log.error("IOException: ", e);
}
}
How should I handle this? reading up on the error it appears to be a setting in the Linux OS that limits the number of open files a process can have.
Judging from that, and this question here, it appears that I am not closing sockets correctly (The server is currently serving around 50 clients). Is this a situation where I need a timer to monitor open sockets and time them out after an extended period?
I have some cases where a client can connect and then not send any data once the connection is established. I thought I had handled those cases properly.
It's my understanding that a non-blocking NIO server has very long timeouts, is it possible that if I've missed cases like this they might accumulate and result in this error?
This server has been running for three months without any issues.
After I go through my code and check for badly handled / missing cases, what's the best way to handle this particular error? Are there other things I should consider that might contribute to this?
Also, (Maybe this should be another question) I have log4j2 configured to send emails for log levels of error and higher, yet I didn't get an email for this error. Are there any reasons why that might be? It usually works, the error was logged to the log file as expected, but I never got an email about it. I should have gotten plenty as the error occurred every time a connection was established.

You fix your socket leaks. When you get EOS, or any IOException other than SocketTimeoutException, on a socket you must close it. In the case of SocketChannels, that means closing the channel. Merely cancelling the key, or ignoring the issue and hoping it will go away, isn't sufficient. The connection has already gone away.
The fact that you find it necessary to count broken socket connections, and catch ClosedChannelException, already indicates major logic problems in your application. You shouldn't need this. And cancelling the key of a closed channel doesn't provide any kind of a solution.
It's my understanding that a non-blocking NIO server has very long timeouts
The only timeout a non-blocking NIO server has is the timeout you specify to select(). All the timeouts built-in to the TCP stack are unaffected by whether you are using NIO or non-blocking mode.

IOException "Socket is closed: Unknown socket_descriptor" using java.net.DatagramSocket with Google App Engine

I am using java.net.DatagramSocket to send UDP packets to a statsd server from a Google App Engine servlet. This generally works; however, we periodically see the following exception:
IOException - Socket is closed: Unknown socket_descriptor..
When these IOExceptions occur, calling DatagramSocket.isClosed() returns false.
This issue happens frequently enough that it is concerning, and although I've put in place some workarounds (allocate a new socket and use a DeferredTask queue to retry), it would be good to understand the underlaying reason for these errors.
The Google docs mention, "Sockets may be reclaimed after 2 minutes of inactivity; any socket operation keeps the socket alive for a further 2 minutes." It is unclear to me how this would play into UDP datagrams; however, one suspicion I have is that this is related to GAE instance lifecycle in some way.
My code (sanitized and extracted) looks like:
DatagramSocket _socket;
void init() {
_socket = new DatagramSocket();
}
void send() {
DatagramPacket packet = new DatagramPacket(<BYTES>, <LENGTH>, <HOST>, <PORT>);
_socket.send(packet);
}
Appreciate any feedback on this!

The approach taken to workaround this issue was simply to manage a single static DatagramSocket instance with a couple of helper methods, getSocket() and releaseSocket() to release sockets throwing IOExceptions through the release method, and then allocate upon next access through the get method. Not shown in this code is retry logic to retry the failed socket.send(). Under load testing, this seems to work reliably.
try {
DatagramPacket packet = new DatagramPacket(<BYTES>, <LENGTH>, <HOST>, <PORT>);
getSocket().send(packet);
} catch (IOException ioe) {
releaseSocket();
}

Exception catch on ObjectOutputStream

In my software i need to send messages between client and server through an ObjectOutputStream.
The core of the sender method is the following:
....
try {
objWriter.writeUnshared(bean);
objWriter.flush();
} catch (Exception e) {
....
}
...
Running my application on windows XP when the network cable is removed the writeUnsahred throw me an exception.
Now i'm trying to run my application into ubuntu 12.10 and the method don't throw anything if i remove the cable!
Any hint??

Whether and when you get the exception depends on:
how large the socket send buffer is at your end
how large the socket receive buffer is at the peer
how much unacknowledged data you have already written
how long it is since you wrote that, and
the internal timers of your TCP stack.
The only part of that you can control from Java is your own socket send buffer. It is therefore entirely unpredictable when and if the exception will be delivered. You therefore must not write your application to depend on a specific behaviour.

Yes but following the methods called by writeUnshared and flush i see that write(OutputStream.class:106) is called and this method must generate an exception if the stream is closed... So i use that information to check if my channel is active.. the problem is that in ubuntu the channel seems to be open even if i remove the cable..

apr_socket_recv: An established connection was aborted by the software in your host machine

I'm creating a small server using java.nio, but when trying to stress test it I keep getting messages about the connection being reset on the server side, or more specifically:
apr_socket_recv: An established connection was aborted by the software in your host machine
I've tried to narrow it down to the most simple of loops, but still no luck. I can get the error after a hundred or so connections, or maybe just after 1 or 2.
Here's the server loop:
byte[] response = ("HTTP/1.1 200 OK\r\n"
+ "Server: TestServer\r\n"
+ "Content-Type: text/html\r\n"
+ "\r\n"
+ "<html><b>Hello</b></html>").getBytes();
SocketChannel newChannel = null;
while (active) {
try {
//get a new connection and delegate it.
System.out.print("Waiting for connection..");
newChannel = serverSocketChannel.accept();
System.out.println("ok");
newChannel.configureBlocking(true);
newChannel.write(ByteBuffer.wrap(response));
}
catch (IOException e) {
e.printStackTrace();
}
finally {
try {
newChannel.close();
} catch (IOException ex) {
Logger.getLogger(Server.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I've tried checking if the write didn't write all requested byte, but it seemingly does. Interestingly enough, calling System.gc() after each newChannel.close() makes the problem disappear (but in return, it's horribly slow). So either I'm not releasing all resources I should release, or the application just needs a pause..
I'm losing all of my best years on this. Oh, and by the way.. if I ignore writing to the channel and just close after I accept the connection, the problem still doesn't go away.

Well I found it out, so I might as well share it.
My app needed a pause. It was simply going too fast, and closing the connection before the client had written all of its request data. The fix would be to keep on reading until the entire HTTP request had been received. D'oh.. lesson learned.

From the docs for SocketChannel#Write (emphasis mine):
An attempt is made to write up to r bytes to the channel, where r is
the number of bytes remaining in the buffer, that is, src.remaining(),
at the moment this method is invoked.
[...]
Returns: The number of bytes written, possibly zero.
It's up to you to check the return value from the write call (which you're not doing presently), and issue successive write calls until the whole of the buffer has been sent. Something like this, I guess:
ByteBuffer toWrite = ByteBuffer.wrap(response);
while (toWrite.remaining() > 0) {
newChannel.write(toWrite);
}
You'll obviously get aborts if you don't write all of your response data and then just close the socket.

Java Sockets and Dropped Connections

What's the most appropriate way to detect if a socket has been dropped or not? Or whether a packet did actually get sent?
I have a library for sending Apple Push Notifications to iPhones through the Apple gatways (available on GitHub). Clients need to open a socket and send a binary representation of each message; but unfortunately Apple doesn't return any acknowledgement whatsoever. The connection can be reused to send multiple messages as well. I'm using the simple Java Socket connections. The relevant code is:
Socket socket = socket(); // returns an reused open socket, or a new one
socket.getOutputStream().write(m.marshall());
socket.getOutputStream().flush();
logger.debug("Message \"{}\" sent", m);
In some cases, if a connection is dropped while a message is sent or right before; Socket.getOutputStream().write() finishes successfully though. I expect it's due to the TCP window isn't exhausted yet.
Is there a way that I can tell for sure whether a packet actually got in the network or not? I experimented with the following two solutions:
Insert an additional socket.getInputStream().read() operation with a 250ms timeout. This forces a read operation that fails when the connection was dropped, but hangs otherwise for 250ms.
set the TCP sending buffer size (e.g. Socket.setSendBufferSize()) to the message binary size.
Both of the methods work, but they significantly degrade the quality of the service; throughput goes from a 100 messages/second to about 10 messages/second at most.
Any suggestions?
UPDATE:
Challenged by multiple answers questioning the possibility of the described. I constructed "unit" tests of the behavior I'm describing. Check out the unit cases at Gist 273786.
Both unit tests have two threads, a server and a client. The server closes while the client is sending data without an IOException thrown anyway. Here is the main method:
public static void main(String[] args) throws Throwable {
final int PORT = 8005;
final int FIRST_BUF_SIZE = 5;
final Throwable[] errors = new Throwable[1];
final Semaphore serverClosing = new Semaphore(0);
final Semaphore messageFlushed = new Semaphore(0);
class ServerThread extends Thread {
public void run() {
try {
ServerSocket ssocket = new ServerSocket(PORT);
Socket socket = ssocket.accept();
InputStream s = socket.getInputStream();
s.read(new byte[FIRST_BUF_SIZE]);
messageFlushed.acquire();
socket.close();
ssocket.close();
System.out.println("Closed socket");
serverClosing.release();
} catch (Throwable e) {
errors[0] = e;
}
}
}
class ClientThread extends Thread {
public void run() {
try {
Socket socket = new Socket("localhost", PORT);
OutputStream st = socket.getOutputStream();
st.write(new byte[FIRST_BUF_SIZE]);
st.flush();
messageFlushed.release();
serverClosing.acquire(1);
System.out.println("writing new packets");
// sending more packets while server already
// closed connection
st.write(32);
st.flush();
st.close();
System.out.println("Sent");
} catch (Throwable e) {
errors[0] = e;
}
}
}
Thread thread1 = new ServerThread();
Thread thread2 = new ClientThread();
thread1.start();
thread2.start();
thread1.join();
thread2.join();
if (errors[0] != null)
throw errors[0];
System.out.println("Run without any errors");
}
[Incidentally, I also have a concurrency testing library, that makes the setup a bit better and clearer. Checkout the sample at gist as well].
When run I get the following output:
Closed socket
writing new packets
Finished writing
Run without any errors

This not be of much help to you, but technically both of your proposed solutions are incorrect. OutputStream.flush() and whatever else API calls you can think of are not going to do what you need.
The only portable and reliable way to determine if a packet has been received by the peer is to wait for a confirmation from the peer. This confirmation can either be an actual response, or a graceful socket shutdown. End of story - there really is no other way, and this not Java specific - it is fundamental network programming.
If this is not a persistent connection - that is, if you just send something and then close the connection - the way you do it is you catch all IOExceptions (any of them indicate an error) and you perform a graceful socket shutdown:
1. socket.shutdownOutput();
2. wait for inputStream.read() to return -1, indicating the peer has also shutdown its socket

After much trouble with dropped connections, I moved my code to use the enhanced format, which pretty much means you change your package to look like this:
This way Apple will not drop a connection if an error happens, but will write a feedback code to the socket.

If you're sending information using the TCP/IP protocol to apple you have to be receiving acknowledgements. However you stated:
Apple doesn't return any
acknowledgement whatsoever
What do you mean by this? TCP/IP guarantees delivery therefore receiver MUST acknowledge receipt. It does not guarantee when the delivery will take place, however.
If you send notification to Apple and you break your connection before receiving the ACK there is no way to tell whether you were successful or not so you simply must send it again. If pushing the same information twice is a problem or not handled properly by the device then there is a problem. The solution is to fix the device handling of the duplicate push notification: there's nothing you can do on the pushing side.
#Comment Clarification/Question
Ok. The first part of what you understand is your answer to the second part. Only the packets that have received ACKS have been sent and received properly. I'm sure we could think of some very complicated scheme of keeping track of each individual packet ourselves, but TCP is suppose to abstract this layer away and handle it for you. On your end you simply have to deal with the multitude of failures that could occur (in Java if any of these occur an exception is raised). If there is no exception the data you just tried to send is sent guaranteed by the TCP/IP protocol.
Is there a situation where data is seemingly "sent" but not guaranteed to be received where no exception is raised? The answer should be no.
#Examples
Nice examples, this clarifies things quite a bit. I would have thought an error would be thrown. In the example posted an error is thrown on the second write, but not the first. This is interesting behavior... and I wasn't able to find much information explaining why it behaves like this. It does however explain why we must develop our own application level protocols to verify delivery.
Looks like you are correct that without a protocol for confirmation their is no guarantee the Apple device will receive the notification. Apple also only queue's the last message. Looking a little bit at the service I was able to determine this service is more for convenience for the customer, but cannot be used to guarantee service and must be combined with other methods. I read this from the following source.
http://blog.boxedice.com/2009/07/10/how-to-build-an-apple-push-notification-provider-server-tutorial/
Seems like the answer is no on whether or not you can tell for sure. You may be able to use a packet sniffer like Wireshark to tell if it was sent, but this still won't guarantee it was received and sent to the device due to the nature of the service.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.