Play 2.5 WebSocket Connection Build

Play 2.5 WebSocket Connection Build - java

I have an AWS server (medium) running in EU West, and there are roughly 250 devices connected but are also always reconnecting due to internet connectivity issues, but for some reason, the amount of TCP connections to the server grows until it reaches around 4300. Then no new connections are allowed to the server. I have confirmed that it is isolated to WebSocket requests and not regular HTTP requests.
WebSocket connections are kept per device in a Map with device UUID as key; it sometimes happens that a device will send a request for a new WS connection even though the server has a connection to the device. In this case, the current connection is closed, and an error is returned so that the device can retry the connection request.
Below is the code snippet from the Controller handling the connections using LegacyWebSocket. Connections are closed using out.close() as per https://www.playframework.com/documentation/2.5.x/JavaWebSockets#handling-websockets-using-callbacks
public LegacyWebSocket<String> create(String uuid) {
logger.debug("NEW WebSocket request from {}, creating new socket...", uuid);
if(webSocketMap.containsKey(uuid)){
logger.debug("WebSocket already exists for {}, closing existing connection", uuid);
webSocketMap.get(uuid).close();
logger.debug("Responding forbidden to force WS restart from device {}", uuid);
return WebSocket.reject(forbidden());
}
LegacyWebSocket<String> ws = WebSocket.whenReady((in, out) -> {
logger.debug("Adding downstream connection to webSocketMap-> {} webSocketMap.size() = {}",uuid, webSocketMap.size());
webSocketMap.put(uuid,out);
// For each event received on the socket,
in.onMessage(message->{
if(message.equals("ping")){
logger.debug("PING received from {} {}",uuid, message);
out.write("pong");
}
});
// When the socket is closed.
in.onClose(() -> {
logger.debug("onClose, removing for {}",uuid);
webSocketMap.remove(uuid);
});
});
return ws;
}
How can I ensure that Play Framework closes the TCP connection for closed WS connections?
The call that I use to check the amount of TCP connections is netstat -n -t | wc -l

Looks like a TCP keep-alive issue - i.e. that the TCP connections become stale because of connectivity issues on the client side and the server does not handle or clean up the stale connections in time before the limit is reached.
This link will help you configure the TCP keep-alive on your server to ensure that the stale connections are cleaned up in time.

Related

Is there a way to simulate Socket and Connection timeout?

I have a certain piece of code that integrates with a third party using HTTP connection, which handles socket timeout and connection timeout differently.
I have been trying to simulate and test all the scenarios which could arise from the third party. was able to test connection timeout by connecting to a port which is blocked by the servers firewall e.g. port 81.
However I'm unable to simulate a socket timeout. If my understanding is not wrong socket timeout is associated with continuous packet flow, and the connection dropping in between. Is there a way I can simulate this?

So we are talking about to kinds of timeouts here, one is to connect to the server (connect timeout), the other timeout will happen when no data is send or received via the socket for a while (idle timeout).
Node sockets have a socket timeout, that can be used to synthesize both the connect and the idle timeout. This can be done by setting the socket timeout to the connect timeout and then when connected, setting it to the idle timeout.
example:
const request = http.request(url, {
timeout: connectTimeout,
});
request.setTimeout(idleTimeout);
This works because the timeout in the options is set immediately when creating the socket, the setTimeout function is run on the socket when connected!
Anyway, the question was about how to test the connect timeout. Ok so let's first park the idle timeout. We can simply test that by not sending any data for some time, that would cause the timeout. Check!
The connect timeout is a bit harder to test, the first thing that comes to mind is that we need a place to connect to that will not error, but also not connect. This would cause a timeout. But how the hell do we simulate that, in node?
If we think a little bit outside the box then we might figure out that this timeout is about the time it takes to connect. It does not matter why the connection takes as long as it does. We simply need to delay the time it takes to connect. This is not necessarily a server thing, we could also do it on the client. After all this is the part connecting, if we can delay it there, we can test the timeout.
So how could we delay the connection on the client side? Well, we can use the DNS lookup for that. Before the connection is made, a DNS lookup is done. If we simply delay that by 5 seconds or so we can test for the connect timeout very easily.
This is what the code could look like:
import * as dns from "dns";
import * as http from "http";
const url = new URL("http://localhost:8080");
const request = http.request(url, {
timeout: 3 * 1000, // connect timeout
lookup(hostname, options, callback) {
setTimeout(
() => dns.lookup(hostname, options, callback),
5 * 1000,
);
},
});
request.setTimeout(10 * 1000); // idle timeout
request.addListener("timeout", () => {
const message = !request.socket || request.socket.connecting ?
`connect timeout while connecting to ${url.href}` :
`idle timeout while connected to ${url.href}`;
request.destroy(new Error(message));
});
In my projects I usually use an agent that I inject. The agent then has the delayed lookup. Like this:
import * as dns from "dns";
import * as http from "http";
const url = new URL("http://localhost:8080");
const agent = new http.Agent({
lookup(hostname, options, callback) {
setTimeout(
() => dns.lookup(hostname, options, callback),
5 * 1000,
);
},
});
const request = http.request(url, {
timeout: 3 * 1000, // connect timeout
agent,
});
request.setTimeout(10 * 1000); // idle timeout
request.addListener("timeout", () => {
const message = !request.socket || request.socket.connecting ?
`connect timeout while connecting to ${url.href}` :
`idle timeout while connected to ${url.href}`;
request.destroy(new Error(message));
});
Happy coding!

"Connection timeout" determines how long it may take for a TCP connection to be established and this all happens before any HTTP related data is send over the line. By connecting to a blocked port, you have only partially tested the connection timeout since no connection was being made. Typically, a TCP connection on your local network is created (established) very fast. But when connecting to a server on the other side of the world, establishing a TCP connection can take seconds.
"Socket timeout" is a somewhat misleading name - it just determines how long you (the client) will wait for an answer (data) from the server. In other words, how long the Socket.read() function will block while waiting for data.
Properly testing these functions involves creating a server socket or a (HTTP) web-server that you can modify to be very slow. Describing how to create and use a server socket for connection timeout testing (if that is possible) is too much to answer here, but socket timeout testing is a common question - see for example here (I just googled "mock web server for testing timeouts") which leads to a tool like MockWebServer. "MockWebServer" might have an option for testing connection timeouts as well (I have not used "MockWebServer"), but if not, another tool might have.
On a final note: it is good you are testing your usage of the third-party HTTP library with respect to timeout settings, even if this takes some effort. The worst that can happen is that a socket timeout setting in your code is somehow not used by the library and the default socket timeout of "wait forever" is used. That can result in your application doing absolutely nothing ("hanging") for no apparent reason.

reactor-netty: using keep-alive HTTP client

I use reactor-netty to request a set of URLs. Majority of URLs belong to the same hosts. reactor-netty seems to make a brand new TCP connection for every URL even if connection to the host is already established for the previous URL. Some servers drop new connections or start to respond slowly when hundreds of simultaneous connections established.
Sample of the code:
Flux.just(...)
.groupBy(link -> {
String host = "";
try {
host = new URL(link).getHost();
} catch (MalformedURLException e) {
LOGGER.warn("Cannot determine host {}", link, e);
}
return host;
})
.flatMap(group -> {
HttpClient client = HttpClient.create()
.keepAlive(true)
.tcpConfiguration(tcp -> tcp.host(group.key()));
return group.flatMap(link -> client.get()
.uri(link)
.response((resp, cont) -> resp.status().code() == 200 ? cont.aggregate().asString() : Mono.empty())
.doOnSubscribe(s -> LOGGER.debug("Requesting {}", link))
.timeout(Duration.ofMinutes(1))
.doOnError(e -> LOGGER.warn("Cannot get response from {}", link, e))
.onErrorResume(e -> Flux.empty())
.collect(Collectors.joining())
.filter(s -> StringUtils.isNotBlank(s)));
})
.blockLast();
In the log I see that local ports are different for the same remote host and sum of active and inactive connections are way higher than the number of distinct hosts. That's why I think that reactor-netty is not reusing already established connections.
DEBUG [2019-04-29 08:15:18,711] reactor-http-nio-10 r.n.r.PooledConnectionProvider: [id: 0xaed18e87, L:/192.168.1.183:56832 - R:capcp2.naad-adna.pelmorex.com/52.242.33.4:80] Releasing channel
DEBUG [2019-04-29 08:15:18,711] reactor-http-nio-10 r.n.r.PooledConnectionProvider: [id: 0xaed18e87, L:/192.168.1.183:56832 - R:capcp2.naad-adna.pelmorex.com/52.242.33.4:80] Channel cleaned, now 1 active connections and 239 inactive connections
...
DEBUG [2019-04-29 08:15:20,158] reactor-http-nio-10 r.n.r.PooledConnectionProvider: [id: 0xd6c6c5db, L:/192.168.1.183:56965 - R:capcp2.naad-adna.pelmorex.com/52.242.33.4:80] Releasing channel
DEBUG [2019-04-29 08:15:20,158] reactor-http-nio-10 r.n.r.PooledConnectionProvider: [id: 0xd6c6c5db, L:/192.168.1.183:56965 - R:capcp2.naad-adna.pelmorex.com/52.242.33.4:80] Channel cleaned, now 0 active connections and 240 inactive connections
Is it possible to request several URLs on the same host using keep-alive HTTP client through the same TCP connection to the host? If not, how do I restrict the number of simultaneous connections to the same host or perform requests to the same host sequentially (the next request only after receiving response to the previous one)?
I use Californium-SR6 release train.

Yes, reactor netty supports keep-alive, connection reuse, and connection pooling.
Note that .flatMap is a async operation that processes the inner streams in parallel. Therefore, when you call group.flatMap(... the inner requests will be executed in parallel. And since they are executed in parallel, multiple connections will need to be established.
If you want to execute requests to the same host sequentially, change your example to use group.concatMap instead of .flatMap.
If you want to still execute them in parallel, but limit the number of active requests to an individual host, then change your example to use one of the overloaded versions of .flatMap that takes a concurrency parameter.
Also, since you are using HttpClient.create(), your example uses the default global http connection pool. If you want more control over connection pooling, you can specify a different ConnectionProvider via HttpClient.create(ConnectionProvider).

netty client takes very long before broken network is detected

I am using netty.io (4.0.4) in a java application to implement a TCP client to communicate with an external hardware driver. One of the requirements of this hardware is, the client send a KEEP_ALIVE (heart-beat) message every 30 seconds, the hardware however does not respond to this heat-beat.
My problem is, when the connection is abruptly broken (eg: network cable unplugged) the client is completely unaware of this, and keeps sending the KEEP_ALIVE message for much longer (around 5-10 minutes) before it gets an operation timeout exception.
In other words, from the client side, there is no way to tell if its still connected.
Below is a snippet of my bootstrap setup if it helps
// bootstrap setup
bootstrap = new Bootstrap().group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.SO_KEEPALIVE, true)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 3000)
.remoteAddress(ip, port)
.handler(tcpChannelInitializer);
// part of the pipeline responsible for keep alive messages
pipeline.addLast("idleStateHandler", new IdleStateHandler(0, 0, 30, TimeUnit.SECONDS));
pipeline.addLast("keepAliveHandler", keepAliveMessageHandler);
I would expect since the client is sending keep alive messages, and those messages are not received at the other end, a missing acknowledgement should indicate a problem in the connection much earlier?
EDIT
Code from the KeepAliveMessageHandler
public class KeepAliveMessageHandler extends ChannelDuplexHandler
{
private static final Logger LOGGER = getLogger(KeepAliveMessageHandler.class);
private static final String KEEP_ALIVE_MESSAGE = "";
#Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception
{
if (!(evt instanceof IdleStateEvent)) {
return;
}
IdleStateEvent e = (IdleStateEvent) evt;
Channel channel = ctx.channel();
if (e.state() == IdleState.ALL_IDLE) {
LOGGER.info("Sending KEEP_ALIVE_MESSAGE");
channel.writeAndFlush(KEEP_ALIVE_MESSAGE);
}
}
}
EDIT 2
I tired to explicitly ensure the keep alive message delivered using the code below
#Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception
{
if (!(evt instanceof IdleStateEvent)) {
return;
}
IdleStateEvent e = (IdleStateEvent) evt;
Channel channel = ctx.channel();
if (e.state() == IdleState.ALL_IDLE) {
LOGGER.info("Sending KEEP_ALIVE_MESSAGE");
channel.writeAndFlush(KEEP_ALIVE_MESSAGE).addListener(future -> {
if (!future.isSuccess()) {
LOGGER.error("KEEP_ALIVE message write error");
channel.close();
}
});
}
}
This also does not work. :( according to this answer this behavior makes sense, but I am still hoping there is some way to figure-out if the write was a "real" success. (Having the hardware ack the hear-beat is not possible)

You have enabled the TCP Keepalive
.option(ChannelOption.SO_KEEPALIVE, true)
But in your code I can't see any piece that ensures keepalive to be sent at 30 seconds rate.
If a connection has been terminated due to a TCP Keepalive time-out and the other host eventually sends a packet for the old connection, the host that terminated the connection will send a packet with the RST flag set to signal the other host that the old connection is no longer active. This will force the other host to terminate its end of the connection so a new connection can be established.
Typically TCP Keepalives are sent every 45 or 60 seconds on an idle TCP connection, and the connection is dropped after 3 sequental ACKs are missed. This varies by host, e.g. by default Windows PCs send the first TCP Keepalive packet after 7200000ms (2 hour)s, then sends 5 Keepalives at 1000ms intervals, dropping the connection if there is no response to any of the Keepalive packets.
(taken form http://ltxfaq.custhelp.com/app/answers/detail/a_id/1512/~/tcp-keepalives-explained_
I do understand now that
pipeline.addLast("idleStateHandler", new IdleStateHandler(0, 0, 30, TimeUnit.SECONDS));
pipeline.addLast("keepAliveHandler", keepAliveMessageHandler);
Will trigger an idle event every 30 seconds on mutual inactivity and keepAliveMessageHandler will sent a packet to remove side in this case.
Unfortunately
ChannelFuture future = channel.writeAndFlush(KEEP_ALIVE_MESSAGE);
is considered success when it is written to OS buffers.
It seems that under your conditions you have only 2 optios:
Sending a command that will have some response from external
device (something that will not cause distruption)
But I would assume that this is impossible in your case.
Modyfying underlying TCP driver settings
The default OS settings for TCP keepalive are more about conserving system resources to support large amount of applications and connections. Provided that you have a dedicated system you may set more aggressive TCP checks configuration.
Here is the link on how to make adjustments to linux kernel: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
The solution should work as on plain installations as well in VMs and Docker containers.
General information on the topic: https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

java.net.SocketException Connection timed out error

I am getting below error when I am trying to connect to a TCP server. My programs tries to open around 300-400 connections using diffferent threads and this is happening during 250th thread. Each thread uses its own connection to send and receive data.
java.net.SocketException: Connection timed out:could be due to invalid address
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:372)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:233)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:220)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
Here is the code I have that a thread uses to get socket:
socket = new Socket(my_hostName, my_port);
Is there any default limit on number of connections that a TCP server can have at one time? If not how to solve this type of problems?

You could be getting a connection timeout if the server has a ServerSocket bound to the port you are connecting to, but is not accepting the connection.
If it always happens with the 250th connection, maybe the server is set up to only accept 250 connections. Someone has to disconnect so you can connect. Or you can increase the timeout; instead of creating the socket like that, create the socket with the empty constructor and then use the connect() method:
Socket s = new Socket();
s.connect(new InetSocketAddress(my_hostName, my_port), 90000);
Default connection timeout is 30 seconds; the code above waits 90 seconds to connect, then throws the exception if the connection cannot be established.
You could also set a lower connection timeout and do something else when you catch that exception...

Why all the connections? Is this a test program? In which case be aware that opening large numbers of connections from a single client stresses the client in ways that aren't exercised by real systems with large numbers of different client hosts, so test results from that kind of client aren't all that valid. You could be running out of client ports, or some other client resource.
If it isn't a test program, same question. Why all the connections? You'd be better off running a connection pool and reusing a much smaller number of connections serially. The network only has so much bandwidth after all; dividing it by 400 isn't very useful.

PDA loses TCP connection to ServerSocket in Suspend Mode

I'm implementing a java TCP/IP Server using ServerSocket to accept messages from clients via network sockets.
Works fine, except for clients on PDAs (a WIFI barcode scanner).
If I have a connection between server and pda - and the pda goues into suspend (standby) after some idle time - then there will be problems with the connection.
When the pda wakes up again, I can observer in a tcp monitor, that a second connection with a different port is established, but the old one remains established too:
localhost:2000 remotehost:4899 ESTABLISHED (first connection)
localhost:2000 remotehost:4890 ESTABLISHED (connection after wakeup)
And now communication doesn't work, as the client now uses the new connection, but the server still listens at the old one - so the server doesn't receive the messages. But when the server sends a message to the client he realizes the problem (receives a SocketException: Connection reset. The server then uses the new connection and all the messages which have been send in the meantime by the client will be received at a single blow!
So I first realize the network problems, when the server tries to send a message - but in the meantime there are no exceptions or anything. How can I properly react to this problem - so that the new connection is used, as soon as it is established (and the old one closed)?

From your description I guess that the server is structured like this:
server_loop
{
client_socket = server_socket.accept()
TalkToClientUntilConnectionCloses(client_socket)
}
I'd change it to process incoming connections and established connections in parallel. The simplest approach (from the implementation point of view) is to start a new thread for each client. It is not a good approach in general (it has poor scalability), but if you don't expect a lot of clients and can afford it, just change the server like this:
server_loop
{
client_socket = server_socket.accept()
StartClientThread(client_socket)
}
As a bonus, you get an ability to handle multiple clients simultaneously (and all the troubles attached too).

It sounds like the major issue is that you want the server to realize and drop the old connections as they become stale.
Have you considered setting a timeout on the connection on the server-side socket (the connection Socket, not the ServerSocket) so you can close/drop it after a certain period? Perhaps after the SO_TIMEOUT expires on the Socket, you could test it with an echo/keepalive command to verify that the connection is still good.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.