I have a NATS streaming cluster with 3 nodes set up. It seems that NATS messages published by my java application during server downtime is lost (i.e. not republished again when my servers are back up and running).
A more detailed description:
NATS cluster online. Publisher and Subscriber applications come online. Publisher begins to publish a message every second. Subscriber receives messages.
NATS servers are shut off. Publisher continues to publish messages (let's call these messages 'offline messages'). Subscriber stops receiving anything
NATS servers come back online. Subscriber begins to receive messages again, but 'offline messages' are never received.
Both my publisher and subscriber applications are configured to attempt reconnection to NATS server and does not timeout. I do not get any exceptions throughout.
NATS connection:
Options options = new Options.Builder().servers(serverList).maxReconnects(-1).build();
Connection nc = Nats.connect(options);
StreamingConnectionFactory cf = new StreamingConnectionFactory(natsProperties.getClusterId(), natsProperties.getClientId());
cf.setNatsConnection(nc);
streamingConnection = cf.createConnection();
Publisher:
// subject and message String variables are passed in
streamingConnection.publish(subject, message.getBytes());
Subscriber:
streamingConnection.subscribe(subject, new MessageHandler() {
public void onMessage(Message m) {
System.out.prinf("Received msg: %s\n", m.getData())
}
}, new SubscriptionOptions.Builder().durableName(durableName).build());
From the docs, the Java NATS client seems to have a reconnect buffer built in. I tried increasing the buffer by a factor of 10 but to no avail (also, my messages consist only of 2-digit numbers). How do I get it to resend these 'offline messages'?
I have the same problem, the only solution that I see that another method of subscription is occupied, save the sequence of messages but this I do not think is the best
// Receive messages starting at a specific sequence number
sc.subscribe("foo", new MessageHandler() {
public void onMessage(Message m) {
logger.info("Sequence message " + m.getSequence());
System.out.printf("Received a message: %s\n", m.getData());
}
}, new SubscriptionOptions.Builder().startAtSequence(22).build());
Related
I am using netty.io (4.0.4) in a java application to implement a TCP client to communicate with an external hardware driver. One of the requirements of this hardware is, the client send a KEEP_ALIVE (heart-beat) message every 30 seconds, the hardware however does not respond to this heat-beat.
My problem is, when the connection is abruptly broken (eg: network cable unplugged) the client is completely unaware of this, and keeps sending the KEEP_ALIVE message for much longer (around 5-10 minutes) before it gets an operation timeout exception.
In other words, from the client side, there is no way to tell if its still connected.
Below is a snippet of my bootstrap setup if it helps
// bootstrap setup
bootstrap = new Bootstrap().group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.SO_KEEPALIVE, true)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 3000)
.remoteAddress(ip, port)
.handler(tcpChannelInitializer);
// part of the pipeline responsible for keep alive messages
pipeline.addLast("idleStateHandler", new IdleStateHandler(0, 0, 30, TimeUnit.SECONDS));
pipeline.addLast("keepAliveHandler", keepAliveMessageHandler);
I would expect since the client is sending keep alive messages, and those messages are not received at the other end, a missing acknowledgement should indicate a problem in the connection much earlier?
EDIT
Code from the KeepAliveMessageHandler
public class KeepAliveMessageHandler extends ChannelDuplexHandler
{
private static final Logger LOGGER = getLogger(KeepAliveMessageHandler.class);
private static final String KEEP_ALIVE_MESSAGE = "";
#Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception
{
if (!(evt instanceof IdleStateEvent)) {
return;
}
IdleStateEvent e = (IdleStateEvent) evt;
Channel channel = ctx.channel();
if (e.state() == IdleState.ALL_IDLE) {
LOGGER.info("Sending KEEP_ALIVE_MESSAGE");
channel.writeAndFlush(KEEP_ALIVE_MESSAGE);
}
}
}
EDIT 2
I tired to explicitly ensure the keep alive message delivered using the code below
#Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception
{
if (!(evt instanceof IdleStateEvent)) {
return;
}
IdleStateEvent e = (IdleStateEvent) evt;
Channel channel = ctx.channel();
if (e.state() == IdleState.ALL_IDLE) {
LOGGER.info("Sending KEEP_ALIVE_MESSAGE");
channel.writeAndFlush(KEEP_ALIVE_MESSAGE).addListener(future -> {
if (!future.isSuccess()) {
LOGGER.error("KEEP_ALIVE message write error");
channel.close();
}
});
}
}
This also does not work. :( according to this answer this behavior makes sense, but I am still hoping there is some way to figure-out if the write was a "real" success. (Having the hardware ack the hear-beat is not possible)
You have enabled the TCP Keepalive
.option(ChannelOption.SO_KEEPALIVE, true)
But in your code I can't see any piece that ensures keepalive to be sent at 30 seconds rate.
If a connection has been terminated due to a TCP Keepalive time-out and the other host eventually sends a packet for the old connection, the host that terminated the connection will send a packet with the RST flag set to signal the other host that the old connection is no longer active. This will force the other host to terminate its end of the connection so a new connection can be established.
Typically TCP Keepalives are sent every 45 or 60 seconds on an idle TCP connection, and the connection is dropped after 3 sequental ACKs are missed. This varies by host, e.g. by default Windows PCs send the first TCP Keepalive packet after 7200000ms (2 hour)s, then sends 5 Keepalives at 1000ms intervals, dropping the connection if there is no response to any of the Keepalive packets.
(taken form http://ltxfaq.custhelp.com/app/answers/detail/a_id/1512/~/tcp-keepalives-explained_
I do understand now that
pipeline.addLast("idleStateHandler", new IdleStateHandler(0, 0, 30, TimeUnit.SECONDS));
pipeline.addLast("keepAliveHandler", keepAliveMessageHandler);
Will trigger an idle event every 30 seconds on mutual inactivity and keepAliveMessageHandler will sent a packet to remove side in this case.
Unfortunately
ChannelFuture future = channel.writeAndFlush(KEEP_ALIVE_MESSAGE);
is considered success when it is written to OS buffers.
It seems that under your conditions you have only 2 optios:
Sending a command that will have some response from external
device (something that will not cause distruption)
But I would assume that this is impossible in your case.
Modyfying underlying TCP driver settings
The default OS settings for TCP keepalive are more about conserving system resources to support large amount of applications and connections. Provided that you have a dedicated system you may set more aggressive TCP checks configuration.
Here is the link on how to make adjustments to linux kernel: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
The solution should work as on plain installations as well in VMs and Docker containers.
General information on the topic: https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
I have a Spring application that consumes messages on a specific port (say 9001), restructures them and then forwards to a Rabbit MQ server. The code segment is:
private void send(String routingKey, String message) throws Exception {
String exchange = applicationConfiguration.getAMQPExchange();
String exchangeType = applicationConfiguration.getAMQPExchangeType();
Connection connection = myConnection.getConnection();
Channel channel = connection.createChannel();
channel.exchangeDeclare(exchange, exchangeType);
channel.basicPublish(exchange, routingKey, null, message.getBytes());
log.debug(" [CORE: AMQP] Sent message with key {} : {}",routingKey, message);
}
If the Rabbit MQ server fails (crashes, runs out of RAM, turned off etc) the code above blocks, preventing the upstream service from receiving messages (a bad thing). I am looking for a way of preventing this behaviour whilst not losing mesages so that at some time in the future they can be resent.
I am not sure how best to address this. One option may be to queue the messages to a disk file and then use a separate thread to read and forward to the Rabbit MQ server?
If I understand correctly, the issue you are describing is a known JDK socket behaviour when the connection is lost mid-write. See this mailing list thread: http://markmail.org/thread/3vw6qshxsmu7fv6n.
Note that if RabbitMQ is shut down, the TCP connection should be closed in a way that's quickly observable by the client. However, it is true that stale TCP connections can take
a while to be detected, that's why RabbitMQ's core protocol has heartbeats. Set heartbeat
interval to a low value (say, 6-8) and the client itself will notice unresponsive peer
in that amount of time.
You need to use Publisher confirms [1] but also account for the fact that the app itself
can go down right before sending a message. As you rightly point out, having a disk-based
WAL (write-ahead log) is a common solution for this problem. Note that it is both quite
tricky to get right and still leaves some time window where your app process shutting down can result in an unpublished and unlogged message.
No promises on the time frame but the idea of adding WAL to the Java client has been discussed.
http://www.rabbitmq.com/confirms.html
I'm begininng to use Mqtt and I have a hard time with handling an unreliable network.
I'm using a Paho Java Client (in groovy) to publish messages to a distant Mosquitto Broker.
Is there a way, when the broker is unreachable, to have the Paho client persist the message and automatically re-connect to the broker and publish the locally stored messages ? Do I have to handle everything myself, using for example a local broker ?
Here is my client building code
String persistenceDir = config['persistence-dir'] ?: System.getProperty('java.io.tmpdir')
def persistence = new MqttDefaultFilePersistence(persistenceDir)
client = new MqttAsyncClient(uri, clientId, persistence)
client.setCallback(this)
options = new MqttConnectOptions()
if (config.password) {
options.setPassword(config.password as char[])
options.setUserName(config.user)
}
options.setCleanSession(false)
client.connect(options)
And my publish code
def message = new MqttMessage(Json.encode(outgoingMessage).getBytes())
try {
client?.connect(options)
def topic = client.getTopic('processMsg')
message.setQos(1)
def token = topic.publish(message)
if (client) {
client.disconnect()
}
Thanks
The Paho client will only persist in-flight messages when it is connected to the broker.
Typically, when connectivity issues start to arrive you'll see message timeouts popping up
Timed out waiting for a response from the server (32000)
At that point the message will still be persisted.
However, when the connection is lost, and you start seeing this
Client is not connected (32104)
You should assume that the message has not been persisted by Paho.
You can debug this in org.eclipse.paho.client.mqttv3.internal.ClientComms :
/**
* Sends a message to the broker if in connected state, but only waits for the message to be
* stored, before returning.
*/
public void sendNoWait(MqttWireMessage message, MqttToken token) throws MqttException {
final String methodName = "sendNoWait";
if (isConnected() ||
(!isConnected() && message instanceof MqttConnect) ||
(isDisconnecting() && message instanceof MqttDisconnect)) {
this.internalSend(message, token);
} else {
//#TRACE 208=failed: not connected
log.fine(className, methodName, "208");
throw ExceptionHelper.createMqttException(MqttException.REASON_CODE_CLIENT_NOT_CONNECTED);
}
}
The internalSend will persist the message, but only if it is connected to the broker.
Also take into account that there is a maximum number of inflight messages that Paho can process. If it exceeds that it will also decide to not persist the message.
You could just setup a local broker and bridge that with the remote broker. That way you can queue up all your messages locally and when the remote broker comes back online all can be delivered.
Yes... After you get an exception that the message can't be delivered, it has to be either persisted or the message needs to be regenerated.
If you plan to use a local broker you can look at Really Small Message Broker (https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=d5bedadd-e46f-4c97-af89-22d65ffee070)
I have some doubts regarding QoS=2 settings.
Mqtt publisher-subscriber am using Qos=2. Up to my knowledge by setting Qos=2 avoid duplication of message delivery among subscribers. In publisher i have set the Qos=2. I have two subscribers listening the same TOPIC. My code is running correctly but both subscribers getting the same message.
By setting Qos=2 Only one subscriber can get the message right?
How to solve this issue?
public class PubSync {
public static void main(String[] args) {
try {
MqttClient client = new MqttClient(TCPAddress,MqttClient.generateClientId());
MqttTopic topic = client.getTopic(MYTOPIC);
MqttMessage message = new MqttMessage(msg.getBytes());
message.setQos(2);
client.connect();
MqttDeliveryToken token = topic.publish(message);
token.waitForCompletion();
client.disconnect();
} catch (Exception e) {
e.printStackTrace();
}
}
}
QOS 2 means the that each subscriber will only receive 1 copy of any given message.
This differs from QOS 1 where it is possible that a subscriber may receive multiple copies of the same message as the broker ensures that message is delivered.
The QOS levels do not change in any way how many subscribers will see a message.
Depending on the MQTT messaging provider you are using, you should be able to share a subscription to a topic across multiple subscribers so that only one subscriber receives each message. In this case the messaging provider handles distributing the workload evently across all the subscribers.
This is known as shared subscriptions and you can read more about how it works in IBM's MessageSight product here: http://pic.dhe.ibm.com/infocenter/ism/v1r0m0/topic/com.ibm.ism.doc/Overview/ov30010.html
I'm on the dev team for a socket server which uses Netty. When a client sends a request, and the server sends a single response, the round trip time is quite fast. (GOOD) We recently noticed that if the request from the client triggers two messages from the server, even though the server writes both messages to the client at about the same time, there is a delay of more than 200ms between the first and second message arriving on the remote client. When using a local client the two messages arrive at the same time. If the remote client sends another request before the second message from the server arrives, that second message is sent immediately, but then the two messages from the new request are both sent with the delay of over 200ms.
Since it was noticed while using Netty 3.3.1, I tried upgrading to Netty 3.6.5 but I still see the same behavior. We are using NIO, not OIO, because we need to be able to support large numbers of concurrent clients.
Is there a setting that we need to configure that will reduce that 200+ ms delay?
editing to add a code snippet. I hope these are the most relevant parts.
#Override
public boolean openListener(final Protocol protocol,
InetSocketAddress inetSocketAddress) throws Exception {
ChannelFactory factory = new NioServerSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool(),
threadingConfiguration.getProcessorThreadCount());
ServerBootstrap bootstrap = new ServerBootstrap(factory);
final ChannelGroup channelGroup = new DefaultChannelGroup();
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
.... lots of pipeline setup snipped ......
});
Channel channel = bootstrap.bind(inetSocketAddress);
channelGroup.add(channel);
channelGroups.add(channelGroup);
bootstraps.add(bootstrap);
return true;
}
The writer factory uses ChannelBuffers.dynamicBuffer(defaultMessageSize) for the buffer, and when we write a message it's Channels.write(channel, msg).
What else would be useful? The developer who migrated the code to Netty is not currently available, and I'm trying to fill in.
200ms strikes me as the magic number of the Nagle's algorithm. Try setting the TcpNoDelay to true on both sides.
This is how you set the option for the server side.
serverBootstrap.setOption("child.tcpNoDelay", true);
This is for the client side.
clientBootStrap.setOption("tcpNoDelay", true);
Further reading: http://www.stuartcheshire.org/papers/NagleDelayedAck/