Lettuce fail over in cluster

Lettuce fail over in cluster - java

I am trying to get Lettuce to connect to the newly promoted master (former slave) after the old one failed. But all writes stop. The writes continue after the failed host reconnects, now as a slave. And it continues to write to the new master (former slave).
I tried setting periodic topology refreshes, as well as adaptive ones on all events but it didn't help. Is there another setting I have to use?
This is how I configured the client:
final List<RedisURI> redisURIs = buildRedisURIs(redisServerSettings.getNodes());
final RedisClusterClient client = RedisClusterClient.create(clientResources, redisURIs);
final ClusterTopologyRefreshOptions refreshOptions =
ClusterTopologyRefreshOptions.builder()
.enableAllAdaptiveRefreshTriggers()
.adaptiveRefreshTriggersTimeout(Duration.ofMinutes(2))
.refreshTriggersReconnectAttempts(2)
.enablePeriodicRefresh(Duration.ofMinutes(10))
.build();
client.setOptions(ClusterClientOptions.builder().topologyRefreshOptions(refreshOptions).build());

I solved the problem.
Because lettuce doesn't have a timeout normally, it waited forever for the response from the server. Setting the timeout caused some transactions to fail but after the failed transactions, the reads and writes continued.

Related

gRPC connection cycling

We are setting up a cluster to handle inferencing (with Tensorflow Serving) over gRPC. We intend to use a layer-7 load balancer (AWS ALB) to distribute the load. For our work load, inferencing will occur many times per minute from each client account. It is my understand that gRPC holds connection state for each of these channels. As a result, in order for the ALB to do its job, we need to periodically teardown and rebuild the connection on the client instance.
My question: what is the best practice for cycling a connection in Java?
Below is my proposed code, which would be called every couple minutes on each client channel. I assume that while the first connection is being shutdown, we can go about creating new one and immediately issue a request on it; or do we need to wait while the prior channel is shutdown first. In our situation, the channel will (very likely) be empty since the previous request will have been 10 seconds earlier.
if (mChannel != null)
mChannel.shutdown();
mChannel = ManagedChannelBuilder.forAddress(mHost, mPort).usePlaintext().build();
mStub = PredictionServiceGrpc.newBlockingStub(mChannel);

The best practice is to use Lookaside Load Balancing.
However, you can do few tweaks to terminate client connections.
var builder = ManagedChannelBuilder.forAddress(mHost, mPort)
.keepAliveTime(15, TimeUnit.SECONDS)
.keepAliveTimeout(5, TimeUnit.SECONDS);
The above config will ensure to terminate sticky gRPC connections, and AWS ALB can do its job to load balance requests uniformly.
There are other options that you can try depending upon your use case, e.g retries, etc. See ManagedChannelBuilder

Slow message consumption using AmazonSQSClient

So, i used concurrency in spring jms 50-100, allowing max connections upto 200. Everything is working as expected but if i try to retrieve 100k messages from queue, i mean there are 100k messages on my sqs and i reading them through the spring jms normal approach.
#JmsListener
Public void process (String message) {
count++;
Println (count);
//code
}
I am seeing all the logs in my console but after around 17k it starts throwing exceptions
Something like : aws sdk exception : port already in use.
Why do i see this exception and how do. I get rid of it?
I tried looking on the internet for it. Couldn't find anything.
My setting :
Concurrency 50-100
Set messages per task :50
Client acknowledged
timestamp=10:27:57.183, level=WARN , logger=c.a.s.j.SQSMessageConsumerPrefetch, message={ConsumerPrefetchThread-30} Encountered exception during receive in ConsumerPrefetch thread,
javax.jms.JMSException: AmazonClientException: receiveMessage.
at com.amazon.sqs.javamessaging.AmazonSQSMessagingClientWrapper.handleException(AmazonSQSMessagingClientWrapper.java:422)
at com.amazon.sqs.javamessaging.AmazonSQSMessagingClientWrapper.receiveMessage(AmazonSQSMessagingClientWrapper.java:339)
at com.amazon.sqs.javamessaging.SQSMessageConsumerPrefetch.getMessages(SQSMessageConsumerPrefetch.java:248)
at com.amazon.sqs.javamessaging.SQSMessageConsumerPrefetch.run(SQSMessageConsumerPrefetch.java:207)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Address already in use: connect
Update : i looked for the problem and it seems that new sockets are being created until every sockets gets exhausted.
My spring jms version would be 4.3.10
To replicate this problem just do the above configuration with the max connection as 200 and currency set to 50-100 and push some 40k messages to the sqs queue.. One can use https://github.com/adamw/elasticmq this as a local stack server which replicates Amazon sqs.. After being done till here. Comment jms listener and use soap ui load testing and call the send message to fire many messages. Just because you commented #jmslistener annotation, it won't consume messages from queue. Once you see that you have sent 40k messages, stop. Uncomment #jmslistener and restart the server.
Update :
DefaultJmsListenerContainerFactory factory =
new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(connectionFactory);
factory.setDestinationResolver(new DynamicDestinationResolver());
factory.setErrorHandler(Throwable::printStackTrace);
factory.setConcurrency("50-100");
factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return factory;
Update :
SQSConnectionFactory connectionFactory = new SQSConnectionFactory( new ProviderConfiguration(), amazonSQSclient);
Update :
Client configuration details :
Protocol : HTTP
Max connections : 200
Update :
I used cache connection factory class and it seems. I read on stack overflow and in their official documentation to not use cache connection factory class and default jms listener container factory.
https://stackoverflow.com/a/21989895/5871514
It's gives the same error that i got before though.
update
My goal is to get a 500 tps, i.e i should be able to consume that much.. So i tried this method and it seems I can reach 100-200, but not more than that.. Plus this thing is a blocker at high concurrency .. If you use it.. If you have some better solution to achieve it.. I am all ears.
**updated **
I am using amazonsqsclient

Starvation on the Consumer
One possible optimization that JMS clients tend to implement, is a message consumption buffer or "prefetch". This buffer is sometimes tunable via the number of messages or by a buffer size in bytes.
The intention is to prevent the consumer from going to the server every single time it receives a messages, rather than pulling multiple messages in a batch.
In an environment where you have many "fast consumers" (which is the opinionated view these libraries may take), this prefetch is set to a somewhat high default in order to minimize these round trips.
However, in an environment with slow message consumers, this prefetch can be a problem. The slow consumer is holding up messaging consumption for those prefetched messages from the faster consumer. In a highly concurrent environment, this can cause starvation quickly.
That being the case the SQSConnectionFactory has a property for this:
SQSConnectionFactory sqsConnectionFactory = new SQSConnectionFactory( new ProviderConfiguration(), amazonSQSclient);
sqsConnectionFactory.setNumberOfMessagesToPrefetch(0);
Starvation on the Producer (i.e. via JmsTemplate)
It's very common for these JMS implementations to expect be interfaced to the broker via some intermediary. These intermediaries actually cache and reuse connections or use a pooling mechanism to reuse them. In the Java EE world, this is usually taken care of a JCA adapter or other method on a Java EE server.
Because of the way Spring JMS works, it expects an intermediary delegate for the ConnectionFactory to exist to do this caching/pooling. Otherwise, when Spring JMS wants to connect to the broker, it will attempt to open a new connection and session (!) every time you want to do something with the broker.
To solve this, Spring provides a few options. The simplest being the CachingConnectionFactory, which caches a single Connection, and allows many Sessions to be opened on that Connection. A simple way to add this to your #Configuration above would be something like:
#Bean
public ConnectionFactory connectionFactory(AmazonSQSClient amazonSQSclient) {
SQSConnectionFactory sqsConnectionFactory = new SQSConnectionFactory(new ProviderConfiguration(), amazonSQSclient);
// Doing the following is key!
CachingConnectionFactory connectionfactory = new CachingConnectionFactory();
connectionfactory.setTargetConnectionFactory(sqsConnectionFactory);
// Set the #connectionfactory properties to your liking here...
return connectionFactory;
}
If you want something more fancy as a JMS pooling solution (which will pool Connections and MessageProducers for you in addition to multiple Sessions), you can use the reasonably new PooledJMS project's JmsPoolConnectionFactory, or the like, from their library.

Automatically reconnect Storm Topology to Redis Cluster on Redis restart

I have created a Storm topology which connects to Redis Cluster using Jedis library. Storm component always expects that Redis is up and running and only then it connects to Redis and subscribes the events.Currently we use pub-sub strategy of Redis.
Below is the code sample that explains my Jedis Connectivity inside Storm to for Redis.
try {
jedis.psubscribe(listener, pattern);
} catch(Exception ex) {
//catch statement here.
} finally {
pool.returnResource(jedis);
}
....
pool = new JedisPool(new JedisPoolConfig(), host, port); //redis host port
ListenerThread listener = new ListenerThread(queue, pool, pattern);
listener.start();
EXPECTED BEHAVIOUR
Once Redis dies and comes back online, Storm is expected to identify the status of Redis. It must not need a restart in case when Redis die and come online.
ACTUAL BEHAVIOUR
Once Redis restarts due to any reason, I always have to restart the Storm topology as well and only then it starts listening back to Redis.
QUESTION
How can I make Storm listen and reconnect to Redis again after Redis is restarted? any guidance would be appreciated, viz. docs, forum answer.

Catch the exception for the connection lost error and set the pool to null
(Assume that you doing this in Spout) Use an if-else statement to check if pool is null then create a new instance of JedisPool() assigning to the pool like in your code:
pool = new JedisPool(new JedisPoolConfig(), host, port); //redis host port
If pool not null (means connected) then continue your work

This is a common issue with apache-storm where connection thread is alivein stale condition, although the source from where you are consuming is down/restarted. Ideally it should retry to create new connection thread instead reusing the existing one. Hence the Idea is to have it automated it by by detecting the Exception (e.g. JMSConnectionError in case of JMS).
refer this Failover Consumer Example which will give you brief idea what to do in such cases.(P.S this is JMS which would be JMS in redis your case.)
The Steps would be something like this.
Catch Exception in case of ERROR or connection lost.
Init connection (if not voluntarily closed by program) from catch.
If Exception got to step 1.

Gremlin server withRemote connection closed - how to reconnect automatically?

I am using withRemote to connect my java application to gremlin server running in AWS with dynamodb storage backend. I am getting connection timeout after few seconds (~3.3 seconds):
org.apache.tinkerpop.gremlin.process.remote.RemoteConnectionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.nio.channels.ClosedChannelException]]
I need to figure out how to reconnect which means detecting if the connection is closed. I am not sure how to detect that. I get the above exception when I use the graph traversal, is there a way to discover it before and reconnect or is there an option in configuration that allows reconnecting automatically (like create new connection before this one closes) so my application is always connected?
In case you need, this is how I am doing connection - currently connection part is singleton when the application starts:
this.graph = EmptyGraph.instance();
GryoMessageSerializerV1d0 gryoMessageSerializerV1d0 = new GryoMessageSerializerV1d0(
GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()));
this.cluster = Cluster.build().serializer(gryoMessageSerializerV1d0)
.addContactPoint(configuration.getString("graphDb.host", "localhost"))
.port(configuration.getInt("graphDb.port", 8182)).create();
this.graphTraversalSource = this.graph.traversal().withRemote(DriverRemoteConnection.using(cluster));

I feel like this problem is already solved with connection.keepAlive configuration option. It defaults to 180 seconds so it's longer than your timeout of 60 seconds in your load balancer which is why it gives up.
That said, the driver should be reconnecting on its own. It's constantly trying to do that given the connectionPool.reconnectInterval but perhaps there is a condition where you're quickly exhausting all the connections to the point of getting that error....not sure. Either way, hopefully the

How to set the timeout for a MQTT client?

I'm using the IA92 Java implementation for MQTT, which allows me to connect to a MQTT broker. In order to establish the connection, I'm doing something like this:
// Create connection spec
String mqttConnSpec = "tcp://the_server#the_port";
// Create the client and connect
mqttClient = MqttClient.createMqttClient(mqttConnSpec, null);
mqttClient.connect("the_id", true, 666);
The problem is that sometimes the server takes too much time to send a response, and it throws a timeout exception:
org.apache.harmony.luni.platform.OSNetworkSystem.connectStreamWithTimeoutSocket(OSNetworkSystem.java:130)
at org.apache.harmony.luni.net.PlainSocketImpl.connect(PlainSocketImpl.java:246)
at org.apache.harmony.luni.net.PlainSocketImpl.connect(PlainSocketImpl.java:533)
at java.net.Socket.connect(Socket.java:1055)
at com.ibm.mqtt.j2se.MqttJava14NetSocket.<init>((null):-1)
at com.ibm.mqtt.j2se.MqttJavaNetSocket.setConnection((null):-1)
at com.ibm.mqtt.Mqtt.tcpipConnect((null):-1)
at com.ibm.mqtt.MqttBaseClient.doConnect((null):-1)
at com.ibm.mqtt.MqttBaseClient.connect((null):-1)
at com.ibm.mqtt.MqttClient.connect((null):-1)
at com.ibm.mqtt.MqttClient.connect((null):-1)
What I need to do is setting a timeout manually, instead of letting the mqtt client decide that. The documentation says: There are also methods for setting attributes of the MQ Telemetry Transport connection, such as timeouts and retries.
But, honestly, I haven't found anything about it. I have taken a look at the whole javadoc reference and there's no evidence of timeout configuration. I can't see the source code since it's not open source.
So how can I set the timeout for the Mqtt connection?

If you have confusion you can go to MqttConnectionOptions for detail.
String userName="Ohelig";
String password="Pojke";
MqttClient client = new MqttClient("tcp://192.168.1.4:1883","Sending");
MqttConnectOptions authen = new MqttConnectOptions();
authen.setUserName(userName);
authen.setPassword(password.toCharArray());
authen.setKeepAliveInterval(30);
authen.setConnectionTimeout(300);
client.connect(authen);

I don't know anything about ia92, but I'd imagine that the 666 in the connect() call is what you're trying to set the timeout to?
The timeout the documentation is referring to is probably the keepalive timeout. This is the maximum number of seconds (chosen by the client) that can elapse without communication between the server and client. I think this is what you're most interested in.
Retries on the other hand are most likely to refer to the retrying of messages that seem to have gone astray when sending messages with QoS>0. This will be something handled by the client library code though, rather than the broker. This is something that comes into play only after you've connected though, so I very much doubt it's your problem.
To be sure that the keepalive timeout is being set correctly, I'd try pointing your client at a modified mosquitto broker. You can modify mqtt3_handle_connect() in src/read_handle_server.c to print out the keepalive value when you connect. This will ensure it's doing what you think, but won't help with the actual problem I'm afraid!

What broker do you use? Really Small Message Broker V1.1 Alpha, Mosquitto, the broker that comes with IBM WebSphere? You need to set this timeout value in your server configuration. Because the system works that way. You set a keep alive value in your broker and send a ping from the client before that interval expires, in order not for the broker to close the client-server connection, and the process restarts. Actually, even if that interval expires, server will still not close the connection until the 'grace period' ends. See http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html#connect

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.