I have my Cassandra sink configured as shown below:
ClusterBuilder secureCassandraSinkClusterBuilder = new ClusterBuilder() {
#Override
protected Cluster buildCluster(Cluster.Builder builder) {
return builder.addContactPoints(props.getCassandraClusterUrlAll().split(","))
.withPort(props.getCassandraPort())
.withAuthProvider(new DseGSSAPIAuthProvider("HTTP"))
.build();
};
CassandraSink
.addSink(cassandraObjectStream)
.setClusterBuilder(secureCassandraSinkClusterBuilder)
.build()
.name("Cassandra-Sink");
Now when the connection to Cassandra is not configured properly, I get a NoHostAvailableException or when the connection unexpectedly drops, I get a ConnectionTimeOutException, or sometimes a WriteTimeoutException. This ultimately triggers a JobExecutionException and the whole Flink job terminates.
Where do I catch these Cassandra exceptions? Where are these thrown? I tried putting a try-catch block around the CassandraSink but that doesn't do it. I want to catch these exceptions and retry connecting to Cassandra in case of a connection time-out or retry writing to Cassandra in case of a write time-out.
AFAIK, you cannot try to catch these exceptions using CassandraSink.
One way to catch the exceptions like TimeoutException is to implement your own sink for Cassandra, but it may take a lot of time...
Another way is if you run your streaming job, you could set the task retry to more than 1 through StreamingExecutionEnvironment.setRestartStrategy, and enable the checkpoint so that the streaming job could continue working based on the last checkpoint. CassandraSink supports WAL, so the EXACTLY_ONCE can be achieved with checkpoint enabled.
Related
I am running kafka transactions on a large scale and below is the codepiece.
producer.initTransaction();
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(producerTopic, element));
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
producer.close();
canSendNext = false;
}catch (KafkaException e) {
producer.abortTransaction();
}
properties used:
ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, STRING_SERIALIZER
ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, BYTE_ARRAY_SERIALIZER
ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString()
bootstrap.servers=localhost:9092
acks=all
retries=1
partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
while the commitTransaction is timing out the catch block of KafkaException runs and try to abort the transaction.
Which fails with the error: **
Cannot attempt operation abortTransaction because the previous call to commitTransaction timed out and must be retried**
how to handle commit transaction timeout scenario
expecting the code to work
As per the documentation:
Note that this method will raise TimeoutException if the transaction
cannot be committed before expiration of max.block.ms. Additionally,
it will raise InterruptException if interrupted. It is safe to retry
in either case, but it is not possible to attempt a different
operation (such as abortTransaction) since the commit may already be
in the progress of completing. If not retrying, the only option is to
close the producer.
The problem here is that the kafka producer has timed out and it does not know whether the kafka broker will complete the transaction or not. So it cannot provide you abort transaction functionality as there is a chance that the transaction might have been committed by the broker even after the producer times out.
You should configure your kafka producer with a sufficiently large max.block.ms and good number of retries to handle such scenarios (Not sure why you have configured your retries to 1). Ideally you should be timing out very rarely - like when there is some issue in the network or there is some actual issue going on in kafka brokers.
In such scenarios, it won't be possible for you to know if your last transaction was actually successful or not. You cannot do anything but close your kafka producer and create a new one.
I'm trying to build an application based on GCP Firestore and therefore I am using the google-cloud-firestore library. I want to stream my query results via streaming API of Query but getting an StatusRuntimeException after ~60s. Seems like the operation timed out. Where can I increase this timeout?
What I finally want to do is building a stream with Flux, which streams a huge amount of data out of Firestore based on a query.
I just tried to find a possibility to increase the timeout via FirestoreOptions but didn't find a working solution for that. What I can see is, that somewhere during initialization of the stream an rpcTimeout of 60s has been set. But I'm not quite sure that this is the right one. Also I didn't find the location to set it.
Method I'm using:
com.google.cloud.firestore.Query
Query.stream(#Nonnull final ApiStreamObserver<DocumentSnapshot> responseObserver)
Exception after ~60s:
com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: The datastore operation timed out, or the data was temporarily unavailable.
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:69)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
...
Edited
Just found out, that it is possible in general to overwrite default settings for timeouts but not for streaming methods. As the respective method in FirestoreSettings states:
/**
* Applies the given settings updater function to all of the unary API methods in this service.
*
* <p>Note: This method does not support applying settings to streaming methods.
*/
That is really annoying, as I didn't have intentions to create a stream based on paging via cursors.
The only way I found to control the timeout was like this:
public List<QueryDocumentSnapshot> getAllDocs() {
try {
ApiFuture<QuerySnapshot> querySnapshot = db().collection("MyDocs").get();
return querySnapshot.get(5, TimeUnit.MINUTES).getDocuments();
} catch (InterruptedException | ExecutionException | TimeoutException e) {
e.printStackTrace();
}
return null;
}
I'm writing an application with Spring Boot so to write to Kafka I do:
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
and then inside my method:
kafkaTemplate.send(topic, data)
But I feel like I'm just relying on this to work, how can I know if this has worked? If it's asynchronous, is it a good practice to return a 200 code and hoped it did work? I'm confused. If Kafka isn't available, won't this fail? Shouldn't I be prompted to catch an exception?
Along with what #mjuarez has mentioned you can try playing with two Kafka producer properties. One is ProducerConfig.ACKS_CONFIG, which lets you set the level of acknowledgement that you think is safe for your use case. This knob has three possible values. From Kafka doc
acks=0: Producer doesn't care about acknowledgement from server, and considers it as sent.
acks=1: This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers.
acks=all: This means the leader will wait for the full set of in-sync replicas to acknowledge the record.
The other property is ProducerConfig.RETRIES_CONFIG. Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error.
Yes, if Kafka is not available, that .send() call will fail, but if you send it async, no one will be notified. You can specify a callback that you want to be executed when the future finally finishes. Full interface spec here: https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/producer/Callback.html
From the official Kafka javadoc here: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
Fully non-blocking usage can make use of the Callback parameter to
provide a callback that will be invoked when the request is complete.
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("the-topic", key, value);
producer.send(myRecord,
new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
if(e != null) {
e.printStackTrace();
} else {
System.out.println("The offset of the record we just sent is: " + metadata.offset());
}
}
});
you can use below command while sending messages to kafka:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-name
while above command is running you should run your code and if sending messages being successful then the message must be printed on the console.
Furthermore, likewise any other connection to any resources if the connection could not be established, then doing any kinds of operations would result some exception raises.
I have created a Storm topology which connects to Redis Cluster using Jedis library. Storm component always expects that Redis is up and running and only then it connects to Redis and subscribes the events.Currently we use pub-sub strategy of Redis.
Below is the code sample that explains my Jedis Connectivity inside Storm to for Redis.
try {
jedis.psubscribe(listener, pattern);
} catch(Exception ex) {
//catch statement here.
} finally {
pool.returnResource(jedis);
}
....
pool = new JedisPool(new JedisPoolConfig(), host, port); //redis host port
ListenerThread listener = new ListenerThread(queue, pool, pattern);
listener.start();
EXPECTED BEHAVIOUR
Once Redis dies and comes back online, Storm is expected to identify the status of Redis. It must not need a restart in case when Redis die and come online.
ACTUAL BEHAVIOUR
Once Redis restarts due to any reason, I always have to restart the Storm topology as well and only then it starts listening back to Redis.
QUESTION
How can I make Storm listen and reconnect to Redis again after Redis is restarted? any guidance would be appreciated, viz. docs, forum answer.
Catch the exception for the connection lost error and set the pool to null
(Assume that you doing this in Spout) Use an if-else statement to check if pool is null then create a new instance of JedisPool() assigning to the pool like in your code:
pool = new JedisPool(new JedisPoolConfig(), host, port); //redis host port
If pool not null (means connected) then continue your work
This is a common issue with apache-storm where connection thread is alivein stale condition, although the source from where you are consuming is down/restarted. Ideally it should retry to create new connection thread instead reusing the existing one. Hence the Idea is to have it automated it by by detecting the Exception (e.g. JMSConnectionError in case of JMS).
refer this Failover Consumer Example which will give you brief idea what to do in such cases.(P.S this is JMS which would be JMS in redis your case.)
The Steps would be something like this.
Catch Exception in case of ERROR or connection lost.
Init connection (if not voluntarily closed by program) from catch.
If Exception got to step 1.
When having declared a method like this using Spring AMQP:
#RabbitListener(..)
public void myMethod(#Header(AmqpHeaders.CHANNEL) Channel channel, #Header(AmqpHeaders.DELIVERY_TAG) Long tag, ...)
and using manual acknowledge mode, how should one properly deal with the IOException that may be thrown when doing ACK:
try {
channel.basicAck(tag, false);
} catch (IOException e) {
// What to do here?
}
Should the exception be rethrown? Should the "basicAck" operation be retried? What's the proper way to handle it?
The standard way of doing this is using retry mechanism & to come out if none of them succeeds.
However, based on my experience, if channel throws an exception, it more or less means the channel is useless & you might have to redo the whole thing again. I normally log the error along with the required details so that I can track which message processing failed so that I can verify the same later to see if its processed or I need to do anything about it.