Detect the rebalancing of cluster via code

Detect the rebalancing of cluster via code - java

Is there a way we can detect via code that the hazelcast cluster is rebalancing
Use case: Multiple producer, multiple consumer
Shutdown consumer1 abruptly, and see how long it takes for the other members to resume producing or consuming. In other words i am trying to see the time it takes for the cluster to rebalance.

AFAIK there is no direct API for it.
You could write your own SPI service which receives repartitioning events.
But can you tell a bit more about your producer consumer problem? Do you want to use an IQueue for that?

Related

Callback method vs get method in Kafka producer

To get the produced record details , we have two options to choose from
onCompletion() - callback function
get() method
Could someone please explain what is the difference between them and how to use them in details please? (JAVA)
NOTE : Producer properties which I'm using is mostly default (ex:batch.size,acks,max.block.ms...)

onCompletion is the async way of producing Kafka data and the loop with a get will be a sync way of writing data in Kafka.
The producer in Kafka writes data on the topic at very high throughput. If you use the sync get function in the producer code, after every write the producer needs to wait for the ack from Kafka. This throttles the producer throughput. The producer needs to wait for Kafka to store the data, replicate (based on how it is configured) and then give back the ack to the producer on successful write.
The alternative is onCompletion here the producer will keep on producing the data without waiting for ack from Kafka. Kafka will call the callback onCompletion if the write is successful. The producer needs to keep track of these onCompletion calls and if things fail, it needs to retry.
What generally producers do is send a batch of N records to Kafka and then wait for all the completion events and then send the next N records. This is something like the TCP sliding window flow control paradigm.
It is difficult to suggest what you should do. The downside of working with onCompletion and use a retry from there is that jeopardizes the ordering of the records in Kafka.
The producer may have sent 1..65 records successfully then Kafka missed 65-72 and then Kafka wrote 73..99. Once the Kafka completed writing 99, the producer may get the 66 , 67 as onCompletion (since it is an async callback, it can come anytime) call back and retry that. This essentially makes the record ordering jumbled up.
In those cases, the consumer needs to understand that all the writes may not be ordered.
My suggestion would be to use onCompletion for a batch of records. Generally, applications don't have very strict ordering requirements. So you could leverage the async nature of the call and improve throughput.

onCompletion() is an asynchronous callback method defined in the Java Kafka client.
on the other hand, get() is an inbuilt Java function. When you're using the Java Kafka client, you can use get() with a future for synchronous writes, as in the example from the Confluent documentation below:
Future<RecordMetadata> future = producer.send(record);
RecordMetadata metadata = future.get();

Get partition assigned by Kafka

I’m using Quarkus Kafka consumer. And I need to know to which partitions my consumer has been assigned by Kafka broker.
Any listener that I can use, just like the one that Kafka client provide.
Otherwise how can I assign a specific partition in each of the nodes of my cluster?
Regards

From the Quarkus docs, I think you can use rebalance listener.
It should be called, because the initial assignment of partitions to your client (from no partitions to some partitions) can be considered as rebalance too.
https://quarkus.io/guides/kafka#consumer-rebalance-listener
The listener is invoked every time the consumer topic/partition assignment changes. For example, when the application starts, it invokes the partitionsAssigned callback with the initial set of topics/partitions associated with the consumer. If, later, this set changes, it calls the partitionsRevoked and partitionsAssigned callbacks again, so you can implement custom logic.

Spring Kafka acknowledgment settings

I have a spring boot application where we are using spring Kafka in consumer section i have made enable.auto.commit to false and set my listener ack-mode to manual_immediate
I have concurrent consumers so after consuming the record I call acknowledgment.acknowledge() but here i still face the duplicate issue problem whenever rebalance happens other consumer start consuming the same message which is already consumed by one consumer. Any idea what magic is happening behind the scene.
Anyone know when using manual_immediate does it commit message by commitSync or commitAsync ? is there way we can change the behaviour to avoid duplicates record message reading. Is there a way we can use hybrid model in Spring Kafka
In spring boot Kafka is there a way we can see whenever a rebalance happen i can log it.
How to create rebalance if we want to do it for some testing purpose?

As long as you call acknowledge on the listener thread, it will use commitSync() by default; use the syncCommits container property to use async commits.
If you call it on a different thread, the commit is queued to be processed by the consumer thread as soon as possible.
Duplicates cannot be avoided if a forced rebalance takes place because your listener took too long to process the records received by the poll().
You can increase max.poll.interval.ms and/or reduce max.poll.records to ensure you can process the records in time.
You can add a ConsumerRebalanceListener to the container properties to log rebalances.
Reduce max.poll.interval.ms to a small value to reproduce in a test.

Firstly, irrespective of ack-mode it is never guaranteed that a message is consumed just once. For instance, A rebalance can happen between the time a message is consumed and the time offset is committed, resulting in Kafka delivering the message again to the newly assigned consumer. It is the applications responsibility to be idempotent to duplicated messages.
In order to listen for rebalance events an implementation of ConsumerRebalanceListener is needed. You can plug this implementation to Spring's auto configured ConcurrentKafkaListenerContainerFactory instance. A more detailed description of how this can be done has already been answered here.
If you wish to create a forced rebalance for testing you can do so by killing one of hopefully more than 1 existing consumers. If using spring-kafka you can do this by using an #Autowired instance of KafkaListenerEndpointRegistry and kill/rest/(re)start any consumer. Something on this should do:
#Autowired
KafkaListenerEndpointRegistry registry;
public void myTest() {
Collection<MessageListenerContainer> containers = registry.getAllListenerContainers()
containers.get(0).stop()
}

How to handle offset commit failures with enable.auto.commit disabled in Spark Streaming with Kafka?

I use Spark 2.0.0 with Kafka 0.10.2.
I have an application that is processing messages from Kafka and is a long running job.
From time to time I see the following message in the logs. Which I understand how I can increase the timeout and everything but what I wanted to know was given that I do have this error how can I recover from it ?
ERROR ConsumerCoordinator: Offset commit failed.
org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing.
You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
This is not on how I escape this error but how to handle it once it occurs
Background: In normal situations I will not see commit errors, but if I do get one I should be able to recover from it. I am using AT_LEAST_ONCE setup, So I am completely happy with reprocessing a few messages.
I am running Java and using DirectKakfaStreams with Manual commits.
Creating the stream:
JavaInputDStream<ConsumerRecord<String, String>> directKafkaStream =
KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
Commiting the offsets
((CanCommitOffsets) directKafkaStream.inputDStream()).commitAsync(offsetRanges);

My understanding of the situation is that you use the Kafka Direct Stream integration (using spark-streaming-kafka-0-10_2.11 module as described in Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)).
As said in the error message:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member.
Kafka manages what topic partition a consumer consumes so the Direct Stream will create a pool of consumers (inside a single consumer group).
As with any consumer group you should expect rebalancing which (quoting Chapter 4. "Kafka Consumers - Reading Data from Kafka" from Kafka: The Definitive Guide):
consumers in a consumer group share ownership of the partitions in the topics they subscribe to. When we add a new consumer to the group it starts consuming messages from partitions which were previously consumed by another consumer. The same thing happens when a consumer shuts down or crashes, it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers. Reassignment of partitions to consumers also happen when the topics the consumer group is consuming are modified, for example if an administrator adds new partitions.
There are quite a few cases when rebalancing can occur and should be expected. And you do.
You asked:
how can I recover from it? This is not on how I escape this error but how to handle it once it occurs?
My answer would be to use the other method of CanCommitOffsets:
def commitAsync(offsetRanges: Array[OffsetRange], callback: OffsetCommitCallback): Unit
that gives you access to Kafka's OffsetCommitCallback:
OffsetCommitCallback is a callback interface that the user can implement to trigger custom actions when a commit request completes. The callback may be executed in any thread calling poll().
I think onComplete gives you a handle on how the async commit has finished and act accordingly.
Something I can't help you with much is how to revert the changes in a Spark Streaming application when some offsets could not have been committed. That I think requires tracking offsets and accept a case where some offsets can't be committed and be re-processed.

Consume all of a Kafka topic and then immediately disconnect?

I'm mostly using Kafka for traditional messaging but I'd also like the ability to consume small topics in a batch fashion, i.e. connect to a topic, consume all the messages and immediately disconnect (not block waiting for new messages). All my topics have a single partition (though they are replicated across a cluster) and I'd like to use the high-level consumer if possible. It's not clear from the docs how I could accomplish such a thing in Scala (or Java). Any advice gratefully received.

The consumer.timeout.ms setting will throw a timeout exception after the specified time if no message is consumed before and this is the only option you have with the high level consumer afaik. Using this you could set it to something like 1 second and disconnect after that if it's an acceptable solution.
If not, you'd have to use the simple consumer and check message offsets.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.