Get partition assigned by Kafka - java

I’m using Quarkus Kafka consumer. And I need to know to which partitions my consumer has been assigned by Kafka broker.
Any listener that I can use, just like the one that Kafka client provide.
Otherwise how can I assign a specific partition in each of the nodes of my cluster?
Regards

From the Quarkus docs, I think you can use rebalance listener.
It should be called, because the initial assignment of partitions to your client (from no partitions to some partitions) can be considered as rebalance too.
https://quarkus.io/guides/kafka#consumer-rebalance-listener
The listener is invoked every time the consumer topic/partition assignment changes. For example, when the application starts, it invokes the partitionsAssigned callback with the initial set of topics/partitions associated with the consumer. If, later, this set changes, it calls the partitionsRevoked and partitionsAssigned callbacks again, so you can implement custom logic.

Related

Spring Kafka acknowledgment settings

I have a spring boot application where we are using spring Kafka in consumer section i have made enable.auto.commit to false and set my listener ack-mode to manual_immediate
I have concurrent consumers so after consuming the record I call acknowledgment.acknowledge() but here i still face the duplicate issue problem whenever rebalance happens other consumer start consuming the same message which is already consumed by one consumer. Any idea what magic is happening behind the scene.
Anyone know when using manual_immediate does it commit message by commitSync or commitAsync ? is there way we can change the behaviour to avoid duplicates record message reading. Is there a way we can use hybrid model in Spring Kafka
In spring boot Kafka is there a way we can see whenever a rebalance happen i can log it.
How to create rebalance if we want to do it for some testing purpose?
As long as you call acknowledge on the listener thread, it will use commitSync() by default; use the syncCommits container property to use async commits.
If you call it on a different thread, the commit is queued to be processed by the consumer thread as soon as possible.
Duplicates cannot be avoided if a forced rebalance takes place because your listener took too long to process the records received by the poll().
You can increase max.poll.interval.ms and/or reduce max.poll.records to ensure you can process the records in time.
You can add a ConsumerRebalanceListener to the container properties to log rebalances.
Reduce max.poll.interval.ms to a small value to reproduce in a test.
Firstly, irrespective of ack-mode it is never guaranteed that a message is consumed just once. For instance, A rebalance can happen between the time a message is consumed and the time offset is committed, resulting in Kafka delivering the message again to the newly assigned consumer. It is the applications responsibility to be idempotent to duplicated messages.
In order to listen for rebalance events an implementation of ConsumerRebalanceListener is needed. You can plug this implementation to Spring's auto configured ConcurrentKafkaListenerContainerFactory instance. A more detailed description of how this can be done has already been answered here.
If you wish to create a forced rebalance for testing you can do so by killing one of hopefully more than 1 existing consumers. If using spring-kafka you can do this by using an #Autowired instance of KafkaListenerEndpointRegistry and kill/rest/(re)start any consumer. Something on this should do:
#Autowired
KafkaListenerEndpointRegistry registry;
public void myTest() {
Collection<MessageListenerContainer> containers = registry.getAllListenerContainers()
containers.get(0).stop()
}

what is the use of the property spring.cloud.stream.bindings.<channelName>.consumer.partitioned

What will happen If partitionCount (spring.cloud.stream.bindings..producer.partitionCount) is greater than 1 and consumer.partitioned (spring.cloud.stream.bindings..consumer.partitioned) is false (Using Kafka)
In the case of Kafa binder, the property spring.cloud.stream.bindings..consumer.partitioned is not relevant. You can skip setting this property on the consumer side. This property's default value is false. Since Kafka has built-in partitioning support, the binder will simply delegate to Kafka broker and Kafka decides on which partitions to consume from. If you only have one consumer, then it will consume from all the partitions. If you have more than one consumer, Kafka will do a rebalance and split the partitions across the available consumers (assuming that the autoRebalanceEnabled property remains true to it's default value).
You can set spring.cloud.stream.bindings..consumer.partitioned to true, if you want to set instance index id on the consumers (for e.g if you want to run the app on certain platforms or achieve static partitioning). In this case, you need to provde the instance index or list of instance index to the consumer. However, I think this is irrelvant for your use case.
The upshot here is that you can safely ignore, spring.cloud.stream.bindings..consumer.partitioned on the consumer side if you are using Kafka binder and auto rebalance is enabled.
We have some basic partitioning samples here that you may want to take a look at.

KafkaConsumer Java API subscribe() vs assign()

I am new with Kafka Java API and I am working on consuming records from a particular Kafka topic.
I understand that I can use method subscribe() to start polling records from the topic. Kafka also provides method assign() if I want to start polling records from selected partitions of the topics.
I want to understand if this is the only difference between the two?
Yes subscribe need group.id because each consumer in a group will dynamically assigned to partitions for list of topics provided in subscribe method and each partition can be consumed by one consumer thread in that group. This is achieved by balancing the partitions between all members in the consumer group so that each partition is assigned to exactly one consumer in the group
assign will manually assign a list of partitions to this consumer. and this method does not use the consumer's group management functionality (where no need of group.id)
The main difference is assign(Collection) will loose the controller over dynamic partition assignment and consumer group coordination
It is also possible for the consumer to manually assign specific partitions (similar to the older "simple" consumer) using assign(Collection). In this case, dynamic partition assignment and consumer group coordination will be disabled.
subscribe
public void subscribe(java.util.Collection<java.lang.String> topics)
The subscribe method Subscribe to the given list of topics to get dynamically assigned partitions. and if the given list of topics is empty, it is treated the same as unsubscribe().
As part of group management, the consumer will keep track of the list of consumers that belong to a particular group and will trigger a rebalance operation if one of the following events trigger -
Number of partitions change for any of the subscribed list of topics
Topic is created or deleted
An existing member of the consumer group dies
A new member is added to an existing consumer group via the join API
assign
public void assign(java.util.Collection<TopicPartition> partitions)
The assign method manually assign a list of partitions to this consumer. And if the given list of topic partitions is empty, it is treated the same as unsubscribe().
Manual topic assignment through this method does not use the consumer's group management functionality. As such, there will be no rebalance operation triggered when group membership or cluster and topic metadata change.
I'd like to add some useful information specifically to a consumer without a group.id. There is no default to this property (given no framework shenanigans - KafkaClient lib + Java). It's not official, but they're typically called a free consumer. a free consumer doesn't subscribe to topics, so it's required to assign topic partitions.
As noted above, the concepts of automatic partition assignment, rebalancing, offset persistence, partition exclusivity, consumer heartbeating and failure detection / liveness (all the things that are gifted with a consumer group) are thrown out the window with these free consumers. As such, it's up to the client (you) to keep track of any state the app has in relation to kafka, and that includes keeping track of offsets (a Map, for instance). This is because a free consumer doesn't commit their offsets to Kafka, and usually your own storage mechanism is used.

Kafka consumer returns empty on second call

I insert a record to Kafka producer and then call consumer which returns the inserted element (and previously inserted elements) then I call the customer again (without inserting new record with producer) the consumer does not return any records.
As far as I know the record should remain in topic. I have no idea how to set acknowledge to false in properties. Is this issue related to acknowledgment?
The consumer will only get "new" messages. If you have previously read until the end of the topic and there is nothing new, you won't get anything.
If you want to read from the beginning of the topic again, you have to "rewind" the consumer (or create a new one).
I have no idea how to set acknowledge to false in properties(is this issue related to acknowledgment?)
If you use consumer groups, the offsets for the consumer (how far it is read each topic partition) will be stored within Kafka (or Zookeeper for older versions). You can control this by acknowledging (or not) the receipt of messages. However, this only has an effect when the consumer is restarted, not for an already running instance.
If you don't use consumer groups, this offset tracking is purely done within the consumer instance itself.

Detect the rebalancing of cluster via code

Is there a way we can detect via code that the hazelcast cluster is rebalancing
Use case: Multiple producer, multiple consumer
Shutdown consumer1 abruptly, and see how long it takes for the other members to resume producing or consuming. In other words i am trying to see the time it takes for the cluster to rebalance.
AFAIK there is no direct API for it.
You could write your own SPI service which receives repartitioning events.
But can you tell a bit more about your producer consumer problem? Do you want to use an IQueue for that?

Categories

Resources