I have two topics, one with 3 partitions and one with 48.
Initially i used the default assignor but i got some problems when a consumer(pod in kubernetes) crashed.
What happened was that when the pod came up again it reassigned the partition from the topic with 3 partitions and 0 from the topic with 48.
The two pods that did not crash got assigned 16 and 32 partitions from the topic with 48 partitions.
I've fixed this by using a round robin partition assignor but now i don't feel confident in how the partitions is distributed since i'm using kstream-kstream joins and for that we need to guarantee that the consumers are assigned to the same partition for all the consumer e.g. C1:(t1:p0, t2:p0) C2(t1:p1, t2:p1) etc..
One thing i thought of was that i could rekey the events coming in so they will repartition and then i might be able to guarantee this?
Or maybe i don't understand how the default partitioning work.. im confused
Kafka Streams does not allow to use a custom partition assignor. If you set one yourself, it will be overwritten with the StreamsPartitionAssignor [1]. This is needed to ensure that -- if possible -- partitions are re-assigned to the same consumers (a.k.a. stickiness) during rebalancing. Stickiness is important for Kafka Streams to be able to reuse states stores at consumer side as much as possible. If a partition is not reassigned to the same consumer, state stores used within this consumer need to be recreated from scratch after rebalancing.
[1] https://github.com/apache/kafka/blob/9bd0d6aa93b901be97adb53f290b262c7cf1f175/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L989
Related
I have 4 brokers and 4 partitions but when I try to push 4 messages with null key it’s not saving as round robin.I was expecting message will save it in each partition once.
Using kafka-clients 5.5* for getting KafkaProducer and it looks like version after 5.4* ,
If a key is not provided, behavior is Confluent Platform version-dependent:
In Confluent Platform versions 5.4.x and later, the partition is assigned with awareness to batching. If a batch of records is not full and has not yet been sent to the broker, it will select the same partition as a prior record. Partitions for newly created batches are assigned randomly. For more information, see KIP-480: Sticky Partitioner and the related Confluent blog post.
In Confluent Platform versions prior to 5.4.x, the partition is assigned in a round robin method, starting at a random partition.
https://docs.confluent.io/platform/current/clients/producer.html
Is my understanding correct or not ?
A new partitioner (StickyPartitioner) was introduced in Kafka version 2.4 for improving the way in which data is sent to the partitions by the producer.
Basically, now it batches the data first and then does a round robin instead of doing round robin for each and every record.
For more details, you cam refer to the below link. It explains everything is detail.
https://www.confluent.io/blog/5-things-every-kafka-developer-should-know/#tip-2-new-sticky-partitioner
First off all, sorry if my terminology is not precise I am very new to kafka and i have read as most as i could.
We have a service which uses kafkastreams, kafka version: 2.3.1.
The stream app has a stream topology which reads from a "topicA", performs a convertion and publishes into another topic "topicB" which then is consumed by another stream of the topology and aggregates it using a Ktable ( localstore ). A listener publishes the ktable changes into another topic.
The topics has 24 partitions.
We have 2 instances of this service in different machines with 4 stream threads each.
The problem is, the partitions that use local store are assigned all to the same instance.
Hence the disk usage, rebalance, performance is awful.
Also something unexpected to me, if I check the group assignments on the Kafka broker i see:
(Removed other partitions for readability )
GROUP CONSUMER-ID HOST CLIENT-ID #PARTITIONS ASSIGNMENT
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-3-consumer-c089baaa-343b-484f-add6-aca12572e2a5 10.11.200.115/10.11.200.115 fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-3-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(4,8,12,16,20)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-2-consumer-f5e2d4e3-feee-4778-8ab8-ec4dd770541a 10.11.200.115/10.11.200.115 fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-2-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(5,9,13,17,21)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-4-consumer-63371f35-118a-44e0-bc9b-d403fb59384d 10.11.200.114/10.11.200.114 fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-4-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(2)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-1-consumer-714f0fee-b001-4b16-8b5b-6ab8935becfd 10.11.200.114/10.11.200.114 fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-1-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(0)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-2-consumer-d14e2e20-9aad-4a20-a295-83621a76b099 10.11.200.114/10.11.200.114 fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-2-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(1)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-4-consumer-14f390d9-f4f4-4e70-8e8d-62a79427c4e6 10.11.200.115/10.11.200.115 fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-4-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(7,11,15,19,23)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-1-consumer-57d2f85b-50f8-4649-8080-bbaaa6ea500f 10.11.200.115/10.11.200.115 fj.TheAggregation.TST.V1.PERF-6898e899-7722-421a-8841-f8e45b074981-StreamThread-1-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(6,10,14,18,22)
fj.TheAggregation.TST.V1.PERF fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-3-consumer-184f3a99-1159-44d7-84c6-e7aa70c484c0 10.11.200.114/10.11.200.114 fj.TheAggregation.TST.V1.PERF-0733344b-bd8d-40d6-ad07-4fc93de76cf2-StreamThread-3-consumer 54 fj.TheAggregationDocument.TST.V1.PERF(3)
so each stream service has 54 partitions assigned in total, however they are not evenly assigned. Also if i check the local store on each instance i see that the stream ktable are all in the same node, even though the broker states that some of the partition's are assigned to another instance. So the data provided by the broker does not seem to match the streamapp state.
Is there a way to ensure that GroupLeader assigns partitions evenly?
I would expect to have some way to specify that or assign some kind of "weight" to each stream so the GroupLeader is able to distribute resources intensive streams evenly among the service instances or at least not so unbalanced.
Btw, is there some kafka users group recommended to ask this kind of things?
Thanks
There was a lot of improvements to the streams assignor in 2.6 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams) you can read about them here.
I don't know if they will fix your problem but it should help. It does treat stateful task like ktables differently and should load them better.
If you cannot upgrade from 2.3.1 you might try different names. You might just be getting unlucky hashes.
tl;dr; I am trying to understand how a single consumer that is assigned multiple partitions handles consuming records for reach partition.
For example:
Completely processes a single partition before moving to the next.
Process a chunk of available records from each partition every time.
Process a batch of N records from first available partitions
Process a batch of N records from partitions in round-robin rotation
I found the partition.assignment.strategy configuration for Ranged or RoundRobin Assignors but this only determines how consumers are assigned partitions not how it consumes from the partitions it is assigned to.
I started digging into the KafkaConsumer source and
#poll() lead me to the #pollForFetches()
#pollForFetches() then lead me to fetcher#fetchedRecords() and fetcher#sendFetches()
This just lead me to try to follow along the entire Fetcher class all together and maybe it is just late or maybe I just didn't dig in far enought but I am having trouble untangling exactly how a consumer will process multiple assigned partitions.
Background
Working on a data pipeline backed by Kafka Streams.
At several stages in this pipeline as records are processed by different Kafka Streams applications the stream is joined to compacted topics feed by external data sources that provide the required data that will be augmented in the records before continuing to the next stage in processing.
Along the way there are several dead letter topics where the records could not be matched to external data sources that would have augmented the record. This could be because the data is just not available yet (Event or Campaign is not Live yet) or it it is bad data and will never match.
The goal is to republish records from the dead letter topic when ever new augmented data is published so that we can match previously unmatched records from the dead letter topic in order to update them and send them down stream for additional processing.
Records have potentially failed to match on several attempts and could have multiple copies in the dead letter topic so we only want to reprocess existing records (before latest offset at the time the application starts) as well as records that were sent to the dead letter topic since the last time the application ran (after the previously saved consumer group offsets).
It works well as my consumer filters out any records arriving after the application has started, and my producer is managing my consumer group offsets by committing the offsets as part of the publishing transaction.
But I want to make sure that I will eventually consume from all partitions as I have ran into an odd edge case where unmatached records get reprocessed and land in the same partition as before in the dead letter topic only to get filtered out by the consumer. And though it is not getting new batches of records to process there are partitions that have not been reprocessed yet either.
Any help understanding how a single consumer processes multiple assigned partitions would be greatly appreciated.
You were on the right tracks looking at Fetcher as most of the logic is there.
First as the Consumer Javadoc mentions:
If a consumer is assigned multiple partitions to fetch data from, it
will try to consume from all of them at the same time, effectively
giving these partitions the same priority for consumption.
As you can imagine, in practice, there are a few things to take into account.
Each time the consumer is trying to fetch new records, it will exclude partitions for which it already has records awaiting (from a previous fetch). Partitions that already have a fetch request in-flight are also excluded.
When fetching records, the consumer specifies fetch.max.bytes and max.partition.fetch.bytes in the fetch request. These are used by the brokers to respectively determine how much data to return in total and per partition. This is equally applied to all partitions.
Using these 2 approaches, by default, the Consumer tries to consume from all partitions fairly. If that's not the case, changing fetch.max.bytes or max.partition.fetch.bytes usually helps.
In case, you want to prioritize some partitions over others, you need to use pause() and resume() to manually control the consumption flow.
This can be a stupid question but I am curious if it is possible to disable Consumer Rebalancing in Spring Kafka. Imagine I have a Topic with 3 partitions and 3 different consumers running using #KafkaListener on different TopicPartitions. Is it possible to not let rebalancing happen if one of the consumers is down? (I want to do manual offset management and when the consumer is up I want to start from where I left).
You can have three different consumers on each partation with auto offset is false, and you can manually submit the offset. So whenever consumer stops and start it will read from previous offset, and also assigning consumer to specific partition will not rebalance among another partitions
I am testing how Kafka works on multi-consumers with high level Java APIS.
Created 1 topic with 5 partitions, 1 producer, and 2 consumer(C1, C2). Each consumer will have only one thread, and partition.assignment.strategy set to range.
C1 start, it claim all the partition. Then C2 start, ZK will trigger a rebalance. After that, C1 will claim (0, 1, 2), C2 will claim (3, 4). It works well util now.
Then I check the messages received by C1, I hope that messages will just from partitions (0, 1, 2). But in my log file, I can find message from all the partitions, and that happened also in C2. It just like that partition.assignment.strategy set to roundrobin. Is this how Kafka dispatch message. Or that must be some mistake?
First of all just to correct your approach, Its always better to have same number of consumers as many partition you have for a topic. In this way each Consumer will claim only one partition and will stick to that only and you will get exactly data from that partition and also in ordered way not from others.
Now to answer your question why you are getting data from almost all the partitions in both the Consumer because you have less consumers as compare to partitions in this case each Consumer thread will try to access partition.
There is also a theory that if you have greater number of Consumers as compared to number of partitions per topic then there is a possibility that some of the Consumer will never gets any data.