I have 4 brokers and 4 partitions but when I try to push 4 messages with null key it’s not saving as round robin.I was expecting message will save it in each partition once.
Using kafka-clients 5.5* for getting KafkaProducer and it looks like version after 5.4* ,
If a key is not provided, behavior is Confluent Platform version-dependent:
In Confluent Platform versions 5.4.x and later, the partition is assigned with awareness to batching. If a batch of records is not full and has not yet been sent to the broker, it will select the same partition as a prior record. Partitions for newly created batches are assigned randomly. For more information, see KIP-480: Sticky Partitioner and the related Confluent blog post.
In Confluent Platform versions prior to 5.4.x, the partition is assigned in a round robin method, starting at a random partition.
https://docs.confluent.io/platform/current/clients/producer.html
Is my understanding correct or not ?
A new partitioner (StickyPartitioner) was introduced in Kafka version 2.4 for improving the way in which data is sent to the partitions by the producer.
Basically, now it batches the data first and then does a round robin instead of doing round robin for each and every record.
For more details, you cam refer to the below link. It explains everything is detail.
https://www.confluent.io/blog/5-things-every-kafka-developer-should-know/#tip-2-new-sticky-partitioner
Related
Is there a more efficient/simpler way of getting the size / latest offsets of a topic/partitions using the newest Kafka client 2.4 APIs in Java?
And then, calculate a Lag for a consumer group by comparing that group's offsets with the size of the topic...
I know this question has been asked for older Kafka versions and there is also a way to get this info from JMX metrics exposed by Kafka, but I am stuck with a legacy app that needs to do it in Java but with latest 2.4 Kafka libs.
The usual way of getting this info , as far as I understand is:
The easiest part: get offsets for a topic/partitions for a consumer groupID using an API call on KafkaAdminClient like
public ListConsumerGroupOffsetsResult listConsumerGroupOffsets(String groupId, ListConsumerGroupOffsetsOptions options)
The hardest part: Determine the size of the topic for each partition:
create a new consumer and subscribe to the topic
advance the consumer to the latest offset using consumer.seekToEnd(...)
get the position of the consumer for all partitions using consumer.position(...)
finally, do [size - current offset] to determine the lag of the consumer group for each partition
Thus, determining the last offset is a pretty heavy operation ...
So my question is: is there a more efficient way of getting the last offsets for a topic without using the dummy consumer, maybe in the latest 2.4 APIs? The topic/partition size info is really independent of any consumers, so it seems logical to be able to get it without the use of consumers...
Thank you!
Marina
externally to the kafka consuming application you are correct, your options are to look at partition end offsets vs the latest checkpointed positions of the consumer group (assuming the consumers in question even use kafka to store offsets).
there are tools that will monitor this for you, such as burrow.
However, if you have access to the consuming application itself there is a more accurate way. here's a list of all consumer sensors (exposed either via API or jmx by default) https://kafka.apache.org/documentation/#consumer_fetch_monitoring.
there is a per-partition records-lag metric. its updated every time poll() is called so is more accurate and lower latency than committed offsets. the only complication is you'd need to sum the values of these sensors across all partitions the consumer is assigned.
here's how to get at it via KafkaConsumer.metrics():
private long calcTotalLag(Map<MetricName, ? extends Metric> metrics) {
long totalLag = 0;
for (Map.Entry<MetricName, ? extends Metric> entry : metrics.entrySet()) {
MetricName metricName = entry.getKey();
Metric metric = entry.getValue();
Map<String, String> tags = metricName.tags();
if (metricName.name().equals("records-lag") && tags.containsKey("partition")) {
totalLag += ((Number) metric.metricValue()).longValue();
}
}
return totalLag;
}
I have two topics, one with 3 partitions and one with 48.
Initially i used the default assignor but i got some problems when a consumer(pod in kubernetes) crashed.
What happened was that when the pod came up again it reassigned the partition from the topic with 3 partitions and 0 from the topic with 48.
The two pods that did not crash got assigned 16 and 32 partitions from the topic with 48 partitions.
I've fixed this by using a round robin partition assignor but now i don't feel confident in how the partitions is distributed since i'm using kstream-kstream joins and for that we need to guarantee that the consumers are assigned to the same partition for all the consumer e.g. C1:(t1:p0, t2:p0) C2(t1:p1, t2:p1) etc..
One thing i thought of was that i could rekey the events coming in so they will repartition and then i might be able to guarantee this?
Or maybe i don't understand how the default partitioning work.. im confused
Kafka Streams does not allow to use a custom partition assignor. If you set one yourself, it will be overwritten with the StreamsPartitionAssignor [1]. This is needed to ensure that -- if possible -- partitions are re-assigned to the same consumers (a.k.a. stickiness) during rebalancing. Stickiness is important for Kafka Streams to be able to reuse states stores at consumer side as much as possible. If a partition is not reassigned to the same consumer, state stores used within this consumer need to be recreated from scratch after rebalancing.
[1] https://github.com/apache/kafka/blob/9bd0d6aa93b901be97adb53f290b262c7cf1f175/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L989
tl;dr; I am trying to understand how a single consumer that is assigned multiple partitions handles consuming records for reach partition.
For example:
Completely processes a single partition before moving to the next.
Process a chunk of available records from each partition every time.
Process a batch of N records from first available partitions
Process a batch of N records from partitions in round-robin rotation
I found the partition.assignment.strategy configuration for Ranged or RoundRobin Assignors but this only determines how consumers are assigned partitions not how it consumes from the partitions it is assigned to.
I started digging into the KafkaConsumer source and
#poll() lead me to the #pollForFetches()
#pollForFetches() then lead me to fetcher#fetchedRecords() and fetcher#sendFetches()
This just lead me to try to follow along the entire Fetcher class all together and maybe it is just late or maybe I just didn't dig in far enought but I am having trouble untangling exactly how a consumer will process multiple assigned partitions.
Background
Working on a data pipeline backed by Kafka Streams.
At several stages in this pipeline as records are processed by different Kafka Streams applications the stream is joined to compacted topics feed by external data sources that provide the required data that will be augmented in the records before continuing to the next stage in processing.
Along the way there are several dead letter topics where the records could not be matched to external data sources that would have augmented the record. This could be because the data is just not available yet (Event or Campaign is not Live yet) or it it is bad data and will never match.
The goal is to republish records from the dead letter topic when ever new augmented data is published so that we can match previously unmatched records from the dead letter topic in order to update them and send them down stream for additional processing.
Records have potentially failed to match on several attempts and could have multiple copies in the dead letter topic so we only want to reprocess existing records (before latest offset at the time the application starts) as well as records that were sent to the dead letter topic since the last time the application ran (after the previously saved consumer group offsets).
It works well as my consumer filters out any records arriving after the application has started, and my producer is managing my consumer group offsets by committing the offsets as part of the publishing transaction.
But I want to make sure that I will eventually consume from all partitions as I have ran into an odd edge case where unmatached records get reprocessed and land in the same partition as before in the dead letter topic only to get filtered out by the consumer. And though it is not getting new batches of records to process there are partitions that have not been reprocessed yet either.
Any help understanding how a single consumer processes multiple assigned partitions would be greatly appreciated.
You were on the right tracks looking at Fetcher as most of the logic is there.
First as the Consumer Javadoc mentions:
If a consumer is assigned multiple partitions to fetch data from, it
will try to consume from all of them at the same time, effectively
giving these partitions the same priority for consumption.
As you can imagine, in practice, there are a few things to take into account.
Each time the consumer is trying to fetch new records, it will exclude partitions for which it already has records awaiting (from a previous fetch). Partitions that already have a fetch request in-flight are also excluded.
When fetching records, the consumer specifies fetch.max.bytes and max.partition.fetch.bytes in the fetch request. These are used by the brokers to respectively determine how much data to return in total and per partition. This is equally applied to all partitions.
Using these 2 approaches, by default, the Consumer tries to consume from all partitions fairly. If that's not the case, changing fetch.max.bytes or max.partition.fetch.bytes usually helps.
In case, you want to prioritize some partitions over others, you need to use pause() and resume() to manually control the consumption flow.
I have two consumers with different client ID's and group ID's. Aside from retention hour and max partitions, my Kafka installation contains default configuration. I've looked around to see if anyone else has had the same issue but can't pull up any results.
So the scenario goes like this:
Consumer A:
Connects to Kafka, consumes about 3 million messages that need to be consumed, and then sits idle waiting for more messages.
Consumer B:
Different client / group ID, connects to the same Kafka topic, and this causes consumer A to get a repeat of the 3 million messages while consumer B consumes them as well.
The two consumers are two completely different Java applications with different client and group ID's running on the same computer. The Kafka server is on another computer.
Is this a normal behavior in Kafka? I am at a complete loss.
Here is my consumer config:
bootstrap.servers=192.168.110.109:9092
acks=all
max.block.ms=2000
retries=0
batch.size=16384
auto.commit.interval.ms=1000
linger.ms=0
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
block.on.buffer.full=true
enable.auto.commit=false
auto.offset.reset=none
session.timeout.ms=30000
zookeeper.session.timeout=100000
rebalance.backoff.ms=8000
group.id=consumerGroupA
zookeeper.connect=192.168.110.109:2181
poll.interval=100
And the obvious difference in my consumer B is the group.id=consumerGroupB
This is a correct behavior. Because based on your configs, your consumers don't commit offset of records that they have read!
When a consumer read a record, it must commit reading it, you can ensure that consumers commit offsets automatically by setting enable.auto.commit=true or commit each record manually. In this case I think auto commit is fine for you.
I'm using the KafkaSpout spout to read from all (6) partitions on a kafka topic. The first bolt in the topology has to convert the byte stream into a struct (via IDL definition), lookup a value in a db and pass these values to a second bolt which writes it all into cassandra.
There are several issues occurring:
Many fail(s) from the kafka spout.
The first bolt reports "capacity" of > 2.0 from the storm ui.
I've tried to increase the parallelism but it appears that storm will only accept 1:1 from the kafkaspout to the first bolt. I'm guessing that #1 is a result of timeouts from the first bolt.
What I want to do: have the kafkaspouts (limited to 1 / kafka partition) able to send their bits to a random first bolt so that I can run many more of these than the # of spouts. The first and second bolts would be 1:1 but the spout to first bolt should be 1:many.
Currently I'm using the LocalOrShuffleGrouping to connect between spout->bolt->bolt.
Edit:
(Re)reading the storms docs I see this passage:
Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
Yet when I look at the load on the executors for my first bolt I see everything concentrated on 6 of them - seemingly ignoring the other 24.
I'm missing some large clue here.