storm topology: one to many (random) - java

I'm using the KafkaSpout spout to read from all (6) partitions on a kafka topic. The first bolt in the topology has to convert the byte stream into a struct (via IDL definition), lookup a value in a db and pass these values to a second bolt which writes it all into cassandra.
There are several issues occurring:
Many fail(s) from the kafka spout.
The first bolt reports "capacity" of > 2.0 from the storm ui.
I've tried to increase the parallelism but it appears that storm will only accept 1:1 from the kafkaspout to the first bolt. I'm guessing that #1 is a result of timeouts from the first bolt.
What I want to do: have the kafkaspouts (limited to 1 / kafka partition) able to send their bits to a random first bolt so that I can run many more of these than the # of spouts. The first and second bolts would be 1:1 but the spout to first bolt should be 1:many.
Currently I'm using the LocalOrShuffleGrouping to connect between spout->bolt->bolt.
Edit:
(Re)reading the storms docs I see this passage:
Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
Yet when I look at the load on the executors for my first bolt I see everything concentrated on 6 of them - seemingly ignoring the other 24.
I'm missing some large clue here.

Related

Kafka RoundRobin partitioner not distributing messages to 4 partitions

I have 4 brokers and 4 partitions but when I try to push 4 messages with null key it’s not saving as round robin.I was expecting message will save it in each partition once.
Using kafka-clients 5.5* for getting KafkaProducer and it looks like version after 5.4* ,
If a key is not provided, behavior is Confluent Platform version-dependent:
In Confluent Platform versions 5.4.x and later, the partition is assigned with awareness to batching. If a batch of records is not full and has not yet been sent to the broker, it will select the same partition as a prior record. Partitions for newly created batches are assigned randomly. For more information, see KIP-480: Sticky Partitioner and the related Confluent blog post.
In Confluent Platform versions prior to 5.4.x, the partition is assigned in a round robin method, starting at a random partition.
https://docs.confluent.io/platform/current/clients/producer.html
Is my understanding correct or not ?
A new partitioner (StickyPartitioner) was introduced in Kafka version 2.4 for improving the way in which data is sent to the partitions by the producer.
Basically, now it batches the data first and then does a round robin instead of doing round robin for each and every record.
For more details, you cam refer to the below link. It explains everything is detail.
https://www.confluent.io/blog/5-things-every-kafka-developer-should-know/#tip-2-new-sticky-partitioner

Refactoring a Spring Batch Job to use Apache Kafka (Decoupling readers and writers)

I currently have a Spring Batch Job with one single step that reads data from Oracle , passes the data through multiple Spring Batch Processors (CompositeItemProcessor) and writes the data to different destinations such as Oracle and files (CompositeItemWriter) :
<batch:step id="dataTransformationJob">
<batch:tasklet transaction-manager="transactionManager" task-executor="taskExecutor" throttle-limit="30">
<batch:chunk reader="dataReader" processor="compositeDataProcessor" writer="compositeItemWriter" commit-interval="100"></batch:chunk>
</batch:tasklet>
</batch:step>
In the above step, the compositeItemWriter is configured with 2 writers that run one after another and write 100 million records to Oracle as well as a file. Also, the dataReader has a synchronized read method to ensure that multiple threads don't read the same data from Oracle. This job takes 1 hour 30 mins to complete as of today.
I am planning to break down the above job into two parts such that the reader/processors produce data on 2 Kafka topics (one for data to be written to Oracle and the other for data to be written to a file). On the other side of the equation, I will have a job with two parallel flows that read data from each topic and write the data to Oracle and file respectively.
With the above architecture in mind, I wanted to understand how I can refactor a Spring Batch Job to use Kafka. I believe the following areas is what I would need to address :
In the existing job that doesn't use Kafka, my throttle limit is 30; however, when I use Kafka in the middle, how does one decide the right throttle-limit?
In the existing job I have a commit-interval of 100. This means that the CompositeItemWriter will be called for every 100 records and each writer will unpack the chunk and call the write method on it. Does this mean that when I write to Kafka, there will be 100 publish calls to Kafka?
Is there a way to club multiple rows into one single message in Kafka to avoid multiple network calls?
On the consumer side, I want to have a Spring batch multi-threaded step that is able to read each partition for a topic in parallel. Does Spring Batch have inbuilt classes to support this already?
The consumer will use standard JdbcBatchITemWriter or FlatFileItemWriter to write the data that was read from Kafka so I believe this should be standard Spring Batch in Action.
Note : I am aware of Kafka Connect but don't want to use it because it requires setting up a Connect cluster and I don't have the infrastructure available to support the same.
Answers to your questions:
No throttling is needed in your kafka producer, data should be available in kafka for consumption asap. Your consumers could be throttled (if needed) as per the implementation.
Kafka Producer is configurable. 100 messages do not necessarily mean 100 network calls. You could write 100 messages to kafka producer (which may or may not buffer it as per the config) and flush the buffer to force network call. This would lead to (almost) the same existing behaviour.
Multiple rows can be clubbed in a single message as the payload of kafka message is entirely upto you. But your reasoning multiple rows into one single message in Kafka to avoid multiple network calls? is invalid since multiple messages (rows) can be produced/consumed in a single network call. For your first draft, I would suggest to keep it simple by having a single row correspond to a single message.
Not as far as I know. (but I could be wrong on this one)
Yes I believe they should work just fine.

Understanding kafka streams partition assignor

I have two topics, one with 3 partitions and one with 48.
Initially i used the default assignor but i got some problems when a consumer(pod in kubernetes) crashed.
What happened was that when the pod came up again it reassigned the partition from the topic with 3 partitions and 0 from the topic with 48.
The two pods that did not crash got assigned 16 and 32 partitions from the topic with 48 partitions.
I've fixed this by using a round robin partition assignor but now i don't feel confident in how the partitions is distributed since i'm using kstream-kstream joins and for that we need to guarantee that the consumers are assigned to the same partition for all the consumer e.g. C1:(t1:p0, t2:p0) C2(t1:p1, t2:p1) etc..
One thing i thought of was that i could rekey the events coming in so they will repartition and then i might be able to guarantee this?
Or maybe i don't understand how the default partitioning work.. im confused
Kafka Streams does not allow to use a custom partition assignor. If you set one yourself, it will be overwritten with the StreamsPartitionAssignor [1]. This is needed to ensure that -- if possible -- partitions are re-assigned to the same consumers (a.k.a. stickiness) during rebalancing. Stickiness is important for Kafka Streams to be able to reuse states stores at consumer side as much as possible. If a partition is not reassigned to the same consumer, state stores used within this consumer need to be recreated from scratch after rebalancing.
[1] https://github.com/apache/kafka/blob/9bd0d6aa93b901be97adb53f290b262c7cf1f175/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L989

How does Kafka Consumer Consume from Multiple assigned Partition

tl;dr; I am trying to understand how a single consumer that is assigned multiple partitions handles consuming records for reach partition.
For example:
Completely processes a single partition before moving to the next.
Process a chunk of available records from each partition every time.
Process a batch of N records from first available partitions
Process a batch of N records from partitions in round-robin rotation
I found the partition.assignment.strategy configuration for Ranged or RoundRobin Assignors but this only determines how consumers are assigned partitions not how it consumes from the partitions it is assigned to.
I started digging into the KafkaConsumer source and
#poll() lead me to the #pollForFetches()
#pollForFetches() then lead me to fetcher#fetchedRecords() and fetcher#sendFetches()
This just lead me to try to follow along the entire Fetcher class all together and maybe it is just late or maybe I just didn't dig in far enought but I am having trouble untangling exactly how a consumer will process multiple assigned partitions.
Background
Working on a data pipeline backed by Kafka Streams.
At several stages in this pipeline as records are processed by different Kafka Streams applications the stream is joined to compacted topics feed by external data sources that provide the required data that will be augmented in the records before continuing to the next stage in processing.
Along the way there are several dead letter topics where the records could not be matched to external data sources that would have augmented the record. This could be because the data is just not available yet (Event or Campaign is not Live yet) or it it is bad data and will never match.
The goal is to republish records from the dead letter topic when ever new augmented data is published so that we can match previously unmatched records from the dead letter topic in order to update them and send them down stream for additional processing.
Records have potentially failed to match on several attempts and could have multiple copies in the dead letter topic so we only want to reprocess existing records (before latest offset at the time the application starts) as well as records that were sent to the dead letter topic since the last time the application ran (after the previously saved consumer group offsets).
It works well as my consumer filters out any records arriving after the application has started, and my producer is managing my consumer group offsets by committing the offsets as part of the publishing transaction.
But I want to make sure that I will eventually consume from all partitions as I have ran into an odd edge case where unmatached records get reprocessed and land in the same partition as before in the dead letter topic only to get filtered out by the consumer. And though it is not getting new batches of records to process there are partitions that have not been reprocessed yet either.
Any help understanding how a single consumer processes multiple assigned partitions would be greatly appreciated.
You were on the right tracks looking at Fetcher as most of the logic is there.
First as the Consumer Javadoc mentions:
If a consumer is assigned multiple partitions to fetch data from, it
will try to consume from all of them at the same time, effectively
giving these partitions the same priority for consumption.
As you can imagine, in practice, there are a few things to take into account.
Each time the consumer is trying to fetch new records, it will exclude partitions for which it already has records awaiting (from a previous fetch). Partitions that already have a fetch request in-flight are also excluded.
When fetching records, the consumer specifies fetch.max.bytes and max.partition.fetch.bytes in the fetch request. These are used by the brokers to respectively determine how much data to return in total and per partition. This is equally applied to all partitions.
Using these 2 approaches, by default, the Consumer tries to consume from all partitions fairly. If that's not the case, changing fetch.max.bytes or max.partition.fetch.bytes usually helps.
In case, you want to prioritize some partitions over others, you need to use pause() and resume() to manually control the consumption flow.

In Storm is there a way to count the number of tuples that failed due to timeout?

I'm trying to develop some reporting around our Storm topologies and one metric we would like to report on is the number of tuples that fail due to timing out.
From what I understand Storm will automatically fail a tuple when it fails to complete before the timeout length, but this seems to happen "behind the scenes" and I don't see a way to distinguish between timeout failures vs other types of failures.
Is there any way to expose or capture this information?
If you consider Storm's WebUI, for each bolt there is a count of failed tuples. Those counts are only the manually failed tuples (ie, failed via OutputCollector.fail(...)) of the bolt and do not include tuples that ran into a timeout. The spout has an overall counter of failed tuples. Thus, you can simply sum up the number of manually failed tuples over all bolts and subtract if from the global spout count to get the number of tuples that timed-out.

Categories

Resources