Seeking to a Kafka Offset with Spring Cloud Stream - java

I have an event-sourced service that listens to a Kafka topic and saves state in a relational DB.
Considering a suitable restoration strategy for this service (i.e. how to restore the DB in a disaster recovery scenario), one option would be to save the current offset in the DB, take snapshots, and restore from a snapshot. In this scenario the service would need to seek to the offset when started in 'restoration mode'.
I am using Spring Cloud Stream, and was wondering if the framework provides any mechanism for seeking to an offset?
I realise another option for restoration would be to simply play all the events from scratch, but that's not an ideal option for some of my microservices.

If you're talking disaster, what makes you think you can write anything to DB?
In other words you may end up dealing with de-duplication on at least one event (at least you have to account for that) and if so, then de-duplication is still something you have to deal with.
I understand your concern with re-play (you simply don't want to reply from the beginning, but you can store periodic snapshots which would ensure you have a relatively fixed amount of events hat may need to be reprocessed/de-dupped.
That said, Kafka maintains the current offset, so you can rely on natural transaction features of Kafka to ensure that the next time you start your microservice it will begin from the last un-processed (successfully) offset.

There is KafkaBindingRebalanceListener interface that you can use
#Slf4j
#Component
public class KafkaRebalanceListener implements KafkaBindingRebalanceListener {
#Value("${config.kafka.topics.offsets:null}")
private String topicOffsets;
#Override
public void onPartitionsAssigned(String bindingName, Consumer<?, ?> consumer, Collection<TopicPartition> partitions, boolean initial) {
if (topicOffsets != null && initial) {
final Optional<Map<TopicPartition, Long>> offsetsOptional = parseOffset(topicOffsets);
if (offsetsOptional.isPresent()) {
final Map<TopicPartition, Long> offsetsMap = offsetsOptional.get();
partitions.forEach(tp -> {
if (offsetsMap.containsKey(tp)) {
final Long offset = offsetsMap.get(tp);
try {
log.info("Seek topic {} partition {} to offset {}", tp.topic(), tp.partition(), offset);
consumer.seek(tp, offset);
} catch (Exception e) {
log.error("Unable to set offset {} for topic {} and partition {}", offset, tp.topic(), tp.partition());
}
}
});
}
}
}
private Optional<Map<TopicPartition, Long>> parseOffset(String offsetParam) {
if (offsetParam == null || offsetParam.isEmpty()) {
return Optional.empty();
}
return Optional.of(Arrays.stream(offsetParam.split(","))
.flatMap(slice -> {
String[] items = slice.split("\\|");
String topic = items[0];
return Arrays.stream(Arrays.copyOfRange(items, 1, items.length))
.map(r -> {
String[] record = r.split(":");
int partition = Integer.parseInt(record[0]);
long offset = Long.parseLong(record[1]);
return new AbstractMap.SimpleEntry<>(new TopicPartition(topic, partition), offset);
});
}).collect(Collectors.toMap(AbstractMap.SimpleEntry::getKey, AbstractMap.SimpleEntry::getValue)));
}
}
config.kafka.topics.offsets field look like this but you can use any format
String topicOffsets = "topic2|1:100|2:120|3:140,topic3|1:1000|2:1200|3:1400";

Related

Why does Spring Cloud Stream with Kafka binder hash keys differently than a standard Kafka Producer?

I've run into a problem where I need to repartition an existing topic (source) to a new topic (target) with a higher number of partitions (a multiple of the number of previous partitions).
The source topic was written to using Spring Cloud Stream using the Kafka Binder. The target topic is being written to using a KStreams application.
The records in the source topic were being partitioned based on a header, with key=null. I tried to explicitly extract this header and set a message key for records in the target topic, and noticed that records with the same partition key were landing in completely different partitions.
After some investigation, I've found the culprit to be the following:
org.springframework.cloud.stream.binder.PartitionHandler.DefaultPartitionSelector
private static class DefaultPartitionSelector implements PartitionSelectorStrategy {
#Override
public int selectPartition(Object key, int partitionCount) {
int hashCode = key.hashCode();
if (hashCode == Integer.MIN_VALUE) {
hashCode = 0;
}
return Math.abs(hashCode);
}
}
org.springframework.cloud.stream.binder.PartitionHandler
public int determinePartition(Message<?> message) {
// ... non relevant code omitted
partition = this.partitionSelectorStrategy.selectPartition(key,
this.partitionCount);
// protection in case a user selector returns a negative.
return Math.abs(partition % this.partitionCount);
While the default Kafka partitioning strategy does:
org.apache.kafka.clients.producer.internals.DefaultPartitioner
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,
int numPartitions) {
if (keyBytes == null) {
return stickyPartitionCache.partition(topic, cluster);
}
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
In essence, using something other than Spring Cloud Stream would never allow co-partitioning with a topic written to by a non Spring Cloud Stream App, unless a custom Partitioner is used (not too difficult to do).
It should be noted, however, that the above DefaultPartitionSelector is not located in the Kafka Binder module, but in the higher-level spring-cloud-stream module.
What is the reasoning for this design choice? I imagine the default partitioner applies to all binders, not just Kafka, but why does the Kafka Binder not implement and its own Partitioner that allows out-of-the-box co-partitioning with non-Spring Cloud Stream apps by default?
As I said in my comment
Partitioning at the binder level is intended for infrastructure that doesn't support partitioning natively; just don't use it and let Kafka do the partitioning itself.
That said, it's not entirely clear what you mean; the spring partitioner was written long ago and predates the sticky cache introduced by KIP 480. But, even that partitioner will change the partition if the number of partitions changes when the app is restarted - if there is a key, it is modded by the number of partitions; if there is no key, a random (sticky) partition is selected.
Run this with 10, then 20, partitions and you will see that.
#SpringBootApplication
public class So73207602Application {
public static void main(String[] args) {
SpringApplication.run(So73207602Application.class, args).close();
}
#Bean
ApplicationRunner runner(KafkaTemplate<String, String> template, NewTopic topic, KafkaAdmin admin) {
return args -> {
System.out.println(template.send("topic1", "foo", "bar").get().getRecordMetadata());
};
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("topic1").partitions(10).replicas(1).build();
}
}
With a null key you will get a different (random) partition each time.

How to perform throttling based on user defined argument?

I am writing to an in-memory distributed database in the batch size of that is user-defined in multithreaded environment. But I want to limit the number of rows written to ex. 1000 rows/sec. The reason for this requirement is that my producer is writing too fast and consumer is running into leaf-memory error. Is there any standard practice to perform throttling while batch processing of the records.
dataStream.map(line => readJsonFromString(line)).grouped(memsqlBatchSize).foreach { recordSet =>
val dbRecords = recordSet.map(m => (m, Events.transform(m)))
dbRecords.map { record =>
try {
Events.setValues(eventInsert, record._2)
eventInsert.addBatch
} catch {
case e: Exception =>
logger.error(s"error adding batch: ${e.getMessage}")
val error_event = Events.jm.writeValueAsString(mapAsJavaMap(record._1.asInstanceOf[Map[String, Object]]))
logger.error(s"event: $error_event")
}
}
// Bulk Commit Records
try {
eventInsert.executeBatch
} catch {
case e: java.sql.BatchUpdateException =>
val updates = e.getUpdateCounts
logger.error(s"failed commit: ${updates.toString}")
updates.zipWithIndex.filter { case (v, i) => v == Statement.EXECUTE_FAILED }.foreach { case (v, i) =>
val error = Events.jm.writeValueAsString(mapAsJavaMap(dbRecords(i)._1.asInstanceOf[Map[String, Object]]))
logger.error(s"insert error: $error")
logger.error(e.getMessage)
}
}
finally {
connection.commit
eventInsert.clearBatch
logger.debug(s"committed: ${dbRecords.length.toString}")
}
}
I was hoping if I could pass a user defined arguments as a throttleMax and if total records written by each thread reaches the throttleMax, thread.sleep() will be called for 1 sec. But this is going to make the entire process very slow. Can there be any other effective method, that can be used for throttle the loading of the data to 1000 rows/sec?
As others have suggested (see the comments on the question), you have better options available to you than throttling here. However, you can throttle an operation in Java with some simple code like the following:
/**
* Given an Iterator `inner`, returns a new Iterator which will emit items upon
* request, but throttled to at most one item every `minDelayMs` milliseconds.
*/
public static <T> Iterator<T> throttledIterator(Iterator<T> inner, int minDelayMs) {
return new Iterator<T>() {
private long lastEmittedMillis = System.currentTimeMillis() - minDelayMs;
#Override
public boolean hasNext() {
return inner.hasNext();
}
#Override
public T next() {
long now = System.currentTimeMillis();
long requiredDelayMs = now - lastEmittedMillis;
if (requiredDelayMs > 0) {
try {
Thread.sleep(requiredDelayMs);
} catch (InterruptedException e) {
// resume
}
}
lastEmittedMillis = now;
return inner.next();
}
};
}
The above code uses Thread.sleep, so is not suitable for use in a Reactive system. In that case, you would want to use the Throttle implementation provided in that system, e.g. throttle in Akka

Neo4j, SDN4, ActiveMQ multiple consumers and data syncronization

In order to speed up the data consumption in my application(Spring Boot, Neo4j database, Spring Data Neo4j 4) I have introduced Apache ActiveMQ and configured 10 concurrent consumers.
Right after that I ran into the issue with a counter updates.
I execute the following createDecisions method from my Apache ActiveMQ consumer :
#Override
public Decision create(String name, String description, String url, String imageUrl, boolean multiVotesAllowed, Long parentDecisionId, User user) {
Decision parentDecision = null;
if (parentDecisionId != null) {
parentDecision = ofNullable(findById(parentDecisionId)).orElseThrow(() -> new EntityNotFoundException("Parent decision with a given id not found"));
}
Decision decision = decisionRepository.save(new Decision(name, description, url, imageUrl, multiVotesAllowed, parentDecision, user), user);
if (parentDecision != null) {
updateTotalChildDecisions(parentDecision, 1);
}
return decision;
}
inside createDecision method I do some logic and then update parentDecision.totalChilDecisions counter:
#Override
public Decision updateTotalChildDecisions(Decision decision, Integer increment) {
decision.setTotalChildDecisions(decision.getTotalChildDecisions() + increment);
return decisionRepository.save(decision);
}
After execution in the concurrent environments this counter doesn't match the real things at database but in a single-threaded env(1 ActiveMQ consumer) everything works fine.
I think the main issue is that during totalChildDecisions update the parentDecision refers to the old SDN 4 object with not actual data(totalChildDecisions). How to correctly update parentDecision.totalChildDecisions ?
How to properly synchronize my code in order to get the counters working on the concurrent ActiveMQ consumers ?

Akka stream - limiting Flow rate without introducing delay

I'm working with Akka (version 2.4.17) to build an observation Flow in Java (let's say of elements of type <T> to stay generic).
My requirement is that this Flow should be customizable to deliver a maximum number of observations per unit of time as soon as they arrive. For instance, it should be able to deliver at most 2 observations per minute (the first that arrive, the rest can be dropped).
I looked very closely to the Akka documentation, and in particular this page which details the built-in stages and their semantics.
So far, I tried the following approaches.
With throttle and shaping() mode (to not close the stream when the limit is exceeded):
Flow.of(T.class)
.throttle(2,
new FiniteDuration(1, TimeUnit.MINUTES),
0,
ThrottleMode.shaping())
With groupedWith and an intermediary custom method:
final int nbObsMax = 2;
Flow.of(T.class)
.groupedWithin(Integer.MAX_VALUE, new FiniteDuration(1, TimeUnit.MINUTES))
.map(list -> {
List<T> listToTransfer = new ArrayList<>();
for (int i = list.size()-nbObsMax ; i>0 && i<list.size() ; i++) {
listToTransfer.add(new T(list.get(i)));
}
return listToTransfer;
})
.mapConcat(elem -> elem) // Splitting List<T> in a Flow of T objects
Previous approaches give me the correct number of observations per unit of time but these observations are retained and only delivered at the end of the time window (and therefore there is an additional delay).
To give a more concrete example, if the following observations arrives into my Flow:
[Obs1 t=0s] [Obs2 t=45s] [Obs3 t=47s] [Obs4 t=121s] [Obs5 t=122s]
It should only output the following ones as soon as they arrive (processing time can be neglected here):
Window 1: [Obs1 t~0s] [Obs2 t~45s]
Window 2: [Obs4 t~121s] [Obs5 t~122s]
Any help will be appreciated, thanks for reading my first StackOverflow post ;)
I cannot think of a solution out of the box that does what you want. Throttle will emit in a steady stream because of how it is implemented with the bucket model, rather than having a permitted lease at the start of every time period.
To get the exact behavior you are after you would have to create your own custom rate-limit stage (which might not be that hard). You can find the docs on how to create custom stages here: http://doc.akka.io/docs/akka/2.5.0/java/stream/stream-customize.html#custom-linear-processing-stages-using-graphstage
One design that could work is having an allowance counter saying how many elements that can be emitted that you reset every interval, for every incoming element you subtract one from the counter and emit, when the allowance used up you keep pulling upstream but discard the elements rather than emit them. Using TimerGraphStageLogic for GraphStageLogic allows you to set a timed callback that can reset the allowance.
I think this is exactly what you need: http://doc.akka.io/docs/akka/2.5.0/java/stream/stream-cookbook.html#Globally_limiting_the_rate_of_a_set_of_streams
Thanks to the answer of #johanandren, I've successfully implemented a custom time-based GraphStage that meets my requirements.
I post the code below, if anyone is interested:
import akka.stream.Attributes;
import akka.stream.FlowShape;
import akka.stream.Inlet;
import akka.stream.Outlet;
import akka.stream.stage.*;
import scala.concurrent.duration.FiniteDuration;
public class CustomThrottleGraphStage<A> extends GraphStage<FlowShape<A, A>> {
private final FiniteDuration silencePeriod;
private int nbElemsMax;
public CustomThrottleGraphStage(int nbElemsMax, FiniteDuration silencePeriod) {
this.silencePeriod = silencePeriod;
this.nbElemsMax = nbElemsMax;
}
public final Inlet<A> in = Inlet.create("TimedGate.in");
public final Outlet<A> out = Outlet.create("TimedGate.out");
private final FlowShape<A, A> shape = FlowShape.of(in, out);
#Override
public FlowShape<A, A> shape() {
return shape;
}
#Override
public GraphStageLogic createLogic(Attributes inheritedAttributes) {
return new TimerGraphStageLogic(shape) {
private boolean open = false;
private int countElements = 0;
{
setHandler(in, new AbstractInHandler() {
#Override
public void onPush() throws Exception {
A elem = grab(in);
if (open || countElements >= nbElemsMax) {
pull(in); // we drop all incoming observations since the rate limit has been reached
}
else {
if (countElements == 0) { // we schedule the next instant to reset the observation counter
scheduleOnce("resetCounter", silencePeriod);
}
push(out, elem); // we forward the incoming observation
countElements += 1; // we increment the counter
}
}
});
setHandler(out, new AbstractOutHandler() {
#Override
public void onPull() throws Exception {
pull(in);
}
});
}
#Override
public void onTimer(Object key) {
if (key.equals("resetCounter")) {
open = false;
countElements = 0;
}
}
};
}
}

Datastore queries in Dataflow DoFn slow down pipeline when run in the cloud

I am trying to enhance data in a pipeline by querying Datastore in a DoFn step.
A field from an object from the Class CustomClass is used to do a query against a Datastore table and the returned values are used to enhance the object.
The code looks like this:
public class EnhanceWithDataStore extends DoFn<CustomClass, CustomClass> {
private static Datastore datastore = DatastoreOptions.defaultInstance().service();
private static KeyFactory articleKeyFactory = datastore.newKeyFactory().kind("article");
#Override
public void processElement(ProcessContext c) throws Exception {
CustomClass event = c.element();
Entity article = datastore.get(articleKeyFactory.newKey(event.getArticleId()));
String articleName = "";
try{
articleName = article.getString("articleName");
} catch(Exception e) {}
CustomClass enhanced = new CustomClass(event);
enhanced.setArticleName(articleName);
c.output(enhanced);
}
When it is run locally, this is fast, but when it is run in the cloud, this step slows down the pipeline significantly. What's causing this? Is there any workaround or better way to do this?
A picture of the pipeline can be found here (the last step is the enhancing step):
pipeline architecture
What you are doing here is a join between your input PCollection<CustomClass> and the enhancements in Datastore.
For each partition of your PCollection the calls to Datastore are going to be single-threaded, hence incur a lot of latency. I would expect this to be slow in the DirectPipelineRunner and InProcessPipelineRunner as well. With autoscaling and dynamic work rebalancing, you should see parallelism when running on the Dataflow service unless something about the structure of your causes us to optimize it poorly, so you can try increasing --maxNumWorkers. But you still won't benefit from bulk operations.
It is probably better to express this join within your pipeline, using DatastoreIO.readFrom(...) followed by a CoGroupByKey transform. In this way, Dataflow will do a bulk parallel read of all the enhancements and use the efficient GroupByKey machinery to line them up with the events.
// Here are the two collections you want to join
PCollection<CustomClass> events = ...;
PCollection<Entity> articles = DatastoreIO.readFrom(...);
// Key them both by the common id
PCollection<KV<Long, CustomClass>> keyedEvents =
events.apply(WithKeys.of(event -> event.getArticleId()))
PCollection<KV<Long, Entity>> =
articles.apply(WithKeys.of(article -> article.getKey().getId())
// Set up the join by giving tags to each collection
TupleTag<CustomClass> eventTag = new TupleTag<CustomClass>() {};
TupleTag<Entity> articleTag = new TupleTag<Entity>() {};
KeyedPCollectionTuple<Long> coGbkInput =
KeyedPCollectionTuple
.of(eventTag, keyedEvents)
.and(articleTag, keyedArticles);
PCollection<CustomClass> enhancedEvents = coGbkInput
.apply(CoGroupByKey.create())
.apply(MapElements.via(CoGbkResult joinResult -> {
for (CustomClass event : joinResult.getAll(eventTag)) {
String articleName;
try {
articleName = joinResult.getOnly(articleTag).getString("articleName");
} catch(Exception e) {
articleName = "";
}
CustomClass enhanced = new CustomClass(event);
enhanced.setArticleName(articleName);
return enhanced;
}
});
Another possibility, if there are very few enough articles to store the lookup in memory, is to use DatastoreIO.readFrom(...) and then read them all as a map side input via View.asMap() and look them up in a local table.
// Here are the two collections you want to join
PCollection<CustomClass> events = ...;
PCollection<Entity> articles = DatastoreIO.readFrom(...);
// Key the articles and create a map view
PCollectionView<Map<Long, Entity>> = articleView
.apply(WithKeys.of(article -> article.getKey().getId())
.apply(View.asMap());
// Do a lookup join by side input to a ParDo
PCollection<CustomClass> enhanced = events
.apply(ParDo.withSideInputs(articles).of(new DoFn<CustomClass, CustomClass>() {
#Override
public void processElement(ProcessContext c) {
Map<Long, Entity> articleLookup = c.sideInput(articleView);
String articleName;
try {
articleName =
articleLookup.get(event.getArticleId()).getString("articleName");
} catch(Exception e) {
articleName = "";
}
CustomClass enhanced = new CustomClass(event);
enhanced.setArticleName(articleName);
return enhanced;
}
});
Depending on your data, either of these may be a better choice.
After some checking I managed to pinpoint the problem: the project is located in the EU (and as such, the Datastore is located in the EU-zone; same as the AppEningine zone), while the Dataflow jobs themselves (and thus the workers) are hosted in the US by default (when not overwriting the zone-option).
The difference in performance is 25-30 fold: ~40 elements/s compared to ~1200 elements/s for 15 workers.

Categories

Resources