Unable to open store for Kafka streams because invalid state - java

I am trying to work with Kafka Streams and I have created the following Topology:
KStream<String, HistoryEvent> eventStream = builder.stream(applicationTopicName, Consumed.with(Serdes.String(),
historyEventSerde));
eventStream.selectKey((key, value) -> new HistoryEventKey(key, value.getIdentifier()))
.groupByKey()
.reduce((e1, e2) -> e2, Materialized.as(streamByKeyStoreName));
I later start the streams like this:
private void startKafkaStreams(KafkaStreams streams) {
CompletableFuture<KafkaStreams.State> stateFuture = new CompletableFuture<>();
streams.setStateListener((newState, oldState) -> {
if(stateFuture.isDone()) {
return;
}
if(newState == KafkaStreams.State.RUNNING || newState == KafkaStreams.State.ERROR) {
stateFuture.complete(newState);
}
});
streams.start();
try {
KafkaStreams.State finalState = stateFuture.get();
if(finalState != KafkaStreams.State.RUNNING) {
// ...
}
} catch (InterruptedException ex) {
// ...
} catch(ExecutionException ex) {
// ...
}
}
My Streams start without an error and they eventually get into the state of RUNNING where the future is completed. Later I am trying to access that store that I created in my topology for the KTable:
public KafkaFlowHistory createFlowHistory(String flowId) {
ReadOnlyKeyValueStore<HistoryEventKey, HistoryEvent> store = streams.store(streamByKeyStoreName,
QueryableStoreTypes.keyValueStore());
return new KafkaFlowHistory(flowId, store, event -> topicProducer.send(new ProducerRecord<>(applicationTopicName, flowId, event)));
}
I have verified that the createFlowHistory is called after the initializing future is completed in RUNNING state, however I am consistently unable to do this and KafkaStreams is reporting the following error:
Exception in thread "main"
org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get
state store flow-event-stream-file-service-test-instance-by-key
because the stream thread is PARTITIONS_ASSIGNED, not RUNNING
Apparently the state of the thread has changed. Do I need to take care of this manually when trying to query a store and wait for the internal thread of Kafka to get into the right state?

Older Versions (before 2.2.0)
On startup, Kafka Streams does the following state transitions:
CREATED -> RUNNING -> REBALANCING -> RUNNING
You need to wait for the second RUNNING state before you can query.
New Version: as of 2.2.0
The state transition behavior on startup was changed (via https://issues.apache.org/jira/browse/KAFKA-7657) to:
CREATED -> REBALANCING -> RUNNING
Hence, you should not hit this issue any longer.

Related

How to verify if a Kafka Topic is not empty meaning has at least 1 message?

I am writing a little Kafka metrics exporter (Yes there are loads available like prometheus etc but I want a light weight custom one. Kindly excuse me on this).
As part of this I would like to know as soon as first message is received (or topic has messages) in a Kafka topic. I am using Spring Boot and Kafka.
I have the below code which gives the name of the topic and number of partitions. I want to know if the topic has messages? Kindly let me know how can I get this stat. Any lead is much appreciated!
#ReadOperation
public List<TopicManifest> kafkaTopic() throws ExecutionException, InterruptedException {
ListTopicsOptions listTopicsOptions = new ListTopicsOptions();
listTopicsOptions.listInternal(true);
ListTopicsResult listTopicsResult = adminClient.listTopics(listTopicsOptions);
Set<String> topics = listTopicsResult.names().get().stream().filter(topic -> !topic.startsWith("_")).collect(Collectors.toSet());
System.out.println(topics);
DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(topics);
Map<String, KafkaFuture<TopicDescription>> topicNameValues = describeTopicsResult.topicNameValues();
List<TopicManifest> topicManifests = topicNameValues.entrySet().stream().map(entry -> {
try {
TopicDescription topicDescription = entry.getValue().get();
return TopicManifest.builder().name(entry.getKey())
.noOfPartitions(topicDescription.partitions().size())
.build();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
return null;
}).collect(Collectors.toList());
return topicManifests;
}
Create a KafkaConsumer and call endOffsets (the consumer does not need to be subscribed to the topic(s)).
#Bean
ApplicationRunner runner1(ConsumerFactory cf) {
return args -> {
try (Consumer consumer = cf.createConsumer()) {
System.out.println(consumer.endOffsets(List.of(new TopicPartition("ktest29", 0),
new TopicPartition("ktest29", 1),
new TopicPartition("ktest29", 2))));
}
};
}
Offsets stored on the topic never reduce. Getting the end offset doesn't guarantee you have a non-empty topic (start and end offsets for the topic partitions could be the same).
Instead, you will still create a consumer but set
auto.offset.reset=earliest
group.id=UUID.randomUUID()
Then subscribe and run
ConsumerRecords records = consumer.poll(Duration.ofSeconds(2));
boolean empty = records.count() == 0;
By setting auto.offset.earliest with a random group, you are guaranteed to start at the earliest offset and seek to the first available record, if it exists, at which point, you can try to poll any number of records, to see if any are returned within the specified timeout.
This should work for regular and compacted topics without needing to check committed offsets.

Spring cloud kafka stream - Waiting for stream to initialize

I have 2 Kafka stream and one of the stream is dependent on the other stream. For example,
Stream1 -> Aggregates and create a state store.
Stream2 -> Uses interactive query api to access the state store from Stream 1.
Problem: When Stream1 takes longer time to initialize, and Stream2 tries to access the state store I am getting the error "org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get state store term-error-events because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"
Solution tried so far: I have added the below retry property for the state store but did not resolve the issue.
spring.cloud.stream.kafka.streams.binder.state-store-retry.max-attempts=10
spring.cloud.stream.kafka.streams.binder.state-store-retry.backoff-period=10000
I want to find out whether I should change the approach or is there a way to wait for Stream1 to be ready and start Stream2 or any other better way to do this.
My Code:
// Stream1
#Bean
public Consumer<KStream<String, Fallout>> aggregateErrors() {
return input -> input
.filter((s, fallout) -> s != null && fallout.getAction() != null && fallout.getAction().equals(Action.ERROR))
.map((s, fallout) -> {
if (StringUtils.isEmpty(fallout.getMessage()) || StringUtils.isEmpty(fallout.getFalloutReasonCode())) {
fallout.setFalloutReasonCode("unknown");
fallout.setMessage("unknown");
return new KeyValue<>(s, fallout);
}
return new KeyValue<>(s, fallout);
})
.groupByKey()
.aggregate(String::new, (s, fallout, error) ->
!StringUtils.isAllEmpty(error)
? error.concat("|").concat(fallout.getMessage())
: error.concat(fallout.getMessage()),
Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as("term-error-events")
.withKeySerde(Serdes.String()).
withValueSerde(Serdes.String())
);
}
// Stream 2
#Bean
public BiFunction<KStream<String, PracLeftGroupInputRequest>, KStream<String, RequestComplete>, KStream<String, RequestComplete>> joinTermsComplete() {
return (termsInputRequest, termsRequestComplete) ->
termsInputRequest
.join(termsRequestComplete, this::termsCompleteJoiner,
JoinWindows.of(Duration.ofMinutes(JOIN_WINDOW_MINUTES)), StreamJoined.with(Serdes.String(), new TermInputRequestSerde(), new TermRequestCompleteSerde()));
}
// Where Stream2 queries the store
private RequestComplete termsCompleteJoiner(PracLeftGroupInputRequest termsInputRequest, RequestComplete termsRequestComplete) {
if (termsRequestComplete != null && !StringUtils.isEmpty(termsRequestComplete.getStatus())
&& termsRequestComplete.getStatus().equalsIgnoreCase("Complete")) {
final String errorEvent =
interactiveQueryService.getQueryableStore("term-error-events", QueryableStoreTypes.<String, String>keyValueStore()).get(termsInputRequest.computeKafkaKey());
termsRequestComplete.setErrors(ErrorUtil.getTerminationErrorList(errorEvent));
if (termsRequestComplete.getErrors() != null && !termsRequestComplete.getErrors().isEmpty()) {
termsRequestComplete.setStatus(CompleteStatus.FAILURE.toString());
} else {
termsRequestComplete.setStatus(CompleteStatus.SUCCESS.toString());
}
return termsRequestComplete;
} else {
return null;
}
}
Spring Cloud Version: 2020.0.2
spring-cloud-stream-binder-kafka-streams Version: 3.1.2
Thanks.

How to move error message to Azure dead letter queue(Topics - Subscription) using Java?

I need to send my messages to Dead letter queue from azure topic subscription incase of any error while reading and processing the message from topic. So I tried testing pushing message directly to DLQ.
My sample code will be like
static void sendMessage()
{
// create a Service Bus Sender client for the queue
ServiceBusSenderClient senderClient = new ServiceBusClientBuilder()
.connectionString(connectionString)
.sender()
.topicName(topicName)
.buildClient();
// send one message to the topic
senderClient.sendMessage(new ServiceBusMessage("Hello, World!"));
}
static void resceiveAsync() {
ServiceBusReceiverAsyncClient receiver = new ServiceBusClientBuilder()
.connectionString(connectionString)
.receiver()
.topicName(topicName)
.subscriptionName(subName)
.buildAsyncClient();
// receive() operation continuously fetches messages until the subscription is disposed.
// The stream is infinite, and completes when the subscription or receiver is closed.
Disposable subscription = receiver.receiveMessages().subscribe(message -> {
System.out.printf("Id: %s%n", message.getMessageId());
System.out.printf("Contents: %s%n", message.getBody().toString());
}, error -> {
System.err.println("Error occurred while receiving messages: " + error);
}, () -> {
System.out.println("Finished receiving messages.");
});
// Continue application processing. When you are finished receiving messages, dispose of the subscription.
subscription.dispose();
// When you are done using the receiver, dispose of it.
receiver.close();
}
I tried getting the deadletter queue path
String dlq = EntityNameHelper.formatDeadLetterPath(topicName);
I got path of dead letter queue like = "mytopic/$deadletterqueue"
But It's not working while passing path as topic name. It throwing a Entity topic not found exception.
Any one can you please advise me on this
Reference :
How to move error message to Azure dead letter queue using Java?
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues#moving-messages-to-the-dlq
How to push the failure messages to Azure service bus Dead Letter Queue in Spring Boot Java?
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-java-how-to-use-topics-subscriptions-legacy#receive-messages-from-a-subscription
You probably know that a message will be automatically moved to the deadletter queue if you throw exceptions during processing, and the maximum delievery count is exceeded. If you want to explicitly move the message to the DLQ, you can do so as well. A common case for this is if you know that the message can never succeed because of its contents.
You cannot send new messages directly to the DLQ, because then you would have two messages in the system. You need to call a special operation on the parent entity. Also, <topic path>/$deadletterqueue does not work, because this would be the DLQ of all subscriptions. The correct entity path is built like this:
<queue path>/$deadletterqueue
<topic path>/Subscriptions/<subscription path>/$deadletterqueue
https://github.com/Azure/azure-service-bus/blob/master/samples/Java/azure-servicebus/DeadletterQueue/src/main/java/com/microsoft/azure/servicebus/samples/deadletterqueue/DeadletterQueue.java
This sample code is for queues, but you should be able to adapt it to topics quite easily:
// register the RegisterMessageHandler callback
receiver.registerMessageHandler(
new IMessageHandler() {
// callback invoked when the message handler loop has obtained a message
public CompletableFuture<Void> onMessageAsync(IMessage message) {
// receives message is passed to callback
if (message.getLabel() != null &&
message.getContentType() != null &&
message.getLabel().contentEquals("Scientist") &&
message.getContentType().contentEquals("application/json")) {
// ...
} else {
return receiver.deadLetterAsync(message.getLockToken());
}
return receiver.completeAsync(message.getLockToken());
}
// callback invoked when the message handler has an exception to report
public void notifyException(Throwable throwable, ExceptionPhase exceptionPhase) {
System.out.printf(exceptionPhase + "-" + throwable.getMessage());
}
},
// 1 concurrent call, messages are auto-completed, auto-renew duration
new MessageHandlerOptions(1, false, Duration.ofMinutes(1)),
executorService);

How to parallelize webservice requests with CompletableFuture?

I have a servlet request that basically requests data given by an input date. As I have multiple dates, I have to send multiple requests, and then aggregate the results. For example:
List<Result> results = new ArrayList<>();
for (LocalDate date : dates) {
ServletReq req = new ServletReq(date);
try {
ServletRsp rsp = webservice.send(req);
results.addAll(rsp.getResults());
} catch (SpecificException e) {
//just ignore this result and continue
}
}
Question: how can I parallelize the code above? Means: sending multiple ServletReq async, and collect the result into the list. Wait for all requests to finish (maybe with a timeout), and ignore the SpecificException.
I started as follows, but neither do I know if this is the right direction, nor did I succeed transfering the code above completely. Especially regarding the exception to be ignored.
ExecutorService service = Executors.newCachedThreadPool();
List<CompletableFuture<ServletRsp>> futures = new ArrayList<>();
for (LocalDate date : dates) {
ServletReq req = new ServletReq(date);
CompletableFuture future = CompletableFuture.supplyAsync(() -> webservice.send(req), service);
futures.add(future);
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])).join();
So far, but: How can I call rsp.getResults() on the async result, and put everything into the list. And how can I ignore the SpecificException during the async execution? (I cannot modify the webservice.send() method!).
catch them within the supplier and return e.g. null. Only do that if you'd really do nothing with the exception anyways. To get the results at future.get() you have to deal with null and ExecutionExceptions.
Eg
CompletableFuture<ServletRsp> future = CompletableFuture.supplyAsync(() -> {
try {
return webservice.send(new ServletReq(date));
} catch (SpecificException e) {
return null;
}
});
rethrow them as (custom?) RuntimeException so you don't lose them. Now you deal with just exceptions in the end but some are double-wrapped.
Manually complete the future.
E.g.
CompletableFuture<ServletRsp> future = new CompletableFuture<>();
service.execute(() -> {
try {
future.complete(webservice.send(new ServletReq(date));
} catch (SpecificException e) {
future.completeExceptionally(e);
}
});
futures.add(future);
No more wrapping besides in ExecutionException. CompletableFuture.supplyAsync does about exactly that, but has no code to deal with checked exceptions.
Just use the good old ExecutorService#submit(Callable<T> callable) method which accepts code that throws:
e.g.
List<Callable<String>> tasks = dates.stream()
.map(d -> (Callable<ServletRsp>) () -> send(new ServletReq(d)))
.collect(Collectors.toList());
List<Future<ServletRsp>> completed = service.invokeAll(tasks);
I think you're on a good path there.
The issue is, that there is no mechanism to nicely collect the results, except doing it yourself:
ExecutorService service = Executors.newCachedThreadPool();
List<CompletableFuture<Void>> futures = new ArrayList<>(); // these are only references to tell you when the request finishes
Queue<ServletRsp> results = new ConcurrentLinkedQueue<>(); // this has to be thread-safe
for (LocalDate date : dates) {
ServletReq req = new ServletReq(date);
CompletableFuture future = CompletableFuture
.supplyAsync(() -> webservice.send(req), service)
.thenAcceptAsync(results::add);
futures.add(future);
}
CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()])).join();
// do stuff with results
I've tried to keep most of the code as you've written it. Maybe it's a bit cleaner with streams:
List<CompletableFuture<Void>> collect = dates
.map(date -> CompletableFuture
.supplyAsync(() -> webservice.send(new ServletReq(date)), service)
.thenAcceptAsync(results::add))
.collect(Collectors.toList());
// wait for all requests to finish
CompletableFuture.allOf(collect.toArray(new CompletableFuture[collect.size()])).thenAcceptAsync(ignored -> {
//you can also handle the response async.
});

What's a good, robust, ASYNC way to check if URLs exist from a Play controller?

I originally tried this:
private static boolean checkUrlsAreReachable(String... urls) {
checkArgument(urls.length > 0);
List<F.Promise<WS.HttpResponse>> promises = newArrayList();
for (String url : urls) {
promises.add(WS.url(url).followRedirects(true).timeout("30s").getAsync());
}
List<WS.HttpResponse> results = await(F.Promise.waitAll(promises));
for (WS.HttpResponse response : results) {
if (!response.success()) {
logger.debug("Failed accessing one of " + Joiner.on(", ").join(urls));
return false;
}
}
return true;
}
But I found several caveats:
I'm getting an exception on WS.url(url) if the URL in question does not resolve well (e.g. http://a.com/).
At least when debugging, it seems the call to getAsync() blocks ... is it really async in production? I know Play has fewer thread in Dev mode, but I thought the call wouldn't even start executing at this point.
If one of the URLs is not reachable, I'm not sure how to log which failed (how to access the URL from the WS.HttpResponse object)
So, I turned to use sync HTTP instead of async. The following implementation seems to work:
private static boolean checkUrlsAreReachable(String... urls) {
checkArgument(urls.length > 0);
List<F.Promise<Boolean>> promises = newArrayList();
for (final String url : urls) {
promises.add(new Job<Boolean>(){
#Override
public Boolean doJobWithResult() throws Exception {
try {
WS.HttpResponse result = WS.url(url).followRedirects(true)
.timeout("30s").get();
return result.success();
} catch (Exception e) {
return false;
}
}
}.now());
}
F.Promise<List<Boolean>> allResults = F.Promise.waitAll(promises);
List<Boolean> booleans = await(allResults);
return Booleans3.and(booleans);
}
Is there a way to make the async implementation work?
set job pool setting part in application.conf
# Jobs executor
# ~~~~~~
# Size of the Jobs pool
play.jobs.pool=20
# Execution pool
# ~~~~~
# Default to 1 thread in DEV mode or (nb processors + 1) threads in PROD mode.
# Try to keep a low as possible. 1 thread will serialize all requests (very useful for debugging purpose)
play.pool=5
And just put the checking part in a job, such as CheckingJob, and start it using
new CheckingJob().now()
it will be async.

Categories

Resources