I have 2 Kafka stream and one of the stream is dependent on the other stream. For example,
Stream1 -> Aggregates and create a state store.
Stream2 -> Uses interactive query api to access the state store from Stream 1.
Problem: When Stream1 takes longer time to initialize, and Stream2 tries to access the state store I am getting the error "org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get state store term-error-events because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"
Solution tried so far: I have added the below retry property for the state store but did not resolve the issue.
spring.cloud.stream.kafka.streams.binder.state-store-retry.max-attempts=10
spring.cloud.stream.kafka.streams.binder.state-store-retry.backoff-period=10000
I want to find out whether I should change the approach or is there a way to wait for Stream1 to be ready and start Stream2 or any other better way to do this.
My Code:
// Stream1
#Bean
public Consumer<KStream<String, Fallout>> aggregateErrors() {
return input -> input
.filter((s, fallout) -> s != null && fallout.getAction() != null && fallout.getAction().equals(Action.ERROR))
.map((s, fallout) -> {
if (StringUtils.isEmpty(fallout.getMessage()) || StringUtils.isEmpty(fallout.getFalloutReasonCode())) {
fallout.setFalloutReasonCode("unknown");
fallout.setMessage("unknown");
return new KeyValue<>(s, fallout);
}
return new KeyValue<>(s, fallout);
})
.groupByKey()
.aggregate(String::new, (s, fallout, error) ->
!StringUtils.isAllEmpty(error)
? error.concat("|").concat(fallout.getMessage())
: error.concat(fallout.getMessage()),
Materialized.<String, String, KeyValueStore<Bytes, byte[]>>as("term-error-events")
.withKeySerde(Serdes.String()).
withValueSerde(Serdes.String())
);
}
// Stream 2
#Bean
public BiFunction<KStream<String, PracLeftGroupInputRequest>, KStream<String, RequestComplete>, KStream<String, RequestComplete>> joinTermsComplete() {
return (termsInputRequest, termsRequestComplete) ->
termsInputRequest
.join(termsRequestComplete, this::termsCompleteJoiner,
JoinWindows.of(Duration.ofMinutes(JOIN_WINDOW_MINUTES)), StreamJoined.with(Serdes.String(), new TermInputRequestSerde(), new TermRequestCompleteSerde()));
}
// Where Stream2 queries the store
private RequestComplete termsCompleteJoiner(PracLeftGroupInputRequest termsInputRequest, RequestComplete termsRequestComplete) {
if (termsRequestComplete != null && !StringUtils.isEmpty(termsRequestComplete.getStatus())
&& termsRequestComplete.getStatus().equalsIgnoreCase("Complete")) {
final String errorEvent =
interactiveQueryService.getQueryableStore("term-error-events", QueryableStoreTypes.<String, String>keyValueStore()).get(termsInputRequest.computeKafkaKey());
termsRequestComplete.setErrors(ErrorUtil.getTerminationErrorList(errorEvent));
if (termsRequestComplete.getErrors() != null && !termsRequestComplete.getErrors().isEmpty()) {
termsRequestComplete.setStatus(CompleteStatus.FAILURE.toString());
} else {
termsRequestComplete.setStatus(CompleteStatus.SUCCESS.toString());
}
return termsRequestComplete;
} else {
return null;
}
}
Spring Cloud Version: 2020.0.2
spring-cloud-stream-binder-kafka-streams Version: 3.1.2
Thanks.
Related
I am writing a little Kafka metrics exporter (Yes there are loads available like prometheus etc but I want a light weight custom one. Kindly excuse me on this).
As part of this I would like to know as soon as first message is received (or topic has messages) in a Kafka topic. I am using Spring Boot and Kafka.
I have the below code which gives the name of the topic and number of partitions. I want to know if the topic has messages? Kindly let me know how can I get this stat. Any lead is much appreciated!
#ReadOperation
public List<TopicManifest> kafkaTopic() throws ExecutionException, InterruptedException {
ListTopicsOptions listTopicsOptions = new ListTopicsOptions();
listTopicsOptions.listInternal(true);
ListTopicsResult listTopicsResult = adminClient.listTopics(listTopicsOptions);
Set<String> topics = listTopicsResult.names().get().stream().filter(topic -> !topic.startsWith("_")).collect(Collectors.toSet());
System.out.println(topics);
DescribeTopicsResult describeTopicsResult = adminClient.describeTopics(topics);
Map<String, KafkaFuture<TopicDescription>> topicNameValues = describeTopicsResult.topicNameValues();
List<TopicManifest> topicManifests = topicNameValues.entrySet().stream().map(entry -> {
try {
TopicDescription topicDescription = entry.getValue().get();
return TopicManifest.builder().name(entry.getKey())
.noOfPartitions(topicDescription.partitions().size())
.build();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
return null;
}).collect(Collectors.toList());
return topicManifests;
}
Create a KafkaConsumer and call endOffsets (the consumer does not need to be subscribed to the topic(s)).
#Bean
ApplicationRunner runner1(ConsumerFactory cf) {
return args -> {
try (Consumer consumer = cf.createConsumer()) {
System.out.println(consumer.endOffsets(List.of(new TopicPartition("ktest29", 0),
new TopicPartition("ktest29", 1),
new TopicPartition("ktest29", 2))));
}
};
}
Offsets stored on the topic never reduce. Getting the end offset doesn't guarantee you have a non-empty topic (start and end offsets for the topic partitions could be the same).
Instead, you will still create a consumer but set
auto.offset.reset=earliest
group.id=UUID.randomUUID()
Then subscribe and run
ConsumerRecords records = consumer.poll(Duration.ofSeconds(2));
boolean empty = records.count() == 0;
By setting auto.offset.earliest with a random group, you are guaranteed to start at the earliest offset and seek to the first available record, if it exists, at which point, you can try to poll any number of records, to see if any are returned within the specified timeout.
This should work for regular and compacted topics without needing to check committed offsets.
I am trying to download a file (or multiple files), based on the result of a previous webrequest. After downloading the file I need to send the previous Mono result (dossier and obj) and the file to another system. So far I have been working with flatMaps and Monos. But when reading large files, I cannot use the Mono during the file download, as the buffer is too small.
Simplified the code looks something like this:
var filePath = Paths.get("test.pdf");
this.dmsService.search()
.flatMap(result -> {
var dossier = result.getObjects().get(0).getProperties();
var objectId = dossier.getReferencedObjectId();
return Mono.zip(this.dmsService.getById(objectId), Mono.just(dossier));
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var objectId = dossier.getReferencedObjectId();
var zip = zipService.createZip(objectId, obj, dossier);
return zipService.uploadZip(Flux.just(zip));
})
.flatMap(newWorkItemId -> {
return updateMetadata(newWorkItemId);
})
.subscribe(() -> {
finishItem();
});
dmsService.search(), this.dmsService.getById(objectId), zipService.uploadZip() all return Mono of a specific type.
dmsService.getDocument(objectId) returns a Flux due to support for large files. With a DataBuffer Mono it was worked for small files if I simply used a Files.copy:
...
var contentMono = this.dmsService.getDocument(objectId);
return contentMono;
})
.flatMap(content -> {
Files.copy(content.asInputStream(), Path.of("test.pdf"));
...
}
I have tried different approaches but always ran into problems.
Based on https://www.amitph.com/spring-webclient-large-file-download/#Downloading_a_Large_File_with_WebClient
DataBufferUtils.write(dataBuffer, destination).share().block();
When I try this, nothing after .block() is ever executed. No download is made.
Without the .share() I get an exception, that I may not use block:
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-nio-5
Since DataBufferUtils.write returns a Mono my next assumption was, that instead of calling block, I can Mono.zip() this together with my other values, but this never returns either.
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
Any inputs on how to achieve this are greatly appreachiated.
I finally figured out that if I use a WritableByteChannel which returns a Flux<DataBuffer> instead of a Mono<Void> I can map the return value to release the DataBufferUtils, which seems to do the trick. I found the inspiration for this solution here: DataBuffer doesn't write to file
var media = this.dmsService.getDocument(objectId);
var file = Files.createTempFile(objectId, ".tmp");
WritableByteChannel filechannel = Files.newByteChannel(file, StandardOpenOption.WRITE);
var writeMono = DataBufferUtils.write(media, filechannel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
I have a requirement where I am connecting one microservice to other microservice via Vertx client. In the code I am checking if another microservice is down then on failure it should create some JsonObject with solrError as key and failure message as value. If there is a solr error I mean if other microservice is down which is calling solr via load balancing then it should throw some error response. But Vertx client is taking some time to check on failure and when condition is checked that time there is no solrError in jsonobject as Vertx client is taking some time to check for failure so condition fails and resp is coming as null. In order to avoid this what can be done so that Vertx client fails before the condition to check for solrError and returns Internal server error response?
Below is the code :
solrQueryService.executeQuery(query).subscribe().with(jsonObject -> {
ObjectMapper objMapper = new ObjectMapper();
SolrOutput solrOutput = new SolrOutput();
List<Doc> docs = new ArrayList<>();
try {
if(null != jsonObject.getMap().get("solrError")){
resp = Response.status(Response.Status.INTERNAL_SERVER_ERROR)
.entity(new BaseException(
exceptionService.processSolrDownError(request.header.referenceId))
.getResponse()).build();
}
solrOutput = objMapper.readValue(jsonObject.toString(), SolrOutput.class);
if (null != solrOutput.getResponse()
&& CollectionUtils.isNotEmpty(solrOutput.getResponse().getDocs())) {
docs.addAll(solrOutput.getResponse().getDocs());
uniDocList = Uni.createFrom().item(docs);
}
} catch (JsonProcessingException e) {
e.printStackTrace();
}
});
if(null!=resp && resp.getStatus() !=200) {
return resp ;
}
SolrQueryService is preparing query and send out URL and query to Vertx web client as below :
public Uni<JsonObject> search(URL url, SolrQuery query,Integer timeout) {
int port = url.getPort();
if (port == -1 && "https".equals(url.getProtocol())) {
port = 443;
}
if (port == -1 && "http".equals(url.getProtocol())) {
port = 80;
}
HttpRequest<Buffer> request = client.post(port, url.getHost(), url.getPath()).timeout(timeout);
return request.sendJson(query).map(resp -> {
return resp.bodyAsJsonObject();
}).onFailure().recoverWithUni(f -> {
return Uni.createFrom().item(new JsonObject().put("solrError", f.getMessage()));
});
}
I have not used the Vertx client but assume its reactive and non-blocking. Assuming this is the case, your code seems to be mixing imperative and reactive constructs. The subscribe in the first line is reactive and the lambda you provide will be called when the server responds to the client request. However, after the subscribe, you have imperative code which runs before the lambda even has a chance to be called so your checks and access to the "resp" object will never be a result of what happened in the lambda itself.
You need to move all the code into the lambda or at least make subsequent code chain onto the result of the subscribe.
I am trying to work with Kafka Streams and I have created the following Topology:
KStream<String, HistoryEvent> eventStream = builder.stream(applicationTopicName, Consumed.with(Serdes.String(),
historyEventSerde));
eventStream.selectKey((key, value) -> new HistoryEventKey(key, value.getIdentifier()))
.groupByKey()
.reduce((e1, e2) -> e2, Materialized.as(streamByKeyStoreName));
I later start the streams like this:
private void startKafkaStreams(KafkaStreams streams) {
CompletableFuture<KafkaStreams.State> stateFuture = new CompletableFuture<>();
streams.setStateListener((newState, oldState) -> {
if(stateFuture.isDone()) {
return;
}
if(newState == KafkaStreams.State.RUNNING || newState == KafkaStreams.State.ERROR) {
stateFuture.complete(newState);
}
});
streams.start();
try {
KafkaStreams.State finalState = stateFuture.get();
if(finalState != KafkaStreams.State.RUNNING) {
// ...
}
} catch (InterruptedException ex) {
// ...
} catch(ExecutionException ex) {
// ...
}
}
My Streams start without an error and they eventually get into the state of RUNNING where the future is completed. Later I am trying to access that store that I created in my topology for the KTable:
public KafkaFlowHistory createFlowHistory(String flowId) {
ReadOnlyKeyValueStore<HistoryEventKey, HistoryEvent> store = streams.store(streamByKeyStoreName,
QueryableStoreTypes.keyValueStore());
return new KafkaFlowHistory(flowId, store, event -> topicProducer.send(new ProducerRecord<>(applicationTopicName, flowId, event)));
}
I have verified that the createFlowHistory is called after the initializing future is completed in RUNNING state, however I am consistently unable to do this and KafkaStreams is reporting the following error:
Exception in thread "main"
org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get
state store flow-event-stream-file-service-test-instance-by-key
because the stream thread is PARTITIONS_ASSIGNED, not RUNNING
Apparently the state of the thread has changed. Do I need to take care of this manually when trying to query a store and wait for the internal thread of Kafka to get into the right state?
Older Versions (before 2.2.0)
On startup, Kafka Streams does the following state transitions:
CREATED -> RUNNING -> REBALANCING -> RUNNING
You need to wait for the second RUNNING state before you can query.
New Version: as of 2.2.0
The state transition behavior on startup was changed (via https://issues.apache.org/jira/browse/KAFKA-7657) to:
CREATED -> REBALANCING -> RUNNING
Hence, you should not hit this issue any longer.
I am developing an API REST using Spring WebFlux, but I have problems when uploading files. They are stored but I don't get the expected return value.
This is what I do:
Receive a Flux<Part>
Cast Part to FilePart.
Save parts with transferTo() (this return a Mono<Void>)
Map the Mono<Void> to Mono<String>, using file name.
Return Flux<String> to client.
I expect file name to be returned, but client gets an empty string.
Controller code
#PostMapping(value = "/muscles/{id}/image")
public Flux<String> updateImage(#PathVariable("id") String id, #RequestBody Flux<Part> file) {
log.info("REST request to update image to Muscle");
return storageService.saveFiles(file);
}
StorageService
public Flux<String> saveFiles(Flux<Part> parts) {
log.info("StorageService.saveFiles({})", parts);
return
parts
.filter(p -> p instanceof FilePart)
.cast(FilePart.class)
.flatMap(file -> saveFile(file));
}
private Mono<String> saveFile(FilePart filePart) {
log.info("StorageService.saveFile({})", filePart);
String filename = DigestUtils.sha256Hex(filePart.filename() + new Date());
Path target = rootLocation.resolve(filename);
try {
Files.deleteIfExists(target);
File file = Files.createFile(target).toFile();
return filePart.transferTo(file)
.map(r -> filename);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
FilePart.transferTo() returns Mono<Void>, which signals when the operation is done - this means the reactive Publisher will only publish an onComplete/onError signal and will never publish a value before that.
This means that the map operation was never executed, because it's only given elements published by the source.
You can return the name of the file and still chain reactive operators, like this:
return part.transferTo(file).thenReturn(part.filename());
It is forbidden to use the block operator within a reactive pipeline and it even throws an exception at runtime as of Reactor 3.2.
Using subscribe as an alternative is not good either, because subscribe will decouple the transferring process from your request processing, making those happen in different execution sequences. This means that your server could be done processing the request and close the HTTP connection while the other part is still trying to read the file part to copy it on disk. This is likely to fail in subtle ways at runtime.
FilePart.transferTo() returns Mono<Void> that is a constant empty. Then, map after that was never executed. I solved it by doing this:
private Mono<String> saveFile(FilePart filePart) {
log.info("StorageService.saveFile({})", filePart);
String filename = DigestUtils.sha256Hex(filePart.filename() + new Date());
Path target = rootLocation.resolve(filename);
try {
Files.deleteIfExists(target);
File file = Files.createFile(target).toFile();
return filePart
.transferTo(file)
.doOnSuccess(data -> log.info("do something..."))
.thenReturn(filename);
} catch (IOException e) {
throw new RuntimeException(e);
}
}