Kafka streams - grouping by value property?

Kafka streams - grouping by value property? - java

I have a stream that comes in with:
Key: { "Symbol": "xxx" }
Value: { "Date": "2019-01-01", ... }
So, I want to group by Symbol and then Value.Date in 5 day blocks. I.e. 01-01 -> 01-05.
KStream<Key, Value> stream = kStreamBuilder.stream(...);
stream.groupBy((key, value) -> key.getSymbol())
So I've got the stream properly, and the first step, I group by the Key.Symbol. Not really sure where to go from here. Any pointers would be appreciated.

You could use a custom timestamp extractor to returns the timestamp from the value, i.e., implement TimestampExtractor interface and specify your class via default.timestamp.extractor configuration parameter (cf https://docs.confluent.io/current/streams/developer-guide/config-streams.html#default-timestamp-extractor)
This allows you to use tumbling time windows based on the extracted timestamp via:
groupBy(...).windowedBy(TimeWindows.of(Duration.ofDays(5))).aggregate(...)
See the docs for more details: https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#tumbling-time-windows

Related

Finding min/max values in a KafkaStream (KStream) object

I have a Kafka Stream application and Avro schemas for each of the topics and also for the key. Key topic schema is same for all.
Now, there is a KafkaStream (KStream) object with the known key object as the key and a value object (derived from the AvroSchema) which extends org.apache.avro.specific.SpecificRecordBase but it could be any of my avro schemas for the topic content.
KStream<CustomKey, ? extends SpecificRecordBase> myStream = ...
What I want to achieve is to run min and max functions on this stream. The problem is that I don't know what is the ? object, and as there are 30+ (and will increase in the future) topics, I don't wanna do a switch-case. So I have the followings:
public KStream<CustomKey, ? extends SpecificRecordBase> max(
final KStream<CustomKey, ? extends SpecificRecordBase> myStream,
final String attributeName) {
SpecificRecordBase maxValue = ...;
myStream.foreach((key, value) -> {
value.get(attributeName) // I want to find the max value for this attribute,
// but at this point we don't know it's type and
// and we can't assign maxValue = value, because this is a lambda
// function.
});
// find and return the max value
}
My question is, how can I calculate the max value for the myStream on the attributeName attribute?

it could be any of my avro schemas for the topic content
Then you need to extends ClassWithMinMaxFields. Otherwise, you will be unable to extract it from generic SpecificRecordBase object.
Also, your method returns a stream. You cannot return the min/max. If that is your objective, you need a plain consumer to scan the whole topic, beginning to (eventual) end.
To do this (correctly) with Streams API, you would either
need to build a KTable for every value, grouped by key, then do a table scan for the min/max, as you need them.
Create a new topic using aggregate DSL function, initialized with {"min": +Inf, "max": -Inf}, then on new records you check old vs new records, if you have a new min and/or max, set them and return the new record. Then, you still need an external consumer to fetch the most recent min/max events.
If you had a consistent Avro type, you could use ksqlDB functions

spring integration spel expression to extract fields from message payload

Not able to get my head over SPEL for Message payloads. I want to extract data from certain fields of my message payload which is essentially the following JSON converted to Map<String, Object> and passed to a #Transformer
{
"expand":"renderedFields,names,schema,transitions,operations,editmeta,changelog,versionedRepresentations",
"id":"14730",
"self":"https://jira.foo.com/rest/api/2/issue/14730",
"key":"SDP-145",
"fields":{
"issuetype":{
"self":"https://jira.foo.com/rest/api/2/issuetype/10200",
"id":"10200",
"description":"gh.issue.epic.desc",
"iconUrl":"https://jira.foo.com/ghanghor/viewkaka?size=xsmall&kakaId=10501&kakaType=issuetype",
"name":"Epic",
"subtask":false,
"kakaId":10501
},
"priority":{
"self":"https://jira.foo.com/rest/api/2/priority/3",
"iconUrl":"https://jira.foo.com/images/icons/priorities/major.svg",
"name":"Major",
"id":"3"
},
"labels":[
"Lizzy",
"kanban",
"rughani"
],
"updated":"2021-01-21T10:33:38.000+0000",
"status":{
"self":"https://jira.foo.com/rest/api/2/status/1",
"description":"The issue is open and ready for the assignee to start work on it.",
"iconUrl":"https://jira.foo.com/images/icons/statuses/open.png",
"name":"Open",
"id":"1",
"statusCategory":{
"self":"https://jira.foo.com/rest/api/2/statuscategory/2",
"id":2,
"key":"new",
"colorName":"blue-gray",
"name":"To Do"
}
},
"summary":"new epic for Tazzy",
"creator":{
"self":"https://jira.foo.com/rest/api/2/user?username=skadmin",
"name":"skadmin",
"key":"skadmin",
"emailAddress":"Lizzy.t#foo.com",
"displayName":"Lizzy Rughani",
"active":true,
"timeZone":"Asia/Kolkata"
},
"subtasks":[
],
}
}
I'm interested in three nested values here which I'm trying to fetch via following expressions
issueDataMap = {LinkedHashMap#4867} size = 3
"name" -> "#payload['fields']['summary']"
"description" -> "#payload['description']"
"text3" -> "#payload['key']"
I get this error when the expression is applied
org.springframework.expression.spel.SpelEvaluationException: EL1012E: Cannot index into a null value
Here's how I get the payload in the as argument to my transformer
#Transformer
public Map<String, Object> generateCardData(Map<String, Object> payload,
#Header("X-UPSTREAM-WEBHOOK-SOURCE") String projectId) {
followed by
StandardEvaluationContext evaluationContext = evaluationContextFactory.getObject();
and here's how I evaluate it
new SpelExpressionParser().parseExpression(issueDataMap.get(key)).getValue(
evaluationContext, payload, String.class)));
I have the app annotated with #SpringBootApplication and #EnableInegrationand I autowire an instance of IntegrationEvaluationContextFactoryBean to get the StandardEvaluationContext
I also tried the variant
issueDataMap = {LinkedHashMap#4867} size = 3
"name" -> "payload['fields']['summary']"
"description" -> "payload['description']"
"text3" -> "payload['key']"
but then I get
EL1008E: Property or field 'payload' cannot be found on object of type 'java.util.LinkedHashMap' - maybe not public or not valid?

First of all it is not clear why would one use SpEL in the code manually, when you have full access to the object. Plus you should keep in mind that create StandardEvaluationContext, parse an expression and evaluate it on every single call is kinda an overhead by performance. You probably just need to change your generateCardData() signature to accept a result of the expression instead of the whole map. See #Payload.expression attribute.
Anyway this is not what you would like to hear for your problem. And it is here:
getValue(evaluationContext, payload, String.class))). The root evaluation object is your payload - a Map. So, what you just need to assume in your expression definition that you get access to that root object. Therefore expressions must be like this: fields.summary, description, key.
You typically see in the docs and samples a payload (or header) as a first token in the expression. That is just because Spring Integration uses a Message as a root object for expressions to evaluate.
Now in regards to performance. Even if your logic to select an expression by some key at runtime (issueDataMap.get(key)), you still could parse it only once.

Compile time check while passing values to a function in Kotlin Android

I am taking a JSON file as input for a class and parsing the values using gson through respective data classes.
I want to call a function that takes a String value as an argument.
The string value allowed is decided from the values parsed from JSON file. Can I somehow check for that string value passed to the function at compile-time & give an error at compile-time?
Or If I can allow only certain values in the argument for the function based on the values from JSON
Detailed Explanation of use case:
I am building a SDK in which a the person using sdk inputs json String. The json is standardised and is parsed in my code.
{
"name": "Test",
"objects": [
{
"name": "object1",
"type": "object1"
}
]
}
Here name values and other values may vary based on the input by the developer using it but key remains same. But we need to call a function using the value in objects name parameter.
fun testMethod(objectName:String)
So developer calls the testMethod as testMethod(object1).
I need to validate object1 parameter based on json but is there any way possible restricting the test method parameter to object1 only & give error at compile time if the developer calls testMethod(obj1)
Right now I parse JSON & have checks inside the testMethod()

Sure it's possible to do, but somehow in different way, that you described. First of all, as you already mentioned this behavior could be done easily. For this purpose we have Objects.requireNotNull() or Guava.Preconditions(). At the same way you can define you checking but this will work on runtime only.
To do in compile time, you need to create Annotation Preprocessor. The same, as did in different libraries, and one of them, could be Lombok, with their NotNull and Nullable. Android annotation just provide mark and bound for IDE warning, but in their case they adding NotNull checking and throw exception for every annotation usage during compile time.
It's not an easy way, but it's what you are looking for.

No, it's impossible check it in compiler time. It's string handling, as numeric calculation.
In my app, I convert string to JSON and JSON to string, passing class descriptor. My aim is record JSON string in a text file to load in SQLite database. This code I've run in my desktop computer not in Android.
data class calcDescr (
...
)
val calc = CalcDescr(...)
// toJson: internal Kotlin data to JSON
val content = Gson().toJson(calc)
//==================
// Testing validity
// ================
// fromJson: JSON to internal Kotlin data.
// It needs pass the class descriptor. Uses *Java* token, but it's *Kotlin*
var testModel = Gson().fromJson(content, CalcDescr::class.java)
// toJson: internal Kotlin data to JSON again
var contentAgain = Gson().toJson(testModel)
// shoul be equal!
if (content == contentAgain) println("***ok***")
In my file, I write the variable content in a file

Kafka streams not using serde after repartitioning

My Kafka Streams application is consuming from a kafka topic that is using the following key-value layout:
String.class -> HistoryEvent.class
When printing my current topic this can be confirmed:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic flow-event-stream-file-service-test-instance --property print.key=true --property key.separator=" -- " --from-beginning
flow1 -- SUCCESS #C:\Daten\file-service\in\crypto.p12
"flow1" is the String key and the part after -- is the serialized value.
My flow is set up like this:
KStream<String, HistoryEvent> eventStream = builder.stream(applicationTopicName, Consumed.with(Serdes.String(),
historyEventSerde));
eventStream.selectKey((key, value) -> new HistoryEventKey(key, value.getIdentifier()))
.groupByKey()
.reduce((e1, e2) -> e2,
Materialized.<HistoryEventKey, HistoryEvent, KeyValueStore<Bytes, byte[]>>as(streamByKeyStoreName)
.withKeySerde(new HistoryEventKeySerde()));
So as far as I know I am telling it to consume the topic using String and HistoryEvent serde as this is what is in the topic. I then 'rekey' it to use a combined key which should be stored locally using the provided serde for HistoryEventKey.class. As far as I understand this will cause an additional topic to be created (can be seen with topic list in the kafka container) with the new key. This is fine.
Now the problem is the application is unable to start up even from a clean environment with just that one document in the topic:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=flow-event-stream-file-service-test-instance, partition=0, offset=0
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: HistoryEventSerializer) is not compatible to the actual key or value type (key type: HistoryEventKey / value type: HistoryEvent). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
It is kinda hard to tell from the message where exactly the issue is. It says in my base topic but that is not possible as the key there is not of type HistoryEventKey. Since I have provided a serde for HistoryEventKey in the reduce it also cannot be with the local store.
The only thing that makes sense to me is that it is related to the selectKey operation that causes a rearranging and a new topic. However I am not able to figure out how I can provide the serde to that operation. I do not want to set it as a default, because it is not the default key serde.

After doing some more debugging of the execution I was able to figure out that the new topic is created in the groupByKey step. You can provide a Grouped instance that offers the possibility to specify the Serde used for key and value:
eventStream.selectKey((key, value) -> new HistoryEventKey(key, value.getIdentifier()))
.groupByKey(Grouped.<HistoryEventKey, HistoryEvent>as(null)
.withKeySerde(new HistoryEventKeySerde())
.withValueSerde(new HistoryEventSerde())
)
.reduce((e1, e2) -> e2,
Materialized.<HistoryEventKey, HistoryEvent, KeyValueStore<Bytes, byte[]>>as(streamByKeyStoreName)
.withKeySerde(new HistoryEventKeySerde()));

I've encountered a very similar error message, yet I had no groupbys, but joins instead. I'm posting here for the next person that googles around.
org.apache.kafka.streams.errors.StreamsException: ClassCastException while producing data to topic my-processor-KSTREAM-MAP-0000000023-repartition. A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: org.apache.kafka.common.serialization.StringSerializer) is not compatible to the actual key or value type (key type: java.lang.String / value type: com.mycorp.mySession). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters (for example if using the DSL, `#to(String topic, Produced<K, V> produced)` with `Produced.keySerde(WindowedSerdes.timeWindowedSerdeFrom(String.class))`).
Clearly, same as in the original question, I did not want to change the default serdes.
So in my case the solution was to pass a Joined instance in the join, which will allow to pass in the serdes. Note that the error message points to a repartition-MAP-... which is a bit of a red herring, because the fix goes somewhere else.
how I fixed it (a joined example)
//...omitted ...
KStream<String,MySession> mySessions = myStream
.map((k,v) ->{
MySession s = new MySession(v);
k = s.makeKey();
return new KeyValue<>(k, s);
});
// ^ the mapping causes the repartition, you can not however specify a serde in there.
// but in the join right below, we can pass a JOINED instance and fix it.
return enrichedSessions
.leftJoin(
myTable,
(session, info) -> {
session.infos = info;
return session; },
Joined.as("my_enriched_session")
.keySerde(Serdes.String())
.valueSerde(MySessionSerde())
);

set / get input / output parameters of a Camunda user task using Java API

I have simple workflow:
[start_workflow] -> [user_task] ->
-> [exclusive_gateway] -> (two routes see below) -> [end_workflow]
The [exclusive_gateway] has two outgoing routes:
1.) ${if user_task output paramterer == null} -> [NULL_service_task] -> [end_workflow]
2.) ${if user_task output paramterer != null} -> [NOT_null_service_task] -> [end_workflow]
In Camunda Modeler, I've added an output parameter (named out) to the [user_task].
Q:
How do I set thet output parameter through Java API before completing the task via:
taskService.complete(taskId);
On the [exclusive_gateway] arrows, I've set this:
Condition type = expression
Expression = ${out != null}
But there's more:
If I delete the output parameter of the [user_task] and set a runtimeService variable before completing the task:
runtimeService.setVariable(processInstanceId, "out", name);
The [exclusive_gateway] does handle the parameter, and routes the flow as expected.
Without deleting the output parameter of the [user_task] it seems like:
1. it is never set (so == null)
2. this null value overwrites the value set by
runtimeService.setVariable(processInstanceId, "out", name);
So can I set a task's output parameter via Java API or I can only use process variables?

I guess you are looking for
taskService.complete(<taskId>, Variables.putValue("out", <name>);
the communication between task and gateway (forwarding of the value) happens through setting the process-variable "out" on complete.
for more info, check the javadoc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kafka streams - grouping by value property? - java

Related

Finding min/max values in a KafkaStream (KStream) object

spring integration spel expression to extract fields from message payload

Compile time check while passing values to a function in Kotlin Android

Kafka streams not using serde after repartitioning

set / get input / output parameters of a Camunda user task using Java API

Categories

Resources