Processing multiple patterns in Flink CEP in Parallel on One stream data - java

I have following use case.
There is one machine which is sending event streams to Kafka which are being received by CEP engine where warnings are generated when conditions are satisfied on the Stream data.
FlinkKafkaConsumer011<Event> kafkaSource = new FlinkKafkaConsumer011<Event>(kafkaInputTopic, new EventDeserializationSchema(), properties);
DataStream<Event> eventStream = env.addSource(kafkaSource);
Event POJO contains id, name, time, ip.
Machine will send huge data to Kafka and there are 35 unique event names from machine (like name1, name2 ..... name35) and I want to detect patterns for each event name combination (like name1 co-occurred with name2, name1 co-occurred with name3.. etc). I got totally 1225 combinations.
Rule POJO contains e1Name and e2Name.
List<Rule> ruleList -> It contains 1225 rules.
for (Rule rule : ruleList) {
Pattern<Event, ?> warningPattern = Pattern.<Event>begin("start").where(new SimpleCondition<Event>() {
#Override
public boolean filter(Event value) throws Exception {
if(value.getName().equals(rule.getE1Name())) {
return true;
}
return false;
}
}).followedBy("next").where(new SimpleCondition<Event>() {
#Override
public boolean filter(Event value) throws Exception {
if(value.getName().equals(rule.getE2Name())) {
return true;
}
return false;
}
}).within(Time.seconds(30));
PatternStream patternStream = CEP.pattern(eventStream, warningPattern);
}
Is this correct way to execute multiple patterns on one stream data or is there any optimized way to achieve this. With above approach we are getting PartitionNotFoundException and UnknownTaskExecutorException and memory issues.

IMO you don't need patterns to achieve your goal. You can define a stateful map function to the source, which maps event names as pairs (latest two names). After that, window the source to 30 seconds and apply the simple WordCount example to the source.
Stateful map function can be something like this (accepting only event name, you need to change it according to your input -extract event name etc.):
public class TupleMap implements MapFunction<String, Tuple2<String, Integer>>{
Tuple2<String, String> latestTuple = new Tuple2<String, String>();
public Tuple2<String, Integer> map(String value) throws Exception {
this.latestTuple.f0 = this.latestTuple.f1;
this.latestTuple.f1 = value;
return new Tuple2<String, Integer>(this.latestTuple.f0 + this.latestTuple.f1, 1);
}
}
and result with event name pairs and occurrence count as a tuple can be obtained like this (written to a kafka sink maybe?):
DataStream<Tuple2<String, Integer>> source = stream.map(new TupleMap());
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = source.keyBy(0).timeWindow(Time.seconds(30)).sum(1);

Related

Can this code be reduced using Java 8 Streams?

I want to use Java 8 lambdas and streams to reduce the amount of code in the following method that produces an Optional. Is it possible to achieve?
My code:
protected Optional<String> getMediaName(Participant participant) {
for (ParticipantDevice device : participant.getDevices()) {
if (device.getMedia() != null && StringUtils.isNotEmpty(device.getMedia().getMediaType())) {
String mediaType = device.getMedia().getMediaType().toUpperCase();
Map<String, String> mediaToNameMap = config.getMediaMap();
if (mediaMap.containsKey(mediaType)) {
return Optional.of(mediaMap.get(mediaType));
}
}
}
return Optional.empty();
}
Yes. Assuming the following class hierarchy (I used records here).
record Media(String getMediaType) {
}
record ParticipantDevice(Media getMedia) {
}
record Participant(List<ParticipantDevice> getDevices) {
}
It is pretty self explanatory. Unless you have an empty string as a key you don't need, imo, to check for it in your search. The main difference here is that once the map entry is found, Optional.map is used to return the value instead of the key.
I also checked this out against your loop version and it works the same.
public static Optional<String> getMediaName(Participant participant) {
Map<String, String> mediaToNameMap = config.getMediaMap();
return participant.getDevices().stream()
.map(ParticipantDevice::getMedia).filter(Objects::nonNull)
.map(media -> media.getMediaType().toUpperCase())
.filter(mediaType -> mediaToNameMap.containsKey(mediaType))
.findFirst()
.map(mediaToNameMap::get);
}
Firstly, since your Map of media types returned by config.getMediaMap() doesn't depend on a particular device, it makes sense to generate it before processing the collection of devices. I.e. regurless of the approach (imperative or declarative) do it outside a Loop, or before creating a Stream, to avoid generating the same Map multiple times.
And to implement this method with Streams, you need to use filter() operation, which expects a Predicate, to apply the conditional logic and map() perform a transformation of stream elements.
To get the first element that matches the conditions apply findFirst(), which produces an optional result, as a terminal operation.
protected Optional<String> getMediaName(Participant participant) {
Map<String, String> mediaToNameMap = config.getMediaMap();
return participant.getDevices().stream()
.filter(device -> device.getMedia() != null
&& StringUtils.isNotEmpty(device.getMedia().getMediaType())
)
.map(device -> device.getMedia().getMediaType().toUpperCase())
.filter(mediaToNameMap::containsKey)
.map(mediaToNameMap::get)
.findFirst();
}

How do I get the key from Map in a List and do a get rest request in Java

I want to do a get request and want to get all the jobs who have the "teacher" in it. In this case the map value 2 and 3.
How do I get there if I execute the code below then I only get the value from the Map 2. The best way is over a List? But how do I do it?
I have this method.
#GET
#Path("jobs")
public Job getJobsId(#QueryParam("id") int id, #QueryParam("description") String description) {
for (final Map.Entry<Integer, Job> entry : jobs.entrySet()) {
if (entry.getValue().getDescription().toLowerCase().contains(description.toLowerCase())) {
System.out.println("I DID IT");
System.out.println(entry.getKey());
return berufe.get(entry.getKey());
}
}
return berufe.get(id);
}
and this Map:
jobs.put(1, new jobs(1, 1337, "student"));
jobs.put(2, new jobs(2, 420, "teacher"));
jobs.put(3, new jobs(3, 69, "schoolteacher"));
---------------------------------EDIT----------------------------------
If I do this:
#GET
#Path("jobs")
public Collection<Job> getJobsId(#QueryParam("id") int id, #QueryParam("description") String description) {
final Set<Beruf> result = new HashSet<>();
for (final Map.Entry<Integer, Job> entry : jobs.entrySet()) {
if (entry.getValue().getDescription().toLowerCase().contains(description.toLowerCase()) == true) {
result.add(jobs.get(entry.getKey()));
} else {
return jobs.values();
}
}
return result;
}
I get with a discription all Map values back and without I get an Error.
What do I do wrong here?
Your method getJobsId() returns on the first item it finds (inside the if statement), that's why you only get one result.
The usual pattern to collect a number of results would be to instantiate a suitable Collection before the for loop, and add each item that is found to this collection. Then, after the for loop, return the collection. I don't understand your code completely, but it would be something similar to the below (I'm sure it won't work if you just copy-paste, so read and understand what is going on ;-) ):
Set<Beruf> result = new HashSet<>();
for (final Map.Entry<Integer, Job> entry : jobs.entrySet()) {
if (entry.getValue().getDescription().toLowerCase().contains(description.toLowerCase())) {
result.add(berufe.get(entry.getKey()));
}
}
return result;
Since Java 8, it has been much more concise (readable) to use the Streams interface and call .filter (and maybe .map) on it, then .collect the stream to a suitable collection and return that. Something similar to:
return jobs.entrySet().stream()
.filter(entry -> (entry.getValue().getDescription().toLowerCase().contains(description.toLowerCase())))
.map(entry -> berufe.get(entry.getKey()))
.collect(Collectors.toSet());
A function can return only one element. If there more than one hit, you have to return an object that can take on multiple elements. As you already mentioned a list, as an representant of a collection, can do this job. For your use case is a map a better option:
#GET
#Path("jobs")
public Job getJobsId(#QueryParam("id") int id, #QueryParam("description") String description)
{
//Declare an object that can take on multiple elements
var Map<Job> jobList = new HashMap<>();
for (final Map.Entry<Integer, Job> entry : jobs.entrySet())
{
if (entry.getValue().getDescription().toLowerCase().contains(description.toLowerCase()))
{
System.out.println("I DID IT");
System.out.println(entry.getKey());
jobList.put(entry.getKey(), job.get(entry.getKey()));
}
}
return jobList.get(id);
}
First of all, you should do is design the API correctly, for example:
#GET
#Path("jobs")
public List<Integer> searchJobsIdByDescription(#QueryParam("description") String description) {
...
}
Important: you must give a suitable name to the method (f.e: searchJobsIdByDescription), your method should only do one thing, so you declare only the parameters needed to do that thing (f.e: (#QueryParam("description") String description)) and return the expected type (List<Integer>).
Pay attention to the annotation #Path("jobs"), you have to avoid matching the previous ones
Next, you should carefully read #frIV's answer

Dynamic grouping and aggregation on List<Map<String, Object>> - Java 8

#Test
public void testAggregation() {
List<Map<String, Object>> joinedList = new ArrayList<>();
Map<String, Object> Myrecord = new HashMap<> ();
Map<String, Object> Myrecord2 = new HashMap<> ();
Map<String, Object> Myrecord3 = new HashMap<> ();
Myrecord.put("ad_id", 8710);
Myrecord.put("medium_type", 2);
Myrecord.put("impressions", 36);
joinedList.add(Myrecord);
Myrecord2.put("ad_id", 8710);
Myrecord2.put("medium_type", 2);
Myrecord2.put("impressions", 1034);
joinedList.add(Myrecord2);
Myrecord3.put("ad_id", 9000);
Myrecord3.put("medium_type", 2);
Myrecord3.put("impressions", 10);
joinedList.add(Myrecord3);
System.out.println("Myrecord:" + joinedList);
//joinedList: [{ad_id=8710, impressions=36, medium_type=2}, {ad_id=8710, impressions=1034, medium_type=2}, {ad_id=9000, impressions=10, medium_type=2}]
}
I have a use case wherein I need to extract the same set of schema from two tables and aggregate the data from both the table. My idea is to query the tables separately and keep them in a List> and merge them. Once I merge them - the sample output looks like below
//joinedList: [{ad_id=8710, medium_type=2, impressions=36}, {ad_id=8710, medium_type=2, impressions=1034}, {ad_id=9000, medium_type=2, impressions=10}]
I want to perform a groupby operation on the dimensions(ad_id and medium_type which can be dynamic and vary on user input) and aggregate the metrics (which are also dynamic and vary on user input). In the example, groupby on ad_id and medium_type in the example above and aggregate the metric impressions and eventually, the result should be as below
//final output: [{ad_id=8710, medium_type=2, impressions=1070}, {ad_id=9000, medium_type=2,
impressions=10}]
NOTE: the group by fields(ad_id, medium_type above) can be dynamic and are driven by what the user inputs. They can be anything apart from ad_id, medium_type. Likewise with metrics as well and the user might be interested in impression, clicks, metric3, metric4.
Since this wouldn't fit in the comments. I believe your actual problem is boiling down to looking for dynamism while defining operations to be performed. But as pointed out in comments as well, this would need some conclusive set of operations to decide over an approach.
With the current description, for example you could have performed something on the lines of(note the comments could be highlighting an actual problem you're asking):
#AllArgsConstructor
static class Record {
Integer adId;
Integer mediumType;
Long impressions;
// note these are identity only for sum operation
static Record identity() {
return new Record(0, 0, 0L);
}
static Function<Record, List<Object>> classifierToGroupBy() {
return r -> Arrays.asList(r.adId, r.mediumType); // make this dynamic
}
static BinaryOperator<Record> mergeOperationInDownstream() {
return (a, b) -> new Record(a.adId, a.mediumType,
a.impressions + b.impressions); //dynamic field and operation selection
}
}
public List<Record> processData(List<Record> records) {
return new ArrayList<>(records.stream()
.collect(Collectors.groupingBy(Record.classifierToGroupBy(),
Collectors.reducing(Record.identity(), Record.mergeOperationInDownstream())))
.values());
}

Kafka streams, branched output to multiple topics

In my DSL based transformation, I have a stream-->branch, where in I want branched output redirected to multiple topics.
Current branch.to() method accepts only a String.
Is there any simple option with stream.branch where I can route the result to multiple topics. With a consumer, I can subscribe to multiple topics by providing an array of string as topics.
My problem requires me to take multiple actions if particular predicate satisfies a query.
I tried with stream.branch[index].to(string), but this is not sufficient for my requirement. I am looking for something like stream.branch[index].to(string array of topics) or stream.branch[index].to(string).
I expect the branch.to method with multiple topics or is there any alternate way to achieve the same with streams?
adding sample code.Removed actual variable names.
My Predicates
Predicate <String, MyDomainObject> Predicate1 = new Predicate<String, MyDomainObject>() {
#Override
public boolean test(String key, MyDomainObject domObj) {
boolean result = false;
if condition on domObj
return result;
}
};
Predicate <String, MyDomainObject> Predicate2 = new Predicate<String, MyDomainObject>() {
#Override
public boolean test(String key, MyDomainObject domObj) {
boolean result = false;
if condition on domObj
return result;
}
};
KStream <String, MyDomainObject>[] branches= myStream.branch(
Predicate1, Predicate2
);
// here I need your suggestions.
// this is my current implementation
branches[0].to(singleTopic),
Produced.with(Serdes.String(), Serdes.serdeFrom(inSer, deSer)));
// I want to send notification to multiple topics. something like below
branches[0].to(topicList),
Produced.with(Serdes.String(), Serdes.serdeFrom(inSer, deSer)));
If you know to which topics you want to send the data, you can do the following:
branches[0].to("first-topic");
branches[0].to("second-topic");
// etc.

Remove Element From Map Using Filter

I have a java.util.Map inside an rx.Observable and I want to filter the map (remove an element based on a given key).
My current code is a mix of imperative and functional, I want to accomplish this goal without the call to isItemInDataThenRemove.
public static Observable<Map<String, Object>> filter(Map<String, Object> data, String removeKey) {
return Observable.from(data).filter((entry) -> isItemInDataThenRemove(entry,removeKey));
}
private static boolean isItemInDataThenRemove(Map<String, Object> data, String removeKey) {
for (Map.Entry<String,Object> entry : data.entrySet()) {
if(entry.getKey().equalsIgnoreCase(removeKey)) {
System.out.printf("Found element %s, removing.", removeKey);
data.remove(removeKey);
return true;
}
}
return false;
}
The code you have proposed has a general problem in that it modifies the underlying stream while operating on it. This conflicts with the general requirement for streams for non-interference, and often in practice means that you will get a ConcurrentModificationException when using streams pipelines with containers that remove objects in the underlying container.
In any case (as I learned yesterday) there is a new default method on the Collection class that does pretty much exactly what you want:
private static boolean isItemInDataThenRemove(Map<String, Object> data, String removeKey) {
return data.entrySet().removeIf(entry -> entry.getKey().equalsIgnoreCase(removeKey));
}
WORKING CODE:
private static boolean isItemInDataThenRemove(Map<String, Object> data, String removeKey) {
data.entrySet().stream().filter(entry ->
entry.getKey().equalsIgnoreCase(removeKey)).forEach(entry -> {
data.remove(entry.getKey());
});
return true;
}

Categories

Resources