Stateful Complex event processing with apache flink

Stateful Complex event processing with apache flink - java

I want to detect if two events happen in a defined timeframe based on two events that have the same identifier. For example a DoorEvent looks like this:
<doorevent>
<door>
<id>1</id>
<status>open</status>
</door>
<timestamp>12345679</timestamp>
</doorevent>
<doorevent>
<door>
<id>1</id>
<status>close</status>
</door>
<timestamp>23456790</timestamp>
</doorevent>
My DoorEvent java class in the example below has the same structure.
I want to detect that door with id 1 closes within 5 minutes of opening. I try to use the Apache flink CEP library for this purpose. The incoming stream contains all open and close messages from lets say 20 doors.
Pattern<String, ?> pattern = Pattern.<String>begin("door_open").where(
new SimpleCondition<String>() {
private static final long serialVersionUID = 1L;
public boolean filter(String doorevent) {
DoorEvent event = new DoorEvent().parseInstance(doorevent, DataType.XML);
if (event.getDoor().getStatus().equals("open")){
// save state of door as open
return true;
}
return false;
}
}
)
.followedByAny("door_close").where(
new SimpleCondition<String>() {
private static final long serialVersionUID = 1L;
public boolean filter(String doorevent) throws JsonParseException, JsonMappingException, IOException {
DoorEvent event = new DoorEvent().parseInstance(doorevent, DataType.XML);
if (event.getDoor().getStatus().equals("close")){
// check if close is of previously opened door
return true;
}
return false;
}
}
)
.within(Time.minutes(5));
How do I save the state of door 1 as open in the door_open so that in the door_close step I know that door 1 is the one being closed and it is not some other door?

If you have Flink 1.3.0 and above its really straightforard what you want to do
Your pattern would look something like this:
Pattern.<DoorEvent>begin("first")
.where(new SimpleCondition<DoorEvent>() {
private static final long serialVersionUID = 1390448281048961616L;
#Override
public boolean filter(DoorEvent event) throws Exception {
return event.getDoor().getStatus().equals("open");
}
})
.followedBy("second")
.where(new IterativeCondition<DoorEvent>() {
private static final long serialVersionUID = -9216505110246259082L;
#Override
public boolean filter(DoorEvent secondEvent, Context<DoorEvent> ctx) throws Exception {
if (!secondEvent.getDoor().getStatus().equals("close")) {
return false;
}
for (DoorEvent firstEvent : ctx.getEventsForPattern("first")) {
if (secondEvent.getDoor().getEventID().equals(firstEvent.getDoor().getEventId())) {
return true;
}
}
return false;
}
})
.within(Time.minutes(5));
So basically you can use IterativeConditions and get the context for the first patterns which are matched and iterate over that list while comparing for the one you need and proceed as you want.
IterativeConditions are expensive and should be handled accordingly
For more information on conditions check here at Flink - Conditions

Related

LastModifiedFileListFilter for Sftp inbound adapter

I am trying to implement LastModifiedFileListFilter as it looks like there is no similar filter for spring-integration-sftp yet for 5.3.2 release, I tried to copy the LastModifiedFileListFilter from spring-integration-file but the discard callback isn't working. Here is my implementation:
#Slf4j
#Data
public class LastModifiedLsEntryFileListFilter implements DiscardAwareFileListFilter<ChannelSftp.LsEntry> {
private static final long ONE_SECOND = 1000;
private static final long DEFAULT_AGE = 30;
private volatile long age = DEFAULT_AGE;
#Nullable
private Consumer<ChannelSftp.LsEntry> discardCallback;
public LastModifiedLsEntryFileListFilter(final long age) {
this.age = age;
}
#Override
public List<ChannelSftp.LsEntry> filterFiles(final ChannelSftp.LsEntry[] files) {
final List<ChannelSftp.LsEntry> list = new ArrayList<>();
final long now = System.currentTimeMillis() / ONE_SECOND;
for (final ChannelSftp.LsEntry file : files) {
if (this.fileIsAged(file, now)) {
log.info("File [{}] is aged...", file.getFilename());
list.add(file);
} else if (this.discardCallback != null) {
log.info("File [{}] is still being uploaded...", file.getFilename());
this.discardCallback.accept(file);
}
}
return list;
}
#Override
public boolean accept(final ChannelSftp.LsEntry file) {
if (this.fileIsAged(file, System.currentTimeMillis() / ONE_SECOND)) {
return true;
}
else if (this.discardCallback != null) {
this.discardCallback.accept(file);
}
return false;
}
private boolean fileIsAged(final ChannelSftp.LsEntry file, final long now) {
return file.getAttrs().getMTime() + this.age <= now;
}
#Override
public void addDiscardCallback(#Nullable final Consumer<ChannelSftp.LsEntry> discardCallbackToSet) {
this.discardCallback = discardCallbackToSet;
}
}
The filter is able to correctly identify the age of file and discards it but that file is not retried which I believe is part of discard callback.
I guess my question is how to set discard callback to keep retrying the discarded file until files ages. Thanks

not retried which I believe is part of discard callback.
I wonder what makes you think that way...
The fact that FileReadingMessageSource with its WatchService option has the logic like this:
if (filter instanceof DiscardAwareFileListFilter) {
((DiscardAwareFileListFilter<File>) filter).addDiscardCallback(this.filesToPoll::add);
}
doesn't mean that SFTP implementation is similar.
The retry is there anyway: on the next poll not accepted file will be checked again.
You probably don't show other filters you use, and your file is filtered out before this LastModifiedLsEntryFileListFilter, e.g. with the SftpPersistentAcceptOnceFileListFilter. You need to consider to have your "last-modified" as a first one in the chain.
If you are not going to support discard callback from the outside, you probably don't need to implement that DiscardAwareFileListFilter at all.

How to implement a Flink Event Time Trigger that emits after no events recieved for X minutes

I'm struggling a bit understanding how Flink Triggers work. My datastream contains events with a sessionId that I aggregated based on that sessionId. Each session will contain a Started and a Ended event however some times the Ended event will be lost.
In order to handle this I've set up a Trigger that will emit the aggregated session whenever the ended event is processed. But in the case that no events arrive from that session for 2 minutes I want to emit whatever we have aggregated so far (our apps that send the events send heartbeats every minute so if we don't get any events the session is considered lost).
I've set up the following trigger function:
public class EventTimeProcessingTimeTrigger extends Trigger<HashMap, TimeWindow> {
private final long sessionTimeout;
private long lastSetTimer;
// Max session length set to 1 day
public static final long MAX_SESSION_LENGTH = 1000l * 86400l;
// End session events
private static ImmutableSet<String> endSession = ImmutableSet.<String>builder()
.add("Playback.Aborted")
.add("Playback.Completed")
.add("Playback.Error")
.add("Playback.StartAirplay")
.add("Playback.StartCasting")
.build();
public EventTimeProcessingTimeTrigger(long sessionTimeout) {
this.sessionTimeout = sessionTimeout;
}
#Override
public TriggerResult onElement(HashMap element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
lastSetTimer = ctx.getCurrentProcessingTime() + sessionTimeout;
ctx.registerProcessingTimeTimer(lastSetTimer);
if(endSession.contains(element.get(Field.EVENT_TYPE))) {
return TriggerResult.FIRE_AND_PURGE;
}
return TriggerResult.CONTINUE;
}
#Override
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
return TriggerResult.FIRE_AND_PURGE;
}
#Override
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
return time == window.maxTimestamp() ?
TriggerResult.FIRE_AND_PURGE :
TriggerResult.CONTINUE;
}
#Override
public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
ctx.deleteProcessingTimeTimer(lastSetTimer);
}
#Override
public boolean canMerge() {
return true;
}
#Override
public void onMerge(TimeWindow window,
OnMergeContext ctx) {
ctx.registerProcessingTimeTimer(ctx.getCurrentProcessingTime() + sessionTimeout);
}
}
In order to set watermarks for the events I use the watermarks set by the apps since appEventTime might not be the same as wallClock on the server. I extract watermarks like this:
DataStream<HashMap> playerEvents = env
.addSource(kafkaConsumerEvents, "playerEvents(Kafka)")
.name("Read player events from Kafka")
.uid("Read player events from Kafka")
.map(json -> DECODER.decode(json, TypeToken.of(HashMap.class))).returns(HashMap.class)
.name("Map Json to HashMap")
.uid("Map Json to HashMap")
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<HashMap>(org.apache.flink.streaming.api.windowing.time.Time.seconds(30))
{
#Override
public long extractTimestamp(HashMap element)
{
long timestamp = 0L;
Object timestampAsObject = (Object) element.get("CanonicalTime");
timestamp = (long)timestampAsObject;
return timestamp;
}
})
.name("Add CanonicalTime as timestamp")
.uid("Add CanonicalTime as timestamp");
Now what I find strange is that when I run the code in debug and set a breakpoint in the clear function of the Trigger it constantly gets called. Even when no FIRE_AND_PURGE point is reached in the Trigger. So it feels like I've completely misunderstood how the Trigger is supposed to work. And that my implementation is not at all doing what I think it's doing.
I guess my question is, when should clear be called by the Trigger? And is this the correct way to implement a combined EventTimeTrigger and ProcessingTimeTrigger?
Thankful for all the help I can get.
UPDATE 1: (2020-05-29)
In order to provide some more information about how things are setup.
I set up my environment as follows:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRestartStrategy(RestartStrategies.failureRateRestart(60, Time.of(60, TimeUnit.MINUTES), Time.of(60, TimeUnit.SECONDS)));
env.enableCheckpointing(5000);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(2000);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
So I use EventTime for the entire stream.
I then create the windows like this:
DataStream<PlayerSession> playerSessions = sideEvents
.keyBy((KeySelector<HashMap, String>) event -> (String) event.get(Field.SESSION_ID))
.window(ProcessingTimeSessionWindows.withGap(org.apache.flink.streaming.api.windowing.time.Time.minutes(5)))
.trigger(new EventTimeProcessingTimeTrigger(SESSION_TIMEOUT))
.aggregate(new SessionAggregator())
.name("Aggregate events into sessions")
.uid("Aggregate events into sessions");

This situation is complex. I hesitate to predict exactly what this code will do, but I can explain some of what’s going on.
Point 1: you have set the time characteristic to event time, arranged for timestamps and watermarks, and implemented an onEventTime callback in your Trigger. But nowhere are you creating an event time timer. Unless I've missed something, nothing is actually using event time or watermarks. You haven't implemented an event time trigger, and I would not expect that onEventTime will ever be called.
Point 2: Your trigger doesn't need to call clear. Flink takes care of calling clear on triggers as part of purging windows.
Point 3: Your trigger is trying to fire and purge the window repeatedly, which doesn't seem right. I say this because you are creating a new processing time timer for every element, and when each timer fires, you are firing and purging the window. You can fire the window as often as you like, but you can only purge the window once, after which it is gone.
Point 4: Session windows are a special kind of window, known as merging windows. When sessions merge (which happens all the time, as events arrive), their triggers are merged, and one of them gets cleared. This is why you see clear being called so frequently.
Suggestion: since you have once-a-minute keepalives, and intend to close sessions after 2 minutes of inactivity, it seems like you could set the session gap to be 2 minutes, and that would avoid a fair bit of what's making things so complex. Let the session windows do what they're designed to do.
Assuming that will work, then you could simple extend Flink's ProcessingTimeTrigger and override its onElement method to do this:
#Override
public TriggerResult onElement(HashMap element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
if (endSession.contains(element.get(Field.EVENT_TYPE))) {
return TriggerResult.FIRE_AND_PURGE;
}
return super(element, timestamp, window, ctx);
}
In this fashion the window will be triggered after two minutes of inactivity, or by an explicit session-ending event.
You should be able to simply inherit the rest of ProcessingTimeTrigger's behavior.
If you want to use event time, then use EventTimeTrigger as the superclass, and you'll have to find a way to make sure that your watermarks make progress even when the stream becomes idle. See this answer for how to handle that.

same problem
I have set the time characteristic to proccessing time and the trigger :
//the trigger
.trigger(PurgingTrigger.of(TimerTrigger.of(Time.seconds(winSec))))
the following trigger function:
//override the ProcessingTimeTrigger behavior
public class TimerTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L;
private final long interval;
private final ReducingStateDescriptor<Long> stateDesc;
private TimerTrigger(long winInterValMills) { //window
this.stateDesc = new ReducingStateDescriptor("fire-time", new TimerTrigger.Min(), LongSerializer.INSTANCE);
this.interval = winInterValMills;
}
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception {
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
}
long now = System.currentTimeMillis();
ReducingState<Long> fireTimestamp = (ReducingState) ctx.getPartitionedState(this.stateDesc);
if (fireTimestamp.get() == null) {
long time = Math.max(timestamp, window.maxTimestamp()) + interval;
if (now-window.maxTimestamp()>interval){ // fire late
time = (now-now%1000) + interval-1;
}
ctx.registerProcessingTimeTimer(time);
fireTimestamp.add(time);
return TriggerResult.CONTINUE;
} else {
return TriggerResult.CONTINUE;
}
}
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception {
if (time == window.maxTimestamp()){
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
}
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = (ReducingState) ctx.getPartitionedState(this.stateDesc);
if (((Long) fireTimestamp.get()).equals(time)) {
fireTimestamp.clear();
long maxTimestamp = Math.max(window.maxTimestamp(), time); //maybe useless
if (maxTimestamp == time) {
maxTimestamp = time + this.interval;
}
fireTimestamp.add(maxTimestamp);
ctx.registerProcessingTimeTimer(maxTimestamp);
return TriggerResult.FIRE;
} else {
return TriggerResult.CONTINUE;
}
}
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = (ReducingState) ctx.getPartitionedState(this.stateDesc);
long timestamp = (Long) fireTimestamp.get();
ctx.deleteProcessingTimeTimer(timestamp);
fireTimestamp.clear();
}
public boolean canMerge() {
return true;
}
public void onMerge(W window, OnMergeContext ctx) {
ctx.mergePartitionedState(this.stateDesc);
}
#VisibleForTesting
public long getInterval() {
return this.interval;
}
public String toString() {
return "TimerTrigger(" + this.interval + ")";
}
public static <W extends Window> TimerTrigger<W> of(Time interval) {
return new TimerTrigger(interval.toMilliseconds());
}
private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L;
private Min() {
}
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}

Netty: How to limit websocket channel messages per second?

I need to limit messages received on websocket channel per second for netty server.
Could'n find any ideas how to do that.
Any ideas would be appreciated
Thank you

You need to add simple ChannelInboundHandlerAdapter handler to your pipeline and add the simple counter to channelRead(ChannelHandlerContext ctx, Object msg) method. I would recommend you to use some of CodaHale Metrics Class for that purpose.
Pseudo code:
private final QuotaLimitChecker limitChecker;
public MessageDecoder() {
this.limitChecker = new QuotaLimitChecker();
}
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
if (limitChecker.quotaReached(100)) { //assume limit is 100 req per sec
return;
}
}
Where QuotaLimitChecker is a class that increments counter and checks if the limit is reached.
public class QuotaLimitChecker {
private final static Logger log = LogManager.getLogger(QuotaLimitChecker.class);
private final int userQuotaLimit;
//here is specific implementation of Meter for your needs
private final InstanceLoadMeter quotaMeter;
public QuotaLimitChecker(int userQuotaLimit) {
this.userQuotaLimit = userQuotaLimit;
this.quotaMeter = new InstanceLoadMeter();
}
public boolean quotaReached() {
if (quotaMeter.getOneMinuteRate() > userQuotaLimit) {
log.debug("User has exceeded message quota limit.");
return true;
}
quotaMeter.mark();
return false;
}
}
Here is my implementation of QuotaLimitChecker that uses the simplified version Meter class of CodaHale Metrics library.

PropertyChangeListener usage check

Good evening SO,
I wrote two pretty simple classes for one of my projects. This is the first time I met such a problem, so I would like to ask you if I managed to tackle the problem the right way / with good implementation.
The background is quite simple: You have a Channel that might be either busy or not. If the Channel is busy, it means it was taken by a ServiceRequest. Once the request is processed the Channel should be open again.
I googled a little bit and found ideas to use the PropertyChangedListener. Code down below. Please give me all the comments you can regarding the code quality / problem solving here. Thank you!
Test:
#Unroll
def "when request is processed and finished channel is free again"() {
given:
def channel = new Channel()
def request = new ServiceRequest()
request.addPropertyChangeListener(channel)
when:
channel.setRequest(request)
request.finish();
then:
assert !channel.isBusy() && channel.request == null
}
Channel class:
public class Channel implements PropertyChangeListener{
private boolean busy;
private ServiceRequest request;
public Channel() {
this.busy = false;
}
public boolean isBusy() {
return busy;
}
public ServiceRequest getRequest() {
return request;
}
public void setRequest(final ServiceRequest request) {
this.request = request;
busy = true;
}
#Override
public void propertyChange(PropertyChangeEvent evt) {
request = null;
busy = false;
}
ServiceRequest class:
public class ServiceRequest {
PropertyChangeSupport support = new PropertyChangeSupport(this);
private String id;
public ServiceRequest() {
id = "randomlygeneratedid";
}
void addPropertyChangeListener(final PropertyChangeListener l) {
support.addPropertyChangeListener(l);
}
public void finish() {
id = "";
support.firePropertyChange("id", id, "");
support.firePropertyChange("request", null, null);
support.firePropertyChange("busy", null, false);
}
public String getId() {
return id;
}

Json Serialization accepts numeric input for boolean variable

I am developing a REST API using spring MVC.
One of the service is used to fetch the train details with some request object .
Request object :
TrainRequest implements Serializable {
private static final long serialVersionUID = 6280494678832642677L;
private String travelMonth;
private boolean departOnly;
}
I have used below request to test the service.
{
"travelMonth": "DEC2016",
"departOnly":0
}
I have seen that 0 is serialized and assigned to departOnly as false .Also I tested with other than 0 and got departOnly as true.
But I dont want to have numeric input for boolean variables.
Help me how to restrict in Spring validation or Java to have only true/false in defined boolean varaible.

TrainRequest implements Serializable {
private static final long serialVersionUID = 6280494678832642677L;
private String travelMonth;
private int departOnlyInt; // Serialized instead of departOnly
private transient boolean departOnly; // Not serialized
............................
private void writeObject(ObjectOutputStream o) throws IOException {
departOnlyInt = departOnly ? 1 : 0;
o.defaultWriteObject();
}
private void readObject(ObjectInputStream o) throws IOException,
ClassNotFoundException {
o.defaultReadObject();
switch(departOnlyInt) {
case 0:
departOnly = false; break;
case 1:
departOnly = true; break;
default:
throw new IOException ("Invalid boolean: " + departOnlyInt);
}
}
..................
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Stateful Complex event processing with apache flink - java

Related

LastModifiedFileListFilter for Sftp inbound adapter

How to implement a Flink Event Time Trigger that emits after no events recieved for X minutes

Netty: How to limit websocket channel messages per second?

PropertyChangeListener usage check

Json Serialization accepts numeric input for boolean variable

Categories

Resources