Flink Statefun Bootstrap and State expiration

Flink Statefun Bootstrap and State expiration - java

According to this page we have the ability to set TTL for state when using Flink Statefun v2.1.0.
We also have the ability to bootstrap state, according to this page.
First question is, bootstrap documentation does not mention state expiration at all. What is the correct way to do bootstrapping on states that have TTL? Can someone point me to an example?
The second question is, what happens if I set some state as expire after writing in 1 day and then bootstrap that state using 6 months worth data?
Is the whole bootstrapped state going to expire after literally 1 day?
If so, what can I do to have it expire 1 day worth of data after 1 day passes?

Yes, if that data hasn't been modified since it was loaded, it will all be deleted after one day.
To expire one day's worth of data every day: After bootstrapping the state, you could send yourself a delayed message, set to be delivered one day later. When it arrives, delete the oldest data and send another delayed message.

Related

Quartz trigger creation in large numbers

Our project is in airline domain and our system is managing the flights for an airline. The system get the flight details from external source as messages and we will always have flights up to 90 days in future in our DB.
We have a requirement to send a message to external system, X minutes before the departure time of a flight. For example 90 minutes before the departure of a flight, a message need to sent to external system. This need to happen for all the flight for a day.
We are planning to implement the solution like when a flight message comes into our system , we will create a quartz trigger for that flight to send the message 90 minutes before its departure time.
But the problem we are facing is there will be more than 300 flights in a day. That means at least 300 triggers are created in system for a single day and we think it will lead to any performance bottlenecks in the scheduler system.
Please suggest if there is any better alternate for this solution. Whether we can achieve it by just one trigger which will query database in frequent intervals and do the complex logic of sending flight message for all flights which satisfy the condition.

I found a way to solve the issue and posting it here so that somebody else can benefit from it.
Creating large number of quartz triggers is not at all a good idea and we resolved the issue just with two jobs.
The first job will run daily once at midnight and it will find all the flights for that day and calculate its message sending time. This information is written to a table.
Another job is created that runs every five minutes and read the table and see if any message need sent for for the current time and also any failed ones in previous attempt. Then it will send the message and update the status accordingly.

Creating Amazon SNS messages to be processed in the future

For the last few years we have used our own RM Application to process events related to our applications. This works by polling a database table every few minutes, looking for any rows that have a due date before now, and have not been processed yet.
We are currently making the transition to SNS, with SQS Worker tiers processing them. The problem with this approach is that we can't future date our messages. Our applications sometimes have events that we don't want to process until a week later.
Are there any design approaches, alternative services, clever tricks we could employ that would allow us to do achieve this?
One solution would be to keep our existing application running, at a simplified level, so all it does is send the SNS notifications when they are due, but the aim of this project is to try and do away with our existing app.

The database approach would be the wisest, being careful that each row is only processed once.
Amazon Simple Notification Service (SNS) is designed to send notifications immediately. There is no functionality for a delayed send (although some notification types are retried if they fail).
Amazon Simple Queue Service (SQS) does have a delay feature, but only up to 15 minutes -- this is useful if you need to do some work before the message is processed, such as copying related data to Amazon S3.
Given that your requirement is to wait until some future arbitrary time (effectively like a scheduling system), you could either start a process and tell it to sleep for a certain amount of time (a bad idea in case systems are restarted), or continue your approach of polling from a database.
If all jobs are scheduled for a distant future (eg at least one hour away), you theoretically only need to poll the database once an hour to retrieve the earliest scheduled time.

A week might be too long as SQS message retention itself is only 15 days. If you are okay with maximum retention of 15days, one idea is to keep the changing the visibility of a message every time you receive until it is ready for processing. The maximum allowed visibility timeout is 12 hours. More on visibility timeout and APIs for changing them,
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ChangeMessageVisibility.html
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html

I found this approach: https://github.com/alestic/aws-sns-delayed. Basically, you can use a step function with a wait step in there

How to design a Real Time Alerting System?

I have an requirement where I have to send the alerts when the record in db is not updated/changed for specified intervals. For example, if the received purchase order doesn't processed within one hour, the reminder should be sent to the delivery manager.
The reminder/alert should sent exactly at the interval (including seconds). If the last modified time is 13:55:45 means, the alert should be triggered 14:55:45. There could be million rows needs to be tracked.
The simple approach could be implementing a custom scheduler and all the records will registered with it. But should poll the database to look for the change every second and it will lead to performance problem.
UPDATE:
Another basic approach would be a creating a thread for each record and put it on sleep for 1 hour (or) Use some queuing concept which has timeout. But still it has performance problems
Any thoughts on better approach to implement the same?

probably using internal JMS queue would be better solution - for example you may want to use scheduled message feature http://docs.jboss.org/hornetq/2.2.2.Final/user-manual/en/html/examples.html#examples.scheduled-message with hornetq.
You can ask broker to publish alert message after exactly 1h. From the other hand during processing of some trading activity you can manually delete this message meaning that the trade activity has been processed without errors.

Use Timer for each reminder.i.e. If the last modified time is 17:49:45 means, the alert should be triggered 18:49:45 simply you should create a dynamic timer scheduling for each task it'll call exact after one hour.

It is not possible in Java, if you really insist on the "Real-timeness". In Java you may encouter Garbage collector's stop-the-world phase and you can never guarantee the exact time.
If the approximate time is also permissible, than use some kind of scheduled queue as proposed in other answers, if not, than use real-time Java or some native call.

If we can assume that the orders are entered with increasing time then:
You can use a Queue with elements that have the properties time-of-order and order-id.
Each new entry that is added to the DB is also enqueued to this Queue.
You can check the element at the start of the Queue each minute.
When checking the element at the start of the Queue, if an hour has passed from the time-of-order, then search for the entry with order-id in the DB.
If found and was not updated then send a notification, else dequeue it from the Queue .

How do I cache diffs in data for arbitrary time differences (java web service)

I have a Java Webserivce which querying a DB to return data to users. DB queries are expensive so I have Cron job which runs every 60 seconds to cache the current data in memcached.
Data elements 'close' after a time meaning they aren't returned by "get current data" requests. So these requests can utilize the cached data.
Clients use a feature called 'since' to get all the data that has changed since a particular timestamp (the last request's timestamp). This would return any closed data if that data closed during since that timestamp.
How can I effectively store the diffs/since data? Accessing the DB for every since request is too slow (and won't scale well), but because clients could request any since time, it makes it difficult to generate an all-purpose cache.
I tried having the cron job also build a since cache. It would do 'since' requests to have everything that changed since the last update, and attempted to force clients to request the timestamps which matched the cron job's since requests. But inconsistencies in how long the cron took plus neither the client nor corn job runs exactly every 60 seconds, so the small differences add up. This eventually results in some data closing, but the cache or the client misses it.
I'm not even sure what to search for to solve this.

I'd be tempted to stick a time expiring cache (eg ehcache with timeToLive set) in front of the database and have whatever process updated the database also put the data directly into the cache (resetting or removing an existing matching element). The webservice then just hits the cache (which is incredibly fast) on everything except its initial connection, filtering out the few elements that are too old and sending the rest on to the client. Gradually the old data gets dropped from the cache as its time to live passes. Then just make sure the cache gets pre populated when the service starts up.

Does your data has any time-stamping? We were having similar issues while caching here in my company, the time-stamping resolved it. You can use a "Valid-upto" timestamp with your data, so that your cache and client can know till when the data is valid.

Implementing notification in webapp

We are developing and web application which allows users to register for certain events. What application is supposed to do, is to send them few notifications to remind them that they have registered. There will be more than 1k user which can register for many events in very wide time range. We have to send notofications like 3 months, 1 month, 1 week and one day before event.
The first thing is that I have to determine if I need to send notification to a specific user.
I'm thinking about thread which will iterate over registrations and determine whether sending notification is required or not. If notification is required, should I do it right away or maybe put all of the objects that need it in some kind of cache and then send them (by another thread)?
Second thing is: if I made that thread - is better to put it and run next to application or embed that thread into application and, for example, start it in the context listener?
How You would solve this? Maybe there are better approaches?

I would not spawn my own threads for that, I would use a scheduler like Quartz and run daily or hourly jobs (I don't know what granularity you need) that would:
find upcoming events in 1 day, 1 week, 1 month, 3 months and users that should get notified about them.
create the notifications and send them
I would probably implement that using separate jobs (sending notifications is a different concern) and thus queue the results of the first part, this will give you more flexibility. And the first part could be done by a unique job scheduled with different time frame parameters (1 day, 1 week, 1 month, 3 months).

Tabling the question about how to schedule the notifications once they're identified, I'd recommend looping over upcoming events, instead of over all users. It seems very likely that you'll have many more users than events (especially if you limit your scan to events that happen exactly 1 week, 1 month and 3 months in the future).
As far as the notifications, I think marking notifications to be sent first, then processing all the marked notifications will allow for more optimization than sending out notifications as part of your scan. If you have a queue of notifications to be sent out, you could then send each affected user one email including multiple events in the same time.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flink Statefun Bootstrap and State expiration - java

Related

Quartz trigger creation in large numbers

Creating Amazon SNS messages to be processed in the future

How to design a Real Time Alerting System?

How do I cache diffs in data for arbitrary time differences (java web service)

Implementing notification in webapp

Categories

Resources