I'm trying to figure out the best way to trigger all actors of a certain type in a sharded cluster, based on a time schedule (e.g. at 8am, 9am, etc - in cron-like fashion).
My plan was to have a single "timer" actor in the cluster that would send out a broadcast message to other cluster's actors on schedule. However, I'm not sure if this is workable and optimal. Akka Scheduler doesn't provide cron-like configuration. akka-quartz-scheduler doesn't seem to be fitted for Akka cluster.
Is it possible at all to trigger schedule-based actions from inside sharded Akka cluster, perhaps using some other framework's capabilities such as Spring scheduling? Or it's better to deploy a scheduling service outside of sharded Akka cluster, and use it to send periodic triggering events to Akka cluster?
Also, is it possible to broadcast a message to all actors of certain type in sharded Akka cluster?
Is it possible at all to trigger schedule-based actions from inside sharded Akka cluster?
In general yes. But it depends on the precision of the scheduled events and the type of the schedule. Let's assume a simple timetable schedule, e.g. everyday at 08:00, with no interest in high precision. It is possible to create an actor with TimerScheduler, lets name this actor TimerSchedulerActor. Given a specific time schedule e.g. everyday at 08:00 TimerSchedulerActor calculates the next time that the alarm needs to go off e.g. 01.09.2020 08:00, then TimerSchedulerActor compute the duration that it needs to wait from the current time java.lang.System.currentTimeMillis to 01.09.2020 08:00. When the TimerScheduler goes off TimerSchedulerActorsends the message and calculates the next alarm timestamp for this schedule.
If one TimerSchedulerActor is responsible for all messages it should be made sure that there is only TimerSchedulerActor running (Singleton actor) as multiple TimerSchedulerActors would send multiple messages for each scheduled event. You could also split have different TimerSchedulerActors to notify different groups of Actors or to be responsible for different events.
Or it's better to deploy a scheduling service outside of sharded Akka cluster, and use
it to send periodic events to Akka cluster?
It could be easier to maintain, easier to deploy and easier debug a scheduling service inside the Akka Cluster. However, the answer depends on the author skills and experience. Someone familiar with a scheduling system outside Akka, (e.g. cron) can perhaps find a faster solution by thinking outside Akka. I would advocate for a solution inside Akka Cluster as it offers more flexibility. For example, if facing a request to change cron (or another external system) from inside the java cluster the complexity would increase.
Also, is it possible to broadcast a message to all actors of certain type in sharded Akka cluster?
It depends on what the actor type is. Cluster Receptionist can be used to look up specific actors. If you are in control of the message and can add the desired actor type inside the messages and actors a good strategy could be to send the message to all potential actors of this type and let them act on the message based on their type. Actors of the correct type can act on the message and actors of another type could just ignore it.
You may either:
Expose an HTTP endpoint that will trigger some tasks, and let airflow or simply a cron job to hit the endpoint
Use quartz, either akka-quartz-scheduler or raw quartz. I think it fits akka cluster without a problem. What do you think is the blocker? A viable practice may be to run a cluster singleton actor inside which you run schedule-based actions based on quartz.
Related
We are now in the process of refactoring our messaging application written in Vert.x. The application processes incoming messages from users. Initially, it was implemented so that there is a single verticle instance that listens to a single queue in the event bus and processes all the incoming messages.
What we are thinking of doing is to refactor it so that it works a bit similar to actor model: we deploy an instance of a verticle for each active user and make it listen to a user-specific queue. This way the verticle instance can maintain user-specific state and the parallelization of the message processing becomes much easier.
The issue, however, is that this would lead to a huge number of verticles deployed (30k - 50k in parallel) and huge amount of queues in the eventbus. And also we would need to maintain the verticles manually (undeploy unused verticles and deploy the ones when there is a message from a new user).
Question is - is this actor-style architecture good for vert.x and can it handle large amount of deployed verticles and eventbus queues at the same time?
There's one major correction to be made here - EventBus is a single queue. So, you won't have "huge number of queues". There will be only one. You'll have huge number of addresses on a single queue.
But is this number so huge? Well, is a HashMap of 50K elements can be considered huge? Probably not, at least in terms of keys. Now note that this applies only to Vert.x in non-clustered mode. Clustered Vert.x is different (still should work, though).
Now having those verticles is another matter. Each verticle is a separate object, and if you plan to store some data in it, it will be even larger. But if you can afford machines with some decent RAM (16GB+), it should work just fine.
What does concern me in this solution, though, is that you plan to deploy verticles on demand, then undeploy them. It does incur delays, so your users will experience degraded performance for first message they send.
What you call "actor-style" does not mean, that you have to inflate a new verticle instance per user. If you do so, you are going to get a system with 98% redundancy.
It's absolutely enough to register an event-bus address for each user and use some sort of persistant storage to keep track of them. Such a storage can be any DB for long-term persistance or a cluster-wide SharedMap for short-term, or a combination of both.
Perhaps you don't even need a address-per-user scheme. Such a scheme is nice when the users are connected constantly to your system via some sort of EventBusBridge. If this is not a case, you can register a single event-bus address for all users and process messages based on payload.
I’m working on an application that often queries a very large number of actors and hence sends / receives a very large number of messages. When the application is ran on a single machine this is not an issue because the messages are sent within the boundaries of a single JVM which is quite fast. However, when I run the application on multiple nodes (using akka-cluster) each node hosts part of these actors and the messages go over the network which becomes extremely slow.
One solution that I came up with is to have a ManagerActor on each node where the application is ran. This will greatly minimize the number of messages exchanged (i.e. instead of sending thousands of messages to each of the actors, if we run the application on 3 nodes we send 3 messages - one for each ManagerActor which then sends messages within the current JVM to the other (thousands of) actors which is very fast). However, I’m fairly new to Akka and I’m not quite sure that such a solution makes sense. Do you see any drawbacks of it? Any other options which are better / more native to Akka?
You could use Akka's Distributed Publish-Subscribe to achieve that. That way you simply start a manager actor on each node the usual way, have them subscribe to a topic, and then publish messages to them using that topic topic. There is a simple example of this in the docs linked above.
I and new to Akka & want to achieve this,
want to deploy few Stateful actors on fixed machines (Which will be always on) and stateless actors (Processing actors/workers) on Amazon EC2 Spot instances
Now to handle failover of Stateful actors deciding to use Akka persistance.
And to distribute job on stateless workers deciding to use RoundRobinPool with remotely deployed routees. And want messages to be passed to least utilized machine (CPU & Memory). Using Pool so that I can use withSupervisorStrategy() for handling Actor Failure.
I am going through example for Remote Deployeed Routees & refering to this code http://www.typesafe.com/activator/template/akka-sample-cluster-java. And https://github.com/akka/akka/blob/cb05725c1ec8a09e9bfd57dd093911dd41c7b288/akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsSampleOneMasterMain.java.
In StatsSampleClient it is randomly taking node & passing message. I want to pass it least utilized maching as mentioned above. I want to know whether Akka support this or I will have write code to find out utilization & send message to that machine accordingly.
Kindly suggest if any better approach can used for what I have mentioned above.
Thanks!
-Devendra
Did you have a look at the Adaptive Load Balancing Router?
It performs load balancing of messages to cluster nodes based on the cluster metrics data by configuration.
Hope it helps.
I have a module which runs standalone in a JVM (no containers) and communicates with other modules via JMS.
My module is both a producer in one queue and a consumer in a different queue.
I have then need to cluster this module, both for HA reasons and for workload reasons, and I'm probably going to go with Terracotta+Hibernate for clustering my entities.
Currently when my app starts it launches a thread (via Executors.newSingleThreadExecutor()) which serves as the consumer (I can attach actual code sample if relevant and neccessary).
What I understood from reading questions here is that if I just start up my module on N different JVMs then N different subscribers will be created and each message in the queue will arrive to N subscribers.
What I'd like to do is have only one of them (let's currently say that which one is not important) process that message and so in actuality enable me to process N messages at a time.
How can/should this be done? Am I way off the track?
BTW, I'm using OpenMQ as my implementation but I don't know if that's relevant.
Thanks for any help
A classic case of message handling in clustered environment. This is what I would do.
Use Broadcast message (Channel based) in place of Queue. Queue being useful for point to point communication is not very effective. Set validity of message till the time it is consumed by one of the consumer. This way, other consumers wont even see the message and only one consumer will consume it.
Take a look at JGroups. You may consider implementing your module/subscribers to use jgroups for the kind of synchronization you need. JGroups provide Reliable Multicast Communication.
I'm looking for an existing system to replace an existing slow and complicated self written mechanism of job management.
The existing system:
1 MySQL DB with a long massive table of jobs - the queue
Multiple servers (written in java) all extracting jobs from the queue and processing them
a job might NOT be deleted from the queue after processing it, to rerun it later
a job might create other jobs and insert them to the queue
The limitations:
As more and more jobs are created and inserted in to the queue, it takes longer to extract jobs from it. (The jobs are chosen by priority and type) - create a bottle neck
I'm looking for an existing system that can replace this one, and improve it's performance.
Any suggestions?
Thanks
I don't generally recommend JMS, but it sounds like it really is what you need here. Distributed, transactional, persistent job queue management is what JMS is all about.
Popular open-source implementations include HornetQ and ActiveMQ.
You could:
submit your jobs to Amazon's Simple Queue Service (maybe JAXB marshalled)
dynamically start some EC2 instances according to your queue's length and probably
submit the results (or availability notice for some files on S3) to Simple Notification Service (again JAXB marshalled).
That exactly what we do, using EC2 Spot instances to minimize costs. And that's what I call serious cloud computing ;)