Sending message to least utilised machine in Akka cluser - java

I and new to Akka & want to achieve this,
want to deploy few Stateful actors on fixed machines (Which will be always on) and stateless actors (Processing actors/workers) on Amazon EC2 Spot instances
Now to handle failover of Stateful actors deciding to use Akka persistance.
And to distribute job on stateless workers deciding to use RoundRobinPool with remotely deployed routees. And want messages to be passed to least utilized machine (CPU & Memory). Using Pool so that I can use withSupervisorStrategy() for handling Actor Failure.
I am going through example for Remote Deployeed Routees & refering to this code http://www.typesafe.com/activator/template/akka-sample-cluster-java. And https://github.com/akka/akka/blob/cb05725c1ec8a09e9bfd57dd093911dd41c7b288/akka-samples/akka-sample-cluster-java/src/main/java/sample/cluster/stats/StatsSampleOneMasterMain.java.
In StatsSampleClient it is randomly taking node & passing message. I want to pass it least utilized maching as mentioned above. I want to know whether Akka support this or I will have write code to find out utilization & send message to that machine accordingly.
Kindly suggest if any better approach can used for what I have mentioned above.
Thanks!
-Devendra

Did you have a look at the Adaptive Load Balancing Router?
It performs load balancing of messages to cluster nodes based on the cluster metrics data by configuration.
Hope it helps.

Related

Akka clustering - one Manager Actor per node

I’m working on an application that often queries a very large number of actors and hence sends / receives a very large number of messages. When the application is ran on a single machine this is not an issue because the messages are sent within the boundaries of a single JVM which is quite fast. However, when I run the application on multiple nodes (using akka-cluster) each node hosts part of these actors and the messages go over the network which becomes extremely slow.
One solution that I came up with is to have a ManagerActor on each node where the application is ran. This will greatly minimize the number of messages exchanged (i.e. instead of sending thousands of messages to each of the actors, if we run the application on 3 nodes we send 3 messages - one for each ManagerActor which then sends messages within the current JVM to the other (thousands of) actors which is very fast). However, I’m fairly new to Akka and I’m not quite sure that such a solution makes sense. Do you see any drawbacks of it? Any other options which are better / more native to Akka?
You could use Akka's Distributed Publish-Subscribe to achieve that. That way you simply start a manager actor on each node the usual way, have them subscribe to a topic, and then publish messages to them using that topic topic. There is a simple example of this in the docs linked above.

Adding new node to a scalable system with zero downtime

I am working as a developer on a batch processing solution, how it works is that we split a big file and process it across jvms. So we have 4 processor jvms which take a chunk of file and process it and 1 gateway jvm job of gateway jvm is to split the file into no. of processor jvms i.e. 4 and send a rest request which is consumed by processor jvms, rest request has all the details the file location it has to pick the file from and some other details
Now if i want to add another processor jvm without any downtime is there any way we can do it. Currently we are maintaining the urls for 4 jvms in a property file is there any better way to do it ? which provided me the ability to add more jvms without restarting any component
You can consider setting up a load balancer and putting your JVM(s) behind it. The load balancer would be responsible for distributing the incoming requests to the JVMs.
This way you can scale up or scale down your JVM depending on the work load. Also, if one of the JVMs are not working, other part of your system need not care about it anymore.
Not sure what is your use case and tech stack you are following. But it seems that you need to have distributed system with auto-scaling and dynamic provisioning capabilities. Have you considered Hadoop or Spark clusters or Akka?
If you can not use any of it, then solution is to maintain list of JVMs in some datastore (lets say in a table); its dynamic data meaning one can add/remove/update JVMs. Then you need a resource manager who can decide whether to spin up a new JVM based on load or any other conditional logic. This resource manager needs to monitor entire system. Also, whenever you create a task or chunk or slice data then distribute it using message queues such as ApacheMQ, ActiveMQ. You can also consider Kafka for complex use cases. Now a days, application servers such as websphere (Libery profile), weblogic also provide auto-scaling capability. So, if you are already using any of such application server then you can think of making use of that capability. I hope this helps.

How to "link" distributed Akka actor systems?

I see that Akka Actor Systems can be distributed across multiple JVMs that might not even be running on the same piece of hardware. If I understand this correctly, then it seems that you could have a distributed Actor system where 1 group of actors is on myapp01, another group is on myapp02 (say, 2 vSphere VMs running on your local data center), and yet a 3rd group of actors running on AWS. So first, if anything about what I just said isn't true/accurate, please start by correcting me!
If everything I've stated up until this point is more or less accurate, then I'm wondering how to actually "glue" all these distributed actors "groups" (not sure what the right term is: JVM, Actor System, Actor Pool, Actor Cluster, etc.) together such that work can be farmed out to any of them, and a FizzActor living on the AWS node can then send a message to a BuzzActor living on myapp02, etc.
For instance, sticking with the example above (2 vSphere VMs and an AWS machine) how could I deploy an actor group/system/pool/cluster to each of these such that they all know about each other and distribute the work between them?
My guess is that Akka allows you to configure the hosts/ports of all the different "nodes" in the Actor System;
My next guess is that this configuration is limited in the sense that you have to update each node's configuration every time you add/remove/modify another node in the cluster (otherwise how could the Akka nodes "know" about a new one, or "know" that we just shut down the AWS machine?);
My final guess is that thie limitation can be averted by bringing something like Apache ZooKeeper into the mix, and somehow treat each node as a separate peer in the distributed system, then use ZooKeeper to coordinate/connect/link/load balance between all the peers/nodes
Am I on track or way off base?

Looking for simple persistent message buffer in Java

I am looking for a simple persistent buffer as temporary storage for JSON messages in a Java application. Memory usage should be relatively constant and not depend on the number of messages in the buffer. It would nice to be able to replay messages from a point in the past. Deletion of old messages should be efficient. Needs to be able to handle 1m messages/h.
Currently my application uses a local RabbitMQ broker which shovels messages to a remote RabbitMQ broker. When the remote broker is down or not accepting messages the local RabbitMQ broker's memory usage rises with the queue length and eventually it stops accepting messages. I want to swap this out for a local disk based buffer and a thread copying messages to the remote RabbitMQ broker.
Anyone have any ideas? I have looked at Kafka but it seems like overkill for my use-case. MongoDB is a possibility but I am worried about its memory usage.
Memory usage is always an issue in any system.I am using MongoDB for production and when I compare with similar solutions (CouchDB,CouchBase,redis.io), MongoDB is really good in memory management and easiness of implementation. But I should admit , I never had a chance to test Riak more in detail.
I am storing 5.000.000 user records with 4 index fields and all user session behind a rest/web service api which uses a messaging service behind.
My messaging service uses another db instance on the same server.
My user records have at least 20 fields and session records have just 5 fields.
My ubuntu servers never used more than 10 GB rams even with heavy loading processes.
Hope this helps to figure out.
ps: all depend on data model and how you implement your infrastructure.
Regards,
EDIT:
I think this is a good slideshow about using MongoDB for messaging.
and a nice article about MongoDB and messaging.
You can use the test code and see the results are ok for your solution.
Please don't forget to share your results if you test.

Module clustering and JMS

I have a module which runs standalone in a JVM (no containers) and communicates with other modules via JMS.
My module is both a producer in one queue and a consumer in a different queue.
I have then need to cluster this module, both for HA reasons and for workload reasons, and I'm probably going to go with Terracotta+Hibernate for clustering my entities.
Currently when my app starts it launches a thread (via Executors.newSingleThreadExecutor()) which serves as the consumer (I can attach actual code sample if relevant and neccessary).
What I understood from reading questions here is that if I just start up my module on N different JVMs then N different subscribers will be created and each message in the queue will arrive to N subscribers.
What I'd like to do is have only one of them (let's currently say that which one is not important) process that message and so in actuality enable me to process N messages at a time.
How can/should this be done? Am I way off the track?
BTW, I'm using OpenMQ as my implementation but I don't know if that's relevant.
Thanks for any help
A classic case of message handling in clustered environment. This is what I would do.
Use Broadcast message (Channel based) in place of Queue. Queue being useful for point to point communication is not very effective. Set validity of message till the time it is consumed by one of the consumer. This way, other consumers wont even see the message and only one consumer will consume it.
Take a look at JGroups. You may consider implementing your module/subscribers to use jgroups for the kind of synchronization you need. JGroups provide Reliable Multicast Communication.

Categories

Resources