What is the best practice for defining Akka Actor Count

What is the best practice for defining Akka Actor Count - java

I am trying to figure out how Akka works and what are the best practices for using Actor Model.
I have couple of questions regarding the same.
Questions:
What are the deciding factors that we should keep in mind when configuring total number of Actors and Threads for below mentioned scenario?
Scenarios:
a. only tell is being invoked on actor (Fire and Forget).
b. ask is being invoked (Futures and Promise).
What are the advantage/disadvantage of using Router e.g RoundRobinRouter(X) over manual actors creation.
How dispatcher orchestrates MailBox, Actor and Threads for message processing.

Futures and Promises can be used independent of Actors and routers. Also the Alvin Alexander link below does a great job comparing Futures/Promises to Threads (which are the same as in Java).
The type of routing you should use will depend on your specific application need. In general you should choose a routing technique that mirrors the real-world problem you are trying to solve. E.g. is it more like a mailbox or bus/broadcast, or a round-robin.
If you don't use the built-in routers offered by Akka, then you might be tempted to write your own router. However it might be hard for you to improve on the Akka library: in the akka.io docs below, they explain that some of the routing work is delegated to the actors by the library, to deal with the fact that the router is single-threaded.
A typical computer will let you launch thousands of threads or actors if you have several gigabytes of RAM. However at any one moment, the number of threads actually running won't be more than the number of cores you have in your CPU.
Here are some articles that might help you decide which techniques to use and how many threads and actors are appropriate:
http://doc.akka.io/docs/akka/2.4.10/scala/routing.html
Akka messaging mechanisms by example
How many actors can be launched in scala?
http://alvinalexander.com/scala/differences-java-thread-vs-scala-future
How many threads can ran on a CPU at a time

Related

Is the performance of Vertx event bus as good or better than ConcurrentQueues in Java?

In a project of mine, I decided to use Vertx for the HTTP APIs, given its proven performance record. Then, because the application does use event queues internally I started wondering if I should use Vertx event bus and verticles, instead of using my usual ArrayBlockingQueue. I am still quite new to Vertx so I don't know how suitable it could be. I've experience with Akka and Actors and those would fit the bill very well, but I'm not sure if Vertx event bus is designed to scale to 100k events per second?

I work with Vert.x since version 3 and have done some projects with it (It's my main stack, since a couple of years). I did never run into a situation where the event bus was the limiting factor. The event bus is designed to handle such an amount of event and even much more. As #injecteer mentioned, the limiting factor is basically the hardware and how many events will be processed depends on what do you with them and how you scale your code.
Vert.x follows consequently a non-blocking programming model and you should follow it as well ... never be blocking. Vert.x has the concept of loose coupling, that's solved with portioning of the code with "verticles" https://vertx.io/docs/vertx-core/java/#_verticles. You can deploy/start multiple instances of those verticles (your pieces of code). A further base concept is event loop threads (default count of cores * 2).
Each deployed verticle instance will run on a specific event loop thread, and ALL registered handlers (event bus, http server, etc.) are get called on this specific event loop thread at any time. This way you are able to scale your code in a "per thread" fashion, according to your needs. Events over the event bus are distributed with round robin between the verticle instance (and the handlers within the verticles) ... btw handlers of http requests are also distributed with round robin.
Clustered mode is a bit different. How do you (de)serialize dtos (Json, Protobuf, etc.) can become a significant difference in terms of performance. A clustered event bus has TCP sockets between all nodes, means events are sent point-to-point. The cluster manager (Hazelcast is the default) on the other hand defines to which node an event should get send to (round robin on cluster level), but events are NOT sent over the cluster manager. E.g. the cluster manager knows which node has consumers registered on the event bus (on which address).
Since Vert.x 4 milestone 5 the cluster manager SPI provides an entry point where you can implement your own alternative to round robin, e.g. load specific distribution etc.
There are some basic concepts like event loop threads, non-blocking programming, and verticles (which is not mandatory but recommended). If / when those concepts are clear you get a very flexible base for near any kind of application. I personally love it and also did never see any other framework/technology that reaches near a comparable performance (with a proper scaling that fit the load).

I benchmarked Vert.x event bus (using pure Vert.x for pub and sub) and found it to max out at around 100K msg/s / CPU (using high-end Xeon CPU). Interestingly the performance was comparable to Vert.x's WebSockets implementation so I agree it's not the bottleneck if you do:
WS -> Event Bus
But if you do 10 hops on the Event Bus then it could be the bottleneck.
I observed the performance of the LMAX Disrupter to be much higher but once you introduce I/O then the I/O become the bottleneck with Disrupter. The problem with disrupter is that you can't use it with Netty.

From my understanding all libraries running in a single JVM would have comparable performance levels and are limited by your hardware and settings.
So, local event-bus would perform as good as any other local tech.
The things start getting interesting, if you scale up your system across different JVMs and/or different machines. This is where Vert.x EB shines, as you don't have to change the code of your verticles!
You replace the local EB with clustered one, which is a matter of adding dependencies and configuring the cluster, but still no original code for event-bus operations has to be changed. The other way around also works just fine, if you want to squeese several verticles into the same JVM.
Clustering of the EB of course has it's price, but it's performance has to do rather with underlying clustering technologies like Hazelcast (default) or Infinispan than with Vert.x itself.

Message Bus versus Quasar/HTTP for internal Microservice Calls

I am looking to optimize a microservice architecture that currently uses HTTP/REST for internal node-to-node communication.
One option is implementing backpressure capability into the services, (eg) by integrating something like Quasar into the stack. This would no doubt improve things. But I see a couple challenges. One is, the async client threads are transient (in memory) and on client failure (crash), these retry threads will be lost. The second, in theory, if a target server is down for some time, the client could eventually reach OOM attempting retry because threads are ultimately limited, even Quasar Fibers.
I know it's a little paranoid, but I'm wondering if a queue-based alternative would be more advantageous at very large scale.
It would still work asynchronously like Quasar/fibers, except a) the queue is centrally managed and off the client JVM, and b) the queue can be durable, so that in the event client and or target servers go down, no in flight messages are lost.
The downside to queue of course is that there are more hops and it slows down the system. But I'm thinking there is probably a sweet spot where Quasar ROI peaks and a centralized and durable queue becomes more critical to scale and HA.
My question is:
Has this tradeoff been discussed? Are there any papers on using a
centralized external queue / router approach for intraservice
communication.
TL;DR; I just realized I could probably phrase this question as:
"When is it appropriate to use Message Bus based intraservice
communication as opposed to direct HTTP within a microservice
architecture."

I've seen three general protocol design patterns with microservices architectures, when running at scale:
Message bus architecture, using a central broker such as ActiveMQ or Apache Qpid.
"Resilient" HTTP, where some additional logic is built on HTTP to make it more resilient. Typical approaches here are Hystrix (Java), or SmartStack/Baker St (smart proxy).
Point-to-point asynchronous messaging using something like NSQ, ZMQ, or Qpid Proton.
By far the most common design pattern is #2, with a little bit of #1 mixed in when a queue is desirable.
In theory, #3 offers the best of both worlds (resiliency AND scale AND performance) but the technologies are all somewhat immature. It turns out that with #2 you can get really very far (e.g., Netflix uses Hystrix everywhere).
To answer your question directly, I'd say that #1 is very rarely used as an exclusive design pattern because it creates a single bottleneck for your entire system. #1 is common for a subset of the system. For most people, I'd recommend #2 today.

How are Java threads heavy compared to Scala / Akka actors?

I was just comparing the performance of scala actors vs java threads.
I was amazed to see the difference, I observed that with my system I was able to spawn maximum ~2000 threads (live at a time) only But with the same system I was able to spawn ~500,000 actors of scala.
Both programs used around 81MB of Heap memory of JVM.
Can you explain How java thread are this much heavy weight than scala / akka actors?
What is the key factor which made scala-actor this much light weight?
If I want to achieve best scalability, Should I go for actor based web server instead of java based traditional web/app server like JBoss or Tomcat?
Thanks.

Scala actors (including the Akka variety) use Java threads. There's no magic: more than a few thousand threads running simultaneously is a problem for most desktop machines.
The Actor model allows for awake-on-demand actors which do not occupy a thread unless they have work to do. Some problems can be modeled effectively as lots of sleeping agents waiting to get some work, who will do it relatively quickly and then go back to sleep. In that case, actors are a very efficient way to use Java threading to get your work done, especially if you have a library like Akka where performance has been a high priority.
The Akka docs explain the basics pretty well.
All reasonably scalable web servers have to solve this sort of problem one way or another; you probably ought not be basing your decision for web server primarily on whether actors are used under the hood, and regardless of what you use you can always add actors yourself.

An Akka actor is not equivalent to a thread. It is more like a Callable that is executed on a threadpool.
When a message is dispatched to an actor, that actor is placed on a threadpool to process the message. When it is done, the pooled thread can be used to execute other actors.

When, where & how should queues be used?

I'm new to enterprise Java development, although I'm sure this question equally applies to any language or platform, such as .NET.
For the first time ever now I'm dealing with message queues, and I'm very intrigued by them. (specifically, we're using ActiveMQ). My tech lead wants ActiveMQ queues to be the front-runners to all of our databases and internal web services; thus instead of a database query being fired off from the client and going directly to the database, it gets queued up first.
My question is this: are queues the way to go with every major processing component? Do best practices dictate putting them in front of system components that usually get hit with large amounts of requests? Are there situations where queues should not be used?
Thanks for any insight here!

Here are some examples where a message queue might be useful.
Limited resources
Lets say you have a large number of users making requests to a service. If the service can only handle a small number of requests concurrently then you might use a queue as a buffer.
Service decoupling
A key enterprise integration concept is decoupling of systems in for eg a workflow. Instead of having systems talk directly to each other, they asyncronously post messages to queues. The integration component then routes and delivers the message to the appropriate system.
Message replay
In the above example queues can also provide reliable delivery and processing of requests. If one component of the workflow breaks, others are unaffected and can still operate and post messages to the broken component. When the broken component recovers it can process all the queued up messages.
They key concepts here are load throttling, loose coupling, reliability and async operation.
As to whether they are the way to go for every major component, I would say no, this is not an automatic choice, you must consider each component individually.

Queues are indeed a very powerful and useful tool, but like every tool you should only use it for the job it is intended.
IMO they are not the away to go for every major processing component.
As a general rule I would use a queue where the requesting resource does not require an immediate, synchronous response. I would not use a queue where the timeliness and order of processing is vital.
Where asynchronous processing is allowable and you wish to regulate the amount of traffic to a service then a queue may be the way to go.
See #Qwerky's answer too, he (or she) makes some good points.

Please check out this:
http://code.google.com/p/disruptor/
Not only queues are there in the wild to solve those kind of problems.
Answering your question. Queues in this case will introduce asynchronous behavior in access to your databases. In this case it is more a question of can you afford such a great impact on your legacy systems. It just might be too much of change to push everything to the queues. Please describe what is the general purpose of your systems. Then it will be easer to answer your question fully.

Message queues are fundamentally an asynchronous communication system. In this case, it means that aside from the queue that links the sender and receiver, both sender and receiver operate independently; a receiver of a message does not (and should not) require interaction with the sender. Similarly, a sender of a message does not (and should not) require interaction with receiver.
If the sender needs to wait for the result of processing a message, then a message queue may not be a good solution, as this would force an asynchronous system to be synchronous, against the core design. It might be possible to construct a synchronous communication system on top of a message queue, but the fundamental asynchronous nature of a message queue would make this conversion awkward.

Server Design and Implementation

I've work in embedded systems and systems programming for hardware interfaces
to date. For fun and personal knowledge, recently I've been trying to learn more about server programming after getting my hands wet with Erlang. I've been going back and thinking about servers from a C++/Java prospective, and now I wonder how scalable systems can be built with technology like C++ or Java.
I've read that due to context-switching and limited memory, a per-client thread handler isn't realistic. Usually a thread-pool is created and a mix of worker-threads and asynchronous I/O is used to handle requests. I wonder, first of all, how does one determine the thread pool size? Does one simply have to measure and find the optimal balance? Eventually as the system scales then perhaps more than one server is needed to handle requests. How are requests managed across mulitple servers handling a large client base?
I am just looking for some direction into where I might be able to read more and find answers to my questions. What area of computer science would I look into for more information in this area? Are there any design patterns for this area of computing?

Your question is too general to have a nice answer. The answer depends greatly on the context, on how much processing any one Thread does, on how rapidly requests arrive, on the CPU family being used, on the web container being used, and on many other factors.

for C++ I've used boost::asio, it's very modern C++, and quite plesant to work with. Also the C++0x network libraries will be based on ASIO's implementation, so it's valuable knowledge.
As for designs 1thread per client, doesn't work, as you've already learned. And for high performance multithreading the best number of threads seems to be CoresX2, but for servers, there is lots of IO per request, which means lots of idle waiting. And from experience, looking at Apache, MySQL, and Oracle the amount of threads is about CoresX10 for database servers, and CoresX40 for web servers, not saying these are the ideals, but they seem to be patterns of succesful systems, so if your system can be balanced to work optimally with similar numbers atleast you'll know your design isn't completely lousy.

C++ Network Programming: Mastering Complexity Using ACE and Patterns and
C++ Network Programming: Systematic Reuse with ACE and Frameworks are very good books that describe many design patterns and their use with the highly portable ACE library.

Like Lothar, we use the ACE library which contains reactor and proactor patterns for handling asynchronous events and asynchronous I/O with C++ code. We use sizable worker thread pools that grow as needed (to a configurable maximum) and shrink over time.
One of the tricks with C++ is how you are going to propagate exceptions and error situations across network boundaries (which isn't handled by the language). I know that there are ways with .NET to throw exceptions across these network boundaries.
One thing you may consider is looking into SOA (Service Oriented Architecture) for dealing with higher level distributed system issues. ACE if really for running at the bare metal of the machine.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.