Akka Typed slower than Classic - java

with a basic understanding of Akka classic, I moved to Typed and noticed, that the typed version of my code is significantly slower than the classic one.
The task is to aggregate "ticks" (containing an instrument name, a timestamp and a price) per instrument.
In the classic code, I dynamically create one actor for each instrument and kep a Map<Instrument, ActorRef> outside the actor system to delegate the incoming ticks to.
In the typed code, a "parent" was required, thus I moved the routing logic with the Map into this parent actor, so I ended up with two Actors classes here (the actual tick actor and the routing parent actor).
Otherwise, the code is pretty much the same, just once implemented via the classic api and once typed.
When testing both logics (primitively) I found that the version using the classic logic took a bit less than 1.5 seconds to process 1,000,000 ticks, while the typed one required a bit more than 3.5 seconds.
The obvious first reason was to move the guardian parent (which is also the router) to its own PinnedDispatcher, so it could run on its own thread, with all the other actors using the default threadpool. This increased performance a good bit, leading to around 2.1 seconds to process 1,000,000 ticks.
My question is: Does anyone have an idea where the remaining performance (0.6 seconds) might be lost?

Typed runs on top of classic (a typed Behavior<T> is effectively wrapped in a function which casts messages to T; once wrapped, it can then be treated as basically a classic Receive), so it introduces some overhead per-message.
I'm guessing from the improvement in putting the routing parent on a pinned dispatcher that the typed implementation sent every tick through the parent, so note that you're incurring that overhead twice. Depending on how many Instruments you have relative to the number of ticks, the typed code can be made much more like the classic code by using something like the SpawnProtocol for the parent, so the code outside the ActorSystem would, at a high-level:
check a local Map<Instrument, ActorRef<Tick>> (or whatever)
if there's an ActorRef for the instrument in question send the tick to that ActorRef
otherwise, ask the parent actor for an ActorRef<Tick> corresponding to the instrument in question; then save the resulting ActorRef in the local Map and send the tick to that ActorRef
This is more like the situation in classic: the number of messages (ignoring internal system messages) is now 1 million plus 2x the number of Instruments, vs. 2 million.

Related

Ask vs Tell or forward for Actors using Akka Streams

Hi I am working with akka streams along with akka-stream-kafka. I am setting up a Stream with the below setup:
Source (Kafka) --> | Akka Actor Flow | --> Sink (MongoDB)
Actor Flow basically by Actors that will process data, below is the hierarchy:
System
|
Master Actor
/ \
URLTypeHandler SerializedTypeHandler
/ \ |
Type1Handler Type2Handler SomeOtherHandler
So Kafka has the message, I write up the consumer and run it in atMostOnceSource configuration and use
Consumer.Control control =
Consumer.atMostOnceSource(consumerSettings, Subscriptions.topics(TOPIC))
.mapAsyncUnordered(10, record -> processAccessLog(rootHandler, record.value()))
.to(Sink.foreach(it -> System.out.println("FinalReturnedString--> " + it)))
.run(materializer);
I've used a print as a sink initially, just to get the flow running.
and the processAccessLog is defined as:
private static CompletionStage<String> processAccessLog(ActorRef handler, byte[] value) {
handler.tell(value, ActorRef.noSender());
return CompletableFuture.completedFuture("");
}
Now, from the definition ask must be used when an actor is expecting a response, makes sense in this case since I want to return values to be written in the sink.
But everyone (including docs), mention to avoid ask and rather use tell and forward, an amazing blog is written on it Don't Ask, Tell.
In the blog he mentions, in case of nested actors, use tell for the first message and then use forward for the message to reach the destination and then after processing directly send the message back to the root actor.
Now here is the problem,
How do I send the message from D back to A, such that I can still use the sink.
Is it good practice to have open ended streams? e.g. Streams where Sink doesn't matter because the actors have already done the job. (I don't think it is recommend to do so, seems flawed).
ask is Still the Right Pattern
From the linked blog article, one "drawback" of ask is:
blocking an actor itself, which cannot pick any new messages until the
response arrives and processing finishes.
However, in akka-stream this is the exact feature we are looking for, a.k.a. "back-pressure". If the Flow or Sink are taking a long time to process data then we want the Source to slow down.
As a side note, I think the claim in the blog post that the additional listener Actor results in an implementation that is "dozens times heavier" is an exaggeration. Obviously an intermediate Actor adds some latency overhead but not 12x more.
Elimination of Back-Pressure
Any implementation of what you are looking for would effectively eliminate back-pressure. An intermediate Flow that only used tell would continuously propagate demand back to the Source regardless of whether or not your processing logic, within the handler Actors, was completing its calculations at the same speed that the Source is generating data.
Consider an extreme example: what if your Source could produce 1 million messages per second but the Actor receiving those messages via tell could only process 1 message per second. What would happen to that Actor's mailbox?
By using the ask pattern in an intermediate Flow you are purposefully linking the speed of the handlers and the speed with which your Source produces data.
If you are willing to remove back-pressure signaling, from the Sink to the Source, then you might as well not use akka-stream in the first place. You can have either back-pressure or non-blocking messaging, but not both.
Ramon J Romero y Vigil is right but I will try to extend the response.
1) I think that the "Don't ask, tell" dogma is mostly for Actor systems architecture. Here you need to return a Future so the stream can resolve the processed result, you have two options:
Use ask
Create an actor per event and pass them Promise so a Future will be complete when this actor receives the data (you can use the getSender method so D can send the response to A). There is no way to send a Promise or Future in a message (The are not Serialisable) so the creation of this short living actors can not be avoided.
At the end you are doing mostly the same...
2) It's perfectly fine to use an empty Sink to finalise the stream (indeed akka provides the Sink.ignore() method to do so).
Seems like you are missing the reason why you are using streams, they are cool abstraction to provide composability, concurrency and back pressure. In the other hand, actors can not be compose and is hard to handle back pressure. If you don't need this features and your actors can have the work done easily you shouldn't use akka-streams in first place.

Long delay between Akka actors

I'm consistently seeing very long delays (60+ seconds) between two actors, from the time at which the first actor sends a message for the second, and when the second actor's onReceive method is actually called with the message. What kinds of things can I look for to debug this problem?
Details
Each instance of ActorA is sending one message for ActorB with ActorRef.tell(Object, ActorRef). I collect a millisecond timestamp (with System.currentTimeMillis()) right after calling the tell method in ActorA, and getting another one at the start of ActorB's onReceive(Object). The interval between these timestamps is consistently 60 seconds or more. Specifically, when plotted over time, this interval follows a rough saw tooth pattern that ranges from more 60 second to almost 120 seconds, as shown in the graph below.
These actors are early in the data flow of the system, there are several other actors that follow after ActorB. This large gap only occurs between these two specific actors, the gaps between other pairs of adjacent actors is typically less than a millisecond, occassionally a few tens of milliseconds. Additionally, the actual time spent inside any given actor is never more than a second.
Generally, each actor in the system only passes a single message to another actor. One of the actors (subsequent to ActorB) sends a single message to each of a few different actors, and a small percentage (less than 0.1%) of the time, certain actors will send multiple messages to the same subsequent actor (i.e., multiple instances of the subsequent actor will be demanded). When this occurs, the number of multiple messages is typically on the order of a dozen or less.
Can this be explained (explicitely) by the normal reactive nature of Akka? Does it indicate a problem with the way work is distributed or the way the actors are configured? Is there something that can explicitly block a particular actor from spinning up? What other information should I collect or look at to understand the source of this, or to understand whether or not it is actually a problem?
You have a limited thread pool. If your Actors block, they still take up space in the thread pool. New threads will not be created if your thread pool is saturated.
You may want to configure
core-pool-size-factor,
core-pool-size-min, and
core-pool-size-max.
If you expect certain actions to block, you can instead wrap them in Future { blocking { ... } } and register a callback. But it's better to use asynchronous, non-blocking calls.

java application multi-threading design and optimization

I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?
Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).
The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )
While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.

Java simple Analytics/Event Stream Processing with front end

My application takes a lot of measurements of it's internal processes. For example I time certain methods, I time external webservice calls and I also have variables which have a changing value, and processes which have a 'state' (e.g. PAUSED, WAITING etc).
The application uses 100 to 200 threads, and each bit of data would be associated with a particular thread.
I am looking for some software that I can channel all this information into that would produce useful metrics and graphs of the data (ideally in real time or close to real time), let me set thresholds to trigger warnings, would allow me to filter the data by thread or thread group, etc etc.
The application is performing time critical tasks so the software/api would need to be very fast and never block.
The application is written in java, and ideally the software/api would be in java as well. I think what I'm looking for is called Event Stream Processing, but I'm really not sure what language to use to describe it.
All I've found so far are Esper and ERMA. Can anyone give me a recommendation? I'm the only one working on this project so I'm hoping for something that is pretty easy to set up and use, and has a workable front end.
In the end I found Graphite which was pretty close to being exactly what I wanted. Not the simplest to set up and configure however, but I got it working in the end.
http://graphite.wikidot.com/
In my case I send data directly from my application to Statsd (via UDP), which collects the data and does some pre processing before it ends up in the whisper back end, there is a simple example of a java interface here https://github.com/etsy/statsd/commit/2253223f3c19d2149d65ec5bc802198ff93da4cb
Alternatively you could send your data directly to graphite, example here http://neopatel.blogspot.co.uk/2011/04/logging-to-graphite-monitoring-tool.html

Need an elegant way to invoke arbitrary code on a specified interval

Ok, I have a game server running in Java/Hibernate/Spring/Quartz. The game clock ticks with a Quartz timer, and that works just fine.
However, I have many other things that need to happen at specific, tweakable intervals (in game time, not real time).
For instance, every 24 hours game time (~ 47 minutes real time, depending on the servers clock multiplier) a bunch of different once-a-day game actions happen, like resupply, or what have you.
Now, the current system is pretty rough, but works - I have a table in the database that's essentially a cron - a string key, the execution time of the next event and then hours, minutes, seconds and days until the next one after that. The time ticker checks that and then fires off a message with that code (the events string key) in it to a queue, adding the days, minutes, seconds to the current time and setting that as the next execution time.
The message listener is the grody part - it switches on the key and hits one of its methods.
Now I understand that this can work just fine, but it really doesn't sit well with me. What would your solution be to this, to have each piece of code in its own little class? What design pattern covers this? (I'm sure there is one). I have a few ideas, but I'd like to hear some opinions.
Rather than a switching on a set of codes, you could use the code as a key into a map, where the values are objects that implement a handler interface. This allows you to be much more flexible in adding new event types.
The pattern looks something like this:
private final Map<String, Handler> handlers = new TreeMap<String, Handler>();
public void register(String event, Handler handler) {
handlers.put(event, handler);
}
public void handle(String event) {
Handler handler = handler.get(event);
if (handler == null) {
/* Log or throw an exception for unknown event type. */
}
else {
handler.execute();
}
}
Rather than explicitly registering handlers, you could use something like Java 6's ServiceLoader to add new behaviors just by dropping JARs into the class path.
I would use a variant of the Command Pattern. I would extend the Command pattern to make a IIntervalCommand class. It would have a interval property, and a readonly CanExecute property in addition to the Execute method.
Then you create a CommandList Class that holds a list of IIntervalCommands. It would have a method called CheckToExecute that you pass it the current game time. The CheckToExecute method would traverse the list calling CanExecute for each command. CanExecute will return true if the elapsed time has occurred. If CanExecute return true then CheckToExecute will call the Execute Method of the object implementing IIntervalCommand.
Then adding additional game events is a matter of creating a new class implementing IIntervalClass. Instantiating the Object and adding it to the IntervalCommandList.
If the processing of the event is time consuming then the command could spawn the processing as a separate thread. It will return false to it's CanExecute property until the thread returns even if the interval has passed again. Or you have it spawn off another thread if the interval passed again.
You avoid the giant case statement. You could eliminate the database and setup the parameters when you instantiate the objects. Or keep it and use it as part of a factory that creates all your IIntervalCommands.
Instead of switching on the key you can use a hashtable to dispatch these events. This way your timer events don't need to know about each other.
It should be possible do have something like:
timerQueue.registerHandler("key",new TimerHandler(){
// do something timer related
});
This way you can restart java code handling events without losing your persisted queue of events.
http://en.wikipedia.org/wiki/Priority_queue'>Priority queues are worth looking at if you have not already.
I personally wouldn't put this in the database but rather keep a separate service running in the background. Then my webservice or web application would communicate with this service through interprocess communication. Don't know how this translates into java world though.
Conceptually I think you're doing two things;
Firstly you have a scaled version of time. As long as the relationship between this time and wall-clock time remains constant I'm fairly sure I'd just delegate this scaling behavior to a single class, that would have signatures like
DateTime getFutureTime( VirtualTimeSpan timespan)
I'd be using this to map virtual time spans to instances of real-time. Thereafter you can operate in real-time, which probably simplifies things a little since you can the use standard scheduling features.
The second part regards scheduling work for a future worker process. There's a number of core technologies working with this; Conceptually I think JMS is the java-grand-dad of a lot of these, it defines concepts much like the ones you're using and what you need. I think taking a look at JMS is fine for seeing concepts you may find interesting, it uses selectors to send tasks to specific workers, much like the ones you decribe.
Alas, JMS never seemed to fit the bill for most people. A lot of people found it was too heavyweight or the implementations too buggy. So usually people ended up with home made queue technologies. But the concepts are all there. Can't you just use quartz ?

Categories

Resources