I'm consistently seeing very long delays (60+ seconds) between two actors, from the time at which the first actor sends a message for the second, and when the second actor's onReceive method is actually called with the message. What kinds of things can I look for to debug this problem?
Details
Each instance of ActorA is sending one message for ActorB with ActorRef.tell(Object, ActorRef). I collect a millisecond timestamp (with System.currentTimeMillis()) right after calling the tell method in ActorA, and getting another one at the start of ActorB's onReceive(Object). The interval between these timestamps is consistently 60 seconds or more. Specifically, when plotted over time, this interval follows a rough saw tooth pattern that ranges from more 60 second to almost 120 seconds, as shown in the graph below.
These actors are early in the data flow of the system, there are several other actors that follow after ActorB. This large gap only occurs between these two specific actors, the gaps between other pairs of adjacent actors is typically less than a millisecond, occassionally a few tens of milliseconds. Additionally, the actual time spent inside any given actor is never more than a second.
Generally, each actor in the system only passes a single message to another actor. One of the actors (subsequent to ActorB) sends a single message to each of a few different actors, and a small percentage (less than 0.1%) of the time, certain actors will send multiple messages to the same subsequent actor (i.e., multiple instances of the subsequent actor will be demanded). When this occurs, the number of multiple messages is typically on the order of a dozen or less.
Can this be explained (explicitely) by the normal reactive nature of Akka? Does it indicate a problem with the way work is distributed or the way the actors are configured? Is there something that can explicitly block a particular actor from spinning up? What other information should I collect or look at to understand the source of this, or to understand whether or not it is actually a problem?
You have a limited thread pool. If your Actors block, they still take up space in the thread pool. New threads will not be created if your thread pool is saturated.
You may want to configure
core-pool-size-factor,
core-pool-size-min, and
core-pool-size-max.
If you expect certain actions to block, you can instead wrap them in Future { blocking { ... } } and register a callback. But it's better to use asynchronous, non-blocking calls.
Related
Sometimes due to some external problems, I need to requeue a message by basic.reject with requeue = true.
But I don't need to consume it immediately because it will possibly fail again in a short time. If I continuously requeue it, this may result in infinite loop and requeue.
So I need to consume it later, say one minute later,
And I need to know how many times the messages has been requeue so that I can stop requeue it but only reject it to declare it fails to consume.
PS: I am using Java client.
There are multiple solutions to point 1.
First one is the one chosen by Celery (a Python producer/consumer library that can use RabbitMQ as broker). Inside your message, add a timestamp at which the task should be executed. When your consumer gets the message, do not ack it and check its timestamp. As soon as the timestamp is reached, the worker can execute the task. (Note that the worker can continue working on other tasks instead of waiting)
This technique has some drawbacks. You have to increase the QoS per channel to an arbitrary value. And if your worker is already working on a long running task, the delayed task wont be executed until the first task has finished.
A second technique is RabbitMQ-only and is much more elegant. It takes advantage of dead-letter exchanges and Messages TTL. You create a new queue which isn't consumed by anybody. This queue has a dead-letter exchange that will forward the messages to the consumer queue. When you want to defer a message, ack it (or reject it without requeue) from the consumer queue and copy the message into the dead-lettered queue with a TTL equal to the delay you want (say one minute later). At (roughly) the end of TTL, the defered message will magically land in the consumer queue again, ready to be consumed. RabbitMQ team has also made the Delayed Message Plugin (this plugin is marked as experimental yet fairly stable and potential suitable for production use as long as the user is aware of its limitations and has serious limitations in term of scalability and reliability in case of failover, so you might decide whether you really want to use it in production, or if you prefer to stick to the manual way, limited to one TTL per queue).
Point 2. just requires putting a counter in your message and handling this inside your app. You can choose to put this counter in a header or directly in the body.
I need a way to limit message count on my IRC bot to avoid a global ban from twitch for chat flooding. (They allow 100 messages/30 seconds)
There are two ways I considered doing this both involving a message queue.
Each message starts a thread which takes a counting semaphore. This thread then blocks for 30 seconds and releases after that time. This would be a very clean solution as the queue would be entirely managed by the OS which means less work for me, however, it may result in creating hundreds of threads. These threads will be sleeping for most of their lifetime, but I'm not sure if it considered okay to launch hundreds of threads that do nothing, effectively. They won't take up time slices from the scheduler when they are asleep but they would consume a lot of memory and there would be a lot of overhead in creating them.
Store a stack of timestamps and if a time-stamp is >30 seconds old remove it every time a message needs to be sent. Have a thread running that checks the bottom of the stack every (10-50ms) to see if the time-stamp is >30 seconds old and remove if it is and then send a message from the highest position in the queue that has not been sent if it exists. When a message comes in to be sent it sends it immediately if there are <# messages in the queue.
1 has the downside of creating many threads that do nothing.
2 has the downside of needing 1 thread to poll the message list constantly.
2 could be improved to calculate the time needed to wait till the bottom message in the stack is 30 seconds old and send the message then, but I feel as if I am overcomplicating the problem at that stage.
Any thoughts on which would be the better approach?
Create a sentMessage list with a date for each entry.
Check the list before posting a new message.
I need a blocking queue that has a size of 1, and every time put is applied it removes the last value and adds the next one. The consumers would be a thread pool in which each thread needs to read the message as it gets put on the queue and decide what to do with it, but they shouldn't be able to take from the queue since all of them need to read from it.
I was considering just taking and putting every time the producer sends out a new message, but having only peek in the run method of the consumers will result in them constantly peeking, won't it? Ideally the message will disappear as soon as the peeking stops, but I don't want to use a timed poll as it's not guaranteed that every consumer will peek the message in time.
My other option at the moment is to iterate over the collection of consumers and call a public method on them with the message, but I really don't want to do that since the system relies on real time updates, and a large collection will take a while to iterate through completely if I'm going through each method call on the stack.
After some consideration, I think you're best off, with each consumer having its own queue and the producer putting its messages on all queues.
If there are few consumers, then putting the messages on those few queues will not take too long (except when the producer blocks because a consumer can't keep up).
If there are many consumers this situation will be highly preferable over a situation where many consumers are in contention with each other.
At the very least this would be a good measure to compare alternate solutions against.
What is both faster and "better practice", using a polling system or a event based timer?
I'm currently having a discussion with a more senior coworker regarding how to implement some mission critical logic. Here is the situation:
A message giving an execution time is received.
When that execution time is reached, some logic must be executed.
Now multiple messages can be received giving different execution times, and the logic must be executed each time.
I think that the best way to implement the logic would be to create a timer that would trigger the logic when the message at the time in the message, but my coworker believes that I would be better off polling a list of the messages to see if the execution time has been reached.
His argument is that the polling system is safer as it is less complicated and thus less likely to be screwed up by the programmer. My argument is that by implementing it my way, we reduce the reduce the computational load and thus are more likely execute the logic when we actually want it to execute. How should I implement it and why?
Requested Information
The only time my logic would ever be utilized would almost certainly be at a time of the highest load.
The requirements do not specify how reliable the connection will be but everyone I've talked to has stated that they have never heard of a message being dropped
The scheduling is based on an absolute system. So, the message will have a execution time specifying when an algorithm should be executed. Since there is time synchronization, I have been instructed to assume that the time will be uniform among all machines.
The algorithm that gets executed uses some inputs which initially are volatile but soon stabilize. By postponing the processing, I hope to use the most stable information available.
The java.util.Timer effectively does what your colleague suggests (truth be told, in the end, there really aren't that many ways to do this).
It maintains a collection of TimerTasks, and it waits for new activity on it, or until the time has come to execute the next task. It doesn't poll the collection, it "knows" that the next task will fire in N seconds, and waits until that happens or anything else (such as a TimerTask added or deleted). This is better overall than polling, since it spends most of its time sleeping.
So, in the end, you're both right -- you should use a Timer for this, because it basically does what your coworker wants to do.
I have an requirement where I have to send the alerts when the record in db is not updated/changed for specified intervals. For example, if the received purchase order doesn't processed within one hour, the reminder should be sent to the delivery manager.
The reminder/alert should sent exactly at the interval (including seconds). If the last modified time is 13:55:45 means, the alert should be triggered 14:55:45. There could be million rows needs to be tracked.
The simple approach could be implementing a custom scheduler and all the records will registered with it. But should poll the database to look for the change every second and it will lead to performance problem.
UPDATE:
Another basic approach would be a creating a thread for each record and put it on sleep for 1 hour (or) Use some queuing concept which has timeout. But still it has performance problems
Any thoughts on better approach to implement the same?
probably using internal JMS queue would be better solution - for example you may want to use scheduled message feature http://docs.jboss.org/hornetq/2.2.2.Final/user-manual/en/html/examples.html#examples.scheduled-message with hornetq.
You can ask broker to publish alert message after exactly 1h. From the other hand during processing of some trading activity you can manually delete this message meaning that the trade activity has been processed without errors.
Use Timer for each reminder.i.e. If the last modified time is 17:49:45 means, the alert should be triggered 18:49:45 simply you should create a dynamic timer scheduling for each task it'll call exact after one hour.
It is not possible in Java, if you really insist on the "Real-timeness". In Java you may encouter Garbage collector's stop-the-world phase and you can never guarantee the exact time.
If the approximate time is also permissible, than use some kind of scheduled queue as proposed in other answers, if not, than use real-time Java or some native call.
If we can assume that the orders are entered with increasing time then:
You can use a Queue with elements that have the properties time-of-order and order-id.
Each new entry that is added to the DB is also enqueued to this Queue.
You can check the element at the start of the Queue each minute.
When checking the element at the start of the Queue, if an hour has passed from the time-of-order, then search for the entry with order-id in the DB.
If found and was not updated then send a notification, else dequeue it from the Queue .