Requirement: In a poller / worker scenario, pollers should stop polling remote service until a worker is available to take the task.
Background:
Due to throttling constraints on the number of requests to the remote service, I am trying to segregate polling from the consumers. The constraint here is, that once a task if picked up by a poller, it will timeout if not processed within a certain time. And our consumers can be arbitrarily long running (10s to 10minutes).
Currently looking at the direct solutions, SynchronousQueue seems to be the most simplified approach.
The problem is that if I have 2 pollers and 4 consumers, then while the 4 consumers are processing, the pollers will first pick/poll two tasks from remote, and then wait for consumers to be available. this will cause the 2 tasks to timeout.
Is there a decent workaround this? Or should i be going for lock based mechanism (like: Semaphore) ?
Related
imagine you have some task structure of
Task1
Task2: 1 million separate independent Subtask[i] that can run concurrently
Task3: must run once after ALL Task2 subtasks have completed
And all of Task1, Subtask[i] and Task3 are represented by MQ messages.
How can this be solved on an ActiveMQ? Especially the triggering of a Task3 message once all subtasks are complete.
I know, it's not a queueing problem, it's a fork-join problem. Lets say the environment dictates you must use an ActiveMQ for it.
Using ActiveMQ features, dynamic queues and consumers, stuff like that, is allowed. Using external counters, like a database row representing Task2's progress, is not allowed.
Hidden in this fork-join problem is a state management and observability challenge. Since the database is ruled out, you have to rely on something in-memory or on-queue.
Create a unique id for the task run -- something short, but with enough space to not collide like an airplane locator code-- ie. 34FDSX
Send all messages for the task to a queue://TASK.34FDSX.DATA
Send a control message to queue://TASK.34FDSX.CONTROL that contains the task id and expected total # of messages (including each messageId would be helpful too)
When consumers from queue://TASK.34FDSX.DATA complete their work, they should send a 'done' message to queue://TASK.34FDSX.DONE queue with their messageId or some identifier.
The consumers for the .CONTROL queue and the .DONE queue should be the same process and can track the expected and total completed tasks. Once everything is completed, he can fire the event to trigger Task #3.
This approach provides everything as 'online', and you can also timeout the .CONTROL and .DONE reader if too much time passes before the task completes.
Queue deletion can be done using ActiveMQ destination GC, or as a clean-up step in the .CONTROL/.DONE reader during the occurances when everything completes successfully.
Advantages:
No infinite blocking consumers
No infinite open transactions
State of the TASK is online and observable via the presence of queues and queue metrics-- queue size, enqueue count, dequeue count
The entire solution can be multi-threaded and the only requirement is that for a given task the .CONTROL/.DONE listener is the same consumer, but multiple tasks can have individual .CONTROL/.DONE listeners to scale.
The question here is a bit vague so my answer will have to be a bit vague as well.
Each of the million independent subtasks for "Task 2" can be represented by a single message. All these messages can be in the same queue. You can spin up as many consumers as you want and process all these messages (i.e. perform all the subtasks). Just ensure that these consumers either use client-acknowledge mode or a transacted session so that the message is not removed from the queue until they are done processing the message. Once there are no more messages in the queue then you know "Task 2" is done.
To detect when the queue is empty you can have a "special" consumer on the queue that periodically opens a transacted session and tries to consume a message from the queue. If the consumer receives a message then you can rollback the transacted session to put the message back on the queue and you know that the queue is not empty (i.e. "Task 2" is not done). If the consumer doesn't receive a message then you know the queue is empty and you can send another message indicating this. You could launch this special consumer as part of "Task 2" after all the messages for the subtasks have been sent to avoid detecting an empty queue prematurely.
To be clear, this is a simple solution. You could certainly add more complexity depending on your requirements, but your question just outlined the basic problem so it's unclear what other requirements you have (if any).
I'm implementing a daily job which get data from a MongoDB (around 300K documents) and for each of them publish a message on a RabbitMQ queue.
On the other side I have some consumers on the same queue, which ideally should work in parallel.
Everything is working but not as much as I would, specially regarding consumers performances.
This is how I declare the queue:
rabbitMQ.getChannel().queueDeclare(QUEUE_NAME, true, false, false, null);
This is how the publishing is done:
rabbitMQ.getChannel().basicPublish("", QUEUE_NAME, null, body.getBytes());
So the channel used to declare the queue is used to publish all the messages.
And this is how the consumers are instantiated in a for loop (10 in total, but it can be any number):
Channel channel = rabbitMQ.getConnection().createChannel();
MyConsumer consumer = new MyConsumer(customMapper, channel, subscriptionUpdater);
channel.basicQos(1); // also tried with 0, 10, 100, ...
channel.basicConsume(QUEUE_NAME, false, consumer);
So for each consumer I create a new channel and this is confirmed by logs:
...
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#bdd2027
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#5d1b9c3d
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#49a26d19
...
As far as I've understood from my very short RabbitMQ experience, this should guarantee that all the consumer are called.
By the way, consumers need between 0.5 to 1.2 seconds to complete their task. I have just spotted very few 3 seconds.
I have two separate queues and I repeat what I said above two times (using the same RabbitMQ connection).
So, I have tested publishing 100 messages for each queue. Both of them have 10 consumers with qos=1.
I didn't expect to have exactly a delivery/consume performance of 10/s, instead I noticed:
actual values are around 0.4 and 1.0.
at least all the consumers bound to the queue have received a message, but it doesn't look like "fair dispatching".
it took about 3 mins 30 secs to consume all the messages on both queues.
Am I missing the main concept of threading within RabbitMQ? Or any specific configuration which might be still at default value?
I'm on it since very few days so this might be possible.
Please notice that I'm in the fortunate position where I can control both publishing and consuming parts :)
I'm using RabbitMQ 3.7.3 locally, so it cannot be any network latency issue.
Thanks for your help!
The setup of RabbitMQ channels and consumers were correct in the end: so one channel for each consumer.
The problem was having the consumers calling a synchronized method to find and update a MongoDB document.
This was delaying the execution time of some consumers: even worst, the more consumers I was adding (thinking to speed up processing), the less message rate/s I was getting.
I have moved the MongoDB part on he publishing side where I don't have to care about synchronization because it's done in sequence by one publisher only. I have a slightly decreased delivery rate/s but now with just 5 consumers I easily reach an ack rate of 50-60/s.
Lessons learnt:
create a separate channel for the publisher.
create a separate channel for each consumer.
let RabbitMQ manage threading for the consumers (--> you can instantiate them on the main thread).
(if possible) back off publishing to give the queues 100% time to deal with consumers.
set a qos > 1 for each consumer channel. But this really depends on your scenario and architecture: you must do some performance test.
As a general rule:
(1) calculate/estimate delivery time.
(2) calculate/estimate ack time.
(3) calculate/estimate consumer time.
qos = (1) + (2) + (3) / (3)
This will give you an initial qos value to test and tweak based on your scenario. The final goal is to have 100% utilization for all the available consumers.
We are seeing unexpected rebalances in Java Kafka consumers, described below. Do these problems sound familiar to anybody? Any tips on APIs or debug techniques to figure out rebalance causes?
Two processes are reading a topic. Sometimes all partitions on the topic get rebalanced to a single reader process. After restarting both processes, partitions get evenly balanced.
Two processes are reading a topic. Sometimes a long sequence of rebalances bounces partitions from reader to reader. We call pause/resume on consumers for backpressure, which should prevent this.
Two processes are reading a topic. Sometimes a rebalance happens when it looks like both processes are reading ok. Afterwards, reading works ok, but it's a hiccup in processing.
We expect partitions would not rebalance without also seeing some cause or failure.
Sometimes poll() gets stuck (exceeds the timeout) and we use wakeup() and close(), then create new consumers. Sometimes coordinator heartbeat threads keep running after consumers are closed (we've seen thousands). The timing seems unrelated to rebalances, so rebalances seem like a separate problem, but maybe heartbeats are hitting an unlogged network problem.
We use a ConsumerRebalanceListener to log and process certain rebalances, but Kafka APIs don't seem to expose data about the cause of rebalances.
The rebalances are intermittent and hard to reproduce. They happened at a message rate anywhere from 10,000 to 80,000 per second. We see no obvious errors in the logs.
Our read loop is trivial - basically "while running, poll with timeout and error handling, then enqueue received messages".
People have asked good related question, but answers didn't help us:
Conditions in which Kafka Consumer (Group) triggers a rebalance
What exactly IS Kafka Rebalancing?
Continuous consumer group rebalancing with more consumers than partitions
Configuration:
Kafka 0.10.1.0 (We've started trying 1.0.0 & don't have test results yet)
Java 8 brokers and clients
2 brokers, 1 zookeeper, stable running processes & no additions
5 topics, with 2 somewhat busy topics. The rebalances happen on a busy one (topic "A").
Topic A has 16 partitions and replication 2, and is created before consumers start.
One process writes to topic A; two processes read from topic A.
Each reader process runs 16 consumers. Some consumers are idle when 16 partitions evenly balance.
The consumer threads do little work between polls. Message processing happens asynchronously, on a separate thread from the consumer.
All the consumers for topic A are in the same consumer group.
The timeout for KafkaConsumer.poll() is 1000 milliseconds.
The configuration that affects rebalance is:
max.poll.interval.ms=50000
max.poll.records=100
request.timeout.ms=40000
session.timeout.ms=20000
We use defaults for these:
heartbeat.interval.ms=3000
(broker) group.max.session.timeout.ms=300000
(broker) group.min.session.timeout.ms=6000
Check the gc log,and make sure there is not full gc frequently which will prevent heartbeat thread working.
Previously, when I use single-producer mode of disruptor, e.g.
new Disruptor<ValueEvent>(ValueEvent.EVENT_FACTORY,
2048, moranContext.getThreadPoolExecutor(), ProducerType.Single,
new BlockingWaitStrategy())
the performance is good. Now I am in a situation that multiple threads would write to a single ring buffer. What I found is that ProducerType.Multi make the code several times slower than single producer mode. That poor performance is not going to be accepted by me. So should I use single producer mode while multiple threads invoke the same event publish method with locks, is that OK? Thanks.
I'm somewhat new to the Disruptor, but after extensive testing and experimenting, I can say that ProducerType.MULTI is more accurate and faster for 2 or more producer threads.
With 14 producer threads on a MacBook, ProducerType.SINGLE shows more events published than consumed, even though my test code is waiting for all producers to end (which they do after a 10s run), and then waiting for the disruptor to end. Not very accurate: Where do those additional published events go?
Driver start: PID=38619 Processors=8 RingBufferSize=1024 Connections=Reuse Publishers=14[SINGLE] Handlers=1[BLOCK] HandlerType=EventHandler<Event>
Done: elpased=10s eventsPublished=6956894 eventsProcessed=4954645
Stats: events/sec=494883.36 sec/event=0.0000 CPU=82.4%
Using ProducerType.MULTI, fewer events are published than with SINGLE, but more events are actually consumed in the same 10 seconds than with SINGLE. And with MULTI, all of the published events are consumed, just what I would expect due to the careful way the driver shuts itself down after the elapsed time expires:
Driver start: PID=38625 Processors=8 RingBufferSize=1024 Connections=Reuse Publishers=14[MULTI] Handlers=1[BLOCK] HandlerType=EventHandler<Event>
Done: elpased=10s eventsPublished=6397109 eventsProcessed=6397109
Stats: events/sec=638906.33 sec/event=0.0000 CPU=30.1%
Again: 2 or more producers: Use ProducerType.MULTI.
By the way, each Producer publishes directly to the ring buffer by getting the next slot, updating the event, and then publishing the slot. And the handler gets the event whenever its onEvent method is called. No extra queues. Very simple.
IMHO, single producer accessed by multi threads with lock won't resolve your problem, because it simply shift the locking from the disruptor side to your own program.
The solution to your problem varies from the type of event model you need. I.e. do you need the events to be consumed chronologically; merged; or any special requirement. Since you are dealing with disruptor and multi producers, that sounds to me very much like FX trading systems :-) Anyway, based on my experience, assuming you need chronological order per producer but don't care about mixing events between producers, I would recommend you to do a queue merging thread. The structure is
Each producer produces data and put them into its own named queue
A worker thread constantly examine the queues. For each queue it remove one or several items and put it to the single producer of your single producer disruptor.
Note that in the above scenario,
Each producer queue is a single producer single consumer queue.
The disruptor is a single producer multi consumer disruptor.
Depends on your need, to avoid a forever running thread, if the thread examine for, say, 100 runs and all queues are empty, it can set some variable and go wait() and the event producers can yield() it when seeing it's waiting.
I think this resolve your problem. If not please post your need of event processing pattern and let's see.
I'm currently adding JMS support to a application-server-like framework. The JMS will be implemented by HornetQ (stand-alone broker, hornetq jars on the servers classpath) but there is neither JBoss nor spring nor anything else that would provide MDBs.
The next step is to add a message listener to a xa queue that would allow for parallel processing of incoming messages. Some messages would init long running tasks, so the basic idea is to spawn worker threads from the onMessage method.
On my long journey through the internet I came across this discussion, where one of participants mentioned, that he would not do that but use an extra internal queue for the task: the (single threaded) message listener then would simply grab the messages from the inbound queue and create new messages for an internal queue, where at the other end of that internal queue some worker threads fight for the incoming messages. Inbound messages then would be acknowledge once they're "copied" to the internal queue (which is ok for me).
Unfortunatly they don't say why it would be better to not spawn worker threads from the onMessage method - maybe, because the listener would block if all threads from the pool are busy. So I'm looking for pros and cons for the designs decisions:
Start worker threads from the onMessage method of the message listener
Use an internal queue to "send messages to the worker threads"
Transaction limits aside, whether or not to have multiple threads (or processes) reading from a queue simply comes down to whether or not the message order is important. Obviously if the order is important, then a single thread naturally maintains that order, while multiple threads will provide no such guarentee.
What you will normally find, is that order is important but across a subset of all the messages. In this scenario, if a single thread isn't performant, you need to get those messages off the queue and re-queued in as short a possible time because to preserve the order you'll have to use a single thread reading from the initial queue - hence the use of one or more internal queues. The problem this incurs is that the transaction will be closed before the messages are fully processed and so you need some sort of temporary storage to ensure messages don't get dropped if the process were to fall over before the processing had taken place.
If, as your question suggests, you're not too worried about dropping messages then the java.util.concurrent.BlockingQueue sounds like what you need for the internal queues with a single thread servicing each.