I have a route as mentioned below. The route polls a directory at regular interval and reads a big size .csv file. It then split the files in chunk of 1000 lines and sends it to the seda queue(firstQueue). I have 15 concurrent consumers on this seda queue.
route.split().tokenize("\n", 1000).streaming().to("seda:firstQueue?concurrentConsumers=15").process(myProcessor).to("seda:secondQueue?concurrentConsumers=15").process(anotherMyProcessor);
1) What does 15 concurrent consumers means - does it means 15 threads read data from the seda and pass it to one instance of myProcessor? Or 15 separate instance of myProcessor are created each one acting on the same copy of the data? Note that myProcessor is a singleton, what will happen if I change it to prototype.
2) Is it possible that any two or more threads pick the same data and pass it to the myProcessor? Or is it guaranteed that no two threads will have the same data ?
Appreciate a quick response. Thanks!
My Camel is a bit rusty but I'm pretty sure that
There are 15 threads running. Each will read a message from the queue and call myProcessor. There is only one instance of my processor so you need to make sure that it is thread safe. I've never tried the it, but I don't believe changing the scope to prototype will make any difference.
Two threads should not pick up the same message from the queue. In normal running each message should get processed just once. However, there are error conditions that my result in the same message being processes twice, the most obvious one being that you restart the app part way through processing the file.
Related
I'm trying to see difference between DirectMessageListener and SimpleMessageListener. I have this drawing just to ask if it is correct.
Let me try to describe how I understood it and maybe you tell me if it is correct.
In front of spring-rabbit there is rabbit-client java library, that is connecting to rabbit-mq server and delivering messages to spring-rabbit library. This client has some ThreadPoolExecutor (which has in this case I think - 16 threads). So, it does not matter how many queues are there in rabbit - if there is a single connection, I get 16 threads. These same threads are reused if I use DirectMessageListener - and this handler method listen is executed in all of these 16 threads when messages arrive. So if I do something complex in handler, rabbit-client must wait for thread to get free in order to get next message using this thread. Also if I increase setConsumersPerQueue to lets say 20, It will create 20 consumer per queue, but not threads. These 20*5 consumers in my case will all reuse these 16 threads offered by ThreadPoolExecutor?
SimpleMessageListener on the other hand, would have its own threads. If concurrent consumers == 1 (I guess default as in my case) it has only one thread. Whenever there is a message on any of secondUseCase* queues, rabbit-client java library will use one of its 16 threads in my case, to forward message to single internal thread that I have in SimpleMessageListener. As soon as it is forwarded, rabbit-client java library thread is freed and it can go back fetching more messages from rabbit server.
Your understanding is correct.
The main difference is that, with the DMLC, all listeners in all listener containers are called on the shared thread pool in the amqp-client (you can increase the 16 if needed). You need to ensure the pool is large enough to handle your expected concurrency across all containers, otherwise you will get starvation.
It's more efficient because threads are shared.
With the SMLC, you don't have to worry about that, but at the expense of having a thread per concurrency. In that case, a small pool in the amqp-client will generally be sufficient.
Sometimes due to some external problems, I need to requeue a message by basic.reject with requeue = true.
But I don't need to consume it immediately because it will possibly fail again in a short time. If I continuously requeue it, this may result in infinite loop and requeue.
So I need to consume it later, say one minute later,
And I need to know how many times the messages has been requeue so that I can stop requeue it but only reject it to declare it fails to consume.
PS: I am using Java client.
There are multiple solutions to point 1.
First one is the one chosen by Celery (a Python producer/consumer library that can use RabbitMQ as broker). Inside your message, add a timestamp at which the task should be executed. When your consumer gets the message, do not ack it and check its timestamp. As soon as the timestamp is reached, the worker can execute the task. (Note that the worker can continue working on other tasks instead of waiting)
This technique has some drawbacks. You have to increase the QoS per channel to an arbitrary value. And if your worker is already working on a long running task, the delayed task wont be executed until the first task has finished.
A second technique is RabbitMQ-only and is much more elegant. It takes advantage of dead-letter exchanges and Messages TTL. You create a new queue which isn't consumed by anybody. This queue has a dead-letter exchange that will forward the messages to the consumer queue. When you want to defer a message, ack it (or reject it without requeue) from the consumer queue and copy the message into the dead-lettered queue with a TTL equal to the delay you want (say one minute later). At (roughly) the end of TTL, the defered message will magically land in the consumer queue again, ready to be consumed. RabbitMQ team has also made the Delayed Message Plugin (this plugin is marked as experimental yet fairly stable and potential suitable for production use as long as the user is aware of its limitations and has serious limitations in term of scalability and reliability in case of failover, so you might decide whether you really want to use it in production, or if you prefer to stick to the manual way, limited to one TTL per queue).
Point 2. just requires putting a counter in your message and handling this inside your app. You can choose to put this counter in a header or directly in the body.
I'm using spring integration and I need to pack group of messages by 10k. I don't want to store it into List since later 10k could became much bigger and persistent storage is also not my choice. I just want that several threads send messages into single thread where I can count them and write into disk into files containing 10k lines. After counter reaches 10k I create new file set counter to zero and so on. It would work fine with direct channel but how to tell several threads(I'm using
<int:dispatcher task-executor="executor" />
) to send messages into single thread? Thanks
You can reach the task with the QueueChannel. Any threads can send messages to it concurrently. On the other side you should just configure PollingConsumer with the fixed-delay poller - single-threaded, as you requested. I mean that poller with the fixed-delay and everything downstream with the DirectChannel will be done only in single thread. Therefore your count and rollover logic can be reached there.
Nothing to show you, because that configuration is straight forward: different services send messages to the same QueueChannel. The fixed-delay poller ensures single-threaded reading for you.
I have a java program that puts down into the queue on the other side of the queue, i do have 10-15 consumers; any ONE of which should read the message and process it. If any of the 10-15 consumers get free they pick up the next message from the queue.
Basically, a Consumer can pickup the message from the queue whenever it is free, and only one consumer must pick it up. (without any synchronization blocks or so).
Also on the sender's end can i pause sending the messages into the queue if the queue sizes becomes full(or reaches a certain threshold)?
I am really new to the JMS API. Apologies if this is a newbie question .
Thanks!!
I have to Send messages into a queue and i have 20 threads running as
consumers, who can pick up the data from the queue(once they are
free). so when each thread gets free it goes to the queue checks if
the data is there it picks up and so on.. is this doable?
Yes, it's doable - that's the standard process of doing it with JMS queues. Another alternative would be topics, but with topics, every listener would have to process the same message, not just one, so queues are what you want. Although usually you don't have threads as consumers (I'm not even sure what that means), but message-driven beans. You might consider using them. MDBs run in their own thread anyway.
Previously, when I use single-producer mode of disruptor, e.g.
new Disruptor<ValueEvent>(ValueEvent.EVENT_FACTORY,
2048, moranContext.getThreadPoolExecutor(), ProducerType.Single,
new BlockingWaitStrategy())
the performance is good. Now I am in a situation that multiple threads would write to a single ring buffer. What I found is that ProducerType.Multi make the code several times slower than single producer mode. That poor performance is not going to be accepted by me. So should I use single producer mode while multiple threads invoke the same event publish method with locks, is that OK? Thanks.
I'm somewhat new to the Disruptor, but after extensive testing and experimenting, I can say that ProducerType.MULTI is more accurate and faster for 2 or more producer threads.
With 14 producer threads on a MacBook, ProducerType.SINGLE shows more events published than consumed, even though my test code is waiting for all producers to end (which they do after a 10s run), and then waiting for the disruptor to end. Not very accurate: Where do those additional published events go?
Driver start: PID=38619 Processors=8 RingBufferSize=1024 Connections=Reuse Publishers=14[SINGLE] Handlers=1[BLOCK] HandlerType=EventHandler<Event>
Done: elpased=10s eventsPublished=6956894 eventsProcessed=4954645
Stats: events/sec=494883.36 sec/event=0.0000 CPU=82.4%
Using ProducerType.MULTI, fewer events are published than with SINGLE, but more events are actually consumed in the same 10 seconds than with SINGLE. And with MULTI, all of the published events are consumed, just what I would expect due to the careful way the driver shuts itself down after the elapsed time expires:
Driver start: PID=38625 Processors=8 RingBufferSize=1024 Connections=Reuse Publishers=14[MULTI] Handlers=1[BLOCK] HandlerType=EventHandler<Event>
Done: elpased=10s eventsPublished=6397109 eventsProcessed=6397109
Stats: events/sec=638906.33 sec/event=0.0000 CPU=30.1%
Again: 2 or more producers: Use ProducerType.MULTI.
By the way, each Producer publishes directly to the ring buffer by getting the next slot, updating the event, and then publishing the slot. And the handler gets the event whenever its onEvent method is called. No extra queues. Very simple.
IMHO, single producer accessed by multi threads with lock won't resolve your problem, because it simply shift the locking from the disruptor side to your own program.
The solution to your problem varies from the type of event model you need. I.e. do you need the events to be consumed chronologically; merged; or any special requirement. Since you are dealing with disruptor and multi producers, that sounds to me very much like FX trading systems :-) Anyway, based on my experience, assuming you need chronological order per producer but don't care about mixing events between producers, I would recommend you to do a queue merging thread. The structure is
Each producer produces data and put them into its own named queue
A worker thread constantly examine the queues. For each queue it remove one or several items and put it to the single producer of your single producer disruptor.
Note that in the above scenario,
Each producer queue is a single producer single consumer queue.
The disruptor is a single producer multi consumer disruptor.
Depends on your need, to avoid a forever running thread, if the thread examine for, say, 100 runs and all queues are empty, it can set some variable and go wait() and the event producers can yield() it when seeing it's waiting.
I think this resolve your problem. If not please post your need of event processing pattern and let's see.