I want to process messages from a rabbitMq queue in parallel. The queue is configured to be autoAck =false. I am using the camel-rabbitMQ support for camel endpoints, which has support for a threadPoolSize parameter, but this does not have the desired effect. Messages are still processed serially off the queue, even when threadpoolsize=20.
From debugging through the code I can see that the threadpoolsize parameter is used to create an ExecutorService that is used to pass to the rabbit connectionfactory as described here. This all looks good until you get into the rabbit ConsumerWorkService. Here messages are processed in block of max size 16 messages. Each message in a block is processed serially and then if there is more work to do the executor service is invokes with the next block. A code snippet of this is below. From this use of the executor service I can't see how the messages can be processed in parallel. The executorservice only ever has one piece of work to perform at a time.
What am I Missing?
private final class WorkPoolRunnable implements Runnable {
public void run() {
int size = MAX_RUNNABLE_BLOCK_SIZE;
List<Runnable> block = new ArrayList<Runnable>(size);
try {
Channel key = ConsumerWorkService.this.workPool.nextWorkBlock(block, size);
if (key == null) return; // nothing ready to run
try {
for (Runnable runnable : block) {
runnable.run();
}
} finally {
if (ConsumerWorkService.this.workPool.finishWorkBlock(key)) {
ConsumerWorkService.this.executor.execute(new WorkPoolRunnable());
}
}
} catch (RuntimeException e) {
Thread.currentThread().interrupt();
}
}
RabbitMQ's documentation is not very clear about this but, even though the ConsumerWorkService is using a thread pool, this pool doesn't seem to be used in a way to process messages in parallel:
Each Channel has its own dispatch thread. For the most common use case of one Consumer per Channel, this means Consumers do not hold up other Consumers. If you have multiple Consumers per Channel be aware that a long-running Consumer may hold up dispatch of callbacks to other Consumers on that Channel.
(http://www.rabbitmq.com/api-guide.html)
This documentation suggests using one Channel per thread and, in fact, if you simply create as many Channels as the required level of concurrency, messages will be dispatched between the consumers linked to these channels.
I've tested with 2 channels and consumers: when 2 messages are in the queue, each consumer only picks one message at a time. The blocks of 16 messages you mentioned don't seem to interfere, which is a good thing.
As a matter of fact, Spring AMQP also creates as several channels to process messages concurrently. This is done by:
setting SimpleMessageListenerContainer.setConcurrentConsumers(...): http://docs.spring.io/spring-amqp/docs/1.3.6.RELEASE/api/
and setting CachingConnectionFactory.setChannelCacheSize(...) accordingly: http://docs.spring.io/spring-amqp/docs/1.3.6.RELEASE/api/
I've also tested this to be working as expected.
If you have a single Channel instance, it's going to invoke its registered consumers serially as you correctly found out by examining ConsumerWorkService. There are 2 ways to overcome that:
Use multiple channels instead of one.
Use single channel but implement consumers in a special way. They should just pick incoming message from queue and put it as a task into an internal thread pool.
You can find more details in this post.
Related
I have a spring boot application in which I have single Kafka Consumer.
I am using a DefaultKafkaConsumerFactory with default Consumer Configurations. I have a ConcurrentListenerContainerFactory with concurrency set to 1, and I have a method annotated with #KafkaListener.
I am listening to a topic with 3 partitions and I have 3 of such consumers deployed each in different applications. Hence, each consumer is listening to one partition.
Lets say poll on the consumer is called under the hood and 40 records are fetched. Then is each record, provided to the method annotated with #KafkaListener serially i.e. record 1 provided, wait till method finishes processing, record 2 provided , wait till method finishes processing and so on.
Does the above happen, or for every record obtained , a separate thread is created and the method invocation happens on a separate thread, so the main thread does not block and it can poll for records more quickly.
I would also like more clarity on what a message listener container is and the eventual message listener.
Thank you in advance.
In 1.3 and above there is a single thread per consumer; the next poll() is performed after the last message from the previous poll has been processed by the listener.
In earlier versions, there were two threads and a second (and possibly third) poll was performed while the listener thread is processing the first batch. This was required to avoid a rebalance due to a slow listener. The threading model was very complicated and we had to pause/resume the consumer when necessary. KIP-62 fixed the rebalance problem so we were able to use the much simpler threading model in use today.
Well, that is exactly an Apache Kafka position - guarantee an order processing records from the same partition in the same thread. Therefore when you distribute your topic with 3 partitions between 3 instances, each of them will get its own partition and does the polling in a single thread.
The KafkaMessageListenerContainer is an event-driven, self-controlling wrapper around KafkaConsumer. It really calls poll() in a while (isRunning()) { loop, which is scheduled in a TaskExecutor:
this.listenerConsumerFuture = containerProperties
.getConsumerTaskExecutor()
.submitListenable(this.listenerConsumer);
And it processes ConsumerRecords calling listener:
private void invokeListener(final ConsumerRecords<K, V> records) {
if (this.isBatchListener) {
invokeBatchListener(records);
}
else {
invokeRecordListener(records);
}
}
I am using a Kafka producer - consumer model in my framework. The record consumed at the consumer end is later indexed onto the elasticsearch. Here i have a use case where if the ES is down, I will have to pause the kafka consumer until the ES is up, Once it is up, I need to resume the consumer and consume the record from where I last left.
I don't think this can be achieved with #KafkaListener. Can anyone please give me a solution for this? I figured out that I need to write my own KafkaListenerContainer for this, but I am not able to implement it correctly. Any help would be much appreciated.
There are sevaral solutions possible, one simple way would be to use the KafkaConsumer API. In KafkaConsumer implementation keeps track of the position on the topic which will be retrieved with the next call to poll(...). Your problem is after you get the record from Kafka, you may be unable to insert it into Elastic Search. In this case, you have to write a routine to reset the position of the consumer, which in your case will be consumer.seek(partition, consumer.position(partition)-1). This will reset the position to the earlier position. At this point a good approach would be to pause the partition (this will enable the server to do some resource clean up) and then poll the ES (by whatever mechanism you desire). Once ES is available, call resume on the consumer and continue with your usual poll-insert cycle.
EDITED AFTER DISCUSSION
Create a spring bean with the lifecycle methods specified. In the initialization method of the bean instantiate your KafkaConsumer (retrieve the configuration of consumer from any source). From the method start a thread to interact with consumer and update ES, rest of the design is as per above. This is a single thred model. For higher throughput consider keeping the data retrieved from Kafka in small in memory queue and a dispatcher thread to take the message and give it to a pooled thread for updating ES.
I would suggest rather pausing consumer , why can't you retry the same message again and again and commit offset once message is consumed successfully.
For Example:
Annotate your method with #Retryable
And block your method with try/catch and throw new exception in catch block.
For ListenerFactory configuration add property:
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setAckOnError(false);
There are a couple of ways you can achieve this.
Method #1
Create your KafkaConsumer object inside a Thread and run an infinite while loop to consume events.
Once you have this setup you can interrupt the thread and in the while loop, have check if Thread.interrupt() is true. If yes, break out of the loop and close the consumer.
Once you are done with your recovery activity, recreate the consumer with the same group ID. Do note, this may rebalance the consumer.
If you are using python same thing can be achieved using threads stop_event.
Method #2
Use KafkaConumer APIs pause(partitions_list) function. It accepts Kafka partitions as input. So, extract all the portions assigned to the consumer and pass these portions to the pause(partitions_list) function. The consumer will stop pulling data from these partitions.
After a certain time, you can use the resume(partitions_list) function to resume the consumer. This method will not rebalance consumers.
Note: If you are using the Spring Kafka client. This becomes a lot easier. You can start/stop the Message Listener Container.
You can find a detailed explanation here.
Autowired
private KafkaListenerEndpointRegistry registry;
KafkaListener(id = "dltGroup", topics = "actualTopicNAme.DLT", autoStartup = "false")
public void dltListen(String in) {
logger.info("Received from DLT: " + in);
}
public void startKafka() {
// TODO if not running
registry.getListenerContainer("dltGroup").start();
}
public void resumeKafka() {
if (registry.getListenerContainer("dltGroup").isContainerPaused() ||
registry.getListenerContainer("dltGroup").isPauseRequested()) {
registry.getListenerContainer("dltGroup").resume();
}
}
public void pauseKafka() {
if (registry.getListenerContainer("dltGroup").isRunning()) {
registry.getListenerContainer("dltGroup").pause();
}
}
in this guide https://www.rabbitmq.com/api-guide.html RabbitMQ guys state:
Channels and Concurrency Considerations (Thread Safety)
Channel instances must not be shared between threads. Applications should prefer using a Channel per thread instead of sharing the same Channel across multiple threads. While some operations on channels are safe to invoke concurrently, some are not and will result in incorrect frame interleaving on the wire. Sharing channels between threads will also interfere with * Publisher Confirms.
Thread safety is very important so I tried to be as diligent as possible, but here's the problem:
I have this application that receives messages from Rabbit. When a message is received, it processes it and then acks when it's done. The application can process just 2 items at the same time in a fixed thread pool with 2 threads. The QOS prefetch for Rabbit is set to 2, because I don't want to feed the app with more than it can handle in a time frame.
Now, my consumer's handleDelivery does the following:
Task run = new Task(JSON.parse(message));
service.execute(new TestWrapperThread(getChannel(),run,envelope.getDeliveryTag()));
At this point, you already figured out that TestWrapperThread does the channel.basicAck(deliveryTag, false); call as last operation.
By my understanding of the documentation, this is incorrect and potentially harmful because channel is not thread safe and this behavior could screw things up. But how I am supposed to do then? I mean, I have a few ideas but they would def make everything more complex and I'd like to figure it out if it's really necessary or not.
Thanks in advance
I suppose you are using Channel only for your consumer and not for other operations like publish etc..
In your case the only potential problem is here:
channel.basicAck(deliveryTag, false);
because you call this across two thread, btw this operation is safe, if you see the java code:
the class ChannelN.java calls:
public void basicAck(long deliveryTag, boolean multiple)
throws IOException
{
transmit(new Basic.Ack(deliveryTag, multiple));
}
see github code for ChannelN.java
the transmit method inside AMQChannel uses:
public void transmit(Method m) throws IOException {
synchronized (_channelMutex) {
transmit(new AMQCommand(m));
}
}
_channelMutex is a protected final Object _channelMutex = new Object();
created with the class.
see github code for AMQChannel.java
EDIT
As you can read on the official documentation, "some" operations are thread-safe, now it is not clear which ones.
I studied the code, an I think there are not problems to call the ACK across more threads.
Hope it helps.
EDIT2
I add also Nicolas's comment:
Note that consuming (basicConsume) and acking from more than one thread is a common rabbitmq pattern that is already used by the java client.
So you can use it safe.
I just read RabbitMQ's Java API docs, and found it very informative and straight-forward. The example for how to set up a simple Channel for publishing/consuming is very easy to follow and understand. But it's a very simple/basic example, and it left me with an important question: How can I set up 1+ Channels to publish/consume to and from multiple queues?
Let's say I have a RabbitMQ server with 3 queues on it: logging, security_events and customer_orders. So we'd either need a single Channel to have the ability to publish/consume to all 3 queues, or more likely, have 3 separate Channels, each dedicated to a single queue.
On top of this, RabbitMQ's best practices dictate that we set up 1 Channel per consumer thread. For this example, let's say security_events is fine with only 1 consumer thread, but logging and customer_order both need 5 threads to handle the volume. So, if I understand correctly, does that mean we need:
1 Channel and 1 consumer thread for publishing/consuming to and from security_events; and
5 Channels and 5 consumer threads for publishing/consuming to and from logging; and
5 Channels and 5 consumer threads for publishing/consuming to and from customer_orders?
If my understanding is misguided here, please begin by correcting me. Either way, could some battle-weary RabbitMQ veteran help me "connect the dots" with a decent code example for setting up publishers/consumers that meet my requirements here?
I think you have several issues with initial understanding. Frankly, I'm a bit surprised to see the following: both need 5 threads to handle the volume. How did you identify you need that exact number? Do you have any guarantees 5 threads will be enough?
RabbitMQ is tuned and time tested, so it is all about proper design
and efficient message processing.
Let's try to review the problem and find a proper solution. BTW, message queue itself will not provide any guarantees you have really good solution. You have to understand what you are doing and also do some additional testing.
As you definitely know there are many layouts possible:
I will use layout B as the simplest way to illustrate 1 producer N consumers problem. Since you are so worried about the throughput. BTW, as you might expect RabbitMQ behaves quite well (source). Pay attention to prefetchCount, I'll address it later:
So it is likely message processing logic is a right place to make sure you'll have enough throughput. Naturally you can span a new thread every time you need to process a message, but eventually such approach will kill your system. Basically, more threads you have bigger latency you'll get (you can check Amdahl's law if you want).
(see Amdahl’s law illustrated)
Tip #1: Be careful with threads, use ThreadPools (details)
A thread pool can be described as a collection of Runnable objects
(work queue) and a connections of running threads. These threads are
constantly running and are checking the work query for new work. If
there is new work to be done they execute this Runnable. The Thread
class itself provides a method, e.g. execute(Runnable r) to add a new
Runnable object to the work queue.
public class Main {
private static final int NTHREDS = 10;
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
for (int i = 0; i < 500; i++) {
Runnable worker = new MyRunnable(10000000L + i);
executor.execute(worker);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination();
System.out.println("Finished all threads");
}
}
Tip #2: Be careful with message processing overhead
I would say this is obvious optimization technique. It is likely you'll send small and easy to process messages. The whole approach is about smaller messages to be continuously set and processed. Big messages eventually will play a bad joke, so it is better to avoid that.
So it is better to send tiny pieces of information, but what about processing? There is an overhead every time you submit a job. Batch processing can be very helpful in case of high incoming message rate.
For example, let's say we have simple message processing logic and we do not want to have thread specific overheads every time message is being processed. In order to optimize that very simple CompositeRunnable can be introduced:
class CompositeRunnable implements Runnable {
protected Queue<Runnable> queue = new LinkedList<>();
public void add(Runnable a) {
queue.add(a);
}
#Override
public void run() {
for(Runnable r: queue) {
r.run();
}
}
}
Or do the same in a slightly different way, by collecting messages to be processed:
class CompositeMessageWorker<T> implements Runnable {
protected Queue<T> queue = new LinkedList<>();
public void add(T message) {
queue.add(message);
}
#Override
public void run() {
for(T message: queue) {
// process a message
}
}
}
In such a way you can process messages more effectively.
Tip #3: Optimize message processing
Despite the fact you know can process messages in parallel (Tip #1) and reduce processing overhead (Tip #2) you have to do everything fast. Redundant processing steps, heavy loops and so on might affect performance a lot. Please see interesting case-study:
Improving Message Queue Throughput tenfold by choosing the right XML Parser
Tip #4: Connection and Channel Management
Starting a new channel on an existing connection involves one network
round trip - starting a new connection takes several.
Each connection uses a file descriptor on the server. Channels don't.
Publishing a large message on one channel will block a connection
while it goes out. Other than that, the multiplexing is fairly transparent.
Connections which are publishing can get blocked if the server is
overloaded - it's a good idea to separate publishing and consuming
connections
Be prepared to handle message bursts
(source)
Please note, all tips are perfectly work together. Feel free to let me know if you need additional details.
Complete consumer example (source)
Please note the following:
channel.basicQos(prefetch) - As you saw earlier prefetchCount might be very useful:
This command allows a consumer to choose a prefetch window that
specifies the amount of unacknowledged messages it is prepared to
receive. By setting the prefetch count to a non-zero value, the broker
will not deliver any messages to the consumer that would breach that
limit. To move the window forwards, the consumer has to acknowledge
the receipt of a message (or a group of messages).
ExecutorService threadExecutor - you can specify properly configured executor service.
Example:
static class Worker extends DefaultConsumer {
String name;
Channel channel;
String queue;
int processed;
ExecutorService executorService;
public Worker(int prefetch, ExecutorService threadExecutor,
, Channel c, String q) throws Exception {
super(c);
channel = c;
queue = q;
channel.basicQos(prefetch);
channel.basicConsume(queue, false, this);
executorService = threadExecutor;
}
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body) throws IOException {
Runnable task = new VariableLengthTask(this,
envelope.getDeliveryTag(),
channel);
executorService.submit(task);
}
}
You can also check the following:
Solution Architecting Using Queues?
Some queuing theory: throughput, latency and bandwidth
A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo…
How can I set up 1+ Channels to publish/consume to and from multiple queues?
You can implement using threads and channels. All you need is a way to
categorize things, ie all the queue items from the login, all the
queue elements from security_events etc. The catagorization can be
achived using a routingKey.
ie: Every time when you add an item to the queue u specify the routing
key. It will be appended as a property element. By this you can get
the values from a particular event say logging.
The following Code sample explain how you make it done in client side.
Eg:
The routing key is used identify the type of the channel and retrive the types.
For example if you need to get all the channels about the type Login
then you must specify the routing key as login or some other keyword
to identify that.
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.exchangeDeclare(EXCHANGE_NAME, "direct");
string routingKey="login";
channel.basicPublish(EXCHANGE_NAME, routingKey, null, message.getBytes());
You can Look here for more details about the Categorization ..
Threads Part
Once the publishing part is over you can run the thread part..
In this part you can get the Published data on the basis of category. ie; routing Key which in your case is logging, security_events and customer_orders etc.
look in the Example to know how retrieve the data in threads.
Eg :
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
//**The threads part is as follows**
channel.exchangeDeclare(EXCHANGE_NAME, "direct");
String queueName = channel.queueDeclare().getQueue();
// This part will biend the queue with the severity (login for eg:)
for(String severity : argv){
channel.queueBind(queueName, EXCHANGE_NAME, routingKey);
}
boolean autoAck = false;
channel.basicConsume(queueName, autoAck, "myConsumerTag",
new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body)
throws IOException
{
String routingKey = envelope.getRoutingKey();
String contentType = properties.contentType;
long deliveryTag = envelope.getDeliveryTag();
// (process the message components here ...)
channel.basicAck(deliveryTag, false);
}
});
Now a thread that process the Data in the Queue of the
type login(routing key) is created. By this way you can create multiple threads.
Each serving different purpose.
look here for more details about the threads part..
Straight answer
For your particular situation (logging and customer_order both need 5 threads) I would create 1 Channel with 1 Consumer for logging and 1 Channel with 1 Consumer for customer_order. I would also create 2 thread pools (5 threads each): one to be used by logging Consumer and the other by customer_order Consumer.
See Consumption below for why should it work.
PS: do not create the thread pool inside the Consumer; be also aware that Channel.basicConsume(...) is not blocking
Publish
According to Channels and Concurrency Considerations (Thread Safety):
Concurrent publishing on a shared channel is best avoided entirely,
e.g. by using a channel per thread. ... Consuming in one thread and publishing in another thread on a shared channel can be safe.
pretty clear ...
Consumption
The Channel might (I say might because of this) run all its Consumer(s) in the same thread; this ideea is almost explicitly conveyed by Receiving Messages by Subscription ("Push API"):
Each Channel has its own dispatch thread. For the most common use case
of one Consumer per Channel, this means Consumers do not hold up other
Consumers. If you have multiple Consumers per Channel be aware that a
long-running Consumer may hold up dispatch of callbacks to other
Consumers on that Channel.
This means that in certain conditions many Consumers pertaining to the same Channel would run on the same thread such that the 1th one would hold up dispatch of callbacks for the next ones. The dispatch word is very confusing because sometimes refers to "thread work dispatching" while here refers mainly to calling Consumer.handleDelivery (see this again).
But what own dispatch thread is about? is about one from the thread pool used with (see Channels and Concurrency Considerations (Thread Safety)):
Server-pushed deliveries ... uses a
java.util.concurrent.ExecutorService, one per connection.
Conclusion
If one has 1 Channel with 1 Consumer but wants to process the incoming messages in parallel than he better creates (outside the Consumer) and uses (inside the Consumer) his own thread pool; hence each Consumer received message will be processed on the user's thread pool instead on the Channel's own dispatch thread.
Is this approach (user's thread pool used from Consumer) even possible/valid/acceptable at all? it is, see Channels and Concurrency Considerations (Thread Safety):
thread that received the delivery (e.g. Consumer#handleDelivery
delegated delivery handling to a different thread) ...
I initially asked this question here, but I've realized that my question is not about a while-true loop. What I want to know is, what's the proper way to do high-performance asynchronous message-passing in Java?
What I'm trying to do...
I have ~10,000 consumers, each consuming messages from their private queues. I have one thread that's producing messages one by one and putting them in the correct consumer's queue. Each consumer loops indefinitely, checking for a message to appear in its queue and processing it.
I believe the term is "single-producer/single-consumer", since there's one producer, and each consumer only works on their private queue (multiple consumers never read from the same queue).
Inside Consumer.java:
#Override
public void run() {
while (true) {
Message msg = messageQueue.poll();
if (msg != null) {
... // do something with the message
}
}
}
The Producer is putting messages inside Consumer message queues at a rapid pace (several million messages per second). Consumers should process these messages as fast as possible!
Note: the while (true) { ... } is terminated by a KILL message sent by the Producer as its last message.
However, my question is about the proper way to design this message-passing. What kind of queue should I use for messageQueue? Should it be synchronous or asynchronous? How should Message be designed? Should I use a while-true loop? Should Consumer be a thread, or something else? Will 10,000 threads slow down to a crawl? What's the alternative to threads?
So, what's the proper way to do high-performance message-passing in Java?
I would say that the context switching overhead of 10,000 threads is going to be very high, not to mention the memory overhead. By default, on 32-bit platforms, each thread uses a default stack size of 256kb, so that's 2.5GB just for your stack. Obviously you're talking 64-bit but even so, that quite a large amount of memory. Due to the amount of memory used, the cache is going to be thrashing lots, and the cpu will be throttled by the memory bandwidth.
I would look for a design that avoids using so many threads to avoid allocating large amounts of stack and context switching overhead. You cannot process 10,000 threads concurrently. Current hardware has typically less than 100 cores.
I would create one queue per hardware thread and dispatch messages in a round-robin fashion. If the processing times vary considerably, there is the danger that some threads finish processing their queue before they are given more work, while other threads never get through their allotted work. This can be avoided by using work stealing, as implemented in the JSR-166 ForkJoin framework.
Since communication is one way from the publisher to the subscribers, then Message does not need any special design, assuming the subscriber doesn't change the message once it has been published.
EDIT: Reading the comments, if you have 10,000 symbols, then create a handful of generic subscriber threads (one subscriber thread per core), that asynchornously recieve messages from the publisher (e.g. via their message queue). The subscriber pulls the message from the queue, retrieves the symbol from the message, and looks this up in a Map of message handlers, retrieves the handler, and invokes the handler to synchronously handle the message. Once done, it repeats, fetching the next message from the queue. If messages for the same symbol have to be processed in order (which is why I'm guessing you wanted 10,000 queues.), you need to map symbols to subscribers. E.g. if there are 10 subscribers, then symbols 0-999 go to subscriber 0, 1000-1999 to subscriber 1 etc.. A more refined scheme is to map symbols according to their frequency distribution, so that each subscriber gets roughly the same load. For example, if 10% of the traffic is symbol 0, then subscriber 0 will deal with just that one symbol and the other symbols will be distributed amongst the other subscribers.
You could use this (credit goes to Which ThreadPool in Java should I use?):
class Main {
ExecutorService threadPool = Executors.newFixedThreadPool(
Runtime.availableProcessors()*2);
public static void main(String[] args){
Set<Consumer> consumers = getConsumers(threadPool);
for(Consumer consumer : consumers){
threadPool.execute(consumer);
}
}
}
and
class Consumer {
private final ExecutorService tp;
private final MessageQueue messageQueue;
Consumer(ExecutorService tp,MessageQueue queue){
this.tp = tp;
this.messageQueue = queue;
}
#Override
public void run(){
Message msg = messageQueue.poll();
if (msg != null) {
try{
... // do something with the message
finally{
this.tp.execute(this);
}
}
}
}
}
This way, you can have okay scheduling with very little hassle.
First of all, there's no single correct answer unless you either put a complete design doc or you try different approaches for yourself.
I'm assuming your processing is not going to be computationally intensive otherwise you wouldn't be thinking of processing 10000 queues at the same time. One possible solution is to minimise context switching by having one-two threads per CPU. Unless your system is going to be processing data in strict real time that may possibly give you bigger delays on each queue but overall better throughput.
For example -- have your producer thread run on its own CPU and put batches of messages to consumer threads. Each consumer thread would then distribute messages to its N private queues, perform the processing step, receive new data batch and so on. Again, depends on your delay tolerance so the processing step may mean either processing all the queues, a fixed number of queues, as many queues it can unless a time threshold is reached. Being able to easily tell which queue belongs to which consumer thread (e.g. if queues are numbered sequentially: int consumerThreadNum = queueNum & 0x03) would be beneficial as looking them up in a hash table each time may be slow.
To minimise memory thrashing it may not be such a good idea to create/destroy queues all the time so you may want to pre-allocate a (max number of queues/number of cores) queue objects per thread. When a queue is finished instead of being destroyed it can be cleared and reused. You don't want gc to get in your way too often and for too long.
Another unknown is if your producer produces complete sets of data for each queue or will send data in chunks until the KILL command is received. If your producer sends complete data sets you may do away with the queue concept completely and just process the data as it arrives to a consumer thread.
Have a pool of consumer threads relative to the hardware and os capacity. These consumer threads could poll your message queue.
I would either have the Messages know how to process themselves or register processors with the consumer thread classes when they are initialized.
In the absence of more detail about the constraints of processing the symbols, its hard to give very specific advice.
You should take a look at this slashdot article:
http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO
It has quite a bit of discussions and actual measured data about the many thread vs. single select vs. thread pool arguments.