RabbitMQ by Example: Multiple Threads, Channels and Queues

RabbitMQ by Example: Multiple Threads, Channels and Queues - java

I just read RabbitMQ's Java API docs, and found it very informative and straight-forward. The example for how to set up a simple Channel for publishing/consuming is very easy to follow and understand. But it's a very simple/basic example, and it left me with an important question: How can I set up 1+ Channels to publish/consume to and from multiple queues?
Let's say I have a RabbitMQ server with 3 queues on it: logging, security_events and customer_orders. So we'd either need a single Channel to have the ability to publish/consume to all 3 queues, or more likely, have 3 separate Channels, each dedicated to a single queue.
On top of this, RabbitMQ's best practices dictate that we set up 1 Channel per consumer thread. For this example, let's say security_events is fine with only 1 consumer thread, but logging and customer_order both need 5 threads to handle the volume. So, if I understand correctly, does that mean we need:
1 Channel and 1 consumer thread for publishing/consuming to and from security_events; and
5 Channels and 5 consumer threads for publishing/consuming to and from logging; and
5 Channels and 5 consumer threads for publishing/consuming to and from customer_orders?
If my understanding is misguided here, please begin by correcting me. Either way, could some battle-weary RabbitMQ veteran help me "connect the dots" with a decent code example for setting up publishers/consumers that meet my requirements here?

I think you have several issues with initial understanding. Frankly, I'm a bit surprised to see the following: both need 5 threads to handle the volume. How did you identify you need that exact number? Do you have any guarantees 5 threads will be enough?
RabbitMQ is tuned and time tested, so it is all about proper design
and efficient message processing.
Let's try to review the problem and find a proper solution. BTW, message queue itself will not provide any guarantees you have really good solution. You have to understand what you are doing and also do some additional testing.
As you definitely know there are many layouts possible:
I will use layout B as the simplest way to illustrate 1 producer N consumers problem. Since you are so worried about the throughput. BTW, as you might expect RabbitMQ behaves quite well (source). Pay attention to prefetchCount, I'll address it later:
So it is likely message processing logic is a right place to make sure you'll have enough throughput. Naturally you can span a new thread every time you need to process a message, but eventually such approach will kill your system. Basically, more threads you have bigger latency you'll get (you can check Amdahl's law if you want).
(see Amdahl’s law illustrated)
Tip #1: Be careful with threads, use ThreadPools (details)
A thread pool can be described as a collection of Runnable objects
(work queue) and a connections of running threads. These threads are
constantly running and are checking the work query for new work. If
there is new work to be done they execute this Runnable. The Thread
class itself provides a method, e.g. execute(Runnable r) to add a new
Runnable object to the work queue.
public class Main {
private static final int NTHREDS = 10;
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
for (int i = 0; i < 500; i++) {
Runnable worker = new MyRunnable(10000000L + i);
executor.execute(worker);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination();
System.out.println("Finished all threads");
}
}
Tip #2: Be careful with message processing overhead
I would say this is obvious optimization technique. It is likely you'll send small and easy to process messages. The whole approach is about smaller messages to be continuously set and processed. Big messages eventually will play a bad joke, so it is better to avoid that.
So it is better to send tiny pieces of information, but what about processing? There is an overhead every time you submit a job. Batch processing can be very helpful in case of high incoming message rate.
For example, let's say we have simple message processing logic and we do not want to have thread specific overheads every time message is being processed. In order to optimize that very simple CompositeRunnable can be introduced:
class CompositeRunnable implements Runnable {
protected Queue<Runnable> queue = new LinkedList<>();
public void add(Runnable a) {
queue.add(a);
}
#Override
public void run() {
for(Runnable r: queue) {
r.run();
}
}
}
Or do the same in a slightly different way, by collecting messages to be processed:
class CompositeMessageWorker<T> implements Runnable {
protected Queue<T> queue = new LinkedList<>();
public void add(T message) {
queue.add(message);
}
#Override
public void run() {
for(T message: queue) {
// process a message
}
}
}
In such a way you can process messages more effectively.
Tip #3: Optimize message processing
Despite the fact you know can process messages in parallel (Tip #1) and reduce processing overhead (Tip #2) you have to do everything fast. Redundant processing steps, heavy loops and so on might affect performance a lot. Please see interesting case-study:
Improving Message Queue Throughput tenfold by choosing the right XML Parser
Tip #4: Connection and Channel Management
Starting a new channel on an existing connection involves one network
round trip - starting a new connection takes several.
Each connection uses a file descriptor on the server. Channels don't.
Publishing a large message on one channel will block a connection
while it goes out. Other than that, the multiplexing is fairly transparent.
Connections which are publishing can get blocked if the server is
overloaded - it's a good idea to separate publishing and consuming
connections
Be prepared to handle message bursts
(source)
Please note, all tips are perfectly work together. Feel free to let me know if you need additional details.
Complete consumer example (source)
Please note the following:
channel.basicQos(prefetch) - As you saw earlier prefetchCount might be very useful:
This command allows a consumer to choose a prefetch window that
specifies the amount of unacknowledged messages it is prepared to
receive. By setting the prefetch count to a non-zero value, the broker
will not deliver any messages to the consumer that would breach that
limit. To move the window forwards, the consumer has to acknowledge
the receipt of a message (or a group of messages).
ExecutorService threadExecutor - you can specify properly configured executor service.
Example:
static class Worker extends DefaultConsumer {
String name;
Channel channel;
String queue;
int processed;
ExecutorService executorService;
public Worker(int prefetch, ExecutorService threadExecutor,
, Channel c, String q) throws Exception {
super(c);
channel = c;
queue = q;
channel.basicQos(prefetch);
channel.basicConsume(queue, false, this);
executorService = threadExecutor;
}
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body) throws IOException {
Runnable task = new VariableLengthTask(this,
envelope.getDeliveryTag(),
channel);
executorService.submit(task);
}
}
You can also check the following:
Solution Architecting Using Queues?
Some queuing theory: throughput, latency and bandwidth
A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo…

How can I set up 1+ Channels to publish/consume to and from multiple queues?
You can implement using threads and channels. All you need is a way to
categorize things, ie all the queue items from the login, all the
queue elements from security_events etc. The catagorization can be
achived using a routingKey.
ie: Every time when you add an item to the queue u specify the routing
key. It will be appended as a property element. By this you can get
the values from a particular event say logging.
The following Code sample explain how you make it done in client side.
Eg:
The routing key is used identify the type of the channel and retrive the types.
For example if you need to get all the channels about the type Login
then you must specify the routing key as login or some other keyword
to identify that.
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.exchangeDeclare(EXCHANGE_NAME, "direct");
string routingKey="login";
channel.basicPublish(EXCHANGE_NAME, routingKey, null, message.getBytes());
You can Look here for more details about the Categorization ..
Threads Part
Once the publishing part is over you can run the thread part..
In this part you can get the Published data on the basis of category. ie; routing Key which in your case is logging, security_events and customer_orders etc.
look in the Example to know how retrieve the data in threads.
Eg :
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
//**The threads part is as follows**
channel.exchangeDeclare(EXCHANGE_NAME, "direct");
String queueName = channel.queueDeclare().getQueue();
// This part will biend the queue with the severity (login for eg:)
for(String severity : argv){
channel.queueBind(queueName, EXCHANGE_NAME, routingKey);
}
boolean autoAck = false;
channel.basicConsume(queueName, autoAck, "myConsumerTag",
new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body)
throws IOException
{
String routingKey = envelope.getRoutingKey();
String contentType = properties.contentType;
long deliveryTag = envelope.getDeliveryTag();
// (process the message components here ...)
channel.basicAck(deliveryTag, false);
}
});
Now a thread that process the Data in the Queue of the
type login(routing key) is created. By this way you can create multiple threads.
Each serving different purpose.
look here for more details about the threads part..

Straight answer
For your particular situation (logging and customer_order both need 5 threads) I would create 1 Channel with 1 Consumer for logging and 1 Channel with 1 Consumer for customer_order. I would also create 2 thread pools (5 threads each): one to be used by logging Consumer and the other by customer_order Consumer.
See Consumption below for why should it work.
PS: do not create the thread pool inside the Consumer; be also aware that Channel.basicConsume(...) is not blocking
Publish
According to Channels and Concurrency Considerations (Thread Safety):
Concurrent publishing on a shared channel is best avoided entirely,
e.g. by using a channel per thread. ... Consuming in one thread and publishing in another thread on a shared channel can be safe.
pretty clear ...
Consumption
The Channel might (I say might because of this) run all its Consumer(s) in the same thread; this ideea is almost explicitly conveyed by Receiving Messages by Subscription ("Push API"):
Each Channel has its own dispatch thread. For the most common use case
of one Consumer per Channel, this means Consumers do not hold up other
Consumers. If you have multiple Consumers per Channel be aware that a
long-running Consumer may hold up dispatch of callbacks to other
Consumers on that Channel.
This means that in certain conditions many Consumers pertaining to the same Channel would run on the same thread such that the 1th one would hold up dispatch of callbacks for the next ones. The dispatch word is very confusing because sometimes refers to "thread work dispatching" while here refers mainly to calling Consumer.handleDelivery (see this again).
But what own dispatch thread is about? is about one from the thread pool used with (see Channels and Concurrency Considerations (Thread Safety)):
Server-pushed deliveries ... uses a
java.util.concurrent.ExecutorService, one per connection.
Conclusion
If one has 1 Channel with 1 Consumer but wants to process the incoming messages in parallel than he better creates (outside the Consumer) and uses (inside the Consumer) his own thread pool; hence each Consumer received message will be processed on the user's thread pool instead on the Channel's own dispatch thread.
Is this approach (user's thread pool used from Consumer) even possible/valid/acceptable at all? it is, see Channels and Concurrency Considerations (Thread Safety):
thread that received the delivery (e.g. Consumer#handleDelivery
delegated delivery handling to a different thread) ...

Related

Pattern to continuously listen to AWS SQS messages

I have a simple class named QueueService with some methods that wrap the methods from the AWS SQS SDK for Java. For example:
public ArrayList<Hashtable<String, String>> receiveMessages(String queueURL) {
List<Message> messages = this.sqsClient.receiveMessage(queueURL).getMessages();
ArrayList<Hashtable<String, String>> resultList = new ArrayList<Hashtable<String, String>>();
for(Message message : messages) {
Hashtable<String, String> resultItem = new Hashtable<String, String>();
resultItem.put("MessageId", message.getMessageId());
resultItem.put("ReceiptHandle", message.getReceiptHandle());
resultItem.put("Body", message.getBody());
resultList.add(resultItem);
}
return resultList;
}
I have another another class named App that has a main and creates an instace of the QueueService.
I looking for a "pattern" to make the main in App to listen for new messages in the queue. Right now I have a while(true) loop where I call the receiveMessagesmethod:
while(true) {
messages = queueService.receiveMessages(queueURL);
for(Hashtable<String, String> message: messages) {
String receiptHandle = message.get("ReceiptHandle");
String messageBody = message.get("MessageBody");
System.out.println(messageBody);
queueService.deleteMessage(queueURL, receiptHandle);
}
}
Is this the correct way? Should I use the async message receive method in SQS SDK?

To my knowledge, there is no way in Amazon SQS to support an active listener model where Amazon SQS would "push" messages to your listener, or would invoke your message listener when there are messages.
So, you would always have to poll for messages. There are two polling mechanisms supported for polling - Short Polling and Long Polling. Each has its own pros and cons, but Long Polling is the one you would typically end up using in most cases, although the default one is Short Polling. Long Polling mechanism is definitely more efficient in terms of network traffic, is more cost efficient (because Amazon charges you by the number of requests made), and is also the preferred mechanism when you want your messages to be processed in a time sensitive manner (~= process as soon as possible).
There are more intricacies around Long Polling and Short Polling that are worth knowing, and its somewhat difficult to paraphrase all of that here, but if you like, you can read a lot more details about this through the following blog. It has a few code examples as well that should be helpful.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
In terms of a while(true) loop, I would say it depends.
If you are using Long Polling, and you can set the wait time to be (max) 20 seconds, that way you do not poll SQS more often than 20 seconds if there are no messages. If there are messages, you can decide whether to poll frequently (to process messages as soon as they arrive) or whether to always process them in time intervals (say every n seconds).
Another point to note would be that you could read upto 10 messages in a single receiveMessages request, so that would also reduce the number of calls you make to SQS, thereby reducing costs. And as the above blog explains in details, you may request to read 10 messages, but it may not return you 10 even if there are that many messages in the queue.
In general though, I would say you need to build appropriate hooks and exception handling to turn off the polling if you wish to at runtime, in case you are using a while(true) kind of a structure.
Another aspect to consider is whether you would like to poll SQS in your main application thread or you would like to spawn another thread. So another option could be to create a ScheduledThreadPoolExecutor with a single thread in the main to schedule a thread to poll the SQS periodically (every few seconds), and you may not need a while(true) structure.

There are a few things that you're missing:
Use the receiveMessages(ReceiveMessageRequest) and set a wait time to enable long polling.
Wrap your AWS calls in try/catch blocks. In particular, pay attention to OverLimitException, which can be thrown from receiveMessages() if you would have too many in-flight messages.
Wrap the entire body of the while loop in its own try/catch block, logging any exceptions that are caught (there shouldn't be -- this is here to ensure that your application doesn't crash because AWS changed their API or you neglected to handle an expected exception).
See doc for more information about long polling and possible exceptions.
As for using the async client: do you have any particular reason to use it? If not, then don't: a single receiver thread is much easier to manage.

If you want to use SQS and then lambda to process the request you can follow the steps given in the link or you always use lambda instead of SQS and invoke lambda for every request.

As of 2019 SQS can trigger lambdas:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

I found one solution for actively listening the queue.
For Node. I have used the following package and resolved my issue.
sqs-consumer
Link
https://www.npmjs.com/package/sqs-consumer

RabbitMQ and channels Java thread safety

in this guide https://www.rabbitmq.com/api-guide.html RabbitMQ guys state:
Channels and Concurrency Considerations (Thread Safety)
Channel instances must not be shared between threads. Applications should prefer using a Channel per thread instead of sharing the same Channel across multiple threads. While some operations on channels are safe to invoke concurrently, some are not and will result in incorrect frame interleaving on the wire. Sharing channels between threads will also interfere with * Publisher Confirms.
Thread safety is very important so I tried to be as diligent as possible, but here's the problem:
I have this application that receives messages from Rabbit. When a message is received, it processes it and then acks when it's done. The application can process just 2 items at the same time in a fixed thread pool with 2 threads. The QOS prefetch for Rabbit is set to 2, because I don't want to feed the app with more than it can handle in a time frame.
Now, my consumer's handleDelivery does the following:
Task run = new Task(JSON.parse(message));
service.execute(new TestWrapperThread(getChannel(),run,envelope.getDeliveryTag()));
At this point, you already figured out that TestWrapperThread does the channel.basicAck(deliveryTag, false); call as last operation.
By my understanding of the documentation, this is incorrect and potentially harmful because channel is not thread safe and this behavior could screw things up. But how I am supposed to do then? I mean, I have a few ideas but they would def make everything more complex and I'd like to figure it out if it's really necessary or not.
Thanks in advance

I suppose you are using Channel only for your consumer and not for other operations like publish etc..
In your case the only potential problem is here:
channel.basicAck(deliveryTag, false);
because you call this across two thread, btw this operation is safe, if you see the java code:
the class ChannelN.java calls:
public void basicAck(long deliveryTag, boolean multiple)
throws IOException
{
transmit(new Basic.Ack(deliveryTag, multiple));
}
see github code for ChannelN.java
the transmit method inside AMQChannel uses:
public void transmit(Method m) throws IOException {
synchronized (_channelMutex) {
transmit(new AMQCommand(m));
}
}
_channelMutex is a protected final Object _channelMutex = new Object();
created with the class.
see github code for AMQChannel.java
EDIT
As you can read on the official documentation, "some" operations are thread-safe, now it is not clear which ones.
I studied the code, an I think there are not problems to call the ACK across more threads.
Hope it helps.
EDIT2
I add also Nicolas's comment:
Note that consuming (basicConsume) and acking from more than one thread is a common rabbitmq pattern that is already used by the java client.
So you can use it safe.

Rabbit Mq java client parallel consumption

I want to process messages from a rabbitMq queue in parallel. The queue is configured to be autoAck =false. I am using the camel-rabbitMQ support for camel endpoints, which has support for a threadPoolSize parameter, but this does not have the desired effect. Messages are still processed serially off the queue, even when threadpoolsize=20.
From debugging through the code I can see that the threadpoolsize parameter is used to create an ExecutorService that is used to pass to the rabbit connectionfactory as described here. This all looks good until you get into the rabbit ConsumerWorkService. Here messages are processed in block of max size 16 messages. Each message in a block is processed serially and then if there is more work to do the executor service is invokes with the next block. A code snippet of this is below. From this use of the executor service I can't see how the messages can be processed in parallel. The executorservice only ever has one piece of work to perform at a time.
What am I Missing?
private final class WorkPoolRunnable implements Runnable {
public void run() {
int size = MAX_RUNNABLE_BLOCK_SIZE;
List<Runnable> block = new ArrayList<Runnable>(size);
try {
Channel key = ConsumerWorkService.this.workPool.nextWorkBlock(block, size);
if (key == null) return; // nothing ready to run
try {
for (Runnable runnable : block) {
runnable.run();
}
} finally {
if (ConsumerWorkService.this.workPool.finishWorkBlock(key)) {
ConsumerWorkService.this.executor.execute(new WorkPoolRunnable());
}
}
} catch (RuntimeException e) {
Thread.currentThread().interrupt();
}
}

RabbitMQ's documentation is not very clear about this but, even though the ConsumerWorkService is using a thread pool, this pool doesn't seem to be used in a way to process messages in parallel:
Each Channel has its own dispatch thread. For the most common use case of one Consumer per Channel, this means Consumers do not hold up other Consumers. If you have multiple Consumers per Channel be aware that a long-running Consumer may hold up dispatch of callbacks to other Consumers on that Channel.
(http://www.rabbitmq.com/api-guide.html)
This documentation suggests using one Channel per thread and, in fact, if you simply create as many Channels as the required level of concurrency, messages will be dispatched between the consumers linked to these channels.
I've tested with 2 channels and consumers: when 2 messages are in the queue, each consumer only picks one message at a time. The blocks of 16 messages you mentioned don't seem to interfere, which is a good thing.
As a matter of fact, Spring AMQP also creates as several channels to process messages concurrently. This is done by:
setting SimpleMessageListenerContainer.setConcurrentConsumers(...): http://docs.spring.io/spring-amqp/docs/1.3.6.RELEASE/api/
and setting CachingConnectionFactory.setChannelCacheSize(...) accordingly: http://docs.spring.io/spring-amqp/docs/1.3.6.RELEASE/api/
I've also tested this to be working as expected.

If you have a single Channel instance, it's going to invoke its registered consumers serially as you correctly found out by examining ConsumerWorkService. There are 2 ways to overcome that:
Use multiple channels instead of one.
Use single channel but implement consumers in a special way. They should just pick incoming message from queue and put it as a task into an internal thread pool.
You can find more details in this post.

Is it possible to have Java listen to multiple RabbitMQ queues?

Update: I thought my explanation may be too verbose so I think a simple way of thinking of what I'm trying to do is: I have multiple worker nodes that I want to utilize them for different tasks(and those different tasks come from different queues). Currently I only know how to make them listen to a single queue which is associated to a single type of work but I want them to listen to different queues so as different work comes up the same cluster of nodes can handle them. Hope thats more clear.
Hi Everyone,
I suspect this is possible but I can't seem to figure out exactly how to do it. I went through the tutorials on rabbitmq's site and they were really helpful and did what I wanted except it didn't show how to listen to multiple queues in the same program.
My program structure is basically a few phases..for example, phase 1 gathers alot of data and then phase 2 proceses it and loads it in a database, phase 3 analyzes it against other data, etc..Each phase cannot start until the previous phase is done and I wanted to use a queue system to use multiple machines to finish each phase quicker(so all consumers work on phase1 then once all of them are done, they go and work on phase 2 together,etc).
I think I can't just do each phase once because the queue could be empty and the computer would move to the next queue and I have no way of knowing if its empty because all the work is done or if its done because we haven't started putting work in the queue yet. So I thought(correct me if I'm wrong) a better way was to listen to all queues associated with all phases and as work gets put into phase1Queue it works on it and if work gets put into phase2Queue it works on it right after(I have another processes outside of the process i'm describing that monitors when each phase is done and sets up for the next phase). Hope that makes sense.
The code in the queue sample is helpful for a consumer listening to one queue but how can I make it listen to multiple(and call different programs depending on different queues). If there's a function for this already then awesome but I'm kind of looking for the logic I can use to implement this in java(worst case I have thought of running 5 sperate programs listening to each queue but I'm trying to find out if there's a better way, having one application with all my work in it would make managing distribution easier).
Thanks!
p.s. if it helps here's the consumer code that works for rabbitmq(but as you can see it only defines one queue):
import java.io.IOException;
import com.rabbitmq.client.ConnectionFactory;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.QueueingConsumer;
public class Worker {
private static final String TASK_QUEUE_NAME = "task_queue";
public static void main(String[] argv)
throws java.io.IOException,
java.lang.InterruptedException {
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.queueDeclare(TASK_QUEUE_NAME, true, false, false, null);
System.out.println(" [*] Waiting for messages. To exit press CTRL+C");
channel.basicQos(1);
QueueingConsumer consumer = new QueueingConsumer(channel);
channel.basicConsume(TASK_QUEUE_NAME, false, consumer);
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
String message = new String(delivery.getBody());
System.out.println(" [x] Received '" + message + "'");
doWork(message);
System.out.println(" [x] Done" );
channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
}
}
//...
}
Update: As per comment, the coordinator process is super simple. Its just a program that monitors from Phase1Queue and once that queue is empty it simply starts a process to fill up Phase2Queue,etc..

Another approach would be to separate the different phases in to different modules/services/processes and have all of them running concurrently.
Each phase's consumer listens to a different queue. Each phase is responsible for producing messages for the next phase's queue.
This way you have reduced complexity by having single responsibilities. Each process consumes from a particular queue and produces for another queue. You can then, if need be, scale these separate processes up and/or out.
This is a pattern that we use. We have an initial job from which up to 100 times more sub jobs are ultimately created. The initial jobs are very small and quick to run whereas the sub jobs are potentially long-running requests which we service with a small army of cloud instances. These jobs then return their results via another queue, the results of which are then collated and added to a database.

Java: High-performance message-passing (single-producer/single-consumer)

I initially asked this question here, but I've realized that my question is not about a while-true loop. What I want to know is, what's the proper way to do high-performance asynchronous message-passing in Java?
What I'm trying to do...
I have ~10,000 consumers, each consuming messages from their private queues. I have one thread that's producing messages one by one and putting them in the correct consumer's queue. Each consumer loops indefinitely, checking for a message to appear in its queue and processing it.
I believe the term is "single-producer/single-consumer", since there's one producer, and each consumer only works on their private queue (multiple consumers never read from the same queue).
Inside Consumer.java:
#Override
public void run() {
while (true) {
Message msg = messageQueue.poll();
if (msg != null) {
... // do something with the message
}
}
}
The Producer is putting messages inside Consumer message queues at a rapid pace (several million messages per second). Consumers should process these messages as fast as possible!
Note: the while (true) { ... } is terminated by a KILL message sent by the Producer as its last message.
However, my question is about the proper way to design this message-passing. What kind of queue should I use for messageQueue? Should it be synchronous or asynchronous? How should Message be designed? Should I use a while-true loop? Should Consumer be a thread, or something else? Will 10,000 threads slow down to a crawl? What's the alternative to threads?
So, what's the proper way to do high-performance message-passing in Java?

I would say that the context switching overhead of 10,000 threads is going to be very high, not to mention the memory overhead. By default, on 32-bit platforms, each thread uses a default stack size of 256kb, so that's 2.5GB just for your stack. Obviously you're talking 64-bit but even so, that quite a large amount of memory. Due to the amount of memory used, the cache is going to be thrashing lots, and the cpu will be throttled by the memory bandwidth.
I would look for a design that avoids using so many threads to avoid allocating large amounts of stack and context switching overhead. You cannot process 10,000 threads concurrently. Current hardware has typically less than 100 cores.
I would create one queue per hardware thread and dispatch messages in a round-robin fashion. If the processing times vary considerably, there is the danger that some threads finish processing their queue before they are given more work, while other threads never get through their allotted work. This can be avoided by using work stealing, as implemented in the JSR-166 ForkJoin framework.
Since communication is one way from the publisher to the subscribers, then Message does not need any special design, assuming the subscriber doesn't change the message once it has been published.
EDIT: Reading the comments, if you have 10,000 symbols, then create a handful of generic subscriber threads (one subscriber thread per core), that asynchornously recieve messages from the publisher (e.g. via their message queue). The subscriber pulls the message from the queue, retrieves the symbol from the message, and looks this up in a Map of message handlers, retrieves the handler, and invokes the handler to synchronously handle the message. Once done, it repeats, fetching the next message from the queue. If messages for the same symbol have to be processed in order (which is why I'm guessing you wanted 10,000 queues.), you need to map symbols to subscribers. E.g. if there are 10 subscribers, then symbols 0-999 go to subscriber 0, 1000-1999 to subscriber 1 etc.. A more refined scheme is to map symbols according to their frequency distribution, so that each subscriber gets roughly the same load. For example, if 10% of the traffic is symbol 0, then subscriber 0 will deal with just that one symbol and the other symbols will be distributed amongst the other subscribers.

You could use this (credit goes to Which ThreadPool in Java should I use?):
class Main {
ExecutorService threadPool = Executors.newFixedThreadPool(
Runtime.availableProcessors()*2);
public static void main(String[] args){
Set<Consumer> consumers = getConsumers(threadPool);
for(Consumer consumer : consumers){
threadPool.execute(consumer);
}
}
}
and
class Consumer {
private final ExecutorService tp;
private final MessageQueue messageQueue;
Consumer(ExecutorService tp,MessageQueue queue){
this.tp = tp;
this.messageQueue = queue;
}
#Override
public void run(){
Message msg = messageQueue.poll();
if (msg != null) {
try{
... // do something with the message
finally{
this.tp.execute(this);
}
}
}
}
}
This way, you can have okay scheduling with very little hassle.

First of all, there's no single correct answer unless you either put a complete design doc or you try different approaches for yourself.
I'm assuming your processing is not going to be computationally intensive otherwise you wouldn't be thinking of processing 10000 queues at the same time. One possible solution is to minimise context switching by having one-two threads per CPU. Unless your system is going to be processing data in strict real time that may possibly give you bigger delays on each queue but overall better throughput.
For example -- have your producer thread run on its own CPU and put batches of messages to consumer threads. Each consumer thread would then distribute messages to its N private queues, perform the processing step, receive new data batch and so on. Again, depends on your delay tolerance so the processing step may mean either processing all the queues, a fixed number of queues, as many queues it can unless a time threshold is reached. Being able to easily tell which queue belongs to which consumer thread (e.g. if queues are numbered sequentially: int consumerThreadNum = queueNum & 0x03) would be beneficial as looking them up in a hash table each time may be slow.
To minimise memory thrashing it may not be such a good idea to create/destroy queues all the time so you may want to pre-allocate a (max number of queues/number of cores) queue objects per thread. When a queue is finished instead of being destroyed it can be cleared and reused. You don't want gc to get in your way too often and for too long.
Another unknown is if your producer produces complete sets of data for each queue or will send data in chunks until the KILL command is received. If your producer sends complete data sets you may do away with the queue concept completely and just process the data as it arrives to a consumer thread.

Have a pool of consumer threads relative to the hardware and os capacity. These consumer threads could poll your message queue.
I would either have the Messages know how to process themselves or register processors with the consumer thread classes when they are initialized.

In the absence of more detail about the constraints of processing the symbols, its hard to give very specific advice.
You should take a look at this slashdot article:
http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO
It has quite a bit of discussions and actual measured data about the many thread vs. single select vs. thread pool arguments.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.