Message reducer/aggregator based on timeout or quantity exceeding

Message reducer/aggregator based on timeout or quantity exceeding - java

I'm working on a java SE (+netty) based system that receives messages of different types from clients, aggregates them and pushes aggregated results into storage.
I need to pre-accumulate messages before aggregation until one of two conditions is met - timeout exceeded or quantity exceeded. Timeouts and quantities are pre-configured for each type and may differ greatly. After that, I aggregate/reduce messages of same type and sender and push result into storage. Aggregation may look like calculating average value among messages. Or it may be much more complex. Post-aggregation in storage is not acceptable in my case.
The task seems easy, but I'm stuck with implementation. Obviously I need to collect messages in some data structure and check timeout and quantity rules on each element. I thought about DelayedQueue<Delayed<List<MyMessages>>> (List<MyMessages> - is an aggregatable list of messages).
DelayedQueue implements timeouts in a great way. But it's not clear, how to check maximum quantities and add new messages in Lists effectively. I don't want to check all Lists on every new message, searching for the right one. And it looks not thread safe to add data to Delayed<List> elements.
What data structures/architecture is suitable for the system I'm trying to create? I guess such problem has a proper academic name and solution, what should I google?

Ignoring existing data structures that might help here, the problem can fundamentally be solved in two ways: Either the thread(s) accepting messages performs the checks and notifies the aggregation thread, or the aggregation thread needs to poll. The first approach makes the limit check easy, the second approach makes the timeout easy.
I would suggest combining both: Recieving threads keep track of how many items have been accumulated and notifies the aggregating thread if the threshold has been reached, and the aggregating thread keeps track of the time.
You can do this, simplistically, something like this:
final long maxWait = 1000;
final int maxMessages = 10;
final ArrayBlockingQueue<Message> queue;
final Thread aggregator = new Thread()
{
#Override
public void run() {
try {
ArrayList<Message> messages = new ArrayList<>();
while ( true ) {
messages.clear();
queue.drainTo( messages );
// Store messages
this.wait( maxWait );
}
}
catch ( InterruptedException e ) {
// Handle this..
}
}
};
final Thread reciever = new Thread()
{
#Override
public void run() {
Message message; // Get this from network
queue.put( message );
if(queue.size() > maxMessages) {
aggregator.notify();
}
}
}
This does not handle your message grouping, but I'm sure you can see how this can be extrapolated to handle multiple queues of different message types. To make the aggregator only consider some specific message type when it's notified, you could use some more elaborate messaging mechanism instead of the wait/notify, for instance have it wait on a queue instead, where receiving threads in turn can put queues as "messages" about queues that need to be aggregated and stored.

Related

Difference between record-send-total and manually counted number of sent messages

I have a producer that sends messages for 15 second.
I wanted to investigate the total number of messages sent to the broker.
The first method I employed involved counting messages "manually", i.e.:
// ...
private int sentMessages = 0;
#Override
public void run() {
sendMessage(msg));
sentMessages++;
}
The second method I used involved analysing the producer's metrics.
I compared the number of produced messages, and significantly different results I observed baffled me. sendMessages was equal to 65243 whereas the producer's record-sent-total was equal 47883.
What might be the reason behind such a great difference between them?

I believe sendMessage(msg) processing in asynchronous way. So some message might failed to send. Try to update sentMessages count based on successful response.

Pattern to continuously listen to AWS SQS messages

I have a simple class named QueueService with some methods that wrap the methods from the AWS SQS SDK for Java. For example:
public ArrayList<Hashtable<String, String>> receiveMessages(String queueURL) {
List<Message> messages = this.sqsClient.receiveMessage(queueURL).getMessages();
ArrayList<Hashtable<String, String>> resultList = new ArrayList<Hashtable<String, String>>();
for(Message message : messages) {
Hashtable<String, String> resultItem = new Hashtable<String, String>();
resultItem.put("MessageId", message.getMessageId());
resultItem.put("ReceiptHandle", message.getReceiptHandle());
resultItem.put("Body", message.getBody());
resultList.add(resultItem);
}
return resultList;
}
I have another another class named App that has a main and creates an instace of the QueueService.
I looking for a "pattern" to make the main in App to listen for new messages in the queue. Right now I have a while(true) loop where I call the receiveMessagesmethod:
while(true) {
messages = queueService.receiveMessages(queueURL);
for(Hashtable<String, String> message: messages) {
String receiptHandle = message.get("ReceiptHandle");
String messageBody = message.get("MessageBody");
System.out.println(messageBody);
queueService.deleteMessage(queueURL, receiptHandle);
}
}
Is this the correct way? Should I use the async message receive method in SQS SDK?

To my knowledge, there is no way in Amazon SQS to support an active listener model where Amazon SQS would "push" messages to your listener, or would invoke your message listener when there are messages.
So, you would always have to poll for messages. There are two polling mechanisms supported for polling - Short Polling and Long Polling. Each has its own pros and cons, but Long Polling is the one you would typically end up using in most cases, although the default one is Short Polling. Long Polling mechanism is definitely more efficient in terms of network traffic, is more cost efficient (because Amazon charges you by the number of requests made), and is also the preferred mechanism when you want your messages to be processed in a time sensitive manner (~= process as soon as possible).
There are more intricacies around Long Polling and Short Polling that are worth knowing, and its somewhat difficult to paraphrase all of that here, but if you like, you can read a lot more details about this through the following blog. It has a few code examples as well that should be helpful.
http://pragmaticnotes.com/2017/11/20/amazon-sqs-long-polling-versus-short-polling/
In terms of a while(true) loop, I would say it depends.
If you are using Long Polling, and you can set the wait time to be (max) 20 seconds, that way you do not poll SQS more often than 20 seconds if there are no messages. If there are messages, you can decide whether to poll frequently (to process messages as soon as they arrive) or whether to always process them in time intervals (say every n seconds).
Another point to note would be that you could read upto 10 messages in a single receiveMessages request, so that would also reduce the number of calls you make to SQS, thereby reducing costs. And as the above blog explains in details, you may request to read 10 messages, but it may not return you 10 even if there are that many messages in the queue.
In general though, I would say you need to build appropriate hooks and exception handling to turn off the polling if you wish to at runtime, in case you are using a while(true) kind of a structure.
Another aspect to consider is whether you would like to poll SQS in your main application thread or you would like to spawn another thread. So another option could be to create a ScheduledThreadPoolExecutor with a single thread in the main to schedule a thread to poll the SQS periodically (every few seconds), and you may not need a while(true) structure.

There are a few things that you're missing:
Use the receiveMessages(ReceiveMessageRequest) and set a wait time to enable long polling.
Wrap your AWS calls in try/catch blocks. In particular, pay attention to OverLimitException, which can be thrown from receiveMessages() if you would have too many in-flight messages.
Wrap the entire body of the while loop in its own try/catch block, logging any exceptions that are caught (there shouldn't be -- this is here to ensure that your application doesn't crash because AWS changed their API or you neglected to handle an expected exception).
See doc for more information about long polling and possible exceptions.
As for using the async client: do you have any particular reason to use it? If not, then don't: a single receiver thread is much easier to manage.

If you want to use SQS and then lambda to process the request you can follow the steps given in the link or you always use lambda instead of SQS and invoke lambda for every request.

As of 2019 SQS can trigger lambdas:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

I found one solution for actively listening the queue.
For Node. I have used the following package and resolved my issue.
sqs-consumer
Link
https://www.npmjs.com/package/sqs-consumer

RabbitMQ by Example: Multiple Threads, Channels and Queues

I just read RabbitMQ's Java API docs, and found it very informative and straight-forward. The example for how to set up a simple Channel for publishing/consuming is very easy to follow and understand. But it's a very simple/basic example, and it left me with an important question: How can I set up 1+ Channels to publish/consume to and from multiple queues?
Let's say I have a RabbitMQ server with 3 queues on it: logging, security_events and customer_orders. So we'd either need a single Channel to have the ability to publish/consume to all 3 queues, or more likely, have 3 separate Channels, each dedicated to a single queue.
On top of this, RabbitMQ's best practices dictate that we set up 1 Channel per consumer thread. For this example, let's say security_events is fine with only 1 consumer thread, but logging and customer_order both need 5 threads to handle the volume. So, if I understand correctly, does that mean we need:
1 Channel and 1 consumer thread for publishing/consuming to and from security_events; and
5 Channels and 5 consumer threads for publishing/consuming to and from logging; and
5 Channels and 5 consumer threads for publishing/consuming to and from customer_orders?
If my understanding is misguided here, please begin by correcting me. Either way, could some battle-weary RabbitMQ veteran help me "connect the dots" with a decent code example for setting up publishers/consumers that meet my requirements here?

I think you have several issues with initial understanding. Frankly, I'm a bit surprised to see the following: both need 5 threads to handle the volume. How did you identify you need that exact number? Do you have any guarantees 5 threads will be enough?
RabbitMQ is tuned and time tested, so it is all about proper design
and efficient message processing.
Let's try to review the problem and find a proper solution. BTW, message queue itself will not provide any guarantees you have really good solution. You have to understand what you are doing and also do some additional testing.
As you definitely know there are many layouts possible:
I will use layout B as the simplest way to illustrate 1 producer N consumers problem. Since you are so worried about the throughput. BTW, as you might expect RabbitMQ behaves quite well (source). Pay attention to prefetchCount, I'll address it later:
So it is likely message processing logic is a right place to make sure you'll have enough throughput. Naturally you can span a new thread every time you need to process a message, but eventually such approach will kill your system. Basically, more threads you have bigger latency you'll get (you can check Amdahl's law if you want).
(see Amdahl’s law illustrated)
Tip #1: Be careful with threads, use ThreadPools (details)
A thread pool can be described as a collection of Runnable objects
(work queue) and a connections of running threads. These threads are
constantly running and are checking the work query for new work. If
there is new work to be done they execute this Runnable. The Thread
class itself provides a method, e.g. execute(Runnable r) to add a new
Runnable object to the work queue.
public class Main {
private static final int NTHREDS = 10;
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
for (int i = 0; i < 500; i++) {
Runnable worker = new MyRunnable(10000000L + i);
executor.execute(worker);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination();
System.out.println("Finished all threads");
}
}
Tip #2: Be careful with message processing overhead
I would say this is obvious optimization technique. It is likely you'll send small and easy to process messages. The whole approach is about smaller messages to be continuously set and processed. Big messages eventually will play a bad joke, so it is better to avoid that.
So it is better to send tiny pieces of information, but what about processing? There is an overhead every time you submit a job. Batch processing can be very helpful in case of high incoming message rate.
For example, let's say we have simple message processing logic and we do not want to have thread specific overheads every time message is being processed. In order to optimize that very simple CompositeRunnable can be introduced:
class CompositeRunnable implements Runnable {
protected Queue<Runnable> queue = new LinkedList<>();
public void add(Runnable a) {
queue.add(a);
}
#Override
public void run() {
for(Runnable r: queue) {
r.run();
}
}
}
Or do the same in a slightly different way, by collecting messages to be processed:
class CompositeMessageWorker<T> implements Runnable {
protected Queue<T> queue = new LinkedList<>();
public void add(T message) {
queue.add(message);
}
#Override
public void run() {
for(T message: queue) {
// process a message
}
}
}
In such a way you can process messages more effectively.
Tip #3: Optimize message processing
Despite the fact you know can process messages in parallel (Tip #1) and reduce processing overhead (Tip #2) you have to do everything fast. Redundant processing steps, heavy loops and so on might affect performance a lot. Please see interesting case-study:
Improving Message Queue Throughput tenfold by choosing the right XML Parser
Tip #4: Connection and Channel Management
Starting a new channel on an existing connection involves one network
round trip - starting a new connection takes several.
Each connection uses a file descriptor on the server. Channels don't.
Publishing a large message on one channel will block a connection
while it goes out. Other than that, the multiplexing is fairly transparent.
Connections which are publishing can get blocked if the server is
overloaded - it's a good idea to separate publishing and consuming
connections
Be prepared to handle message bursts
(source)
Please note, all tips are perfectly work together. Feel free to let me know if you need additional details.
Complete consumer example (source)
Please note the following:
channel.basicQos(prefetch) - As you saw earlier prefetchCount might be very useful:
This command allows a consumer to choose a prefetch window that
specifies the amount of unacknowledged messages it is prepared to
receive. By setting the prefetch count to a non-zero value, the broker
will not deliver any messages to the consumer that would breach that
limit. To move the window forwards, the consumer has to acknowledge
the receipt of a message (or a group of messages).
ExecutorService threadExecutor - you can specify properly configured executor service.
Example:
static class Worker extends DefaultConsumer {
String name;
Channel channel;
String queue;
int processed;
ExecutorService executorService;
public Worker(int prefetch, ExecutorService threadExecutor,
, Channel c, String q) throws Exception {
super(c);
channel = c;
queue = q;
channel.basicQos(prefetch);
channel.basicConsume(queue, false, this);
executorService = threadExecutor;
}
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body) throws IOException {
Runnable task = new VariableLengthTask(this,
envelope.getDeliveryTag(),
channel);
executorService.submit(task);
}
}
You can also check the following:
Solution Architecting Using Queues?
Some queuing theory: throughput, latency and bandwidth
A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo…

How can I set up 1+ Channels to publish/consume to and from multiple queues?
You can implement using threads and channels. All you need is a way to
categorize things, ie all the queue items from the login, all the
queue elements from security_events etc. The catagorization can be
achived using a routingKey.
ie: Every time when you add an item to the queue u specify the routing
key. It will be appended as a property element. By this you can get
the values from a particular event say logging.
The following Code sample explain how you make it done in client side.
Eg:
The routing key is used identify the type of the channel and retrive the types.
For example if you need to get all the channels about the type Login
then you must specify the routing key as login or some other keyword
to identify that.
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.exchangeDeclare(EXCHANGE_NAME, "direct");
string routingKey="login";
channel.basicPublish(EXCHANGE_NAME, routingKey, null, message.getBytes());
You can Look here for more details about the Categorization ..
Threads Part
Once the publishing part is over you can run the thread part..
In this part you can get the Published data on the basis of category. ie; routing Key which in your case is logging, security_events and customer_orders etc.
look in the Example to know how retrieve the data in threads.
Eg :
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
//**The threads part is as follows**
channel.exchangeDeclare(EXCHANGE_NAME, "direct");
String queueName = channel.queueDeclare().getQueue();
// This part will biend the queue with the severity (login for eg:)
for(String severity : argv){
channel.queueBind(queueName, EXCHANGE_NAME, routingKey);
}
boolean autoAck = false;
channel.basicConsume(queueName, autoAck, "myConsumerTag",
new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body)
throws IOException
{
String routingKey = envelope.getRoutingKey();
String contentType = properties.contentType;
long deliveryTag = envelope.getDeliveryTag();
// (process the message components here ...)
channel.basicAck(deliveryTag, false);
}
});
Now a thread that process the Data in the Queue of the
type login(routing key) is created. By this way you can create multiple threads.
Each serving different purpose.
look here for more details about the threads part..

Straight answer
For your particular situation (logging and customer_order both need 5 threads) I would create 1 Channel with 1 Consumer for logging and 1 Channel with 1 Consumer for customer_order. I would also create 2 thread pools (5 threads each): one to be used by logging Consumer and the other by customer_order Consumer.
See Consumption below for why should it work.
PS: do not create the thread pool inside the Consumer; be also aware that Channel.basicConsume(...) is not blocking
Publish
According to Channels and Concurrency Considerations (Thread Safety):
Concurrent publishing on a shared channel is best avoided entirely,
e.g. by using a channel per thread. ... Consuming in one thread and publishing in another thread on a shared channel can be safe.
pretty clear ...
Consumption
The Channel might (I say might because of this) run all its Consumer(s) in the same thread; this ideea is almost explicitly conveyed by Receiving Messages by Subscription ("Push API"):
Each Channel has its own dispatch thread. For the most common use case
of one Consumer per Channel, this means Consumers do not hold up other
Consumers. If you have multiple Consumers per Channel be aware that a
long-running Consumer may hold up dispatch of callbacks to other
Consumers on that Channel.
This means that in certain conditions many Consumers pertaining to the same Channel would run on the same thread such that the 1th one would hold up dispatch of callbacks for the next ones. The dispatch word is very confusing because sometimes refers to "thread work dispatching" while here refers mainly to calling Consumer.handleDelivery (see this again).
But what own dispatch thread is about? is about one from the thread pool used with (see Channels and Concurrency Considerations (Thread Safety)):
Server-pushed deliveries ... uses a
java.util.concurrent.ExecutorService, one per connection.
Conclusion
If one has 1 Channel with 1 Consumer but wants to process the incoming messages in parallel than he better creates (outside the Consumer) and uses (inside the Consumer) his own thread pool; hence each Consumer received message will be processed on the user's thread pool instead on the Channel's own dispatch thread.
Is this approach (user's thread pool used from Consumer) even possible/valid/acceptable at all? it is, see Channels and Concurrency Considerations (Thread Safety):
thread that received the delivery (e.g. Consumer#handleDelivery
delegated delivery handling to a different thread) ...

How do I get Java to write the contents of an ArrayList to a file once every minute?

Just a quick question about the above subject. Basically, I'm writing a piece of software which captures data from the network and writes it to an external file for further processing.
What I want to know is what code would be the best to use in order to get this desired effect.
Thanks for your time
David

I'd probably implement it using a TimerTask. E.g.,
int hour = 1000*60*60;
int delay = 0;
Timer t = new Timer();
t.scheduleAtFixedRate(new TimerTask() {
public void run() {
// Write to disk ...
}
}, delay, hour);
Otherwise quarts is a powerful java scheduler which is capable of handling more advanced scheduling needs.

You can use the Executor Framework, here is a sample implementation:
final List<String> myData = new ArrayList<String>();
final File f = new File("some/file.txt");
final Runnable saveListToFileJob = new Runnable(){
#Override
public void run(){ /* this uses Guava */
try{
Files.write(
Joiner.on('\n').join(myData),
f, Charsets.UTF_8);
} catch(final IOException e){
throw new IllegalStateException(e);
}
}
};
Executors
.newScheduledThreadPool(1) /* one thread should be enough */
.scheduleAtFixedRate(saveListToFileJob,
1000 * 60 /* 1 minute initial delay */,
1, TimeUnit.MINUTES /* run once every minute */);

The question probably needs a few more details. What part of the solution is causing you trouble?
The scheduling?
The reading from the network?
The persistence in a memory data structure?
The writing to a file?
You should also describe the problem domain a little bit more in detail since it might significantly affect any solution that might be offered. For example:
What is the nature of the data going over the network?
Does it have to be per minute? What if the network isn't done sending and the minute is up and you start reading?
What exactly does the ArrayList contain?
Can you describe the file output? (text file? serialized object? etc...)
Just an initial hunch on my part --some thoughts I would have when designing a solution for the problem you described would involve some kind of Producer-Consumer approach.
A Runnable/Thread object whose sole responsibility is to continuously read from the network, assemble the data and place it in a synchronized queue.
A Runnable/Thread object in a wait() state observing the synchronized queue. When signaled by the queue, it will start reading the contents of the queue for persistence to a file(s).
When there are items in the queue (or when a certain threshold is reached) it will notify() the waiting queue reader to start consuming the objects from the queue for persistence.
I maybe completely off base, but the fact that you're reading from the network implies some sort of unpredictability and unreliability. So instead of relying on timers, I would rely on the Producer of data (object reading from the network) signal the Consumer (the object that will use the data read from the network) to do something with the data.

Java: High-performance message-passing (single-producer/single-consumer)

I initially asked this question here, but I've realized that my question is not about a while-true loop. What I want to know is, what's the proper way to do high-performance asynchronous message-passing in Java?
What I'm trying to do...
I have ~10,000 consumers, each consuming messages from their private queues. I have one thread that's producing messages one by one and putting them in the correct consumer's queue. Each consumer loops indefinitely, checking for a message to appear in its queue and processing it.
I believe the term is "single-producer/single-consumer", since there's one producer, and each consumer only works on their private queue (multiple consumers never read from the same queue).
Inside Consumer.java:
#Override
public void run() {
while (true) {
Message msg = messageQueue.poll();
if (msg != null) {
... // do something with the message
}
}
}
The Producer is putting messages inside Consumer message queues at a rapid pace (several million messages per second). Consumers should process these messages as fast as possible!
Note: the while (true) { ... } is terminated by a KILL message sent by the Producer as its last message.
However, my question is about the proper way to design this message-passing. What kind of queue should I use for messageQueue? Should it be synchronous or asynchronous? How should Message be designed? Should I use a while-true loop? Should Consumer be a thread, or something else? Will 10,000 threads slow down to a crawl? What's the alternative to threads?
So, what's the proper way to do high-performance message-passing in Java?

I would say that the context switching overhead of 10,000 threads is going to be very high, not to mention the memory overhead. By default, on 32-bit platforms, each thread uses a default stack size of 256kb, so that's 2.5GB just for your stack. Obviously you're talking 64-bit but even so, that quite a large amount of memory. Due to the amount of memory used, the cache is going to be thrashing lots, and the cpu will be throttled by the memory bandwidth.
I would look for a design that avoids using so many threads to avoid allocating large amounts of stack and context switching overhead. You cannot process 10,000 threads concurrently. Current hardware has typically less than 100 cores.
I would create one queue per hardware thread and dispatch messages in a round-robin fashion. If the processing times vary considerably, there is the danger that some threads finish processing their queue before they are given more work, while other threads never get through their allotted work. This can be avoided by using work stealing, as implemented in the JSR-166 ForkJoin framework.
Since communication is one way from the publisher to the subscribers, then Message does not need any special design, assuming the subscriber doesn't change the message once it has been published.
EDIT: Reading the comments, if you have 10,000 symbols, then create a handful of generic subscriber threads (one subscriber thread per core), that asynchornously recieve messages from the publisher (e.g. via their message queue). The subscriber pulls the message from the queue, retrieves the symbol from the message, and looks this up in a Map of message handlers, retrieves the handler, and invokes the handler to synchronously handle the message. Once done, it repeats, fetching the next message from the queue. If messages for the same symbol have to be processed in order (which is why I'm guessing you wanted 10,000 queues.), you need to map symbols to subscribers. E.g. if there are 10 subscribers, then symbols 0-999 go to subscriber 0, 1000-1999 to subscriber 1 etc.. A more refined scheme is to map symbols according to their frequency distribution, so that each subscriber gets roughly the same load. For example, if 10% of the traffic is symbol 0, then subscriber 0 will deal with just that one symbol and the other symbols will be distributed amongst the other subscribers.

You could use this (credit goes to Which ThreadPool in Java should I use?):
class Main {
ExecutorService threadPool = Executors.newFixedThreadPool(
Runtime.availableProcessors()*2);
public static void main(String[] args){
Set<Consumer> consumers = getConsumers(threadPool);
for(Consumer consumer : consumers){
threadPool.execute(consumer);
}
}
}
and
class Consumer {
private final ExecutorService tp;
private final MessageQueue messageQueue;
Consumer(ExecutorService tp,MessageQueue queue){
this.tp = tp;
this.messageQueue = queue;
}
#Override
public void run(){
Message msg = messageQueue.poll();
if (msg != null) {
try{
... // do something with the message
finally{
this.tp.execute(this);
}
}
}
}
}
This way, you can have okay scheduling with very little hassle.

First of all, there's no single correct answer unless you either put a complete design doc or you try different approaches for yourself.
I'm assuming your processing is not going to be computationally intensive otherwise you wouldn't be thinking of processing 10000 queues at the same time. One possible solution is to minimise context switching by having one-two threads per CPU. Unless your system is going to be processing data in strict real time that may possibly give you bigger delays on each queue but overall better throughput.
For example -- have your producer thread run on its own CPU and put batches of messages to consumer threads. Each consumer thread would then distribute messages to its N private queues, perform the processing step, receive new data batch and so on. Again, depends on your delay tolerance so the processing step may mean either processing all the queues, a fixed number of queues, as many queues it can unless a time threshold is reached. Being able to easily tell which queue belongs to which consumer thread (e.g. if queues are numbered sequentially: int consumerThreadNum = queueNum & 0x03) would be beneficial as looking them up in a hash table each time may be slow.
To minimise memory thrashing it may not be such a good idea to create/destroy queues all the time so you may want to pre-allocate a (max number of queues/number of cores) queue objects per thread. When a queue is finished instead of being destroyed it can be cleared and reused. You don't want gc to get in your way too often and for too long.
Another unknown is if your producer produces complete sets of data for each queue or will send data in chunks until the KILL command is received. If your producer sends complete data sets you may do away with the queue concept completely and just process the data as it arrives to a consumer thread.

Have a pool of consumer threads relative to the hardware and os capacity. These consumer threads could poll your message queue.
I would either have the Messages know how to process themselves or register processors with the consumer thread classes when they are initialized.

In the absence of more detail about the constraints of processing the symbols, its hard to give very specific advice.
You should take a look at this slashdot article:
http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO
It has quite a bit of discussions and actual measured data about the many thread vs. single select vs. thread pool arguments.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.