Apache Camel RabbitMQ consumers leaves back extra threads running

Apache Camel RabbitMQ consumers leaves back extra threads running - java

I have built an app which starts multiple RabbitMQ consumers. When I start the app in debug mode in eclipse, I can see desired numbers of threads spawned as can be seen in the Debug window:
The app deals with several RabbitMQ queues plus some seda queues. The app continues executing by processing and moving messages from one queue to another.
There are at least 7 routes starting from RabbitMQ consumer. These routes roughly looks like this:
from("rabbitmq://url")
.process(Processor1.class)
.process(Processor2.class)
There is one specific start queue. Depending upon messages published, the message flow from different sequence of queues. So I was testing different sequence flows by publishing different messages to the start queue. After testing some flows by publishing different messages to the start queue, I realized that it spawned many new threads. And even after that sequence flow finishes (that is the message leaves final queue and the final processor in the camel route also executes completely), the thread is left back in running state. I found many such threads added up after I tested multiple flows. Five such threads can be seen in the screenshot below.
Above are just five extra threads, however this count shoots fast enough as I test multiple complex flow. I have came across 44 threads count. So I was wondering what wrong I am doing. Do I have to explicitly stop the route threads in some way? Did I miss/forget some configuration that I must on the camel route? Why is this happening? Is it normal?
PS: My machine is excessively slow on RAM, just 4GB. It runs two lightweight db servers, two web apps, eclipse and my main (above) aap. Most of the time, 3.7 GBs are full. Some times it takes time for a breakpoint (inside camel processor) to hit when I publish a message in the queue. Can such machine be the reason
for erratic threads leaving behind? (Though I primarily think its me missing some setting on the routes)

Related

Akka clustering - one Manager Actor per node

I’m working on an application that often queries a very large number of actors and hence sends / receives a very large number of messages. When the application is ran on a single machine this is not an issue because the messages are sent within the boundaries of a single JVM which is quite fast. However, when I run the application on multiple nodes (using akka-cluster) each node hosts part of these actors and the messages go over the network which becomes extremely slow.
One solution that I came up with is to have a ManagerActor on each node where the application is ran. This will greatly minimize the number of messages exchanged (i.e. instead of sending thousands of messages to each of the actors, if we run the application on 3 nodes we send 3 messages - one for each ManagerActor which then sends messages within the current JVM to the other (thousands of) actors which is very fast). However, I’m fairly new to Akka and I’m not quite sure that such a solution makes sense. Do you see any drawbacks of it? Any other options which are better / more native to Akka?

You could use Akka's Distributed Publish-Subscribe to achieve that. That way you simply start a manager actor on each node the usual way, have them subscribe to a topic, and then publish messages to them using that topic topic. There is a simple example of this in the docs linked above.

ActiveMQ - Controlling how many messages are consumed at a time.

Apologies for the wording of my question.
I am using Tomee.
I have an ActiveMQ queue set up and receiving messages from a Producer (the Tomee provided example)
It is persisted in MySQL (in case that matters)
My scenario is this...
A message comes into the queue
A Consumer/Monitor reads the message and starts a thread to run a process (backup, copying, processing etc...) that could take some time to complete.
At anyone time I could have 5 messages to process or 500+ (and anything in between)
Ideally, I would like some Java/Apache library that is designed to monitor the queue and read 10 messages (for example) and then start the threads and then wait for one to finish before starting any more. For all intents and purposes I am trying to create a 'thread pool' or 'work queue' that prevents too many processes from starting up at any one time.
OR
Does this need to be thread pooled outside of ActiveMQ ?
I'm new to JMS and am beginning to understand it but still a long way to go.
Any help is appreciated.
Trevor

What you are looking to do sounds like something that could easily be solved using Apache Camel. Take a look at the Camel documentation for the competing consumers EIP which sounds like an ideal fit for your case.

HornetQ: consuming distributed queue sequentially

For a current project, I'm trying to set up the following scenario with JBoss 7.1 and HornetQ (JMS), which I think is a fairly common use case: There are three applications servers. A number of MDBs should each process a broken-down fragment of a lengthy calculation process, the tasks should be distributed among the three servers. When one fragment is finished and a corresponding result is ready, the result should be sent to a distributed queue, from where it is consumed an the total result is assembled. In order to avoid race conditions during total result assembly, the "result" queue must be processed sequentially, although it may be distributed among several servers. No message in the result queue may be processed while another message ist still in progress.
An administrative constraint is that the consumers (MDB or session beans) consuming messages from the result queue can be deployed on all of the cluster nodes, i.e. the EAR deployed on the cluster nodes are identical. In that case, the same consumer code will be deployed on each of the nodes. Is there still a way to synchronize access the the queue?

I don't fully understand your use case, but It sounds you need message grouping.
http://docs.jboss.org/hornetq/2.4.0.beta1/docs/user-manual/html/message-grouping.html
if you post edit your question with something simpler I can understand without digging on your testcase I may be able to add more information to this answer.
You also talked about lengthy process. what sounds that you may have clients buffering. take a look on treating slow consumers by setting consumer-window-size=0 on a serverLocator.
this example here on hornetq exemplifies how that could be achieved:
http://docs.jboss.org/hornetq/2.4.0.beta1/docs/user-manual/html/examples.html#examples.no-consumer-buffering

Profiling Netty Performance

I'm writing a Netty application. The application is running on a 64 bit eight core linux box
The Netty application is a simple router that accepts requests (incoming pipeline) reads some metadata from the request and forwards the data to a remote service (outgoing pipeline).
This remote service will return one or more responses to the outgoing pipeline. The Netty application will route the responses back to the originating client (the incoming pipeline)
There will be thousands of clients. There will be thousands of remote services.
I'm doing some small scale testing (ten clients, ten remotes services) and I don't see the sub 10 millisecond performance I'm expecting at a 99.9 percentile. I'm measuring latency from both client side and server side.
I'm using a fully async protocol that is similar to SPDY. I capture the time (I just use System.nanoTime()) when we process the first byte in the FrameDecoder. I stop the timer just before we call channel.write(). I am measuring sub-millisecond time (99.9 percentile) from the incoming pipeline to the outgoing pipeline and vice versa.
I also measured the time from the first byte in the FrameDecoder to when a ChannelFutureListener callback was invoked on the (above) message.write(). The time was a high tens of milliseconds (99.9 percentile) but I had trouble convincing myself that this was useful data.
My initial thought was that we had some slow clients. I watched channel.isWritable() and logged when this returned false. This method did not return false under normal conditions
Some facts:
We are using the NIO factories. We have not customized the worker size
We have disabled Nagel (tcpNoDelay=true)
We have enabled keep alive (keepAlive=true)
CPU is idle 90+% of the time
Network is idle
The GC (CMS) is being invoked every 100 seconds or so for a very short amount of time
Is there a debugging technique that I could follow to determine why my Netty application is not running as fast as I believe it should?
It feels like channel.write() adds the message to a queue and we (application developers using Netty) don't have transparency into this queue. I don't know if the queue is a Netty queue, an OS queue, a network card queue or what. Anyway I'm reviewing examples of existing applications and I don't see any anti-patterns I'm following
Thanks for any help/insight

Netty creates Runtime.getRuntime().availableProcessors() * 2 workers by default. 16 in your case. That means you can handle up to 16 channels simultaneously, other channels will wait untils you release the ChannelUpstreamHandler.handleUpstream/SimpleChannelHandler.messageReceived handlers, so don't do heavy operations in these (IO) threads, otherwise you can stuck the other channels.

You haven't specified your Netty version, but it sounds like Netty 3.
Netty 4 is now stable, and I would advise that you update to it as soon as possible.
You have specified that you want ultra low latency times, as well as tens of thousands of clients and services. This doesn't really mix well. NIO is inherently reasonably latent as opposed to OIO. However the pitfall here is that OIO probably wont be able to reach the number of clients you are hoping for. None the less I would use an OIO event loop / factory and see how it goes.
I myself have a TCP server, which takes around 30ms on localhost to send and receive and process a few TCP packets (measured from the time client opens a socket until server closes it). If you really do require such low latencies I suggest you switch away from TCP due to the SYN/ACK spam that is required to open a connection, this is going to use a large part of your 10ms.

Measuring time in a multi-threaded environment is very difficult if you are using simple things like System.nanoTime(). Imagine the following on a 1 core system:
Thread A is woken up and begins processing the incoming request.
Thread B is woken up and begins processing the incoming request. But since we are working on a 1 core machine, this ultimately requires that Thread A is put on pause.
Thread B is done and performed perfectly fast.
Thread A resumes and finishes, but took twice as long as Thread B. Because you actually measured the time it took to finish for Thread A + Thread B.
There are two approaches on how to measure correctly in this case:
You can enforce that only one thread is used at all times.
This allows you to measure the exact performance of the operation, if the OS does not interfere. Because in the above example Thread B can be outside of your program as well. A common approach in this case is to median out the interference, which will give you an estimation of the speed of your code.You can however assume, that on an otherwise idle multi-core system, there will be another core to process background tasks, so your measurement will usually not be interrupted. Setting this thread to high priority helps as well.
You use a more sophisticated tool that plugs into the JVM to actually measure the atomic executions and time it took for those, which will effectively remove outside interference almost completely. One tool would be VisualVM, which is already integrated in NetBeans and available as a plugin for Eclipse.
As a general advice: it is not a good idea to use more threads than cores, unless you know that those threads will be blocked by some operation frequently. This is not the case when using non-blocking NIO for IO-operations as there is no blocking.
Therefore, in your special case, you would actually reduce the performance for clients, as explained above, because communication would be put on hold up to 50% of the time under high load. In worst case, that could cause a client to even run into a timeout, as there is no guarantee when a thread is actually resumed (unless you explicitly request fair scheduling).

Module clustering and JMS

I have a module which runs standalone in a JVM (no containers) and communicates with other modules via JMS.
My module is both a producer in one queue and a consumer in a different queue.
I have then need to cluster this module, both for HA reasons and for workload reasons, and I'm probably going to go with Terracotta+Hibernate for clustering my entities.
Currently when my app starts it launches a thread (via Executors.newSingleThreadExecutor()) which serves as the consumer (I can attach actual code sample if relevant and neccessary).
What I understood from reading questions here is that if I just start up my module on N different JVMs then N different subscribers will be created and each message in the queue will arrive to N subscribers.
What I'd like to do is have only one of them (let's currently say that which one is not important) process that message and so in actuality enable me to process N messages at a time.
How can/should this be done? Am I way off the track?
BTW, I'm using OpenMQ as my implementation but I don't know if that's relevant.
Thanks for any help

A classic case of message handling in clustered environment. This is what I would do.
Use Broadcast message (Channel based) in place of Queue. Queue being useful for point to point communication is not very effective. Set validity of message till the time it is consumed by one of the consumer. This way, other consumers wont even see the message and only one consumer will consume it.

Take a look at JGroups. You may consider implementing your module/subscribers to use jgroups for the kind of synchronization you need. JGroups provide Reliable Multicast Communication.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.