How to design non-EJB load balanced applications? - java

I have a java class Processor that is listening to a jms topic and it is struggling to keep up with the speed in which messages are arriving so we've decided to go concurrent:
A single class listening to the topic who's job is to hand the messages out to a pool of worker threads, effectively being a load balancer. It also has to prevent 2 workers processing messages for the same customer.
I expected there to be quite a lot of information on the internet about this but everything seems to suggest the use of EJBs where the app server manages the pool and does the balancing. I'm sure this must be a really common problem, but can't seem to find any libraries or design patterns to assist. Am I making more out of it than it is and should just delve in and write my own code?

Why don't you just use a queue instead of a topic and have several instances of the same application handle messages from this queue ?

This is an easy problem to solve with a pool of listeners. That's what the app server would be doing for you.
I'd get a good app server and use its MDBs to solve this quickly. Size the pool to keep up and you'll be fine.
If you insist on writing your own code, get a good open source pool implementation and use it.
If it must be non-EJB, consider Spring. It has message driven POJOs that could be just what you need.

Related

Websphere Message Queue Multithreaded

We are backend processor and doing programming with JMS MQ. We have 2 queues. one is used to get message and another one is used to send message. All the banking users will put messages to Q1 through their IB, MB etc. We receive messages from Q1 and process it and we send message to Q2.
Currently we do not use multi threading for this. Can we able to use multi threading for this or single thread is enough to do this. because we are getting messages from Q1 one by one and process it.
Kindly revert back to me if question is not understandable. Please someone help me.
Yes, JMS allows for multiple readers on the same queue. You can do this by multi-threading, multiple application instances, or a dispatch layer that fetches messages and then passes them to a handler through a callback or other mechanism.
The application must support that, however. For example, if two messages are related and must be processed in order, the order is not preserved if there is more than one listener on the queue. This is one reason why async messaging patterns strongly prefer messages not to have order dependencies or affinities.
If you use multi-threading, it is important to make sure to preserve transactionality. If multiple threads use the same connection and one issues a COMMIT then that commits all outstanding messages across all the threads sharing that connection.

How many connections to maintain in RabbitMQ?

I am using the RabbitMQ java client.
My app has multiple exchanges and queues. Adopting something similar to the Pub/Sub model.
What is the best practice regarding connections?
Shall I have one connection per app?
I understand the channel model, and the thread (un)safety model. Just not sure if I should have multiple connections or not.
One connection per app is correct.
Within that connection, you will have many channels - where the actual work is done.
You can have hundreds or thousands of message producers and consumers (each on their on channel) inside a single connection.
If you start to see slowdown in your RMQ setup because you're dong too much work, look at clustering RMQ and/or standing up multiple instances of your app.
But you would still maintain 1 connection per app instance.
It will depends on the volumetry of messages you will have. If it really is huge, maybe 2 or 3 connections could do it, but one per application seems to be the best choice

Module clustering and JMS

I have a module which runs standalone in a JVM (no containers) and communicates with other modules via JMS.
My module is both a producer in one queue and a consumer in a different queue.
I have then need to cluster this module, both for HA reasons and for workload reasons, and I'm probably going to go with Terracotta+Hibernate for clustering my entities.
Currently when my app starts it launches a thread (via Executors.newSingleThreadExecutor()) which serves as the consumer (I can attach actual code sample if relevant and neccessary).
What I understood from reading questions here is that if I just start up my module on N different JVMs then N different subscribers will be created and each message in the queue will arrive to N subscribers.
What I'd like to do is have only one of them (let's currently say that which one is not important) process that message and so in actuality enable me to process N messages at a time.
How can/should this be done? Am I way off the track?
BTW, I'm using OpenMQ as my implementation but I don't know if that's relevant.
Thanks for any help
A classic case of message handling in clustered environment. This is what I would do.
Use Broadcast message (Channel based) in place of Queue. Queue being useful for point to point communication is not very effective. Set validity of message till the time it is consumed by one of the consumer. This way, other consumers wont even see the message and only one consumer will consume it.
Take a look at JGroups. You may consider implementing your module/subscribers to use jgroups for the kind of synchronization you need. JGroups provide Reliable Multicast Communication.

What ways exist to distribute asynchronous batch tasks?

I am currently investigating what Java compatible solutions exist to address my requirements as follows:
Timer based / Schedulable tasks to batch process
Distributed, and by that providing the ability to scale horizontally
Resilience, no SPFs please
The nature of these tasks (heavy XML generation, and the delivery to web based receiving nodes) means running them on a single server using something like Quartz isn't viable.
I have heard of technologies like Hadoop and JavaSpaces which have addressed the scaling and resilience end of the problem effectively. Not knowing whether these are quite suited to my requirements, its hard to know what other technologies might fit well.
I was wondering really what people in this space felt were options available, and how each plays its strengths, or suits certain problems better than others.
NB: Its worth noting that schedule-ability is perhaps a hangover from how we do things presently. Yes there are tasks which ought to go at certain times. It has also been used to throttle throughput at times when no mandate for set times exists.
Asynchronous always brings JMS to mind for me. Send the request message to a queue; a MessageListener is plucked out of the pool to handle it.
This can scale, because the queue and listener can be on a remote server. The size of the listener thread pool can be configured. You can have different listeners for different tasks.
UPDATE: You can avoid having a single point of failure by clustering and load balancing.
You can get JMS without cost using ActiveMQ (open source), JBOSS (open source version available), or any Java EE app server, so budget isn't a consideration.
And no lock-in, because you're using JMS, besides the fact that you're using Java.
I'd recommend doing it with Spring message driven POJOs. The community edition is open source, of course.
If that doesn't do it for you, have a look at Spring Batch and Spring Integration. Both of those might be useful, and the community editions are open source.
Have you looked into GridGain? I am pretty sure it won't solve the scheduling problem, but you can scale it and it happens like "magic", the code to be executed is sent to a node and it is executed in there. It works fine when you don't have a database connection to be sent (or anything that is not serializable).

Potential pitfalls in using a JMS queue?

I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.
Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.
If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.
You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.

Categories

Resources