Multiple Consumers with Amazon SQS

Multiple Consumers with Amazon SQS - java

I code all my micro-service in java. I want use Multiple Consumers with Amazon SQS but each consumer has multiple instances on AWS behind a load balancer.
I use SNS for input stream
I use SQS Standard Queue after SNS.
I find the same question on stackoverflow (Using Amazon SQS with multiple consumers)
This sample is
https://aws.amazon.com/fr/blogs/aws/queues-and-notifications-now-best-friends/
When I read SQS Standard Queue documentation, I see that occasionally more than one copy of a message is delivered.:
Each message has a message_id. How to detect that there are not multiple instances of a same micro-service processes the same message that would have been sent multiple times. I got an idea by registering the message_id in a dynamodb database but if this is done by several instances of the same micro-service, how to make a lock on the get (a bit like a SELECT FOR UPDATE)?
for example multiple instances of a same micro-service "Scan Metadata".

As you have mentioned, standard SQS queues can sometimes deliver the same message more than once. This is due to the distribute nature of SQS service. Each message is stored on multiple servers for redundancy and there is a change that one of those servers is down when you are calling sqs:DeleteMessage, therefore the message will not be deleted from all of the servers and once the failed server comes back online, it doesn't know that the you have deleted the message and it will be processed again.
Easiest way to solve the issue with duplicate messages is to switch to using FIFO queue which provides you with exactly once processing. You can choose to use deduplication based on either content or unique ID generated by sender. If you choose to use content deduplication, when queue receives two messages with the same content in 5 min. deduplication interval, the message will be discarded.
If two messages can have the same content yet you need to treat them as different messages, you can use deduplication based on ID that you can pass to sqs:SendMessage or sqs:SendMessageBatch calls via MessageDeduplicationId argument.
I would definitely check FIFO queues before thinking about using DynamoDB to store the state of message processing. It will be cheaper and this deduplication functionality is provided for you by default without you having to implement any complex logic.

Related

When using Spring Boot and STOMP, is there a way to make queues not delete messages?

I have an application that uses Java on the backend end, Angular on the frontend, and I'm trying to use STOMP messaging between the two to exchange state data.
What I would like to do is have my services, on startup, publish their states and have that data stay in the queue for any client that later connects to the server.
(edit)
For clarification, I don't mean I want to messages to survive a server reboot. What I want is for certain message queues to retain all messages until the server reboots.
How do I tell Spring Boot's STOMP implementation to not delete the contents of a /queue?

You can configure ActiveMQ Artemis as an "external broker" and use a "non-destructive" queue. When a STOMP client receives and acknowledges a message from a non-destructive queue the broker will not remove it. You can define a special "initialization" queue which all clients connect to initially to receive the state data which you care about and then they can connect to whatever other queues they need to complete their normal work.
In this kind of use-case the queue is typically configured as non-destructive and as a "last value" queue. This way each client can use its own "last value" and can keep their state data up-to-date without the complication of stale state data on the queue.
I realize your question was asking about how to do this with Spring's built-in broker, but all my research indicates that Spring's simple in-memory broker neither supports last-value queue semantics nor non-destructive queue semantics nor even persistent messages. From what I understand Spring's broker is only meant for the most basic use-cases which is why then enable integration with 3rd party brokers which can support more advanced use-cases (e.g. like yours).

Bidirectional messaging system using kafka

Is there any possible to develop a bi-directional messaging system using apache kafka ?
I need to subscribe for a topic from my consumer as well as I need to send message from my consumer.

You could do it one of two ways. Either set up a prefix system for the message keys or put content inside of the message that allows the consumer to avoid messages it has produced.
Now as to whether you should design it like this, that depends on your message traffic. If you're not slamming it with events, it might be better to consider something like Thrift as a way to have your message components do bidirectional communication. Where Kafka really excels relative to its complexity is when you need to produce and consume massive volumes of data. That might not be the case for you.
For example, one common use case with Kafka is to hook it up to a service like Storm, Apex or Samza for doing distributed processing of hundreds of GB or even TB of data. If your system has a high throughput requirement, that architecture would be a good one to consider as a starting point with Kafka for handling messages. With Storm, if you need to send messages back for reprocessing, you can always use the Kafka bolt to republish a message into Kafka to ensure it gets completely reprocessed.

Rabbit MQ What data to send as a Message

i am going to integrate some applications using RabbitMQ. Now i am facing the design issue. Right now i am having one application producing message and one application consuming it (in future more are possible). Both applications have access to some database. Application A is some kind of registration application when it receives registration request it sends message to on rabitmq. Now application b receives this message and its task is to load the registration data to elasticsearch server. Now i have some options
consumer will read the message and id from q and load the data and send it to the elastic search server
fastest throughput. Because things will move in asynchronous way. other process which may be running on separate
server will loading the data and sending to elastic server
consumer will read the message and id from the q and then call the rest service to load the company data.
will take more time for processing each request as it will be having network overhead.although it will save time to data load
but will add network delay. And it will by pass the ESB(Message Broker) also. (i personally think if i am using esb in my application
it is not necessary that i use it for every single method call)
send all the registration data in the message. consumer will receive it and just upload it to elasticsearch server.
which approach i should follow?

Apparently there are many components to your application set up that is hard to take into account and suggest a straightforward answer. I would suggest that you should look into each design and identify I/O points, calls over the network and data volume exchanged over the network. Then depending on the load you expect and the volume of data you expect to store over time I would suggest you hierarchize these bottlenecks giving a higher score depending on the severity of it. Identify the one solution that has the lowest score and go with that.
I would suggest you should benchmark the difference between sending only the iq or sending the whole object. I would expect that the difference is negligible.
One suggestion. Make your objects immutable. It is not directly relevant with what you are describing but in situations like yours, where components are operating "blindly" you will find that knowing that an object has not changed state is a big assurance.

using JMS for long running processes?

Can someone point me to a tutorial or similar code where JMS is used by a web app to execute a long running background process? (instead of using threads), I'm fairly familiar with the concepts of JMS messaging, but never used any JMS API or brokers (i'm looking at learning Apache ActiveMQ)
I'd like to be able to:
submit a message to the queue to run a process
check the status (progress) on that process at arbitrary times
Thanks!

The real point of using JMS in your context is to start tasks asynchronously. This is called fire and forget in middleware lingo. JMS has guaranteed delivery semantics, meaning that once the message has been put on the queue it is guaranteed to get there ... eventually.
The idea is you do any tasks you need to do and if you have any tasks in the process that can be done at a later time, then you put a message on a queue and later it will execute. This allows you to cut down processing by a significant amount while somebody is waiting for a response.
Another benefit of JMS is that the different parts of the system do not need to be running at the same time. The part that consumes messages can be down for maintenance while your front end still works.

The previous post is accurate in terms of a model to put orders or requests into a queue asynchronously and then have them be picked up later. However, it doesn't really address the question of long running processes.
In terms of queues and topics, the benefit of persistent queues is that if there are no consumers on the queue then messages will be waiting for consumption until there is a subscriber. In a topic, you need to create a durable subscription in order to make sure a consumer that is not connected will receive messages that are sent in its absence once it reconnects.
So, how are you defining a long-running-process? For a multi-step process you would typically use something like a workflow engine. There are options like a BPM tool or something like "OS Workflow". You can also do a home-grown solution that could look like the example below
1) There would need to be some sort of workflow definition that defines the steps in the process. This could be a properties file or an XML file.
2) Web App puts a message on a queue or topic (pub/sub) with an indication of the process to be executed (or you can have specific destinations for different processes)
3) A Dispatcher MDB picks the 'order' up off the queue with a status of 'NEW' and starts processing the first step.
4) Once the step is complete, the MDB puts a new message on the queue indicating the process being executed and either the next step to be executed, or the last step that was executed (depending on how deterministic you want the process to be)
5) The MDB picks up the message and sees that the process is 'IN_PROGRESS'. It either determines the next step to be executed or reads the step to be executed next from the message (either a JMS header value or within the message, perhaps in an XML format)
6) Steps 4 & 5 are repeated until the process instance is complete
In this case you will need an external representation of the order and process instance information. This will allow you to check the status of a request from your WebApp. Your order would need to be read and persisted with an updated status after each step in the process such that the WebApp could access the status information.
The key component of this architecture is the dispatcher MDB that listens for messages and executes the next step of the process. When I worked with OS Workflow that was one key piece that was missing. In this manner, you can control the number of threads that are executing process steps by controlling the number of MDB's in the pool and consumers on the queue. In this architecture I would recommend a queue over a topic for the workflow steps. However after each process step you could publish a message to a topic for subscribers to get updated status information.
With the Java EE6 technologies including JPA you could easily create an XSD, generate domain data model POJO's with JAXB and use JPA for persistence. We did a webcast earlier this year that covered the JEE6 technologies that are currently supported in WebLogic. Here are the replays: http://www.oracle.com/technetwork/middleware/weblogic/learnmore/weblogic-javaee6-webcasts-358613.html.
I'm also still interested to speak with you about your JBoss migration :) jeffrey.west#oracle.com

Whats the best way to process an asynchronous queue continuously in Java?

I'm having a hard time figuring out how to architect the final piece of my system. Currently I'm running a Tomcat server that has a servlet that responds to client requests. Each request in turn adds a processing message to an asynchronous queue (I'll probably be using JMS via Spring or more likely Amazon SQS).
The sequence of events is this:
Sending side:
1. Take a client request
2. Add some data into a DB related to this request with a unique ID
3. Add a message object representing this request to the message queue
Receiving side:
1. Pull a new message object from the queue
2. Unwrap the object and grab some information from a web site based on information contained in the msg object.
3. Send an email alert
4. update my DB row (same unique ID) with the information that operation was completed for this request.
I'm having a hard figuring out how to properly deal with the receiving side. On one hand I can probably create a simple java program that I kick off from the command line that picks each item in the queue and processes it. Is that safe? Does it make more sense to have that program running as another thread inside the Tomcat container? I will not want to do this serially, meaning the receiving end should be able to process several objects at a time -- using multiple threads. I want this to be always running, 24 hours a day.
What are some options for building the receiving side?

"On one hand I can probably create a simple java program that I kick off from the command line that picks each item in the queue and processes it. Is that safe?"
What's unsafe about it? It works great.
"Does it make more sense to have that program running as another thread inside the Tomcat container?"
Only if Tomcat has a lot of free time to handle background processing. Often, this is the case -- you have free time to do this kind of processing.
However, threads aren't optimal. Threads share common I/O resources, and your background thread may slow down the front-end.
Better is to have a JMS queue between the "port 80" front-end, and a separate backend process. The back-end process starts, connects to the queue, fetches and executes the requests. The backend process can (if necessary) be multi-threaded.

If you are using JMS, why are you placing the tasks into a DB?
You can use a durable Queue in JMS. This would keep tasks, even if the JMS broker dies, until they have been acknowledged. You can have redundant brokers so that if one broker dies, the second automatically takes over. This could be more reliable than using a single DB.

If you are already using Spring, check out DefaultMessageListenerContainer. It allows you to create a POJO message driven bean. This can be used from within an existing application container (your WAR file) or as a separate process.

I've done this sort of thing by hosting the receiver in an app server, weblogic in my case, but tomcat works fine, too. Don't poll the queue, use an event-based model. This could be hand-coded or it could be a message-driven web service. If the database update is idempotent, you could update the database and send the email, then issue the commit on the queue. It's not a problem to have several threads that all read from the same queue.
I've use various JMS solutions, including tibco, activemq (before apache subsumed it) and joram. Joram was the more reliable opensource solution, but that may have changed now that it's part of apache.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.