MDB is inserting duplicate records - java

I am using JBOSS 5.1.2 MDB to consume entity messages placed on a queue. The message producer can produce multiple messages for the same entity. These message entities are defined by their entity number. (msg_entity_no) I must insert a record into an existing entity table only if the the entity does not exist else I must update the existing record.
The existing entity table is not unique on this entity number as it has another internal key. ie it contains duplicate msg_entity_no
The problem I am experiencing is that when multiple messages are produced , multiple instances of the MDB query's for existence on the entity table at the same time.
At that time it does not exist for either instance and the process then inserts for both messages. As opposed to one insert for the non-existent entity
and then updating the record for subsequent messages.
I want to get away from using the annotation #ActivationConfigProperty(propertyName = "maxSession", propertyValue = "1") and deploying to the deploy-hasingleton folder which only allows one instance of the MDB as this is not scalable.

The condition you are receiving is due to either DUPLICATE or SAME DATA contained with in messages which are placed in the queue within quick succession. There are a couple of solutions for this.
1) DEQUEUE On one JBOSS instance with only one mdb. This means you will have ONE MDB running on one JBOSS SERVER in the cluster, messages will be essentially processed in sequence.
2) Create a locking mechanism whereby you create a table write the message contents to that table with a PRIMARY KEY and the message contents. You then filter out or create an ordering of data to be processed based on the contents. This will slowdown execution time but you will have better auditing for your data. You will in essence have two ASYNC process jobs. One to populate you data from the QUEUE and another to PROCESS the data. You could do this one minute later.
3) Some QUEUE Implementations such as ORACLE AQ have a dequeue condition which can be set Messages in the queue are evaluated against the condition, and messages that satisfy the given condition are returned. TIBCO have strategies for Locking which protect the thread of execution when multiple threads in an agent.
References
http://tech.chococloud.com/archives/2152
Not understanding the business process, I would suggest you try prevent the "DUPLICATE" messages from the source.

Related

Java Kafka Consumer store state in memory?

I'm having a usecase where I need to "batch process" events data for customers.
Every piece of event data would have a customerId.
In my application layer (java), I will need to batch up all the events per customer id and then apply my business logic. My business logic needs all the events per customer to be available. Basically, I'm grouping by customerId before I can do anything with it.
Approach:
Ingest all the events to a Kafka Topic with partition key as "customerId". Therefore the events belonging to a specific customer always goes to the same consumer. In the consumer, I can gather the events in memory (perhaps using a simple expiry map or so) and do a batch process. In this approach, my entire batch is transient and stored in the application memory.
Caveats:
When Kafka partitions rebalancing happens (for whatever reasons) and when different partitions are re-assigned to different consumers, the data becomes inconsistent. Not sure if there's any way to overcome that.
I'm wondering what is a practical approach for such "batch" use cases? Is Kafka-Streams the right candidate for this? But this is not an infinite stream. The batch data set clearly has a start and end. End event is used as a trigger to perform the business logic.
The events will be ordered per customerId, but without a StickyAssignor in the consumer instances, they will not "go to" be consumed by the same consumer, especially in the event of replaces in a distributed environment
If you have some data in a compact topic that acts as your raw events, and consuming them all into some cache will build up your materialized view, then that's what Kafka Streams does with changelog topics, yes. You can also build this logic on your own with a plain consumer like the Confluent Schema Registry does with its _schemas topic and multiple internal Hashmaps

Producer - consumer using MySQL DB

My requirement is as follows
Maintain a pool of records in a table (MySQL DB).
A job acts as a producer and fills up this pool if the number of entries goes below a certain threshold. The job runs every 15 mins.
There can be multiple consumers with each consumer picking up just one record each. Two consumers coming in at the same time should get two different records.
The producer should not block the consumer. So while the producer job is running consumers should be able to pick up any available rows.
The producer / consumer is a part of the application code which is turn is a JBoss application.
In order to ensure that each consumer picks a distinct record (in case of concurrency) we do the following
We use an integer column as an index.
A consumer will first update the record with the lowest index value with its own name.
It will then select and pick up that record and proceed with that.
This approach ensures that two consumers do not end up with the same record.
One problem we are seeing is that when the producer is filling up the pool, consumers get blocked. Since the producer can take some time to complete, all consumers in that period are blocked as the update by the consumer waits for the insert by the producer to complete.
Is there any way to resolve this scenario? Any other approach to design this is also welcome.
Is it a hard requirement that you use a relational database as a queue? This seems like a bad approach to the problem, especially since the problems been addressed by message queues. You could use MySQL to persist the state of your queue, but it won't make a good queue itself.
Take a look at ActiveMQ or JBoss Messaging (given that you are using JBoss)

JMS Multiple messages locks database table

I am having web service that is receiving multiple XML file at a time which contains student's data
i need to process that file and store values to database.
for that i have used JMS queue. i am creating object message and pushing to queue.
but when queue is processing message another messages are available for process and due to that my database table gets locked.
consider that i am having one list that contains 5000 values and in for loop i am iterating list and processing JMS messages.
this is exactly my scenario is . The problem is while processing one message my table gets locked and rest of file remains as it is in queue.
suggest some solution
Make sure you use the right lock strategy (see Table level locking and Row level locking)
See if you can treat your messages one at a time, (JMS consumer conf.) this way, the first message will release the lock for the second one and so on
EDIT: Typo and links
If I understand you correctly, the database handling is in the listener that's taking messages off the queue.
You have to worry about database isolation and table/row locking, because each listener runs in its own thread.
You'll either have to lock rows or set the ISOLATION level on your database to SERIALIZABLE to guarantee that only one thread at a time will INSERT or UPDATE the table.

Message Driven Bean and message consumption order

I have an MDB that gets subscribed to a topic which sends messages whose content is eventually persisted to a DB.
I know MDBs are pooled, and therefore the container is able to handle more than one incoming message in parallel. In my case, the order in which those messages are consumed (and then persisted) is important. I don't want an MDB instance pool to consume and then persist messages in a different order as they get published in the JMS topic.
Can this be an issue? If so, is there a way of telling the container to follow strict incoming order when consuming messages?
Copied from there:
To ensure that receipt order matches the order in which the client sent the message, you must do the following:
Set max-beans-in-free-pool to 1 for the MDB. This ensures that the MDB is the sole consumer of the message.
If your MDBs are deployed on a cluster, deploy them to a single node in the cluster, [...].
To ensure message ordering in the event of transaction rollback and recovery, configure a custom connection factory with MessagesMaximum set to 1, and ensure that no redelivery delay is configured. For more information see [...].
You should be able to limit the size of the MDB pool to 1, thus ensuring that the messages are processed in the correct order.
Of course, if you still want some parallelism in there, then you have a couple of options, but that really depends on the data.
If certain messages have something in common and the order of processing only matters within that group of messages that share a common value, then you may need to have multiple topics, or use Queues and a threadpool.
If on the other hand certain parts of the logic associated with the arrival of a message can take place in parallel, while other bits cannot, then you will need to split the logic up into the parallel-ok and parallel-not-ok parts, and process those bits accordingly.

Oracle AQ dequeue order

A trigger in an Oracle 10g generates upsert and delete messages for a subset of rows in a regular table. These messages consist out of two fields:
A unique row id.
A non-unique id.
When consuming these message I want to impose an order on the deque process that respects the following constraints:
Messages must be dequeued in insertion order.
Messages belonging to the same id must be dequeued in such a fashion that no other dequeuing process should be able to dequeue a potential successor message (or messages) with this id. Since the messages are generated using a trigger I cannot use groups for this purpose.
I am using the Oracle Java interface for AQ. Any pointers on how that could be achieved?
The default dequeue order I believe is first in first out, therefore they will be dequeued in the same order they were enqueued.
For your second point, are you saying that you want to serialize dequeue on the non-unique-id? Ie, you basically have many queues within your queue, and you only want one job to consume messages form each queue at any one time?
Ie, you have messages:
1 | a
2 | a
3 | b
4 | a
Here you have two types of record (a and b) and you want 1 job to consume all the a's and another to consume all the b's. If that is the case consider creating multiple queues perhaps?
Failing multiple queues, have a look at the dequeue_options_t type that you pass to the dequeue procedure - most notably dequeue_condition - this allows you to select only specific messages, so you could start a job for all the a's and another for all the b's etc.

Categories

Resources