How to ensure no duplicates created in DB on connection loss

How to ensure no duplicates created in DB on connection loss - java

Евгений Кравцов:
I develop some tiny service with http backend and android app. And recently i felt the lack of knowledge in such a systems.
Case:
- Client makes order in app and send request to server
- Server successfuly recieved data and make a database row for order
- After database work completes, backend tries to respond to App with 200 success code.
- App user faced some internet connection problems and can not receive server response. App gets timeout exception and notify user, that order was not successful
- After some time, internet connection on user device restored and he send another request with same oreder.
- Backend recieves this again and create duplicate for previous order
So i got 2 orders in database, but user wanted to create only one.
Question: How can i prevent such a behavior?

You should add a key to your table in the db, for example defining the key as compound key of user_id, type - so specific user cannot make 2 orders with the same order type.
This will cause the second db insert to fail.

Related

Microservices that share database

I have an application that can be reduced/simplify to this flow:
user sends request to app A
app A inserts info about the request and user into DB ( marked as "B in progress" and "C in progress")
app A pushes the data into queue and returns to user
app B retrieves data from queue and process it
app B finishes processing the data and marks record in DB as "B done"
app C retrieves data from queue and process it
app C finishes processing the data and marks record in DB as "C done"
In other words, user sends request to app, app saves the record to the database and send it to queue, app B and C takes request from queue and process it ( each app does different thing but requires data from request ) and when they are done i want to mark the request in db as done for both APP.
This can be achieved, if all apps share DB. However sharing the DB like this between microservices is considered anti-pattern.
What are some design patterns to solve this? Am i really left with only option - make app A expose rest API and call the endpoint from app B and C to update the row in DB?
Thanks for help!

This more sounds like choreography , event driven process. Instead of DB, did u consider using Kafka where status gets enriched at each publish.

How to save message into database and send response into topic eventually consistent?

I have the following rabbitMq consumer:
Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope, MQP.BasicProperties properties, byte[] body) throws IOException {
String message = new String(body, "UTF-8");
sendNotificationIntoTopic(message);
saveIntoDatabase(message);
}
};
Following situation can occur:
Message was send into topic successfully
Connection to database was lost so database insert was failed.
As a result we have data inconsistency.
Expected result either both action were successfully executed or both were not executed at all.
Any solutions how can I achieve it?
P.S.
Currently I have following idea(please comment upon)
We can suppose that broker doesn't lose any messages.
We have to be subscribed on topic we want to send.
Save entry into database and set field status with value 'pending'
Attempt to send data to topic. If send was successfull - update field status with value 'success'
We have to have a sheduled job which have to check rows with pending status. At the moment 2 cases are possible:
3.1 Notification wasn't send at all
3.2 Notification was send but save into database was failed(probability is very low but it is possible)
So we have to distinquish that 2 cases somehow: we may store messages from topic in the collection and job can check if the message was accepted or not. So if job found a message which corresponds the database row we have to update status to "success". Otherwise we have to remove entry from database.
I think my idea has some weaknesses(for example if we have multinode application we have to store messages in hazelcast(or analogs) but it is additional point of hypothetical failure)

Here is an example of Try Cancel Confirm pattern https://servicecomb.apache.org/docs/distributed_saga_3/ that should be capable of dealing with your problem. You should tolerate some chance of double submission of the data via the queue. Here is an example:
Define abstraction Operation and Assign ID to the operation plus a timestamp.
Write status Pending to the database (you can do this in the same step as 1)
Write a listener that polls the database for all operations with status pending and older than "timeout"
For each pending operation send the data via the queue with the assigned ID.
The recipient side should be aware of the ID and if the ID has been processed nothing should happen.
6A. If you need to be 100% that the operation has completed you need a second queue where the recipient side will post a message ID - DONE. If such consistency is not necessary skip this step. Alternatively it can post ID -Failed reason for failure.
6B. The submitting side either waits for a message from 6A of completes the operation by writing status DONE to the database.
Once a sertine timeout has passed or certain retry limit has passed. You write status to operation FAIL.
You can potentialy send a message to the recipient side opertaion with ID rollback.
Notice that all this steps do not involve a technical transactions. You can do this with a non transactional database.
What I have written is a variation of the Try Cancel Confirm Pattern where each recipient of message should be aware of how to manage its own data.

In the listener save database row with field staus='pending'
Another job(separated thread) will obtain all pending rows from DB and following for each row:
2.1 send data to topic
2.2 save into database
If we failured on the step 1 - everything is ok - data in consistent state because job won't know anything about that data
if we failured on the step 2.1 - no problem, next job invocation will attempt to handle it
if we failured on the step 2.2 - If we failured here - it means that next job invocation will handle the same data again. From the first glance you can think that it is a problem. But your consumer has to be idempotent - it means that it has to understand that message was already processed and skip the processing. This requirement is a consequence that all message brokers have guarantees that message will be delivered AT LEAST ONCE. So our consumers have to be ready for duplicated messages anyway. No problem again.

Here's the pseudocode for how i'd do it: (Assuming the dao layer has transactional capability and your messaging layer doesnt)
//Start a transaction
try {
String message = new String(body, "UTF-8");
// Ordering is important here as I'm assuming the database has commit and rollback capabilities, but the messaging system doesnt.
saveIntoDatabase(message);
sendNotificationIntoTopic(message);
} catch (MessageDeliveryException) {
// rollback the transaction
// Throw a domain specific exception
}
//commit the transaction
Scenarios:
1. If the database fails, the message wont be sent as the exception will break the code flow .
2. If the database call succeeds and the messaging system fails to deliver, catch the exception and rollback the database changes
All the actions necessary for logging and replaying the failures can be outside this method

If there is enough time to modify the design, it is recommended to use JTA like APIs to manage 2phase commit. Even weblogic and WebSphere support XA resource for 2 phase commit.
If timeline is less, it is suggested perform as below to reduce the failure gap.
Send data topic (no commit) (incase topic is down, retry to be performed with an interval)
Write data into DB
Commit DB
Commit Topic
Here failure will happen only when step 4 fails. It will result in same message send again. So receiving system will receive duplicate message. Each message has unique messageID and CorrelationID in JMS2.0 structure. So finding duplicate is bit straight forward (but this is to be handled at receiving system)
Both case will work for clustered environment as well.
Strict to your case, thought below steps might help to overcome your issue
Subscribe a listener listener-1 to your topic.
Process-1
Add DB entry with status 'to be sent' for message msg-1
Send message msg-1 to topic. Retry sending incase of any topic failure
If step 2 failed after certain retry, process-1 has to resend the msg-1 before sending any new messages OR step-1 to be rolled back
Listener-1
Using subscribed listener, read reference(meesageID/correlationID) from Topic, and update DB status to SENT, and read/remove message from topic. Incase reference-read success and DB update failed, topic still have message. So next read will update DB. Incase DB update success and message removal failed. Listener will read again and tries to update message which is already done. So can be ignored after validation.
Incase listener itself down, topic will have messages until listener reading the messages. Until then SENT messages will be in status 'to be sent'.

Entity & Entity Properties. Database design for effective searching

Last two days i've been searching suitable solution for the problem described below.
In my standalone notification-service module I have an abstract Message entity. Message has 'to', 'from', 'sentAt', 'receivedAt' and other attributes. The responsibility of the notification-service is to:
send new messages using different registered message providers (SMS, EMAIL, Skype , etc).
receive new messages from registered message providers
update status for already sent messages.
Notification-service module is developed as standalone module that is available by SOAP protocol. A lot of clients can use this module to send or searching through already received messages.
Clients want to attach some properties (~ smth like tags) while sending messages for further searching messages by these properties. These properties make a sense only in client's environment.
For example, Client A might want to send message and save following custom properties :
1. Internal system id of user whom system sends message
2. Distinguish flag (whether id related to users / admins or clients)
3. Notification flag (notification/alert/ ...)
Client B might want to send message and save another set of custom properties :
1. Internal system operator id (who sends sms)
2. Template id that was used to send message
Custom properties can be used by the clients to search already sent messages.
For example:
Client A could find SMS messages sent to administrator users in period between [Date 1; Date 2] that have 'alert' status.
Client B could find all notification sent by specified template.
Of course, data should be fetched page by page.
At first I created the following database model:
Database scheme
To find all messages with specified properties I tried to use query:
SELECT * FROM (SELECT message_id FROM custom_message_properties
WHERE CONCAT(CONCAT(key, ':'), value) IN ('property1:value1', 'property2:value2')
GROUP BY message_id having(count(*)) = 2)
as cmp JOIN message m ON cmp.message_id = m.id ORDER BY ID LIMIT 100 OFFSET 0
Query worked fine (although it seems me not very good) in database with small data. I decided to check results for ~ real awaited data .
So i generated 10 000 000 messages that have 40 000 000 custom properties and checked result. Execution time was ~ 2 minutes. The most time consumed operation was following sub-select:
SELECT message_id FROM custom_message_properties
WHERE CONCAT(CONCAT(key, ':'), value) IN ('property1:value1', 'property2:value2')
I understand that string comparison is very slow cause database index feature is not used. I decided to change database structure to merge 'key' and 'value' columns into single one. So i updated by database scheme :
Updated database scheme
I checked result again. Now execution time was ~20 seconds. It's much better but still is not suitable for production use.
So now I have no idea how to improve performance without significant changes in application architecture design.
The only one thought i have is to create separate table for each client with required client properties.
client(i)_custom_properties {
mid bigint, // foreign key references message (id)
p1 type1,
p2 type2,
......
pn type(n)
}
I have spent a lot of time while trying to find any useful information. I have also analyzed 'stackoverflow' database cause it seemed me that it should be quite the same. But in 'stackoverflow' there are ~ 50 000 different tags. Not so much that my database could have.
Any help is appreciated. Thanks, in advance!
Project environment that i use :
Postgres database (9.6)
Java 1.8
Spring modules (spring-boot, spring-data-jpa + hibernate, spring-ws, etc).

I have not found any suitable solution except creating additional table with client's properties for each client.
I know, that solution is not so flexible,
but now search query time is less than 1 second.
In future, I will try to solve the same problem using noSQL data storage.

mongo uri for springboot get request

I have configure mongo uri in property file as below,
spring.data.mongodb.uri=mongodb://db1.dev.com,db2.dev.com,db3.dev.com
spring.data.mongodb.database=mydb
I use mongoowl as a monitoring tool.
When i do a get request, it shows hits in every mongodb which ideally should be show only in one db right?

No, You are actually opening a cluster replica set connection, in this connection type spring actually connects to all 3 databases to maintain fail over conditions or to full fill "read from secondary" option(hence you see hits on all 3 databases), but however the read and write operations happen only on primary unless you have specified it to read from a secondary.

Sending mail asynchornously

I have four db servers with the same db structure but different data.
Currently when new data are inserted to database my application get this data, create template and send email.
I would like to separate sending email from my applications.
For example some thread which will start once per 10 minutes. It selects data from my four db servers, connect to mail server and send email to users.
It's possible with using JMS or something similar ?
Thanks for replys !

I did the same by creating a mail table (likely one per DB) and save the Template and data (or subject/body) in it. A separate process could be Quartz or your own pooling thread read that table and connect to mail server and send email and update email status.
In this way you can check status of any email at any given time and even you can resend any email. The table needs to purge/archive after some time may be after 1 day or 1 week depends on table size.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.