I have a JPA transaction like the following (Using controller advice to catch exceptions)
#Transactional
public void save(MyObj myObj) {
// Attempt to save the object
this.myRepo.save(myObj)
// After it saves, call my audit log service to record the change
this.myAuditLogService.logChange(myObj)
}
Works fine, but the problem is if the save fails and throws an exception, it still calls the audit log service, and then throws an exception afterwards. Causing erroneous audit entries to be created.
Expected Flow
Call save function
Save fails
Transaction stops and rolls back
Controller advice catches the exception
Actual Flow
Call save function
Save fails
Audit log service is called
Transaction rolls back
Controller advice catches the exception
This is a common problem in Computer Science in Distributed Systems.
Basically what you want to achieve is to have atomic operation across multiple systems.
Your transaction spans only your local (or first) database and that's all.
When the REST call to the second system is initiated and successful but the first save results in crash you want to have rollback on the first system (first save) and rollback on the second system as well. There are multiple problems with that and it's really hard to have atomic-like consistency across multiple systems.
You could use Database supported technologies for such cases:
What you probably need is a 2PC / 3PC or change the processing of your request somehow.
The trade-off of course will be that you'll have to sacrifice immediate results to have eventual consistency.
You could use eventual-consistency
For example send message to some storage for later processing -> make both systems read the message:
System1 reads from storage this message and will save myObj
System2 reads from storage this message and will log change
This will of course happen "eventually" - there will never be a guarantee that either system is up at the time of the processing or even later on (e.g. somebody killed the server or deployed code with bug and the server restarts indefinitely).
Moreover you'll sacrifice read-after-write consistency.
You could use in case of a failure a Compensating transaction.
I recommend reading more on the topic of Distributed Systems in:
(Fallacies of distributed computing)[https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing]
(Designing Data Intensive Applications)[https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321]
(CAP theorem)[https://en.wikipedia.org/wiki/CAP_theorem]
(Consistency models)[https://en.wikipedia.org/wiki/Consistency_model]
Related
I'm developing a system that contains multiple microservices operating on the same database, which isn't ideal but is done for the sake of legacy support. As we refactored the old system we split old, transactional functionality into separate microservices, which led us to have distributed transactions now that must be handled in some way. We're using Spring boot.
As it usually goes, we have a microservice A calling microservices B and then C, we need a way to rollback transaction B if transaction C throws an error.
I've read on 2PC and Saga approaches, I was wondering if there is a way to implement a somewhat simpler variation of 2PC approach. It would need to support the following functionality.
1)Microservice endpoint is called, it calls a #Transactional service which creates a transaction. A special parameter is passed that tells the TransactionManager to keep the transaction "hanging" for some time (say, 5 seconds)
2)In those 5 seconds, another call can be made that either confirms or rolls back the transaction
3)If the time elapses (times out), default handling for the transaction is applied (either rollback or commit)
4)If the special parameter isn't passed in, transactions behave as if they are not distributed
5)API calls are still asynchronous
Simple example:
1)Service A endpoint "createResource" is called with parameter "hangTransaction=true"
2)Service A returns status 200
3)Service B endpoint "registerResource" is called
4)Service B returns status 500
5)Service A endpoint "createResource" is called with parameter "transactionValid=false"
6)Service A rolls the transaction back and returns status 200
7)There are no changes to DB after this
There would of course be additional parameters sent (transaction ID for example), the example is simplified.
Is there some way to control the TransactionManager in a way that allows the transaction to persist between API calls?
I have set the maximum number of retry as 3. I have added only RemoteAccessException as retry-able exception. What I want to do, is to change the state of some of the entities to error and persist them to database after all retries are exhausted. All of this I am doing in writer step. I have implemented ItemWriteListener and when RemoteAccessException occurs, it does go to onWriteError method, where I have written this state changing logic.But when I check the database after all execution is done, I see that the state is not changed at all.
My question is, exactly what is happening in this case? After 3 retries, does the entire step rollbacks, as the exception is still there and so nothing is changed in database? And also, I do need to change the states to error. Is there some way to achieve that?
I have found the answer of this. In my case, what was happening - RetryExhaustedException was being thrown after 3 retries. As it is stated in the Spring Retry docs, any enclosing transaction will be rolled back in this case.
From spring docs(https://docs.spring.io/spring-batch/trunk/reference/html/retry.html)-
After a callback fails the RetryTemplate has to make a call to the RetryPolicy to ask it to update its state (which will be stored in the RetryContext), and then it asks the policy if another attempt can be made. If another attempt cannot be made (e.g. a limit is reached or a timeout is detected) then the policy is also responsible for handling the exhausted state. Simple implementations will just throw RetryExhaustedException which will cause any enclosing transaction to be rolled back. More sophisticated implementations might attempt to take some recovery action, in which case the transaction can remain intact.
For the case where I need to change the states to error, I have asked a similar question and found the answer -
Is there any way to persist some data in database after an exception occurs in ItemWriter in spring batch?
Consider user cart and checkout: a customer can perform addItemToCart action which will be handled by main DB instance. However, getUserCartItems action might be performed on Read Replica and it might not contain result of the first action yet due to Replica Lag. Even if we try to minimize this lag, still it's possible to hit this case, so I'm wondering what solutions have you tried in production?
According to #Henrik answer, we have 3 options:
1. Wait at user till consistent.
This means we need to perform polling (regular or long polling) on the client and wait until Replica will receive update. However, I assume Replica Lag shouldn't be longer than 1-5 secs. Also, the less Replica Lag, the more performance down we will have.
2. Ensure consistency through 2PC.
If I understood correctly, we need to combine both addItemToCart insert and getUserCartItems select into one aggregate operation on backend and return getUserCartItems as addItemToCart response. However, the next request might still not get updated info due to lag… Yes it returns immediate confirmation about successful operation and the application can continue, however proceeding to checkout requires user cart items in order to show price correctly, so we are not fixing the problem anyway.
3. Fool the client.
Application stores/caches all successfully send data and uses it for showing. Yes, this is a solution, but it definitely requires additional business logic to be implemented:
Perform getUserCartItems request;
if (getUserCartItems returned success)
Store addItemToCart in local storage;
else
Show error and retry;
Perform getUserCartItems request;
if (getUserCartItems contains addItemToCart ID)
Update local storage / cache and proceed with it.
else
Use existing data from local storage;
How do you deal with eventual inconsistency?
The correct answer is to NOT send SELECT queries to a read slave if the data needs to be immediately available.
You should structure your application such that all real-time requests hit your master, and all other requests hit one of your read slaves.
For things where you don't need real-time results, you can fool the user quite well using something like AJAX requests or websockets (websockets is going to make your application a lot more resource friendly as you won't be hammering your backend servers with multiple AJAX requests).
This kind of thing has been done a million times I'm sure, but my search foo appears weak today, and I'd like to get opinions on what is generally considered the best way to accomplish this goal.
My application keeps track of sessions for online users in a system. Each session corresponds to a single record in a database. A session can be ended in one of two ways. Either a "stop" message is received, or the session can timeout. The former case is easy, it is handled in the message processing thread and everything is fine. The latter case is where the concern comes from.
In order to process timeouts, each record has an ending time column that is updated each time a message is received for that session. To make timeouts work, I have a thread that returns all records from the database whose endtime < NOW() (has an end time in the past), and goes through the processing to close those sessions. The problem here is that it's possible that I might receive a message for a session while the timeout thread is going through processing for the same session. I end up with a race between the timeout thread and message processing thread.
I could use a semaphore or the like and just prevent the message thread from processing while timeout is taking place as it only needs to run every 30 seconds or a minute. However, as the user table gets large this is going to run into some serious performance issues. What I think I would like is a way to know in the message thread that this record is currently being processed by the timeout thread. If I could achieve that I could either discard the message or wait for timeout thread to end but only in the case of conflicts now instead of always.
Currently my application uses JDBC directly. Would there be an easier/standard method for solving this issue if I used a framework such as Hibernate?
This is a great opportunity for all kinds of crazy bugs to occur, and some of the cures can cause performance issues.
The classic solution would be to use transactions (http://dev.mysql.com/doc/refman/5.0/en/commit.html). This allows you to guarantee the consistency of your data - but a long-running transaction on the database turns it into a huge bottleneck; if your "find timed-out sessions" code runs for a minute, the transaction may run for that entire period, effectively locking write access to the affected table(s). Most systems would not deal well with this.
My favoured solution for this kind of situation is to have a "state machine" for status; I like to implement this as a history table, but that does tend to lead to a rapidly growing database.
You define the states of a session as "initiated", "running", "timed-out - closing", "timed-out - closed", and "stopped by user" (for example).
You implement code which honours the state transition logic in whatever data access logic you've got. The pseudo code for your "clean-up" script might then be:
update all records whose endtime < now() and whose status is "running, set status = "timed-out - closing"
for each record whose status is "timed-out - closing"
do whatever other stuff you need to do
update that record to set status "timed-out - closed" where status = "timed-out - closing"
next record
All other attempts to modify the current state of the session record must check that the current status is valid for the attempted change.
For instance, the "manual" stop code should be something like this:
update sessions
set status = "stopped by user"
where session_id = xxxxx
and status = 'running'
If the auto-close routine has kicked off in the time between showing the user interface and the database code, the where clause won't match any records, so the rest of the code simply doesn't run.
For this to work, all code that modifies the session status must check its pre-conditions; the most maintainable way is to encode status and allowed transitions into a separate database table.
You could also write triggers to enforce this logic, though I'm normally not a fan of triggers - only do this if you have to.
I don't think this adds significant performance worries - but test and optimize. The majority of the extra work on the database is by adding extra "where" clauses to your update statements; assuming you have an index on status, it's unlikely to have a measurable impact.
I have EJB RESTEasy controller with CMT.
One critical method which creates some entities in DB works fine and quickly on single invocation.
But when i try to invoke it simultaneously by 10 users it works very slowly.
I've tracked time in logs and the most expanded place vs. single invocation is
lag between exit from RESTeasy controller and enter into MainFilter.
So this lag grows from 0-1 ms for single invocation to 8 sec. for 10 simultaneously invocations!
I need ideas what could be a reason and how can I speed up it.
My immediate reaction is that it's a database locking problem. Can you tell if the lag occurs as the flow of control passes across the transaction boundary? Try the old technique of littering your code with print statements to see when things stop.
Lock contention over resteasy threads? database? It is very difficult to predict where the bottleneck is.
As per some of the comments above, it does sound like it could be a database locking problem. From what you said, the "lag" occours between the Controller and the Filter invoking the controller. Presumably that is where the transaction commit is occuring.
You say however that the code creates some entities in the database, but you don't say if the code does any updates or selects. Just doing inserts wouldn't normally create a locking problem with most databases, unless there are associated updates or selects (i.e. select for update in Oracle).
Check and see if there are any resources like a table of keys or an parent record that are being updated that might be causing the problem.
Also check your JDBC documentation. Most JDBC drivers have logging levels that should allow you to see the operations being performed on the database. While this may generate a sizeable log, if you include a thread identifier in the log, you should be able to see where problems are occuring.