My service is DService and I am fourth link in the chain of services i.e. the call flow is
Online User -> AService -> BService -> CService -> DService -> EService.
When I invoke EService from DService, it can throw retriable exception like HttpTimeoutException. I typically retry 2-3 three times and throw back an exception if it fails even after 2-3 retries.
My question is, the exception which I am throwing to CService, should that be retriable or non-retriable? Please find below my evaluation of Pros & Cons of both options
Cons of Throwing Retriable exception from DService
- If DService throws a retriable exception, following the same convention CService also might retry DService 2-3 times and in each call of C-D, D will again try 2-3 times onto E service call. Likewise the calls ultimately to EService will increase exponentially as we go up the call chain. So If EService network was indeed down for long time, we are talking about long number of un-necessary calls. This can be mitigated by having timeouts for each call in the chain, still not sure if that's a enough mitigation against un-necessary number of calls.
Pros of Throwing Retriable exception from DService
- CService will retry after sometime as in the subsequent retries we might get correct value (within the time limits)
- Especially if the clients are some backend jobs, then they can exponentially retry for long time before giving up. Throwing Un-Retriable exception would weed out this option
Please provide your views and suggestions on this
Thanks,
Harish
Without knowing what the services do, whether or not DService should retry or CService should, I cannot say for sure. However my philosophy is that the service being called should not be the one to retry, ever. In this case, EService would throw an exception stupidly and without any handling whatsoever. The reason behind this is because the end of the chain should be stateless and should not make decisions on behalf of the caller.
The caller can dictate to a certain extent within the confines of what is acceptable and what isn't on whether the error should be reattempted or not. In other words, if EService attempts to perform a connection to a database and DService is performing a lookup service, then it may be in the scope of DService to say, if a certain information isn't found in a certain table, to check in another table instead. However, failure to connect to the database by EService flies over the head of DService, whose scope is simply to return information requested by CService.
CService, having made the call to retrieve certain information, depending on what it does, may then receive the database connection and attempt to retry a number of times after a delay because it is performing a batch work on that data and will continue to retry until the database is back online. Or, if it is retrieving information to show to the user on a webpage, must fail fast and deal with the database connection error by presenting a nice error message to the user instead.
It entirely depends on what your services do and where their responsibilities lie. Likewise, whether an exception is retriable or not should again depend on the caller's necessity, not the service itself. It is perfectly viable to present a retriable exception to the caller that is only attempted once.
Hope that helps!
I think throwing retriable exceptions is a viable approach if you define exponentially increasing retry-periods up on the chain.
I'd say you shouldn't retry in DService in the first place, because, as you say, if each service did that you could be facing trouble. Hence, let the exception bubble up the call stack and let it be handled at the outer most service possible; could even be the user.
Rationale: Why would it be on DService to decide if CService, BService or AService would want to retry or not?
However, I think it also depends on the frequency of the exception and the success rate of retries. If the exception occurs frequently but the call usually succeeds upon the first or second retry it's another thing than an exception which happens once a day and/or retrying is futile most of the time.
What you throw at your invokers, and whether what you throw at them will also carry a suggestion "but you could retry this", should be determined by the intended semantics of your service exclusively.
(Besides, I have never heard of java Exception objects formally carrying any such property, but that might be because I'm lagging behind a bit.)
EDIT.
Whether you "retry" an operation that failed, is for you (and you alone) to decide. However, if you do decide to retry, it is also your responsibility to decide after how many failures you are going to stop retrying and call it a day, and at that point it is most certainly unwise to throw an exception to your caller that suggests he can "retry" as well.
Related
Developing in Java an asynchronous method with a CompletableFuture return type we expect the resulting CF to complete normally or exceptionally depending on whether that method succeeds or fails.
Yet, consider for instance that my method writes to an AsynchronousChannel and got an exception opening that channel. It has not even started writing. So, in this case I am tenting to just let the exception flow to the caller. Is that correct?
Although the caller will have to deal with 2 failure scenarios: 1) exception, or 2) rejected promise.
Or alternatively, should my method catch that exception and return a rejected promise instead?
IMO, option 1) makes the API harder to use because there will be two different paths for communicating errors:
"Synchronous" exceptions, where the method ends the an exception being thrown.
"Asynchronous" exceptions, where the method returns a CF, which completes with an exception. Note that it is impossible to avoid this case, because there will always be situations where the errors are only found after the asynchronous path has started (e.g. timeouts).
The programmer now has to ensure both these two paths are correctly handled, instead of just one.
It is also interesting to observe that the behaviour for both C# and Javascript is to always report exceptions thrown inside the body of an async function via the returned Task/Promise, even for exceptions thrown before the first await, and never by ending the async function call with an exception.
The same is also true for Kotlin's coroutines, even when using the Unconfined dispatcher
class SynchronousExceptionExamples {
#Test
fun example() {
log.info("before launch")
val job = GlobalScope.launch(Dispatchers.Unconfined) {
log.info("before throw")
throw Exception("an-error")
}
log.info("after launch")
Thread.sleep(1000)
assertTrue(job.isCancelled)
}
}
will produce
6 [main] INFO SynchronousExceptionExamples - before launch
73 [main #coroutine#1] INFO SynchronousExceptionExamples - before throw
(...)
90 [main] INFO SynchronousExceptionExamples - after launch
Note as the exception occurs in the main thread, however launch ends with a proper Job.
I think both are valid designs. Datastax actually started their design with first approach, where borrowing a connection was blocking, and switched to fully async model (https://docs.datastax.com/en/developer/java-driver/3.5/upgrade_guide/#3-0-4)
As a user of datastax java driver I was very happy with the fix, as it changed the api to be truly non-blocking (even opening a channel, in your example, has a cost).
But I don't think there are right and wrong here...
It doesn't make a big difference from the callers point of view. In either case there will be visibility of the cause of the exception whether it it thrown from the method or from calling get() on the completable future.
I would perhaps argue that an exception thrown by the completable future should be an exception from the async computation and not failing to start that computation.
I have set the maximum number of retry as 3. I have added only RemoteAccessException as retry-able exception. What I want to do, is to change the state of some of the entities to error and persist them to database after all retries are exhausted. All of this I am doing in writer step. I have implemented ItemWriteListener and when RemoteAccessException occurs, it does go to onWriteError method, where I have written this state changing logic.But when I check the database after all execution is done, I see that the state is not changed at all.
My question is, exactly what is happening in this case? After 3 retries, does the entire step rollbacks, as the exception is still there and so nothing is changed in database? And also, I do need to change the states to error. Is there some way to achieve that?
I have found the answer of this. In my case, what was happening - RetryExhaustedException was being thrown after 3 retries. As it is stated in the Spring Retry docs, any enclosing transaction will be rolled back in this case.
From spring docs(https://docs.spring.io/spring-batch/trunk/reference/html/retry.html)-
After a callback fails the RetryTemplate has to make a call to the RetryPolicy to ask it to update its state (which will be stored in the RetryContext), and then it asks the policy if another attempt can be made. If another attempt cannot be made (e.g. a limit is reached or a timeout is detected) then the policy is also responsible for handling the exhausted state. Simple implementations will just throw RetryExhaustedException which will cause any enclosing transaction to be rolled back. More sophisticated implementations might attempt to take some recovery action, in which case the transaction can remain intact.
For the case where I need to change the states to error, I have asked a similar question and found the answer -
Is there any way to persist some data in database after an exception occurs in ItemWriter in spring batch?
I just read the Hystrix guide and am trying to wrap my head around how the default circuit breaker and recovery period operate, and then how to customize their behavior.
Obviously, if the circuit is tripped, Hystrix will automatically call the command's getFallBack() method; this much I understand. But what criteria go into making the circuit tripped in the first place? Ideally, I'd like to try hitting a backing service several times (say, a max of 3 attempts) before we consider the service to be offline/unhealthy and trip the circuit breaker. How could I implement this, and where?
But I imagine that if I override the default circuit breaker, I must also override whatever mechanism handles the default recovery period. If a backing service goes down, it could be for any one of several reasons:
There is a network outage between the client and server
The service was deployed with a bug that makes it incapable of returning valid responses to the client
The client was deployed with a bug that makes it incapable of sending valid requests to the server
Some weird, momentary service hiccup (perhaps the service is doing a major garbage collection, etc.)
etc.
In most of these cases, it is not sufficient to have a recovery period that merely waits N seconds and then tries again. If the service has a bug in it, or if someone pulled some network cables in the data center, we will always get failures from this service. Only in a small number of cases will the client-service automagically heal itself without any human interaction.
So I guess my next question is partially "How do I customize the default recovery period strategy?", but I guess it is mainly: "How do I use Hystrix to notify devops when a service is down and requires manual intervention?"
there are basically four reasons for Hystrix to call the fallback method: an exception, a timeout, too many parallel requests, or too many exceptions in the previous calls.
You might want to do a retry in your run() method if the return code or the exception you receive from your service indicate that a retry makes sense.
In your fallback method of the command you might retry when there was a timeout - when there where too many parallel requests or too many exceptions it usually makes no sense to call the same service again.
As also asked how to notify devops: You should connect a monitoring system to Hystrix that polls the status of the circuit breaker and the ratio of successful and unsuccessful calls. You can use the metrics publishers provided, JMX, or write your own adapter using Hystrix' API. I've written two adapters for Riemann and Zabbix in a tutorial I prepared; you'll new very few lines of code for that.
The tutorial also has a example application and a load driver to try out some scenarios.
Br,
Alexander.
I'm working on a Java project and I've come upon an interesting design issue. It's not exactly a problem, but it is a bit ugly with the obvious solution.
I have a class implementing Callable, although with the current implementation it could just as easily be a Runnable as I'm not interested in the outcome, at least not as far as the calling thread is concerned. The calling thread will drop a number of these into a thread pool. Each of these Callables will have a batch of data that was obtained from an external service. The Callables will perform a number of actions, many of which involve making further calls to external services. As a result, there are a good number of places where various Exceptions could be thrown.
The issue I find is that depending on where the Exception occurs, I may need to take different actions. If it happens at point A, then delete the data on the external service. If it happens at point B, move the data to a different location on the server. If it happens at point C, just log it and do nothing further, etc. Any number of Exception types could be thrown at multiple points, although I don't think I'll need to do much filtering on the type itself, but more that one occurred.
The Callable itself isn't terribly huge, so there's not a great deal of code to mess with. However, I am hesitant to kludge it up with a ton of try/catch blocks to handle every possible point/Exception that may need different handling. I realize that this may really be the only viable solution. I don't really have control over most of the Exceptions that will be thrown (maybe a handful) without catching an existing one and rethrowing my own, which seems a bit redundant. I'm wondering if there's a good pattern or method to handle this sort of thing.
I've considered an exception handling class, but I'd still need to catch each Exception somehow and pass it to the handler as the point at which the Exception was thrown is important. I could break the Callable down into more atomic classes, each with their own small block and handling, but that would be trading one kludge for another. Catching everything in the call() method outright or by grabbing the Exception from the Future in the calling thread really isn't an option as this will lose the data on where it occurred unless I want to parse the stack trace, which isn't exactly viable.
Can anyone shed some light? Maybe I'm just quibbling over the try/catch blocks and should just go ahead with it, but I feel like there must be a better way...
Hmmm, it does occur to me that annotations on methods might help here. I could break down all methods until there's only one possible exception-throwing piece of code in each. Annotate each of these with a custom annotation that dictates what is done when that method throws an exception. I'm not sure if it is possible (an exception would somehow need to be caught right there as it may happen within a loop going over each piece of data and only one piece may be problematic, or at least somehow mark that piece for processing further up the chain), but perhaps this could mitigate the need for lots of try/catch blocks and instead handle the behavior with a single annotation and a handler class to deal with the exceptions. I don't believe it's possible to dictate behavior this way with an annotation, but I'd be happy to be wrong on that.
Say I have 3 tier app- frontend domain and data access. I have read that it is a good idea to catch exceptions high in the call stack...so if I get a data-access exception, the domain layer merely does a finally, like so
try{
}finally{
//cleans up
}
and lets the data-access exception percolate to the frontend layer. Does this not break layering by making the front-end layer deal with the innards ? I think that each layer should either handler or wrap and throw exception that it cannot handle to its calling layer...
any thoughts ?
Lots of good feedback so far, I'll give you my take.
Rule #1. ONLY catch exceptions you are going to actually handle. By handle, I mean handle in such a way that the client's request can continue. You may catch things long enough to log information (don't abuse this, usually the stack is enough information) or to convert to a different error that propagates easier (ala Runtime based). But, if you can't handle it, don't bother catching it. That's just extra code that is useless and confusing. Even when you log or convert, you end up rethrowing.
Realize that most of the time, you can NOT handle an exception. Truly. Many fail to grasp this. But the reality is, if you get an IOException reading or writing to the disk, game over. That request cannot be completed for the user. If your network is flaky and you can't talk to the database, same thing.
Rule #2. When you do get an exception that you cannot handle, the only thing you can do is try to fail in such a way that it is helpful to the user. This means, log it for later analysis (including original stack/cause), and then report something as helpful as possible to the user. Clean up whatever you must, so that the system remains in a consistent state.
Given that this communication with the end user happens at a very high level, that means you usually have to catch at that level. Most of the time, I find that there is very little value in any exception handling between it's inception point and the top level where you catch it for logging and reporting to the user. I often convert to a form of RuntimeException, but that's only done to ease propagation through the layers.
The biggest and most important thing is to realize that you usually can't handle exceptions, so any code you write for them should be as simple as possible.
I don't think layering is such a pure idea that this breaks it.
Wrapping and rethrowing doesn't add much value either.
What's wrong with having the service layer handle exceptions? That ought to be the end of the line, the last line of defense. This design lets the service log the exception - once and for all - and send a user friendly message to the UI for display.
You generally want to catch exceptions higher in the call stack, but only to the point that is makes sense. If the data level can handle and log the exception and just pass a message back to the front-end then that will keep things simple and more flexible.
Personally, if I need to have a try and a finally then I would like to also catch and do something about the situation there rather than pass it up to the caller. Just keep in mind there are always exceptions to good design rules (normally another rule like KISS).
There are three interlocking problems here.
First, constantly re-wrapping exceptions can be done but what value is it providing? You are just creating more layers around the original exception. I only wrap an exception when I can provide additional information about the exception or when the first exception causes another.
Second, the idea of an exception is to respond that a function can not be completed normally. You should catch the exception at the place where it makes the most sense to deal with the problem. If the code has "another alternative" the exception should be trapped at that point. Otherwise log it for the user or developer to work out.
Third, the try/finally block. These are useful when an exception would cause resources to hang out in a open or allocated state. I always use try/finally to clean up resources that might be left open (my favorite is the Statement/ResultSet from java.sql, a huge memory hog). A really good programmer has a lot of this in their code as a way to recover gracefully without creating huge memory leaks or resource constraints.