Looking to get opinions on whether or not it would be a good idea to hand persistence operations off to a task queue. For example, a user submits a new 'order', I use bean validation to verify that everything is ok, and then hand over the processing/persisting of the order to a task queue, and respond back to the user faster.
My hesitance is that the persistence 'could' fail, but once I've validated the bean, the chances are low. Are task queues usually used to handle tasks that are relatively trivial? My main concern is what happens if a task that the task queue has fails, since it's done asynchronously, how can I notify the user?
Tasks will retry automatically. If the failure is caused by the infrastructure, the task will be completed on a subsequent try. So you need to worry only about cases where a failure was caused by your code (code bug) or data (validation bug). If you iron out the bugs, you can use tasks with no hesitations and don't worry about the notifications.
In either case, if processing an order takes a couple of seconds, I probably wouldn't bother with task queues. From a user experience perspective, users want to feel that the app did some work with their order, so a 1-2 seconds delay with response is acceptable and even expected.
We have implemented a huge app of logistic flows and some of our processes take 2-3 minutes to read lot of data from BigQuery, do the work and send an e-mail with attachments.
To notify the user you can use the Channel API and/or send an e-mail.
You'll have to provide the user id, mail address or something like that into the task parameters because it is run by the system.
You can't ask to App Engine the current logged user, it will null everytime.
Like said Andrei :
you need to worry only about cases where a failure was caused by your
code (code bug) or data (validation bug).
Don't let an exception go out of the task otherwise the entire task will be run again.
Related
If I have Microservice, which should create User but since user creation is complex it uses queue, and user is actually created by the consumer the endpoint only takes request and returns ok or fail.
How do I create acceptance test for this acceptance criteria:
Given: User who wants to register
When: api is requested for user creation
Then: create user AND set hosting environment_id on new user
For this I have to wait while the environment is actually set up, which takes up to 30 seconds. And if I implement sleep inside my test, then I hit anti pattern wait and see how to properly test it without failing best practices?
most proper might be, to return a response instantly, let's say "setup process started" (with a setup process id) and then have another API method, which will "obtain setup status" (for that setup process id) - and then proceed, when "setup has completed".
because, alike this nothing will be stuck for 30s, neither in tests nor production - and one could display a progress bar to the user, which indicates the current status, so that they will have an estimate how long it will take - whilst not getting the impression, that something is stuck or would not work.
one barely can test asynchronously, while the setup process by itself won't be asynchronous; and long-running tasks without any kind of status indicator are barely acceptable for delivery - because this only appears valid, while knowing what is going on in the background, but not whilst not knowing that.
whenever testing hits an anti-pattern, this is an indicator, that the solution might be sub-optimal.
I don't presume to tell you exactly how to code your acceptance tests without more detail regarding language or testing stack, but the simplest solution is to implement a dynamic wait that continuously polls the state of the system for a desired result before moving forward, breaking the loop (presuming you would use some form of loop, but that’s up to you) when the expected/desired response has been received.
This "polling" can take many forms such as:
a) querying for an expected update to a database (perhaps a value within a table is updated when the user is created)
b) pinging the dependent service until you receive the proper "signal" you are expecting to indicate user creation. For example, perhaps a GET request to another service (or another endpoint of the same service) returns a status of “created” for the given user, signifying that the user has been created.
Without further technical information I can’t give you exact instructions, but dynamic polling is the solution I use every day to test our asynchronous microservice architecture.
Keep in mind, this dynamic polling solution operates on the assumption that you have access to the service(s) and/or database(s) that contain the indicator for which you are "polling" when it is time to move forward with your test. Again, I'm the signal to move forward is something transparent such as a status change for the newly created user, the user's existence in a database/table either external or internal to the microservice, etc.
Some other assumptions in this scenario are:
a) sufficient non-functional performance of the System Under Test, where poor non-functional performance of the System Under Test would be a constraint.
b) a lack of resource constraints as resources are consumed somewhat heavily during the "polling", as resources are consumed somewhat heavily during the period of “polling”. (think Azure dynamic resource flexing, which can be costly over time).
Note: Be careful for infinite loops. You should insert some sort of constraint that exits the polling loop (and likely results in a failed test) after a reasonable period of time or number of attempts at your discretion.
Create a query service that given the user attributes (id, or name etc), will return the status of the user.
For the acceptance criteria, will be 2 part
create-user service returns 200
get-status service returns 200 (you can call it in a loop in your test).
This service will be helpful in the long run for various reasons
Check how long is it taking to the async process to complete.
At any time you can get status of any user, including to validate if a user is truly deleted / inactivated etc
You can mock this service results in your end-to-end integrated testing.
For the last few years we have used our own RM Application to process events related to our applications. This works by polling a database table every few minutes, looking for any rows that have a due date before now, and have not been processed yet.
We are currently making the transition to SNS, with SQS Worker tiers processing them. The problem with this approach is that we can't future date our messages. Our applications sometimes have events that we don't want to process until a week later.
Are there any design approaches, alternative services, clever tricks we could employ that would allow us to do achieve this?
One solution would be to keep our existing application running, at a simplified level, so all it does is send the SNS notifications when they are due, but the aim of this project is to try and do away with our existing app.
The database approach would be the wisest, being careful that each row is only processed once.
Amazon Simple Notification Service (SNS) is designed to send notifications immediately. There is no functionality for a delayed send (although some notification types are retried if they fail).
Amazon Simple Queue Service (SQS) does have a delay feature, but only up to 15 minutes -- this is useful if you need to do some work before the message is processed, such as copying related data to Amazon S3.
Given that your requirement is to wait until some future arbitrary time (effectively like a scheduling system), you could either start a process and tell it to sleep for a certain amount of time (a bad idea in case systems are restarted), or continue your approach of polling from a database.
If all jobs are scheduled for a distant future (eg at least one hour away), you theoretically only need to poll the database once an hour to retrieve the earliest scheduled time.
A week might be too long as SQS message retention itself is only 15 days. If you are okay with maximum retention of 15days, one idea is to keep the changing the visibility of a message every time you receive until it is ready for processing. The maximum allowed visibility timeout is 12 hours. More on visibility timeout and APIs for changing them,
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ChangeMessageVisibility.html
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html
I found this approach: https://github.com/alestic/aws-sns-delayed. Basically, you can use a step function with a wait step in there
In designing my GWT/GAE app, it has become evident to me that my client-side (GWT) will be generating three types of requests:
Synchronous - "answer me right now! I'm important and require a real-time response!!!"
Asynchronous - "answer me when you can; I need to know the answer at some point but it's really not all that ugent."
Command - "I don't need an answer. This isn't really a request, it's just a command to do something or process something on the server-side."
My game plan is to implement my GWT code so that I can specify, for each specific server-side request (note: I've decided to go with RequestFactory over traditional GWT-RPC for reasons outside the scope of this question), which type of request it is:
SynchronousRequest - Synchronous (from above); sends a command and eagerly awaits a response that it then uses to update the client's state somehow
AsynchronousRequest - Asynchronous (from above); makes an initial request and somehow - either through polling or the GAE Channel API, is notified when the response is finally received
CommandRequest - Command (from above); makes a server-side request and does not wait for a response (even if the server fails to, or refuses to, oblige the command)
I guess my intention with SynchronousRequest is not to produce a totally blocking request, however it may block the user's ability to interact with a specific Widget or portion of the screen.
The added kicker here is this: GAE strongly enforces a timeout on all of its frontend instances (60 seconds). Backend instances have much more relaxed constraints for timeouts, threading, etc. So it is obvious to me that AsynchronousRequests and CommandRequests should be routed to backend instances so that GAE timeouts do not become an issue with them.
However, if GAE is behaving badly, or if we're hitting peak traffic, or if my code just plain sucks, I have to account for the scenario where a SynchronousRequest is made (which would have to go through a timeout-regulated frontend instance) and will timeout unless my GAE server code does something fancy. I know there is a method in the GAE API that I can call to see how many milliseconds a request has before its about to timeout; but although the name of it escapes me right now, it's what this "fancy" code would be based off of. Let's call it public static long GAE.timeLeftOnRequestInMillis() for the sake of this question.
In this scenario, I'd like to detect that a SynchronousRequest is about to timeout, and somehow dynamically convert it into an AsynchronousRequest so that it doesn't time out. Perhaps this means sending an AboutToTimeoutResponse back to the client, and force the client to decide about whether to resend as an AsynchronousRequest or just fail. Or perhaps we can just transform the SynchronousRequest into an AsynchronousRequest and push it to a queue where a backend instance will consume it, process it and return a response. I don't have any preferences when it comes to implementation, so long as the request doesn't fail or timeout because the server couldn't handle it fast enough (because of GAE-imposed regulations).
So then, here is what I'm actually asking here:
How can I wrap a RequestFactory call inside SynchronousRequest, AsynchronousRequest and CommandRequest in such a way that the RequestFactory call behaves the way each of them is intended? In other words, so that the call either partially-blocks (synchronous), can be notified/updated at some point down the road (asynchronous), or can just fire-and-forget (command)?
How can I implement my requirement to let a SynchronousRequest bypass GAE's 60-second timeout and still get processed without failing?
Please note: timeout issues are easily circumvented by re-routing things to backend instances, but backends don't/can't scale. I need scalability here as well (that's primarily why I'm on GAE in the first place!) - so I need a solution that deals with scalable frontend instances and their timeouts. Thanks in advance!
If the computation that you want GAE to do is going to take longer than 60 seconds, then don't wait for the results to be computed before sending a response. According to your problem definition, there is no way to get around this. Instead, clients should submit work orders, and wait for a notification from the server when the results are ready. Requests would consist of work orders, which might look something like this:
class ComputeDigitsOfPiWorkOrder {
// parameters for the computation
int numberOfDigitsToCompute;
// Used by the GAE app to contact the requester when results are ready.
ClientId clientId;
}
This way, your GAE app can respond as soon as the work order is saved (e.g. in Task Queue), and doesn't have to wait until it actually finishes calculating a billion digits of pi before responding. Your GWT client then waits for the result using the Channel API.
In order to give some work orders higher priority, you can use multiple task queues. If you want Task Queue work to scale automatically, you'll want to use push queues. Implementing priority using push queues is a little tricky, but you can configure high priority queues to have faster feed rate.
You could replace Channel API with some other notification solution, but that would probably be the most straightforward.
In my Java app, sometimes my users do some work that requires a datastore write, but I don't want to keep the user waiting while the datastore is writing. I want to immediately return a response to the user while the data is stored in the background.
It seems fairly clear that I could do this by using GAE task queues, enqueueing a task to store the data. But I also see that there's an Async datastore API, which seems like it would be much easier than dealing with task queues.
Can I just call AsyncDatastoreService.put() and then return from my servlet? Will that API store my data without keeping my users waiting?
I think you are right that the Async calls seem easier. However, the docs for AsyncDatastore mention one caveat that you should consider:
Note: Exceptions are not thrown until you call the get() method. Calling this method allows you to verify that the asynchronous operation succeeded.
The "get" in that note is being called on the Future object returned by the async call. If you just return from your servlet without ever calling get on the Future object, you might not know for sure whether your put() worked.
With a queued task, you can handle the error cases more explicitly, or just rely on the automatic retries. If all you want to queue is datastore puts, you should be able to create (or find) a utility class that does most of the work for you.
Unfortunately, there aren't any really good solutions here. You can enqueue a task, but there's several big problems with that:
Task payloads are limited in size, and that size is smaller than the entity size limit.
Writing a record to the datastore is actually pretty fast, in wall-clock time. A significant part of the cost, too, is serializing the data, which you have to do to add it to the task queue anyway.
By using the task queue, you're creating more eventual consistency - the user may come back and not see their changes applied, because the task has not yet executed. You may also be introducing transaction issues - how do you handle concurrent updates?
If something fails, it could take an arbitrarily long time to apply the user's updates. In such situations, it probably would have been better to simply return an error to the user.
My recommendation would be to use the async API where possible, but to always write to the datastore directly. Note that you need to wait on all your outstanding API calls, as Peter points out, or you won't know if they failed - and if you don't wait on them, the app server will, before returning a response to the user.
If all you need is for the user to have a responsive interface while stuff churns in the back on the db, all you have to do is make an asynchronous call at the client level, aka do some ajax that sends the db write request, changes imemdiatelly the users display, and then upon an ajax request callback update the view with whatever is it you wish.
You can easily add GWT support to you GAE project (either via eclipse plugin or maven gae plugin) and have the time of your life doing asynchronous stuff.
I'm building a web service with a RESTful interface (lets call it MY_API). This service relies on another RESTful webservice to handle certain aspects (calling it OTHER_API). I'd like to determine what types of best practices I should consider using to handle failures of OTHER_API.
Scenario
My UI is a single page javascript application. There are some fairly complex actions a user can take, which can easily take the user a minute or two to complete. When they are done, they click the SAVE button and MY_API is called to save the data.
MY_API has everything it needs to persist the information submitted by the user. However, there is an action that must take place that is handled by OTHER_API. For instance, OTHER_API might handle sending out an emails. Or perhaps it handles adding line items to my user's billing statement. In both cases, these are critical things than must be completed, but they don't have to happen right now, they just need to happen eventually.
If OTHER_API fails, I don't want to simply tell the user their action has failed, as they spent a lot of time doing it and this will make the experience less than optimal.
Questions
So should I create some sort of Message or Event Queue that can save these failed REST requests to OTHER_API and process them later?
Any advice or suggestions on techniques to go about saving REST requests for delayed processing?
Is there a recommended open source message queue solution that would work for this type of scenario with JSON-based REST web services? Java is preferred as my backend is written in it.
Are there other techniques I should consider?
Rather than approach this by focusing on the failure state, it'd be faster and more robust to recognize that these actions should be performed asynchronously and out-of-band from the request by the UI. You should indeed use a message/event/job queue, and just pop those jobs right onto that queue as quickly as possible, and respond to the original request as quickly as possible. Once you've done that, the asynchronous job can be performed independently of the original request, and at its own pace — including with retries as needed.
If you want your API to indicate that there are aspects of the request which have not completed, you can use the HTTP response Status Code 202 (Accepted).