I have a use case where I need to process a request as follows
Authenticate the Request
Authorize the Request
Validate the message (Reads the database for the existing record and validates it)
Perform some asynchronous operations
Update the records in the database and notify the customer
The problem is I need to read the same record we read in step 3 in step 4
and step 5
Since this looked
like a workflow I thought I can use the COR design pattern.
However I do not want to read the database record again in step 4 and 5 and want to pass this from step 3 to step 4 and 5.
What is an elegant design pattern I can use for this workflow.
Can you help me with giving some class/interface structure for this?
It's not a question about design pattern: simply put your data in cache memory like Google Guava
Related
I need to automate the workflow after an event occurred. I do have experience in CRUD applications but not in Workflow/Batch processing. Need help in designing the system.
Requirement
The workflow involves 5 steps. Each step is a REST call and are dependent on previous step.
EX of Steps: (VerifyIfUserInSystem, CreateUserIfNeeded, EnrollInOpt1, EnrollInOpt2,..)
My thought process is to maintain 2 DB Tables
WORKFLOW_STATUS Table which contains columns like
(foreign key(referring to primary table), Workflow Status: (NEW, INPROGRESS, FINISHED, FAILED), Completed Step: (STEP1, STEP2,..), Processed Time,..)
EVENT_LOG Table to maintain the track of Events/Exceptions for a particular record
(foreign key, STEP, ExceptionLog)
Question
#1. Is this a correct approach to orchestrate the system(which is not that complex)?
#2. As the steps involve REST Calls, I might have to stop the process when a service is not available and resume the process in a later point of time. I am not sure for many retry attempts should be made and how to maintain the no of attempts made before marking it as FAILED. (Guessing create another column in the WORKFLOW_STATUS table called RETRY_ATTEMPT and set some limit before marking it Failed)
#3 Is the EVENT_LOG Table a correct design and what datatype(clob or varchar(2048)) should I be using for exceptionlog? Every step/retry attempts will be inserted as a new record to this table.
#4 How to reset/restart a FAILED entry after a dependent service is back up.
Please direct me to an blogs/videos/resources if available.
Thanks in advance.
Have you considered using a workflow orchestration engine like Netflix's Conductor? docs, Github.
Conductor comes with a lot of the features you are looking for built in.
Here's an example workflow that uses two sequential HTTP requests (where the 2nd requires a response from the first):
Input supplies an IP address (and a Accuweather API key)
{
"ipaddress": "98.11.11.125"
}
HTTP request 1 locates the zipCode of the IP address.
HTTP request 2 uses the ZipCode (and the apikey) to report the weather.
The output from this workflow is:
{
"zipcode": "04043",
"forecast": "rain"
}
Your questions:
I'd use an orchestration tool like Conductor.
Each of these tasks (defined in Conductor) have retry logic built in. How you implement will vary on expected timings etc. Since the 2 APIs I'm calling here are public (and relatively fast), I don't wait very long between retries:
"retryCount": 3,
"retryLogic": "FIXED",
"retryDelaySeconds": 5,
Inside the connection, there are more parameters you can tweak:
"connectionTimeOut": 1600,
"readTimeOut": 1600
There is also exponential retry logic if desired.
The event log is stored in ElasticSearch.
You can build error pathways for all your workflows.
I have this workflow up and running in the Conductor Playground called "Stack_overflow_sequential_http". Create a free account. Run the workflow - click "run workflow, select "Stack_overflow_sequential_http" and use the JSON above to see it in action.
The get_weather connection is a very slow API, so it may fail a few times before succeeding. Copy the workflow, and play with the timeout values to improve the success.
You describe an Enterprise Integration Pattern with enrichments/transformations from REST calls and stateful aggregation of the results over time (consequently meaning many such flows may be in progress at any one time). Apache Camel was designed for exactly these scenarios.
See What exactly is Apache Camel?
My question of the day is about combinatory operation when building microservices.
Let us use the fictionnal scenario : I want to build a dashboard. The dashboard is composed of a bunch of people and their infos (history, reviews, purchases, last products searched).
Reading spring-cloud and spring-reactor, I would like a non blocking solution calling multiple microservices : user service, review service, search engine service, ....
My first guess was to do something like
load the users,
for each one load its reviews then
load its history then
combine all the data
In pseudo-code something like loadUsers().flatmap(u -> loadReviews(u))....reduce(). It's really approximative here as you can see.
When loading 1 user, we can estimate that we need 4 more http calls. For 100 users, 400 additional calls etc etc. The Big-O doesn't seem linear.
In the worst case where a microservice also delegates data loading from a XYZ microservices then we have got : for 1 user -> N calls including 1 review call -> 1 XYZ call. Sorry I didn't calculate the Big-O (quadratic ?).
To avoid that, we can perhaps load all the users, extract their id, call earch microservcies with a batch of ids. Each microservices can load all the data at once (List of reviews mapped by id perhaps) and the original called will merge all theses lists. (a kind of zip function)
Summary : I just read this question about Observables composition. My question is can be summarize with "Do you use the same strategy when you don't have a unique user at the start of the chain but hundreds of users ?" (the performance can be a problem no ?)
You will likely want to use batching to reduce the number downstream calls. Instead of sending a single user through the observable, you will want to send the batch.
I am looking for a good design pattern for sharding a list in Google App Engine. I have read about and implemented sharded counters as described in the Google Docs here but I am now trying to apply the same principle to a list. Below is my problem and possible solution - please can I get your input?
Problem:
A user on my system could receive many messages kind of like a online chat system. I'd like the server to record all incoming messages (they will contain several fields - from, to, etc). However, I know from the docs that updating the same entity group often can result in an exception caused by datastore contention. This could happen when one user receives many messages in a short time thus causing his entity to be written to many times. So what about abstracting out the sharded counter example above:
Define say five entities/entity groups
for each message to be added, pick one entity at random and append the message to it writing it back to the store,
To get list of messages, read all entities in and merge...
Ok some questions on the above:
Most importantly, is this the best way to go about things or is there a more elegant/more efficient design pattern?
What would be a efficient way to filter the list of messages by one of the fields say everything after a certain date?
What if I require a sharded set instead? Should I read in all entities and check if the new item already exists on every write? Or just add it as above and then remove duplicates whenever the next request comes in to read?
why would you want to put all messages in 1 entity group ?
If you don't specify a ancestor, you won't need sharding, but the end user might see some lagging when querying the messages due to eventual consistency.
Depends if that is an acceptable tradeoff.
I’m struggling with how to design a Spring Batch job. The overall goal is to retrieve ~20 million records and save them to a sql database.
I’m doing it in two parts. First I retrieve the 20 million ids of the records I want to retrieve and save those to a file (or DB). This is a relatively fast operation. Second, I loop through my file of Ids, taking batches of 2,000, and retrieve their related records from an external service. I then repeat this, 2,000 Ids at a time, until I’ve retrieved all of the records. For each batch of 2,000 records I retrieve, I save them to a database.
Some may be asking why I’m doing this in two steps. I eventual plan to make the second step run in parallel so that I can retrieve batches of 2,000 records in parallel and hopefully greatly speed-up the download. Having the Ids allows me to partition the job into batches. For now, let’s not worry about parallelism and just focus on how to design a simpler sequential job.
Imagine I already have solved the first problem of saving all of the Ids locally. They are in a file, one Id per line. How do I design the steps for the second part?
Here’s what I’m thinking…
Read 2,000 Ids using a flat file reader. I’ll need an aggregator since I only want to do one query to my external service for each batch of 2K Ids. This is where i’m struggling. Do I nest a series of readers? Or can I do ‘reading’ in the processor or writer?
Essentially, my problem is that I want to read lines from a file, aggregate those lines, and then immediately do another ‘read’ to retrieve the respective records. I almost want to chain readers together.
Finally, once I’ve retrieved the records from the external service, I’ll have a List of records. Which means when they arrive at the Writer, I’ll have a list of lists. I want a list of objects so that I can use the JdbcItemWriter out of the box.
Thoughts? Hopefully that makes sense.
Andrew
This is a matter of design and is subjective, but based on the Spring Batch example I found (from SpringSource) and my personal experience, the pattern of doing addtional reading in the processor step is a good solution to this problem. You can also chain together multiple processors/readers in the 'processor' step. So, while the names don't exactly match, i find myself doing more and more 'reading' in my processors.
[http://docs.spring.io/spring-batch/trunk/reference/html/patterns.html#drivingQueryBasedItemReaders][1]
Given that you want to call your external service just once per chunk of 2.000 records, you 'll actually want to do this service call in an ItemWriter. That is the standard recommended way to do chunk-level processing.
You can create a custom ItemWriter<Long> implementation. It will receive the list of 2.000 IDs as input, and call the external service. The result from the external service should allow you to create a List<Item>. Your writer can then simply forward this List<Item> to your JdbcItemWriter<Item> delegate.
I am doing geoquery among 300 user entities with a result range 10.
I've maked a query for 120 times. For each query I got 10 user entity objects.
After this my app engine read operations reached 52% (26000 operations).
My user entity has 12 single value properties and 3 multi-value properties(List type).
User entity have 2 indexes for single value properties and 2 indexes on list type properties.
Can any one please help me to understand how google's appengine counts the datastore read operations?
As a start, use appstats. It'll show you where your costs are coming from in your app:
https://developers.google.com/appengine/docs/java/tools/appstats
To keep your application fast, you need to know:
Is your application making unnecessay RPC calls? Should it cache data
instead of making repeated RPC calls to get the same data? Will your
application perform better if multiple requests are executed in
parallel rather than serially? The Appstats library helps you answer
these questions and verify that your application is using RPC calls in
the most efficient way by allowing you to profile your RPC calls.
Appstats allows you to trace all RPC calls for a given request and
reports on the time and cost of each call.
Once you understand where your costs are coming from you can optimise.
If you just want to know what the prices are, they are here:
https://developers.google.com/appengine/docs/billing
You can analyse what is going on under the hood with appstats: https://developers.google.com/appengine/docs/java/tools/appstats