What is the best way to "roll back" changes? - java

Alright, so I have a Spring application that takes in a Network Representation and boots up virtual machines to represent the network that was passed in.
It uses a low level API to bring up the VMs, there is no database involved.
What I need to figure out how to do is handle the situation where a user submits a 10 node (or any number) network model and the application goes through and builds up the network (starting VMs), if a node fails to startup I want to be able to react to that. I would like to be able to roll back my changes (i.e destroy all nodes that were created).
I've been told that I need to look into "Transactions" but I am unsure whether or not that applies to this scenario when I'm not using a database.
As a side note, I do have logic to take down nodes if a user sends in that request.
My question is -- how do I handle this?
Also, is this the best stack overflow for this question?

It does seem that you are looking for transactional behavior, and specifically, for atomicity ("all or nothing"). But usually "transaction" connotes certain guarantees (particularly around ACID properties) that will be difficult or impossible to achieve where human-level timescales on the order of minutes are involved.
Probably "workflow with compensation for errors" is more what you would be looking for here.
I would implement this manually, perhaps with tool support (e.g. workflow engines). Kick off a process to spawn your network, and keep track of the current progress, such as VMs created, VMs in progress, etc. If there are errors that demand a rollback, then have another process that performs a cleanup. The behavior of the cleanup process itself could fail, so it might retry its various steps a couple times before generating a report that says "this cleanup step failed".
If there are shared resources involved then you would need to implement some kind of isolation mechanism as well. Sometimes this is easy enough--e.g., DHCP helps you avoid duplicate IPs. If you're updating a DNS zone file then you'd want to synchronize access to that to avoid concurrent writes. Etc.

Related

Request should get response in 1sec in Microservice Architecture

Recently, I have faced one java interview question. It goes like this : "There are 3 Microservices (flow goes from 1st to 2nd to 3rd) which takes a minimum roundabout of 0.5sec to provide the response. But the web request should get response in 1sec itself. How to achieve this ?"
Any architecture design or pattern or any settings, need to do ?
It's a very vague question and there's no easy direct answer, it's more to identify in which direction you will go, what analisys options will you suggest. Its reliability engineering (SRE) which includes many tricks and approaches.
I would start by analyzing and clarifying what business process is implemented by this requests sequence, think about needless (it happens that not always Developers write correct code, hence some non-needed calls to services, DB etc)
Monitor network latency and identify where is a focus area. If the network takes significant time, then makes sense to improve network hardware or software, look for problems, bad packages when the Client resends data, "Package storm" issue etc. If the network is fine, focus on services.
Then consider caching data from downstream services (In-progress cache or Distributed depends on architecture and data type). This step should be done carefully with a full understanding of data nature, e.g Can it be cached, for which period, which way to use for refreshing/evicting data?
Pay attention to the possibility of code optimisation which is executed. It happens, that Developers don't keep in mind performance during implementation, hence can create functionality with unneeded operations (for example some sorting, filtering, synchronization (with locks), etc).
Part of 3., parallelize everything that is possible inside the code execution (no guarantee it helps), get rid of locks. For example, there can be some dependencies on DB or other sources before/after calls to downstream service, which may lead to unpredicted blockings, in such a situation make sense to do execution of tasks in parallel threads without blocking each other.
If no low-level tricks help, it can ring a bell to revisit the architecture of services, e.g if SLO 1s is very important, then maybe it makes sense to join 1+2 or 2+3 microservices into bigger service to reduce data transformation and transferring between service (need to calc before).
There are much more things to consider, depends how deep you would like to go.

Spring State Machine - How many should I create?

When I receive a request on my API, I want to do a series of steps, each being a check or an enrichment. Each step could either succeed or fail. On Success, the next step should be carried out. On Failure, an end-step should be executed, and the flow is done. For that I have considered Spring State Machine, as it seems to fit the bill.
I have read up on the documentation and played around with it, but some things elude me:
Should there be a 1-to-1 relationship between a request and a State Machine, meaning that for every request, I make a new State Machine instance? Or should I somehow reuse a completed State Machine by resetting the machine for the next request?
What about cleanup of completed State Machines? There doesn't seem to be a way to destroy and clean a State Machine instance. If I create 1 per request, I've effectively introduced a memory leak, unless the framework somehow handles resources.
There is no absolutely correct answer to your question so I just need to leave some comments here. State machine as a concept is so loose that it gives you so many different ways to do things.
Whole concept if steps one after another kinda relates to how tasks recipe was implemented. It executes a dag of tasks and if parent task fails machine enters into error state giving user a chance to fix things and request machine to continue. statemachine-recipes-tasks statemachine-examples-tasks. Might be that this kind of use case would be a good candidate to create a new recipe as it is pretty generic.
Framework should clear things after machine has been stopped and eventually jvm should clear garbage. If you find something abnormal, please file a gh issue and we'll fix things.
We have sample statemachine-examples-eventservice which is reusing machines but I'm currently re-implementing that sample(it works but should be implemented better) as I was told by our head-chef that what I did there is dump SPR-15042. Machines cannot be used with a session scope and things go south if rich object(which ssm is) is serialised.
It is relatively easy to do a combination of states and choices which would do your step flow. It's only question how much you want this to be re-usable(thus generic recipe would be a good thing, PR's welcomed :) )
What comes for error handling something I presented in a statechart in gh-240 is also something to consider.
There has been some questions if ssm could work as a more generic flow engine but it's probably something it's never going to be as it would be a completely new project. Thought most of a flows could be handled as a separate recipes.

Optimizing performance by saving data in local database

We are creating a financial transaction system. There is a blacklist service (soap) exposed by some external system. We have to call this service in each transaction to check whether the sender or receiver exist in black list. If they do we should not let the transaction through.
Black list size is a few thousand.
To optimize the system, we are thinking to keep a copy of this list in our database and check it from there and whenever there is an update in the blacklist external system will inform us.
From architecture point of view, is it a good approach? Should we use caching libraries instead of doing this manually?
Application is being developed in Java with Oracle database.
From my point of view you shouldn't invent the wheel, always use known libraries instead writing your own code since it probably more optimized and maintained.
If you are able to save the data on your local db and be inform somehow about changes in the external system it can be a better approach.
Just be sure to synchronized your actions since you would probably want for the action that checked if someone is blacklisted to wait for the action that synchronized the data from the blacklist system in order to have the most updated data.
I think it can (or not) be a good approach.
1) That SOAP in general means a webservice and that in general means latency. Do you want all step in your process to wait for that call to return?
2) That list maybe don't changes fast? Meaning it ill not hurt to keep that in "cache". You can tune the periodicity you system checks and updates your cache.
3) That SOAP also means an asynch call (in general) you can keep your system running while the "cache" is updated.
If you system is hot processing a lot and needs that to check that list many times/second and that list don't change 0.1% per day you ill be fine.
In the other hand if you system just run a batch a few times a day and that list is changing a lot every second it ill be not the best approach.

How to properly throttle web requests to external systems?

My Java web application pulls some data from external systems (JSON over HTTP) both live whenever the users of my application request it and batch (nightly updates for cases where no user has requested it). The data changes so caching options are likely exhausted.
The external systems have some throttling in place, the exact parameters of which I don't know, and which likely change depending on system load (e.g., peak times 10 requests per second from one IP address, off-peak times 100 requests per second from open IP address). If the requests are too frequent, they time out or return HTTP 503.
Right now I am attempting the request 5 times with 2000ms delay between each, giving up if an error is received each time. This is not optimal as sometimes at peak-times nearly all requests fail; I could avoid making these requests and perhaps get at least some to succeed instead.
My goals are to have a somewhat simple, reliable design, and enough flexibility so that I could both pull some metrics from the throttler to understand how well the external systems are responding (and thus adjust how often they are invoked), and to auto-adjust the interval with which I call them (individually per system) so that it is optimal both on off-peak and peak hours.
My infrastructure is Java with RabbitMQ over MongoDB over Linux.
I'm thinking of three main options:
Since I already have RabbitMQ used for batch processing, I could just introduce a queue to which the web processes would send the requests they have for external systems, then worker processes would read from that queue, throttle themselves as needed, and return the results. This would allow running multiple parallel worker processes on more servers if needed. My main concern is that it isn't a very simple solution, and how to manage peak-hour throughput being low and thus the web processes waiting for a long while. Also this converts my RabbitMQ into a critical single failure point; if it dies the whole system stops (as opposed to the nightly batch processes just not running any more, which is less critical). I suppose rpc is the correct pattern of RabbitMQ usage, but not sure. Edit - I've posted a related question How to properly implement RabbitMQ RPC from Java servlet web container? on how to implement this.
Introduce nginx (e.g. ngx_http_limit_req_module), HAProxy (link) or other proxy software to the mix (as reverse proxies?), have them take care of the throttling through some configuration magic. The pro is that I don't have to make code changes. The con is that it is more technology used, and one I've not used before, so chances of misconfiguring something are quite high. It would also likely not be easy to do dynamic throttling depending on external server load, or prioritizing live requests over batch requests, or get statistics of how the throttling is doing. Also, most documentation and examples will likely be on throttling incoming requests, not outgoing.
Do a pure-Java solution (e.g., leaky bucket implementation). Would be simple in the sense that it is "just code", but the devil is in the details; debugging all the deadlocks, starvations and race conditions isn't always fun.
What am I missing here?
Which is the best solution in this case?
P.S. Somewhat related question - what's the proper approach to log all the external system invocations, so that statistics are collected as to how often I invoke them, and what the success rate is?
E.g., after every invocation I'd invoke something like .logExternalSystemInvocation(externalSystemName, wasSuccessful, elapsedTimeMills), and then get some aggregate data out of it whenever needed.
Is there a standard library/tool to use, or do I have to roll my own?
If I use option 1. with RabbitMQ, is there a way to organize the flow so that I get this out of the box from the RabbitMQ console? I wouldn't want to send all failed messages to poison queue, it would fill up too quickly though and in most cases there is no need to re-process these failed requests as the user has already sadly moved on.
Perhaps this open source system can help you a little: http://code.google.com/p/valogato/

Critically efficient server

I am developing a client-server based application for financial alerts, where the client can set a value as the alert for a chosen financial instrument , and when this value will be reached the monitoring server will somehow alert the client (email, sms ... not important) .The server will monitor updates that come from a data generator program. Now, the server has to be very efficient as it has to handle many clients (possible over 50-100.000 alerts ,with updates coming at 1,2 seconds) .I've written servers before , but never with such imposed performances and I'm simply afraid that a basic approach(like before) will just not do it . So how should I design the server ?, what kind of data structures are best suited ?..what about multithreading ?....in general what should I do (and what I should not do) to squeeze every drop of performance out of it ?
Thanks.
I've worked on servers like this before. They were all written in C (or fairly simple C++). But they were even higher performance -- handling 20K updates per second (all updates from most major stock exchanges).
We would focus on not copying memory around. We were very careful in what STL classes we used. As far as updates, each financial instrument would be an object, and any clients that wanted to hear about that instrument would subscribe to it (ie get added to a list).
The server was multi-threaded, but not heavily so -- maybe a thread handing incoming updates, one handling outgoing client updates, one handling client subscribe/release notifications (don't remember that part -- just remember it had fewer threads than I would have expected, but not just one).
EDIT: Oh, and before I forget, the number of financial transactions happening is growing at an exponential rate. That 20K/sec server was just barely keeping up and the architects were getting stressed about what to do next year. I hear all major financial firms are facing similar problems.
You might want to look into using a proven message queue system, as it sounds like this is basically what you are doing in your application.
Projects like Apache's ActiveMQ or RabbitMQ are already widely used and highly tuned, and should be able to support the type of load you are talking about outside of the box.
I would think that squeezing every drop of performance out of it is not what you want to do, as you really never want that server to be under load significant enough to take it out of a real-time response scenario.
Instead, I would use a separate machine to handle messaging clients, and let that main, critical server focus directly on processing input data in "real time" to watch for alert criteria.
Best advice is to design your server so that it scales horizontally.
This means distributing your input events to one or more servers (on the same or different machines), that individually decide whether they need to handle a particular message.
Will you be supporting 50,000 clients on day 1? Then that should be your focus: how easily can you define a single client's needs, and how many clients can you support on a single server?
Second-best advice is not to artificially constrain yourself. If you say "we can't afford to have more than one machine," then you've already set yourself up for failure.
Beware of any architecture that needs clustered application servers to get a reasonable degree of performance. London Stock Exchange had just such a problem recently when they pulled an existing Tandem-based system and replaced it with clustered .Net servers.
You will have a lot of trouble getting this type of performance from a single Java or .Net server - really you need to consider C or C++. A clustered architecture is much more error prone to build and deploy and harder to guarantee uptime from.
For really high volumes you need to think in terms of using asynchronous I/O for networking (i.e. poll(), select() and asynchronous writes or their Windows equivalents), possibly with a pool of worker threads. Read up about the C10K problem for some more insight into this.
There is a very mature C++ framework called ACE (Adaptive Communications Environment) which was designed for high volume server applications in telecommunications. It may be a good foundation for your product - it has support for quite a variety of concurrency models and deals with most of the nuts and bolts of synchronisation within the framework. You might find that the time spent learning how to drive this framework pays you back in less development and easier implementation and testing.
One Thread for the receiving of instrument updates which will process the update and put it in a BlockingQueue.
One Thread to take the update from the BlockingQueue and hand it off to the process that handles that instrument, or set of instruments. This process will need to serialize the events to an instrument so the customer will not receive notices out-of-order.
This process (Thread) will need to iterated through the list of customers registered to receive notification and create a list of customers who should be notified based on their criteria. The process should then hand off the list to another process that will notify the customer of the change.
The notification process should iterate through the list and send each notification event to another process that handles how the customer wants to be notified (email, etc.).
One of the problems will be that with 100,000 customers synchronizing access to the list of customers and their criteria to be monitored.
You should try to find a way to organize the alerts as a tree and be able to quickly decide what alerts can be triggered by an update.
For example let's assume that the alert is the level of a certain indicator. Said indicator can have a range of 0, n. I would groups the clients who want to be notified of the level of the said indicator in a sort of a binary tree. That way you can scale it properly (you can actually implement a subtree as a process on a different machine) and the number of matches required to find the proper subset of clients will always be logarithmic.
Probably the Apache Mina network application framework as well as Apache Camel for messages routing are the good start point. Also Kilim message-passing framework looks very promising.

Categories

Resources