Request should get response in 1sec in Microservice Architecture

Request should get response in 1sec in Microservice Architecture - java

Recently, I have faced one java interview question. It goes like this : "There are 3 Microservices (flow goes from 1st to 2nd to 3rd) which takes a minimum roundabout of 0.5sec to provide the response. But the web request should get response in 1sec itself. How to achieve this ?"
Any architecture design or pattern or any settings, need to do ?

It's a very vague question and there's no easy direct answer, it's more to identify in which direction you will go, what analisys options will you suggest. Its reliability engineering (SRE) which includes many tricks and approaches.
I would start by analyzing and clarifying what business process is implemented by this requests sequence, think about needless (it happens that not always Developers write correct code, hence some non-needed calls to services, DB etc)
Monitor network latency and identify where is a focus area. If the network takes significant time, then makes sense to improve network hardware or software, look for problems, bad packages when the Client resends data, "Package storm" issue etc. If the network is fine, focus on services.
Then consider caching data from downstream services (In-progress cache or Distributed depends on architecture and data type). This step should be done carefully with a full understanding of data nature, e.g Can it be cached, for which period, which way to use for refreshing/evicting data?
Pay attention to the possibility of code optimisation which is executed. It happens, that Developers don't keep in mind performance during implementation, hence can create functionality with unneeded operations (for example some sorting, filtering, synchronization (with locks), etc).
Part of 3., parallelize everything that is possible inside the code execution (no guarantee it helps), get rid of locks. For example, there can be some dependencies on DB or other sources before/after calls to downstream service, which may lead to unpredicted blockings, in such a situation make sense to do execution of tasks in parallel threads without blocking each other.
If no low-level tricks help, it can ring a bell to revisit the architecture of services, e.g if SLO 1s is very important, then maybe it makes sense to join 1+2 or 2+3 microservices into bigger service to reduce data transformation and transferring between service (need to calc before).
There are much more things to consider, depends how deep you would like to go.

Related

What is the best way to "roll back" changes?

Alright, so I have a Spring application that takes in a Network Representation and boots up virtual machines to represent the network that was passed in.
It uses a low level API to bring up the VMs, there is no database involved.
What I need to figure out how to do is handle the situation where a user submits a 10 node (or any number) network model and the application goes through and builds up the network (starting VMs), if a node fails to startup I want to be able to react to that. I would like to be able to roll back my changes (i.e destroy all nodes that were created).
I've been told that I need to look into "Transactions" but I am unsure whether or not that applies to this scenario when I'm not using a database.
As a side note, I do have logic to take down nodes if a user sends in that request.
My question is -- how do I handle this?
Also, is this the best stack overflow for this question?

It does seem that you are looking for transactional behavior, and specifically, for atomicity ("all or nothing"). But usually "transaction" connotes certain guarantees (particularly around ACID properties) that will be difficult or impossible to achieve where human-level timescales on the order of minutes are involved.
Probably "workflow with compensation for errors" is more what you would be looking for here.
I would implement this manually, perhaps with tool support (e.g. workflow engines). Kick off a process to spawn your network, and keep track of the current progress, such as VMs created, VMs in progress, etc. If there are errors that demand a rollback, then have another process that performs a cleanup. The behavior of the cleanup process itself could fail, so it might retry its various steps a couple times before generating a report that says "this cleanup step failed".
If there are shared resources involved then you would need to implement some kind of isolation mechanism as well. Sometimes this is easy enough--e.g., DHCP helps you avoid duplicate IPs. If you're updating a DNS zone file then you'd want to synchronize access to that to avoid concurrent writes. Etc.

Decision to go for distributed application?

I have a legacy product in financial domain.Using tomcat 6. We get millions of request 10k of request in hour. I am wondering at high level
should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
The reason i am planning to go in this direction so that load is distribute and get better performance With this it becomes scalable also.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
If some expert can help
what is the criteria i should consider to go for distributed model and pros/cons of it? I also tried googling where i could get some stats
like how much load a given webserver (tomcat in my case)handle efiiciently with given hardware(16 gb ram, windows 7, processor ).
Yes i am going
to do POC where i will be measuring performance with distributed model vs without bit high level input will be highly appreciated?

It is impossible to answer this questions without more details - how long does it take to reply to one request on the current server? How many resources are allocated for one request?
having 10k requests per hour means ~3 requests per second. If performing the necessary operations and replying to a request, using 1 CPU takes ~300ms - one simple machine is totally fine. This is simple math, and doesn't always work. I guess you still have peaks within those 10k requests per hour and they aren't gradually distributed.
If we assume, one reply can take up to 1 second, than you can handle as many replies per second as your system has CPUs (given that a CPU would be the bottle neck) If the CPU isn't the bottle neck for your application server, there's probably something wrong. You should set up the database(s) on a different machine and only perform computation tasks on the application server machine.
Especially in the financial sector with a legacy software, I wouldn't try splitting a running product. How old is the current server? I believe that a new Server should be cheaper than rewriting an application. Unless you expect 50-100k requests per hour very soon, I don't think, splitting up such small parts makes sense.
Instead - run it on an up to date server hardware, split application server and data storage and you should be fine.

I am wondering at high level if should i go for ditributed application where my mvc component is on one system and service/dao on another box(can use spring remote/EJB).
I'm not sure what you mean for "system" in this context, but if it means that you are planning to run your application in two servers,
one dedicated to presentation and other dedicated to business layer, take in mind that a simpler approach (and probably more suitable for your app)
is build a co-located architecture.
Basically, the idea is to replicate your app in several servers (at least two) and put in front of them a load balancer that routes the incoming requests among the available servers.
All servers share the same database instance. This will give you vertical scalability and also will improve the availability of your system.
I only see the positive side of it but somehow not able to figure out what can be the negative aspect of it?
Distributing your business logic will probably involve a refactor of your application code, if the system is working well you will add some bugs for sure.
The necessary remote calls will add latency and the fact that you execute your business logic in several servers doesn't resolve the performance problems on the presentation tier.
In Expert One-on-One J2EE Development Without EJB (pag. 65), you can find a good reading about why not distribute your business logic.

in HTTP mode, does node.js have substantial performance advantage over Java?

I just started to code in node.js for a little while. Now here is one of my questions about it:
In HTTP apps, given the request-response model, the single app thread is blocked until all the back end tasks are done and response is returned to the client, so the performance improvement seems to be limited only to fine-tuning back end things like parallelizing IO requests. (Well, this improvement matters when it comes to many heavy and independent IO operations being involved, but usually the condition also implies that by redesigning the data structure you could eliminate a large number of IO request and, possibly, end up with even better performance than just issuing parallelized operations.)
If that is true, how could it produce superior performance than those frameworks based on Java (or PHP, python, etc.) do?
I also referred to an article Understanding the node.js event loop, which also explains that situation:
It really is a single thread running: you can’t do any parallel code
execution; doing a “sleep” for example will block the server for one
second:
while(new Date().getTime() < now + 1000) {
// do nothing
}
…however, everything runs in parallel except your code.
I personally verified that by putting exact the "sleep" code into one IO callback closure, and tried submitting a request leading to this callback, then submitted another one. Both requests will trigger a console log when it is processed. And my observation is that the later was blocked until the former returned a response.
So, does it imply that only in socket mode, where both sides can emit events and push messages to each other at any time, would the full power of its asynchronous processing capability be utilized?
I'm a little confused about that. Any comment or advice is welcome. Thanks!
update
I ask this question because some performance evaluation cases are
reported, for instance Node.js is taking over the Enterprise –
whether you like it or
not,
and LinkedIn Moved from Rails to Node: 27 Servers Cut and Up to 20x
Faster.
Some radical opinion claims that J2EE will be totally replaced: J2EE
is Dead: Long-live Javascript Backed by JSON
Services.

NodeJS uses libuv, so IO operations are non-blocking. Yes, your Node app uses 1 thread, however, all the IO requests are pushed to an event queue. Then when the request is made, it is obvious that its response will not be read from socket, file etc. at zero-time. So, whatever is ready in the queue is popped and it is handled. In the mean time, your requests can be answered, there might be chunks or full data to be read, however they are just waiting in the queue to be processed. This goes on until there is no event remains, or the open sockets are closed. Then the NodeJS can finally end its execution.
As you see, NodeJS is not like other frameworks, pretty different. If you have a long going and Non-IO operation, so it is blocking, like matrix operations, image&video processing, you can spawn another processes and assign them the job, use message passing, the way you like TCP, IPC.
The main point of NodeJS is to remove unncesseary context switches which brings significant overhead when not used properly. In NodeJS, why would you want context switches? All the jobs are pushed to event queue and they are probably small in computation, since all they do to make multiple IO/s, (read from db, update db, write to client, write to bare TCP socket, read from cache), it is not logical to stop them in the middle and switch to another job. So with the help of libuv, whichever IO is ready can be executed right now.
For reference please look at libuv documentation: http://nikhilm.github.io/uvbook/basics.html#event-loops

I have also noticed a lot of radical opinions regarding Node.js performance when compared to Java. From a queuing theory perspective, I was skeptical how a single thread with no blocking could out perform multiple threads that blocked. I thought that I would conduct my own investigation into just how well Node.js performed against a more established and mature technology.
I evaluated Node.js by writing a functionally identical, multiple datasource micro-service both in Node.js and in DropWizard / Java then subjected both implementions to the same load test. I collected performance measurements of the results from both tests and analyzed the data.
At one fifth the code size, Node.js had comparable latency and 16% lower throughput than DropWizard.
I can see how Node.js has caught on with early stage start-up companies. It is easier to write micro-services very quickly in Node.js and get them running than it is with Java. As companies mature, their focus tends to shift from finding product / market fit to improving economies of scale. This might explain why more established companies prefer Java with its higher scalability.

As far as my experience(though brief) goes with node.js, i agree the performance of node.js server can not be compared with other webservers like tomcat etc as stated in node.js doc somewhere
It really is a single thread running: you can’t do any parallel code
execution; doing a “sleep” for example will block the server for one
second:
So we used it not as alternative to full fledged webserver like tomcat but just to distrubute some load from tomcat where we can take of single thread model. So it has to be trade-off somewhere
Also see http://www.sitepoint.com/node-js-is-the-new-black/ Thats the beautiful article about node.js

Prevent client from overloading server?

I have a Java servlet that's getting overloaded by client requests during peak hours. Some clients span concurrent requests. Sometimes the number of requests per second is just too great.
Should I implement application logic to restrict the number of request client can send per second? Does this need to be done on the application level?

The two most common ways of handling this are to turn away requests when the server is too busy, or handle each request slower.
Turning away requests is easy; just run a fixed number of instances. The OS may or may not queue up a few connection requests, but in general the users will simply fail to connect. A more graceful way of doing it is to have the service return an error code indicating the client should try again later.
Handling requests slower is a bit more work, because it requires separating the servlet handling the requests from the class doing the work in a different thread. You can have a larger number of servlets than worker bees. When a request comes in it accepts it, waits for a worker bee, grabs it and uses it, frees it, then returns the results.
The two can communicate through one of the classes in java.util.concurrent, like LinkedBlockingQueue or ThreadPoolExecutor. If you want to get really fancy, you can use something like a PriorityBlockingQueue to serve some customers before others.
Me, I would throw more hardware at it like Anon said ;)

Some solid answers here. I think more hardware is the way to go. Having too many clients or traffic is usually a good problem to have.
However, if you absolutely must throttle clients, there are some options.
The most scalable solutions that I've seen revolve around a distributed caching system, like Memcached, and using integers to keep counts.
Figure out a rate at which your system can handle traffic. Either overall, or per client. Then put a count into memcached that represents that rate. Each time you get a request, decrement the value. Periodically increment the counter to allow more traffic through.
For example, if you can handle 10 requests/second, put a count of 50 in every 5 seconds, up to a maximum of 50. That way you aren't refilling it all the time, but you can also handle a bit of bursting limited to a window. You will need to experiment to find a good refresh rate. The key for this counter can either be a global key, or based on user id if you need to restrict that way.
The nice thing about this system is that it works across an entire cluster AND the mechanism that refills the counters need not be in one of your current servers. You can dedicate a separate process for it. The loaded servers only need to check it and decrement it.
All that being said, I'd investigate other options first. Throttling your customers is usually a good way to annoy them. Most probably NOT the best idea. :)

I'm assuming you're not in a position to increase capacity (either via hardware or software), and you really just need to limit the externally-imposed load on your server.
Dealing with this from within your application should be avoided unless you have very special needs that are not met by the existing solutions out there, which operate at HTTP server level. A lot of thought has gone into this problem, so it's worth looking at existing solutions rather than implementing one yourself.
If you're using Tomcat, you can configure the maximum number of simultaneous requests allowed via the maxThreads and acceptCount settings. Read the introduction at http://tomcat.apache.org/tomcat-6.0-doc/config/http.html for more info on these.
For more advanced controls (like per-user restrictions), if you're proxying through Apache, you can use a variety of modules to help deal with the situation. A few modules to google for are limitipconn, mod_bw, and mod_cband. These are quite a bit harder to set up and understand than the basic controls that are probably offered by your appserver, so you may just want to stick with those.

Critically efficient server

I am developing a client-server based application for financial alerts, where the client can set a value as the alert for a chosen financial instrument , and when this value will be reached the monitoring server will somehow alert the client (email, sms ... not important) .The server will monitor updates that come from a data generator program. Now, the server has to be very efficient as it has to handle many clients (possible over 50-100.000 alerts ,with updates coming at 1,2 seconds) .I've written servers before , but never with such imposed performances and I'm simply afraid that a basic approach(like before) will just not do it . So how should I design the server ?, what kind of data structures are best suited ?..what about multithreading ?....in general what should I do (and what I should not do) to squeeze every drop of performance out of it ?
Thanks.

I've worked on servers like this before. They were all written in C (or fairly simple C++). But they were even higher performance -- handling 20K updates per second (all updates from most major stock exchanges).
We would focus on not copying memory around. We were very careful in what STL classes we used. As far as updates, each financial instrument would be an object, and any clients that wanted to hear about that instrument would subscribe to it (ie get added to a list).
The server was multi-threaded, but not heavily so -- maybe a thread handing incoming updates, one handling outgoing client updates, one handling client subscribe/release notifications (don't remember that part -- just remember it had fewer threads than I would have expected, but not just one).
EDIT: Oh, and before I forget, the number of financial transactions happening is growing at an exponential rate. That 20K/sec server was just barely keeping up and the architects were getting stressed about what to do next year. I hear all major financial firms are facing similar problems.

You might want to look into using a proven message queue system, as it sounds like this is basically what you are doing in your application.
Projects like Apache's ActiveMQ or RabbitMQ are already widely used and highly tuned, and should be able to support the type of load you are talking about outside of the box.

I would think that squeezing every drop of performance out of it is not what you want to do, as you really never want that server to be under load significant enough to take it out of a real-time response scenario.
Instead, I would use a separate machine to handle messaging clients, and let that main, critical server focus directly on processing input data in "real time" to watch for alert criteria.

Best advice is to design your server so that it scales horizontally.
This means distributing your input events to one or more servers (on the same or different machines), that individually decide whether they need to handle a particular message.
Will you be supporting 50,000 clients on day 1? Then that should be your focus: how easily can you define a single client's needs, and how many clients can you support on a single server?
Second-best advice is not to artificially constrain yourself. If you say "we can't afford to have more than one machine," then you've already set yourself up for failure.

Beware of any architecture that needs clustered application servers to get a reasonable degree of performance. London Stock Exchange had just such a problem recently when they pulled an existing Tandem-based system and replaced it with clustered .Net servers.
You will have a lot of trouble getting this type of performance from a single Java or .Net server - really you need to consider C or C++. A clustered architecture is much more error prone to build and deploy and harder to guarantee uptime from.
For really high volumes you need to think in terms of using asynchronous I/O for networking (i.e. poll(), select() and asynchronous writes or their Windows equivalents), possibly with a pool of worker threads. Read up about the C10K problem for some more insight into this.
There is a very mature C++ framework called ACE (Adaptive Communications Environment) which was designed for high volume server applications in telecommunications. It may be a good foundation for your product - it has support for quite a variety of concurrency models and deals with most of the nuts and bolts of synchronisation within the framework. You might find that the time spent learning how to drive this framework pays you back in less development and easier implementation and testing.

One Thread for the receiving of instrument updates which will process the update and put it in a BlockingQueue.
One Thread to take the update from the BlockingQueue and hand it off to the process that handles that instrument, or set of instruments. This process will need to serialize the events to an instrument so the customer will not receive notices out-of-order.
This process (Thread) will need to iterated through the list of customers registered to receive notification and create a list of customers who should be notified based on their criteria. The process should then hand off the list to another process that will notify the customer of the change.
The notification process should iterate through the list and send each notification event to another process that handles how the customer wants to be notified (email, etc.).
One of the problems will be that with 100,000 customers synchronizing access to the list of customers and their criteria to be monitored.

You should try to find a way to organize the alerts as a tree and be able to quickly decide what alerts can be triggered by an update.
For example let's assume that the alert is the level of a certain indicator. Said indicator can have a range of 0, n. I would groups the clients who want to be notified of the level of the said indicator in a sort of a binary tree. That way you can scale it properly (you can actually implement a subtree as a process on a different machine) and the number of matches required to find the proper subset of clients will always be logarithmic.

Probably the Apache Mina network application framework as well as Apache Camel for messages routing are the good start point. Also Kilim message-passing framework looks very promising.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.