Handling multiple requests efficiently in a REST api - java

I've built a REST api using Spring Boot that basically accepts two images via POST and performs image comparison on them. The api is invoked synchronously. I'm not using an external application server to host the service, rather I package it as a jar and run it.
#RequestMapping(method = RequestMethod.POST, value = "/arraytest")
public String compareTest(#RequestParam("query") MultipartFile queryFile,#RequestParam("test") MultipartFile testFile,RedirectAttributes redirectAttributes,Model model) throws IOException{
CoreDriver driver=new CoreDriver();
boolean imageResult=driver.initProcess(queryFile,testFile);
model.addAttribute("result",imageResult);
return "resultpage";
}
The service could be invoked in parallel across multiple machines and I would need my service to perform efficiently. I'm trying to understand how are parallel calls to a REST service handled?
When the request is sent to the service , does a single object of the service get created and same object get used in multiple threads to handle multiple requests?
A follow-up question would be whether if it is possible to improve the performance of a service on the perspective of handling requests rather than improving the performance of the service functionality.

Spring controllers (and most spring beans) are Singletons, i.e. there is a single instance in your application and it handles all requests.
Assuming this is not web sockets (and if you don't know what that means, it's probably not), servlet containers typically maintain a thread pool, and will take a currently unused thread from the pool and use it to handle the request.
You can tune this by, for example, changing some aspects of the thread pool (initial threads, max threads, etc...). This is the servlet container stuff (i.e. configuring tomcat/jetty/whatever you're using) not spring per se.
You can also tune other http aspects such as compression. This can usually be done via the container, but if I recall correctly spring offers a servlet filter that will do this.
The image library and image operations you perform will also matter. Many libraries convert the image into raw in memory in order to perform operations. This means a 3 meg jpg can take upwards of 100 megs of heap space. Implication of this is that you may need some kind of semaphore to limit concurrent image processing.
Best approach here is to experiment with different libraries and see what works best for your usecase. Hope this helps.

The controller will be singleton but there are ways to make the processing async. Like a thread pool or JMS. Also you can have multiple nodes. This way as long as you return a key and have a service for clients to poll to get the result later, you can scale out back end processing.
Besides you can cluster your app so there are more nodes to process. Also if possible cache results; if you get the same input and they have the same output for 30% or more of the requests.

Related

Limit memory or process or thread pool resources to a spring controller

I have Spring application with 3 Controllers, each supporting some set of use-cases, say
SmallController - supports 1 usecase
MediumController - supports 3 usecases
LargeController - Supports 20 usecases
The problem is, if I end up getting extensive amount of requests for the SmallController, say 1000 TPS, consuming 50-60% of my resources, will it end up starving my remaining 23 usecases?
If so, is there a way to Configure my Spring applications in such a way that my surge in requests sent to the SmallController does not allocate it resources like memory/threads etc. beyond a certain predefined value so that the MediumController and LargeController don't begin to starve?
Basically, if I have 100 Mbs of memory and lets say 100 threadpool limit,
Is it possible to prevent the SmallController from exceeding 50 Mbs of memory and say 40 Threads at max while the remaining resources are guaranteed for MediumController and LargeController?
And if at all there is no already existing tool for having such controlled use of resources, can someone suggest an approach that can be explored to get started for building one?
My suggestion is to introduce "throttling" (Rate Limit).
you can have your own implementation or you can use something like Buket4j
ex: https://www.baeldung.com/spring-bucket4j.
If you don't like to pollute the source code of controllers, you can do this at MVC interceptor level (cleaner solution , handle it in interceptor preHandle method ).
Java has no concept of amount-of-memory owned by a specific thread. The memory is owned by the whole process. Running a heap analyzer on a heap dump may allow you to attribute allocations to a specific thread or thread pool but that's an offline analysis that can't be cheaply performed at runtime.
So if you want to partition resources you should start up multiple applications and set resource limits for each.
Memory is at the runtime level, not the controller level. Let's backup. You want to manage resources so that the app is still responsive
Doing this in the app level is going to give you head-aches even if possible. It really sounds like you want to implement an API usage plan. You can then throttle and or reject requests which overload your system to keep it responsive. Hopefully you have this available in one flavour or another (AWS APIGW, Kong etc.)
Otherwise you may want to consider deploying your app with different profiles so that controllers run on different boxes to isolate failures and keep the app responsive, or breaking it up into separate micro-services all together. This should yield better performance and give you the ability to scale out the separate parts of the app.
I know those answers assume that you have those options available, hopefully you do.
If you want to control the no of permits in your SmallController then you can try using Semaphore. Please note this is just a suggestion, you can explore options if this doesn't fit the requirement.
#RestController
public class SmallController{
private Semaphore semaphore = new Semaphore(10);//no of calls allowed
#GetMapping("/someaction")
public ResponseEntity<T> action(){
semaphore.acquire(); // this will block the request until permit is available.
try{
//resource intensive operation
}finally{
semaphore.release();
}
// return response entity object
}
}
you can usesemaphore.tryAcquire() if you want non blocking semaphore. You can check if it returns false then respond back saying 'resource busy'

How Spring MVC controller handle multiple long http requests?

As I found, controllers in String are singletones Are Spring MVC Controllers Singletons?
The question is, how Spring handles multiple long time consuming requests, to the same mapping? For example, when we want to return a model which require long calculations or connection to other server- and there are a lot of users which are sending request to the same url?
Async threads I assume- are not a solution, because method need to end before next request will be maintained? Or not..?
Requests are handled using a thread-pool (Container-managed), so each request has an independent context, it does not matter whether if the Controller is Singleton or not.
One important thing is that Singleton instances MUST NOT share state between requests to avoid unexpected behaviours or race conditions.
The thread-pool capacity will define the number of requests the server could handle in a sync model.
If you want an async approach you coud use many options like:
Having a independent thread pool that processes tasks from container threads, or
Use a queue to push tasks and use an scheduler process tasks, or
Use Websockets to make requests and use (1) or (2) for processing and then receive the notification when done.

How to optimize Tomcat for Feed pull

We have a mobile app which presents feed to users. The feed REST API is implemented on tomcat, which parallel makes calls to different data sources such as Couchbase, MYSQL to present the content. The simple code is given below:
Future<List<CardDTO>> pnrFuture = null;
Future<List<CardDTO>> newsFuture = null;
ExecutionContext ec = ExecutionContexts.fromExecutorService(executor);
final List<CardDTO> combinedDTOs = new ArrayList<CardDTO>();
// Array list of futures
List<Future<List<CardDTO>>> futures = new ArrayList<Future<List<CardDTO>>>();
futures.add(future(new PNRFuture(pnrService, userId), ec));
futures.add(future(new NewsFuture(newsService, userId), ec));
futures.add(future(new SettingsFuture(userPreferenceManager, userId), ec));
Future<Iterable<List<CardDTO>>> futuresSequence = sequence(futures, ec);
// combine the cards
Future<List<CardDTO>> futureSum = futuresSequence.map(
new Mapper<Iterable<List<CardDTO>>, List<CardDTO>>() {
#Override
public List<CardDTO> apply(Iterable<List<CardDTO>> allDTOs) {
for (List<CardDTO> cardDTOs : allDTOs) {
if (cardDTOs != null) {
combinedDTOs.addAll(cardDTOs);
}
}
Collections.sort(combinedDTOs);
return combinedDTOs;
}
}
);
Await.result(futureSum, Duration.Inf());
return combinedDTOs;
Right now we have around 4-5 parallel tasks per request. But it is expected to grow to almost 20-25 parallel tasks as we introduce new kinds of items in feed.
My question is, how can I improve this design? What kind of tuning is required in Tomcat to make sure such 20-25 parallel calls can be served optimally under heavy load.
I understand this is a broad topic, but any suggestions would be very helpful.
Tomcat just manages the incoming HTTP connections and pushes the bytes back and forth. There is no Tomcat optimization that can be done to make your application run any better.
If you need 25 parallel processes to run for each incoming HTTP request, and you think that's crazy, then you need to re-think how your application works.
No tomcat configuration will help with what you've presented in your question.
I understand you are calling this from a mobile app and the number of feeds could go up.
based on the amount of data being returned, would it be possible to return the results of some feeds in the same call?
That way the server does the work.
You are in control of the server - you are not in control of the users device and their connection speed.
As nickebbit suggested, things like DefferedResult are really easy to implement.
is it possible that the data from these feeds would not be updated in a quick fashion? If so - you should investigate the use of EHCache and the #Cacheable annotation.
You could come up with a solution where the user is always pulling a cached version of your content from your tomcat server. But your tomcat server is constantly updating that cache in the background.
Its an extra piece of work - but at the end of the day if the user experience is not fast - users will not want to use this app
It looks like your using Akka but not really embracing the Actor model, doing so will likely increase the parallelism and therefore scalability of your app.
If it was me I'd hand requests off from my REST API to a single or pool of coordinating actors that will process the request asynchronously. Using Spring's RestController this can be done using a Callable or DeferredResult but there will obviously be an equivalent in whatever framework you are using.
This coordinating actor would then in turn hand off processing to other actors (i.e. workers) that take care of the I/O bound tasks (preferably using their own dispatcher to ensure other CPU bound threads do not get blocked) and respond to the coordinator with their results.
Once all workers have fetched their data and replied to the coordinator with the results then the original request can be completed with the full result set.

REST client multi threaded application

I am working on a Java application which takes SOAP requests on one end with 1 to 50 unique id's. I use the unique id's from the request to make a REST call and process the response and send back the processed data as a soap response. The performance will take a hit if I get all 50 unique id's, since I am calling the REST service 50 times sequentially.
My question is,
will I get performance benefits if I make my application multi-threaded, spawn new threads to make REST calls, when I get higher number of unique id's .
if so how should I design the multi-threading, use multiple threads to make rest calls only or also process the REST response data in multiple threads and merge the data after it is processed.
I searched for multithreaded implementation of Apache rest client but could not find one. Can any one point me in the right direction.
I'm using Apache Http client.
Thanks, in advance
It's most likely worth doing. Assuming you're getting multiple concurrent SOAP requests, your throughput won't improve, but your latency will.
You probably want to have a threadpool, so you have control over how many threads/REST calls you're doing at the same time. Create a ThreadPoolExecutor (you can use Executors.newFixedThreadPool or Executors.newCachedThreadPool); create a Callable task for constructing/processing each REST call, and then call ThreadPoolExecutor.invokeAll() with the list of the tasks. Then, iterate over the returned list and construct the SOAP response out of it.
See prior discussions on using Apache HTTP Client with multiple threads.

Queuing / Worker Thread architecture for a single java process

I have the following problem to solve.
I need to write a java program that:
reads JSON object j1,j2,...,jn from a web service.
does some number crunching on each object to come up with j1',j2',...,jn'
Sends objects j1',j2',...,jn' to a web service.
The computational, space requirements for steps 1,2, and 3 can vary at any given time.
For example:
The time it takes to process a JSON object at step 2 can vary depending on the contents of the JSON Object.
The rate of objects being produced by the webservice in step 1 can go up or down with time.
The consuming web service in step 3 can get backlogged.
To address the above design concerns want to implement the following architecture:
Read JSON objects from the external webservice and place them on a Q
An automatically size adjusting worker thread pool that consumes JSON objects from the Q and processes them. After processing them, places the resulting objects on the second Q
An automatically size adjusting worker thread pool that consumes JSON objects from the second Q to send them to the consuming webservice.
Question:
I am curious if there is framework which I can use to solve this problem?
Notes:
I can solve this using a range of components like custom Queues, Threadpools using the concurrency package -- however I'm looking for a solution that allows the writing of such a solution.
This is not going to live inside a container. This will be a Java process where the entry point is public static void main(String args[])
However if there is a container suited to this paradigm I would like to learn about it.
I could split this into multiple processes, however I'd like to keep it very simple and in a single process.
Thanks.
Thanks.
try Apache camel or Spring Integration to wire things up. These are kind of integration frameworks, will ease your interaction with webservices. What you need to do is define a route from webservice 1 -> number cruncher -> web service 2. routing and conversion required in between can be handled by the framework itself
you'd implement your cruncher as a camel processor.
parallelizing your cruncher may be achieved via SEDA; Camel has a component for this pattern. Another alternate would be AsyncProcessor
I'd say you first take a look at the principles behind frameworks like camel. The abstractions they create are very relevant for the problem in hand.
I'm not exactly sure what the end question is for your post, but you have a reasonable design concept. One question I have for you is what environment are you in? Are you in a JavaEE container or just a simple standalone application?
If you are in a container, it would make more sense to have Message Driven Beans processing off of the JMS queues than having a pool of worker threads.
If in your own container, it would make more sense for you to manage the thread pool yourself. With that said, I would also consider having separate applications running that pull the work off of the queues which would lead to a better scaling architecture for you. If the need ever came up, you could add more machines with more workers pointing at the one queue.

Categories

Resources