So if AsyncContext::complete closes the response and I need to write the response within the asynchronous context, how do I implement a multi-step response in which some steps are blocking with non-blocking sections in-between them?
You seem to be operating under a misapprehension about the nature of an AsyncContext and the semantics of ServletRequest::startAsync. This method (re)initializes an AsyncContext for the request and associated response, creating one first if necessary, and associates it with the request / response pair. This puts the request into asynchronous mode, which, at its core, means nothing more than that the container will not consider request processing complete until the provided context's complete() method is invoked.
In particular, creating an async context does not create any threads or assign the associated request to a different thread, and the methods of an AsyncContext run on the thread that invokes them (though that's kinda a technicality for AsyncContext::start). The context is primarily an object for whatever asynchronous code you provide to use for interacting with the container, which it otherwise could not safely do. To actually perform processing on some other thread, you need to arrange for that thread to exist, and for the work to be assigned to it. AsyncContext::start is a convenient way to do that, but not the only way.
With respect specifically to
how do I implement a multi-step response in which some steps are blocking with non-blocking sections in-between them?
, the basic answer is "however you want". The AsyncContext neither hinders nor particularly helps you because it's about communication with the container, not about workflow. In particular, I see no need or special use for nested AsyncContexts.
I think you're describing a processing pipeline with certain, limited parallelization. You might implement that, say, by running the overall workflow -- all the "blocking" steps, I guess -- in a thread launched via AsyncContext::start, and dispatching the other work to a thread pool, in whatever units make sense. Do be aware, however, that the request and response objects are not thread-safe. Ideally, then, the primary thread will extract all the needed data from the request, and perform all needed writes to the response.
Alternatively, maybe you use the regular request processing thread for the main workflow, dispatch pieces of work to a thread pool as appropriate, and skip the AsyncContext bit altogether. It is not necessary in any absolute sense to use an AsyncContext to perform asynchronous computations in a web application -- it's purpose and the processing models it is designed to support are rather a lot more specific.
Related
I have a api(GET REST) developed using java & Quarkus. I want to understand the default mechanism on how this api handle the multiple requests to same api. Is there any queuing mechanism used by default.? is there any multithreading used by default?
Please help to understand this.
Quarkus became popular for it's optimization of resources and benchmarks on heavy loaded systems. It will by default use 2 kind of different threads.
I/O threads or otherwise called event-loop threads
Worker threads
I/O threads or otherwise called event-loop threads. Those threads are responsible, among other things, for reading bytes from the HTTP request and writing bytes back to the HTTP response. The important here is that those threads are usually not blocked at all. You can see a simple of the functionality of those threads as illustrated in the following picture
The number of those I/o threads as described in documentation
The number if IO threads used to perform IO. This will be
automatically set to a reasonable value based on the number of CPU
cores if it is not provided. If this is set to a higher value than the
number of Vert.x event loops then it will be capped at the number of
event loops. In general this should be controlled by setting
quarkus.vertx.event-loops-pool-size, this setting should only be used
if you want to limit the number of HTTP io threads to a smaller number
than the total number of IO threads.
Worker threads. Here a pool of threads again are maintained by the system and the system assigns a worker thread to a execute some scheduled work on some request. Then this thread can be used from another thread to execute some other task. These threads normally take over long running tasks or blocking code.
The default number of these type of threads are 20 if not otherwise configured as indicated by documentation
So to sum up a request in Quarkus will be executed either by some I/O thread or some Worker thread and those threads will be shared between other requests too. An I/O thread will normally take over non blocking tasks that do not take long to be executed. A Worker thread will normally take over blocking tasks and long running processes.
Taking in consideration the above it makes sense that Quarkus will have configured much more Worker threads in the worker thread pool than I/O threads in the i/o thread pool.
What is very important to take from the above information is the following:
A worker thread will serve a specific request (ex request1) and if during this serve it get's blocked to do some I/O operation it will continue to wait for the I/O in order to complete the request it serves. When this request is finished the thread is able to move on and serve some other request (ex request2).
An I/O thread or event-loop thread will serve a specific request (ex request1) and if during this serve it get's blocked for some I/O operation which is needed for this request, it will pause this request, and continue to serve another request (ex request2). When the I/O of first request is completed the thread will return according to some algorithm that schedules the job again to request1 to continue from where it was left.
Now someone may question what is the case then, since usually every request requires some type of I/O operation then how can someone use I/O thread to have better performance. In that case the programmer has 2 choices when he declares the controller of quarkus to use I/O thread:
Spawn manually inside the controller method which is declared to be I/O some other thread to do the blocking code block work while the outer thread that serves the request is of type I/O (read http request data, write http response). The manual thread can be of type worker inside some service layer. This is a bit complicated approach.
Use some external library for I/O operations that's expected to work with the same approach that I/O threads work in quarkus. For example for database operations the I/O could be operated by the library hibernate-reactive. This way full benefits of I/O approach can be achieved.
Some side notes
Considering that we are in the java ecosystem it will be very useful to also mention that the above architecture and efficiency of resources is similar (not exactly same) with Spring Reactive (Web Flux).
But quarkus is based on Jax-Rs and will by default provide this architecture of efficient use of resources, independently of whether you write reactive code or not. When using Spring Boot however in order to have a similar efficiency with quarkus you have to use Spring Reactive (Web Flux).
In case you use the basic spring boot web, the architecture used will be of a single thread per incoming request. A specific thread in this case is not able to switch between different threads. It will need to complete some request in order to handle the next request.
Also in quarkus making a controller method execute from an I/O thread is as simple as placing an annotation #NonBlocking in that method. The same for an endpoint method that needs to be executed from a worker thread with #Blocking.
In Spring boot however switching from those 2 type of threads may mean switching from spring-boot-web to spring-boot-webflux and vice versa. Spring-boot-web has some support however now with servlet-3 to optimize it's approach, article with such optimization, but this requires some programming optimization and not an out of the box functionality.
To give some context here, I have been following Project Loom for some time now. I have read The state of Loom. I have done asynchronous programming.
Asynchronous programming (provided by Java NIO) returns the thread to the thread pool when the task waits and it goes to great lengths to not block threads. And this gives a large performance gain, we can now handle many more request as they are not directly bound by the number of OS threads. But what we lose here, is the context. The same task is now NOT associated with just one thread. All the context is lost once we dissociate tasks from threads. Exception traces do not provide very useful information and debugging is difficult.
In comes Project Loom with virtual threads that become the single unit of concurrency. And now you can perform a single task on a single virtual thread.
It's all fine until now, but the article goes on to state, with Project Loom:
A simple, synchronous web server will be able to handle many more requests without requiring more hardware.
I don't understand how we get performance benefits with Project Loom over asynchronous APIs? The asynchrounous API:s make sure to not keep any thread idle. So, what does Project Loom do to make it more efficient and performant that asynchronous API:s?
EDIT
Let me re-phrase the question. Let's say we have an http server that takes in requests and does some crud operations with a backing persistent database. Say, this http server handles a lot of requests - 100K RPM. Two ways of implementing this:
The HTTP server has a dedicated pool of threads. When a request comes in, a thread carries the task up until it reaches the DB, wherein the task has to wait for the response from DB. At this point, the thread is returned to the thread pool and goes on to do the other tasks. When DB responds, it is again handled by some thread from the thread pool and it returns an HTTP response.
The HTTP server just spawns virtual threads for every request. If there is an IO, the virtual thread just waits for the task to complete. And then returns the HTTP Response. Basically, there is no pooling business going on for the virtual threads.
Given that the hardware and the throughput remain the same, would any one solution fare better than the other in terms of response times or handling more throughput?
My guess is that there would not be any difference w.r.t performance.
We don't get benefit over asynchronous API. What we potentially will get is performance similar to asynchronous, but with synchronous code.
The answer by #talex puts it crisply. Adding further to it.
Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.
With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.
With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.
When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.
We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.
Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.
This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.
When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?
One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.
This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.
With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.
The http server has a dedicated pool of threads ....
How big of a pool? (Number of CPUs)*N + C? N>1 one can fall back to anti-scaling, as lock contention extends latency; where as N=1 can under-utilize available bandwidth. There is a good analysis here.
The http server just spawns...
That would be a very naive implementation of this concept. A more realistic one would strive for collecting from a dynamic pool which kept one real thread for every blocked system call + one for every real CPU. At least that is what the folks behind Go came up with.
The crux is to keep the {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} from fighting over internal resources; thus they do not lean on system based blocking mechanisms until absolutely necessary This falls under the banner of lock avoidance, and might be accomplished with various queuing strategies (see libdispatch), etc.. Note that this leaves the PEA divorced from the underlying system thread, because they are internally multiplexed between them. This is your concern about divorcing the concepts. In practice, you pass around your favourite languages abstraction of a context pointer.
As 1 indicates, there are tangible results that can be directly linked to this approach; and a few intangibles. Locking is easy -- you just make one big lock around your transactions and you are good to go. That doesn't scale; but fine-grained locking is hard. Hard to get working, hard to choose the fineness of the grain. When to use { locks, CVs, semaphores, barriers, ... } are obvious in textbook examples; a little less so in deeply nested logic. Lock avoidance makes that, for the most part, go away, and be limited to contended leaf components like malloc().
I maintain some skepticism, as the research typically shows a poorly scaled system, which is transformed into a lock avoidance model, then shown to be better. I have yet to see one which unleashes some experienced developers to analyze the synchronization behavior of the system, transform it for scalability, then measure the result. But, even if that were a win experienced developers are a rare(ish) and expensive commodity; the heart of scalability is really financial.
I have a scenario where I’m attempting to serve data in a non-blocking fashion which is sourced by a RxJava Observable (also non-blocking). I’m using the WriteListener callback provided by ServletOutputStream. I’m running into an issue where the write is throwing an IllegalStateException (java.lang.IllegalStateException: UT010035: Stream in async mode was not ready for IO operation) immediately after a successful isReady() check on the ServletOutputStream.
While looking deeper, I noticed this comment in the Undertow implementation of ServletOutputStream:
Once the write listener has been set operations must only be invoked on this stream from the write listener callback. Attempting to invoke from a different thread will result in an IllegalStateException.
Given that my data source is asynchronous there are scenarios where the onWritePossible() callback will reach a state where there no data immediately available and I would need to wait for more to be received from the source. In these cases, I would need to interact with the stream from the callback of my data source which is going to be a different thread. The only other option would be to suspend the thread used to call onWritePossible() and wait for more data to arrive, but this would be a blocking operation which defeats the whole purpose.
Is there another approach that I’m missing? The single thread requirement of Undertow doesn’t seem to be required by the Servlet 3.1 spec. From what I’ve read, other implementations appear to tolerate the multi-threaded approach given that the application coordinates the stream access synchronization.
Here are two links which seem to be contradicting each other. I'd sooner trust the docs:
Link 1
Request processing on the server works by default in a synchronous processing mode
Link 2
It already is multithreaded.
My question:
Which is correct. Can it be both synchronous and multithreaded?
Why do the docs say the following?:
in cases where a resource method execution is known to take a long time to compute the result, server-side asynchronous processing model should be used
If the docs are correct, why is the default action synchronous? All requests are asynchronous on client-side javascript by default for user experience, it would make sense then that the default action for server-side should also be asynchronous too.
If the client does not need to serve requests in a specific order, then who cares how "EXPENSIVE" the operation is. Shouldn't all operations simply be asynchronous?
Request processing on the server works by default in a synchronous processing mode
Each request is processed on a separate thread. The request is considered synchronous because that request holds up the thread until the request is finished processing.
It already is multithreaded.
Yes, the server (container) is multi-threaded. For each request that comes in, a thread is taken from the thread pool, and the request is tied to the particular request.
in cases where a resource method execution is known to take a long time to compute the result, server-side asynchronous processing model should be used
Yes, so that we don't hold up the container thread. There are only so many threads in the container thread pool to handle requests. If we are holding them all up with long processing requests, then the container may run out of threads, blocking other requests from coming in. In asynchronous processing, Jersey hands the thread back to the container, and handle the request processing itself in its own thread pool, until the process is complete, then send the response up to the container, where it can send it back to the client.
If the client does not need to serve requests in a specific order, then who cares how "EXPENSIVE" the operation is.
Not really sure what the client has to do with anything here. Or at least in the context of how you're asking the question. Sorry.
Shouldn't all operations simply be asynchronous?
Not necessarily, if all the requests are quick. Though you could make an argument for it, but that would require performance testing, and numbers you can put up against each other and make a decision from there. Every system is different.
Even after reading http://krondo.com/?p=1209 or Does an asynchronous call always create/call a new thread? I am still confused about how to provide asynchronous calls on an inherently single-threaded system. I will explain my understanding so far and point out my doubts.
One of the examples I read was describing a TCP server providing asynch processing of requests - a user would call a method e.g. get(Callback c) and the callback would be invoked some time later. Now, my first issue here - we have already two systems, one server and one client. This is not what I mean, cause in fact we have two threads at least - one in the server and one on the client side.
The other example I read was JavaScript, as this is the most prominent example of single-threaded asynch system with Node.js. What I cannot get through my head, maybe thinking in Java terms, is this:If I execute the code below (apologies for incorrect, probably atrocious syntax):
function foo(){
read_file(FIle location, Callback c) //asynchronous call, does not block
//do many things more here, potentially for hours
}
the call to read file executes (sth) and returns, allowing the rest of my function to execute. Since there is only one thread i.e. the one that is executing my function, how on earth the same thread (the one and only one which is executing my stuff) will ever get to read in the bytes from disk?
Basically, it seems to me I am missing some underlying mechanism that is acting like round-robin scheduler of some sort, which is inherently single-threaded and might split the tasks to smaller ones or call into a multiothraded components that would spawn a thread and read the file in.
Thanks in advance for all comments and pointing out my mistakes on the way.
Update: Thanks for all responses. Further good sources that helped me out with this are here:
http://www.html5rocks.com/en/tutorials/async/deferred/
http://lostechies.com/johnteague/2012/11/30/node-js-must-know-concepts-asynchrounous/
http://www.interact-sw.co.uk/iangblog/2004/09/23/threadless (.NET)
http://ejohn.org/blog/how-javascript-timers-work/ (intrinsics of timers)
http://www.mobl-lang.org/283/reducing-the-pain-synchronous-asynchronous-programming/
The real answer is that it depends on what you mean by "single thread".
There are two approaches to multitasking: cooperative and interrupt-driven. Cooperative, which is what the other StackOverflow item you cited describes, requires that routines explicitly relinquish ownership of the processor so it can do other things. Event-driven systems are often designed this way. The advantage is that it's a lot easier to administer and avoids most of the risks of conflicting access to data since only one chunk of your code is ever executing at any one time. The disadvantage is that, because only one thing is being done at a time, everything has to either be designed to execute fairly quickly or be broken up into chunks that to so (via explicit pauses like a yield() call), or the system will appear to freeze until that event has been fully processed.
The other approach -- threads or processes -- actively takes the processor away from running chunks of code, pausing them while something else is done. This is much more complicated to implement, and requires more care in coding since you now have the risk of simultaneous access to shared data structures, but is much more powerful and -- done right -- much more robust and responsive.
Yes, there is indeed a scheduler involved in either case. In the former version the scheduler is just spinning until an event arrives (delivered from the operating system and/or runtime environment, which is implicitly another thread or process) and dispatches that event before handling the next to arrive.
The way I think of it in JavaScript is that there is a Queue which holds events. In the old Java producer/consumer parlance, there is a single consumer thread pulling stuff off this queue and executing every function registered to receive the current event. Events such as asynchronous calls (AJAX requests completing), timeouts or mouse events get pushed on to the Queue as soon as they happen. The single "consumer" thread pulls them off the queue and locates any interested functions and then executes them, it cannot get to the next Event until it has finished invoking all the functions registered on the current one. Thus if you have a handler that never completes, the Queue just fills up - it is said to be "blocked".
The system has more than one thread (it has at least one producer and a consumer) since something generates the events to go on the queue, but as the author of the event handlers you need to be aware that events are processed in a single thread, if you go into a tight loop, you will lock up the only consumer thread and make the system unresponsive.
So in your example :
function foo(){
read_file(location, function(fileContents) {
// called with the fileContents when file is read
}
//do many things more here, potentially for hours
}
If you do as your comments says and execute potentially for hours - the callback which handles fileContents will not fire for hours even though the file has been read. As soon as you hit the last } of foo() the consumer thread is done with this event and can process the next one where it will execute the registered callback with the file contents.
HTH