Java Virtual Threads
In Java 19 were introduced Virtual Threads JEP-425 as a preview feature
After some investigation of concept of Java Virtual Threads (Project Loom), called sometimes lightweight threads (or sometimes fibers or green threads), I'm pretty interested in potential usage of them with reactive libraries, like, for example, with Spring WebFlux based on Project Reactor (reactive streams implementation) and Netty, for making blocking calls efficiently.
Most JVM implementations today implement Java threads as thin direct wrappers around operating system threads, called sometimes heavyweight, OS-managed threads platform threads.
While a platform thread can only execute a single thread at a time, the virtual threads have the ability to switch to executing a different virtual thread when the currently executed virtual thread makes a blocking call (e.g. network, file system, database call).
How do we deal with blocking calls in Reactor?
So, when dealing with blocking calls in Reactor we use the following construct:
Mono.fromCallable(() -> {
return blockingOperation();
}).subscribeOn(Schedulers.boundedElastic());
In subcribeOn() we provide a Scheduler that creates a dedicated thread for executing that blocking operation. However, it means that thread will eventually be blocked, so, as we are still on the old-fashioned threading model, we'll actually block the platform thread, which is still not really efficient way of dealing with CPU resources.
Here is the question:
So, the question is, could we use the virtual threads with reactive frameworks directly for making blocking calls like this, using, for example, Executors.newVirtualThreadPerTaskExecutor() :
Creates an Executor that starts a new virtual Thread for each task.
The number of threads created by the Executor is unbounded.
Mono.fromCallable(() -> {
return blockingOperation();
}).subscribeOn(Schedulers.fromExecutorService(Executors.newVirtualThreadPerTaskExecutor()));
Would it work out of the box? Will we actually get the benefits from this approach, in terms of dealing with our CPU resources more efficiently and increase performance of our application? Does it mean that we can easily integrate reactive library with any blocking library/framework, for example, Spring Data JPA (which is based on JDBC) and millions of others and magically turn them to non-blocking?
Related
Heavy CPU bound task could block the thread and delay other tasks waiting execution. That's because JVM can't interrupt running thread and require help from programmer and manual interruption.
So writing CPU bound tasks in Java/Kotlin requires manual intervention to make things run smoothly, like using Sequence in Kotlin in code below.
fun simple(): Sequence<Int> = sequence { // sequence builder
for (i in 1..3) {
Thread.sleep(100) // pretend we are computing it
yield(i) // yield next value
}
}
fun main() {
simple().forEach { value -> println(value) }
}
As far as I understood the reason is that having preemptive scheduler with the ability to interrupt running threads have performance overhead.
But wouldn't it be better to have a switch, so you can choose? If you would like to run JVM with faster non-preemptive scheduler. Or with slower pre-emtpive (interrupting and switching the tread after N instructions) one but able to run things smoothly and don't require manual labor to do that?
I wonder why Java/Kotlin doesn't have such JVM switch that would allow to choose what mode you would like.
When you program using Kotlin coroutines or Java virtual threads (after Loom), you get preemptive scheduling from the OS.
Following usual practices, tasks that are not blocked (i.e., they need CPU) are multiplexed over real OS threads in the Kotlin default dispatcher or Java ForkJoinPool. Those OS threads are scheduled preemptively by the OS.
Unlike old-style multithreading, however, tasks are not assigned to a thread when they are blocked waiting for I/O. This makes no difference in terms of preemption, since a task that is waiting for I/O couldn't possibly preempt another running task anyway.
What you don't get when programming with coroutines, is preemptive scheduling over a large number of tasks simultaneously. If you have many tasks that require the CPU, then the first N will be assigned to a real thread and the OS will time slice them. The remaining ones will wait in the queue until those ones are done.
But in real life, when you have 10000 tasks that need to be simultaneously interactive, they are I/O bound tasks. On average, there aren't many that require the CPU at any one time, so the number of real threads you get from the default dispatcher or ForkJoinPool is plenty. In normal operation, the queue of tasks waiting for threads is almost always empty.
If you really had a situation where 10000 CPU-bound tasks needed to be simultaneously interactive, well, then you would be sad anyway, because time slicing would not provide a very smooth experience.
This question is based on a false premise: In the JVM, a preemptive scheduler is your only choice. No modern JVM uses co-operative multitasking.
No modern JVM implements user space threads or a scheduler of its own. JVMs use native operating system threads instead. Native threads are scheduled by the operating system, and operating system schedulers are preemptive.
The fact that JVM threads map 1-to-1 to native operating system threads is a problem for applications that need a high level of concurrency. Threads are relatively scarce and expensive. To address this, Project Loom is investigating adding "virtual threads" that may allow using native threads more sparingly, especially for I/O bound tasks.
Project Loom is under active development and there is no set schedule of when it will become a part of standard Java. Regarding how Project Loom schedules "virtual threads", the latest (May 2020) update from Project Loom claims "virtual threads are preemptive, not cooperative" but then goes on to say "none of the schedulers in the JDK currently employs time-slice-based preemption of virtual threads". It sounds like in its current state the "virtual thread" scheduler in Project Loom is somewhere between fully co-operative and fully pre-emptive. It will be interesting to see how the project develops and what we will get when it is integrated into main stream Java.
In a July 28 Q&A the Loom project lead Ron Pressler mentioned that you will be able to plug in a scheduler of your own for virtual thread, but did not go into details of how much control you get over the scheduling algorithm.
I am trying to understand the core principles of non-blocking programming (and frameworks like project reactor). The main idea is to have "thread pool" with determined number of threads (executors) and tasks which are executed there. We should not have any blocked threads. In "user code" we just run something to execute and leave callback (what to do with the result). Out "user" thread is not blocked, right. But what if my task depends on some jdbc query. My task will request this query and then will be blocked waiting for the result, right? So, this thread is blocked.
But we avoid thread creating (which is expensive). Is it the core benefit of this style?
If my thread pool consists of 2 executors and both are blocked waiting for something, other tasks will not be executed, right? How to avoid it? Create more than 2 threads?
Threads are relatively costly system resources. For example, each thread needs memory for the call stack. How much this is depends on the operating system, but typically it's something like 1 or 2 MB. This means it's not a good idea to start thousands of threads - you'd waste 1 or 2 GB memory just on the call stacks of 1000 threads.
So, to do things more efficiently you want to limit the number of threads, for example using a thread pool to handle work. The thread pool makes it possible to manage the number of threads that are being used.
However, imagine that you'd have a thread pool with 10 threads, and then 10 requests come in. Each of your threads will be reserved to handle a request. While they are busy, you can't handle request #11 because there is no thread free. When you are using blocking I/O, then, even though all your 10 threads are doing nothing (waiting for I/O to complete), request #11 cannot be handled...
When you use non-blocking I/O, threads will never need to wait for I/O - so when the handling request #3 is suspended because it needs the result of an I/O operation, the thread that was handling it can temporarily switch to handling other requests.
So, with non-blocking I/O, you never have waiting threads and you are using system resources more efficiently.
This will only work if you are using non-blocking I/O from the front to the back of your system. If at the back-end you are using JDBC, which is a blocking API, then you'll loose the full benefit of non-blocking I/O.
Therefore, if you have a database at the back-end, this works best if you have a DB which supports non-blocking I/O. Some NoSQL databases like MongoDB support this, and for some relational databases there are special drivers / APIs available that support this. You won't be using JDBC in that case, because JDBC is an inherently blocking API.
Oracle is working on a new API for relational databases tentatively called
ADBA which will allow you to do non-blocking / async I/O with relational databases but it's not ready yet.
Project Reactor is an implementation of Reactive Streams specification. The specification overview can be found at ReactiveManifest. It's not just creating a set of threads and letting them do their jobs, It's the framework or the runtime (in this case ProjectReactor) that will organize your code in such a way that it'll presumably behave as nonblocking. Also, the whole system implementation has to be in this fashion otherwise you won't be benefited from the reactive streams.
If my thread pool consists of 2 executors and both are blocked waiting for something, other tasks will not be executed, right? How to avoid it? Create more than 2 threads?
The answer to this will be yes, and no. The framework may are may not create threads. Since the code will be interleaved among the threads, Since the non-blocking system are event-driven including the low-level operations (ex, libuv I/O), It's not necessary for a thread to wait for the completion of an I/O operation. Meanwhile, the thread will be executing something meaningful. The completion of the task will be notified and the dependent code can be executed by any of the available thread. The goal of such a system is to utilize the CPU to the fullest with limited resources(threads).
Taken from http://www.reactive-streams.org.
The main goal of Reactive Streams is to govern the exchange of stream data across an asynchronous boundary—think passing elements on to another thread or thread-pool—while ensuring that the receiving side is not forced to buffer arbitrary amounts of data. In other words, back pressure is an integral part of this model in order to allow the queues which mediate between threads to be bounded. The benefits of asynchronous processing would be negated if the communication of back pressure were synchronous (see also the Reactive Manifesto), therefore care has to be taken to mandate fully non-blocking and asynchronous behavior of all aspects of a Reactive Streams implementation.
It's the Reactor framework that enforces and help you in building a completely non-blocking system from the ground up.
What I understood from Vert.x documentation (and a little bit of coding in it) is that Vert.x is single threaded and executes events in the event pool. It doesn't wait for I/O or any network operation(s) rather than giving time to another event (which was not before in any Java multi-threaded framework).
But I couldn't understand following:
How single thread is better than multi-threaded? What if there are millions of incoming HTTP requests? Won't it be slower than other multi-threaded frameworks?
Verticles depend on CPU cores. As many CPU cores you have, you can have that many verticles running in parallel. How come a language that works on a virtual machine can make use of CPU as needed? As far as I know, the Java VM (JVM) is an application that uses just another OS process for (here my understanding is less about OS and JVM hence my question might be naive).
If a single threaded, non-blocking concept is so effective then why can't we have the same non-blocking concept in a multi-threaded environemnt? Won't it be faster? Or again, is it because CPU can execute one thread at a time?
What I understood from Vert.x documentation (and a little bit of coding in it) is that Vert.x is single threaded and executes events in the event pool.
It is event-driven, callback-based. It isn't single-threaded:
Instead of a single event loop, each Vertx instance maintains several event loops. By default we choose the number based on the number of available cores on the machine, but this can be overridden.
It doesn't wait for I/O or any network operation(s)
It uses non-blocking or asynchronous I/O, it isn't clear which. Use of the Reactor pattern suggests non-blocking, but it may not be.
rather than giving time to another event (which was not before in any Java multi-threaded framework).
This is meaningless.
How single thread is better than multi-threaded?
It isn't.
What if there are millions of incoming HTTP requests? Won't it be slower than other multi-threaded frameworks?
Yes.
Verticles depend on CPU cores. As many CPU cores you have, you can have that many verticles running in parallel. How come a language that works on a virtual machine can make use of CPU as needed? As far as I know, the Java VM (JVM) is an application that uses just another OS process for (here my understanding is less about OS and JVM hence my question might be naive).
It uses a thread per core, as per the quotation above, or whatever you choose by overriding that.
If a single threaded, non-blocking concept is so effective then why can't we have the same non-blocking concept in a multi-threaded environemnt?
You can.
Won't it be faster?
Yes.
Or again, is it because CPU can execute one thread at a time?
A multi-core CPU can execute more than one thread at a time. I don't know what 'it' in 'is it because' refers to.
First of all, Vertx isn't single threaded by any means. It just doesn't spawn more threads that it needs.
Second, and this is not related to Vertx at all, JVM maps threads to native OS threads.
Third, we can have non-blocking behavior in multithreaded environment. It's not one thread per CPU, but one thread per core.
But then the question is: "what are those threads doing?". Because usually, to be useful, they need other resources. Network, DB, filesystem, memory. And here it becomes tricky. When you're single threaded, you don't have race conditions. The only one accessing the memory at any point of time is you. But if you're multi threaded, you need to concern yourself with mutexes, or any other way to keep you data consistent.
Q:
How single thread is better than multi-threaded? What if there are millions of incoming HTTP requests? Won't it be slower than other multi-threaded frameworks?
A:
Vert.x isn't a single threaded framework, it does make sure that a "verticle" which is something you deploy within you application and register with vert.x is mostly single threaded.
The reason for this is that concurrency with multiple threads over complicates concurrency with locks synchronisation and other concept that need to be taken care of with multi threaded communication.
While verticles are single threaded the do use something called an event loop which is the true power behind this paradigm called the reactor pattern or multi reactor pattern in Vert.x's case. Multiple verticles can be registered within one application, communication between these verticles run through an eventbus which empowers verticles to use an event based transfer protocol internally but this can also be distributed using some other technology to manage the clustering.
event loops handle events coming in on one thread but everything is async so computation gets handled by the loop and when it's done a signal notifies that a result can be used.
So all computation is either callback based or uses something like Reactive.X / fibers / coroutines / channels and the lot.
Due to the simpler communication model for concurrency and other nice features of Vert.x it can actually be faster than a lot of the Blocking and pure multi threaded models out there.
the numbers
Q:
If a single threaded, non-blocking concept is so effective then why can't we have the same non-blocking concept in a multi-threaded environemnt? Won't it be faster? Or again, is it because CPU can execute one thread at a time?
A:
Like a said with the first question it's not really single threaded. Actually when you know something is blocking you'll have to register computation with a method called executeBlocking which wil make it run multithreaded on an ExecutorService managed by Vert.x
The reason why Vert.x's model is mostly faster is also here because event loops make better use of cpu computation features and constraints. This is mostly powered by the Netty project.
the overhead of multi threading with it's locks and syncs imposes to much strain to outdo Vert.x with it's multi reactor pattern.
I was going through the twitter finagle library which is an asynchronous service framework in scala and I have some question regarding asynchronous libraries in general.
So as I understand, the advantage of an synchronous library using a callback is that the application thread gets free and the library calls the callback as soon as the request is completed over the network. And in general the application threads might not have a 1:1 mapping with the library thread.
The service call in the library thread is blocking right?
If that's the case then we are just making the blocking call in some other thread. This makes the application thread free but some other thread is doing the same work. Can't we just increase the number of application threads to have that advantage?
It's possible that I mis-understand how the asynchronous libraries are implemented in Java/Scala or JVM in general. Can anyone help me understand how does this work?
Async approach is useful abstraction: your CPU-intensive thread offloads long-running IO operation to dedicated (maybe, belonging to a library) thread. When IO is done, some other thread will receive IO result.
Using blocking approach, you'll miss CPU ticks for your threads which are doing blocking IO call. And adding some more threads to ensure there's always free thread to do some CPU work means wasting OS/JVM resources for re-scheduling.
Blocking IO is used because it's simpler to program (no need to synchronize caller and callback).
Actually, IO is only one possible use-case where async style is useful. In general, whenever you feel that task at hand will benefit from splitting it into several activities, which can be run concurrently and would communicate with each other, this is the case for async style. Examples not connected to IO:
GUI: GUI event loop thread passes user input to background threads, and they do necessary processing;
Utilizing modern multi-core CPUs: if your task can be split in several subtasks, you can run these in separate threads, utilizing all available cores. Naturally, you'll need to gather results of subtasks, and you'll need async style here.
As for as I know, Java threads can communicate using some thread APIs. But I want to know how the Java threads and the OS threads are communicting with each other. For example a Java thread needs to wait for some OS thread finishes its execution and returns some results to this Java thread and it process the same.
Many mix up threads and processes here, the jvm is a process which may spawn more threads. Threads are lighter processes which share memory within their process. A process on the other hand lives in his own address space, which makes the context switch more expensive. You can communicate between different processes via the IPC mechanisms provided by your OS and you can communicate between different threads within the same process due to shared memory and other techniques. What you can't is communicate from ThreadA(ProcessA) to ThreadA(ProcessB) without going through plain old IPC: ThreadA(ProcessA) -> ProcessA -> IPC(OS) -> ProcessB -> ThreadA(ProcessB)).
You can use RMI to communicate between two java processes, if you want to "talk" to native OS processes, you have to go JNI to call the IPC mechanisms your OS of choice provides imo.
Feel free to correct me here :)
Sidenote:
You cant see the threads of your JVM with a process manager (as long as your JVM does not map threads to native processes, which would be stupid but possible), you need to use jps and jstack to do that.
Every Instance of JVM is essentially an OS process.
Java threads usually but don't necessarily run on native threads and Java concurrency classes could but don't necessarily map onto native equivalents.
If you had to sync between a native thread and a Java thread, you will most likely have to consider writing a JNI method that your Java thread calls. This JNI method would do whatever native synchronization operation it needs to do and then return. Every platform is going to do this differently but I assume this wouldn't be too much of an issue if you need to inspect native threads in the first place.