Why Kotlin/Java doesn't have an option for preemptive scheduler?

Why Kotlin/Java doesn't have an option for preemptive scheduler? - java

Heavy CPU bound task could block the thread and delay other tasks waiting execution. That's because JVM can't interrupt running thread and require help from programmer and manual interruption.
So writing CPU bound tasks in Java/Kotlin requires manual intervention to make things run smoothly, like using Sequence in Kotlin in code below.
fun simple(): Sequence<Int> = sequence { // sequence builder
for (i in 1..3) {
Thread.sleep(100) // pretend we are computing it
yield(i) // yield next value
}
}
fun main() {
simple().forEach { value -> println(value) }
}
As far as I understood the reason is that having preemptive scheduler with the ability to interrupt running threads have performance overhead.
But wouldn't it be better to have a switch, so you can choose? If you would like to run JVM with faster non-preemptive scheduler. Or with slower pre-emtpive (interrupting and switching the tread after N instructions) one but able to run things smoothly and don't require manual labor to do that?
I wonder why Java/Kotlin doesn't have such JVM switch that would allow to choose what mode you would like.

When you program using Kotlin coroutines or Java virtual threads (after Loom), you get preemptive scheduling from the OS.
Following usual practices, tasks that are not blocked (i.e., they need CPU) are multiplexed over real OS threads in the Kotlin default dispatcher or Java ForkJoinPool. Those OS threads are scheduled preemptively by the OS.
Unlike old-style multithreading, however, tasks are not assigned to a thread when they are blocked waiting for I/O. This makes no difference in terms of preemption, since a task that is waiting for I/O couldn't possibly preempt another running task anyway.
What you don't get when programming with coroutines, is preemptive scheduling over a large number of tasks simultaneously. If you have many tasks that require the CPU, then the first N will be assigned to a real thread and the OS will time slice them. The remaining ones will wait in the queue until those ones are done.
But in real life, when you have 10000 tasks that need to be simultaneously interactive, they are I/O bound tasks. On average, there aren't many that require the CPU at any one time, so the number of real threads you get from the default dispatcher or ForkJoinPool is plenty. In normal operation, the queue of tasks waiting for threads is almost always empty.
If you really had a situation where 10000 CPU-bound tasks needed to be simultaneously interactive, well, then you would be sad anyway, because time slicing would not provide a very smooth experience.

This question is based on a false premise: In the JVM, a preemptive scheduler is your only choice. No modern JVM uses co-operative multitasking.
No modern JVM implements user space threads or a scheduler of its own. JVMs use native operating system threads instead. Native threads are scheduled by the operating system, and operating system schedulers are preemptive.
The fact that JVM threads map 1-to-1 to native operating system threads is a problem for applications that need a high level of concurrency. Threads are relatively scarce and expensive. To address this, Project Loom is investigating adding "virtual threads" that may allow using native threads more sparingly, especially for I/O bound tasks.
Project Loom is under active development and there is no set schedule of when it will become a part of standard Java. Regarding how Project Loom schedules "virtual threads", the latest (May 2020) update from Project Loom claims "virtual threads are preemptive, not cooperative" but then goes on to say "none of the schedulers in the JDK currently employs time-slice-based preemption of virtual threads". It sounds like in its current state the "virtual thread" scheduler in Project Loom is somewhere between fully co-operative and fully pre-emptive. It will be interesting to see how the project develops and what we will get when it is integrated into main stream Java.
In a July 28 Q&A the Loom project lead Ron Pressler mentioned that you will be able to plug in a scheduler of your own for virtual thread, but did not go into details of how much control you get over the scheduling algorithm.

Related

Can a blocked thread be rescheduled to do other work?

If I have a thread that is blocked waiting for a lock, can the OS reschedule that thread to do other work until the lock becomes available?
From my understanding, it cannot be rescheduled, it just sits idle until it can acquire the lock. But it just seems inefficient. If we have 100 tasks submitted to an ExecutorService, and 10 threads in the pool: if one of the threads holds a lock and the other 9 threads are waiting for that lock, then only the thread holding the lock can make progress. I would have imagined that the blocked threads could be temporarily rescheduled to run some of the other submitted tasks.

You said:
I would have imagined that the blocked threads could be temporarily rescheduled to run some of the other submitted tasks.
Project Loom
You happen to be describing the virtual threads (fibers) being developed as part of Project Loom for future versions of Java.
Currently the OpenJDK implementation of Java uses threads from the host OS as Java threads. So the scheduling of those threads is actually controlled by the OS rather than the JVM. And yes, as you describe, on all common OSes when Java code blocks, the code’s thread sits idle.
Project Loom layers virtual threads on top of the “real” platform/kernel threads. Many virtual threads may be mapped to each real thread. Running millions of threads is possible, on common hardware.
With Loom technology, the JVM detects blocking code. That blocked code’s virtual thread is “parked”, set aside, with another virtual thread assigned to that real thread to accomplish some execution time while the parked thread awaits a response. This parking-and-switching is quite rapid with little overhead. Blocking becomes extremely “cheap” under Loom technology.
Blocking is quite common in most run-of-the-mill business-oriented apps. Blocking occurs with file I/O, network I/O, database access, logging, console interactions, GUIs, and so on. Such apps using virtual threads are seeing huge performance gains with the experimental builds of Project Loom. These builds are available now, based on early-access Java 17. The Project Loom team seeks feedback.
Using virtual threads is utterly simple: Switch your choice of executor service.
ExecutorService executorService = Executors.newVirtualThreadExecutor() ;
Caveat: As commented by Michael, the virtual threads managed by the JVM depend on the platform/kernel threads managed by the host OS. Ultimately, execution is scheduled by the OS even under Loom. Virtual threads are useful for those times when a blocked Java thread would otherwise be sitting idle on a CPU core. If the host computer were overburdened, the Java threads might see little execution time, with or without virtual threads.
Virtual threads are not appropriate for tasks that rarely block, that are truly CPU-bound. For example, encoding video. Such tasks should continue using conventional threads.
For more info, see the enlightening presentations and interviews with Ron Pressler of Oracle, or with other members of the Loom team. Look for the most recent, as Loom has evolved.

I would have imagined that the blocked threads could be temporarily rescheduled to run some of the other submitted tasks.
That's what the other threads are for. If you create X threads and Y are blocked, you have the remaining X-Y threads to do other submitted tasks. Presumably, the number X was chosen specifically to get the number of concurrent tasks that the implementation and/or programmer thought was best.
You are asking why the implementation doesn't ignore this decision. The answer is because it makes more sense to choose the number of threads reasonably than have the implementation ignore that choice.

You are partially right.
In the executor service scenario that you described, all the 9 threads will be blocked and only one thread will make progress. true.
The part where you are not quite right is when you attempt to expect the behaviour of OS and Java combined. See, the concept of threads exists at both OS and at Java level. But they are two different things. So there are Java-Threads and there are OS-Threads. Java-Threads are implemented through OS-Threads.
Imagine it this way, JVM has (say) 10 Java-Threads in it, some running, some not. Java borrows some OS-Thread to implement the running Java-Threads. Now when a Java-Thread gets blocked (for whatever reason) then what we know for sure is that the Java-Thread has been blocked. We cannot easily observe what happened to the underlying OS-Thread.
The OS could reclaim the OS-Thread and use it for something else, or it can stay blocked - it depends. But even if the OS-Thread is reused, still Java-Thread will remain blocked. In your thread pool scenario, still the nine Java-Threads will be blocked, and only one Java-Thread will be working.

If I have a thread that is blocked waiting for a lock, can the OS reschedule that thread to do other work until the lock becomes available? From my understanding, it cannot be rescheduled, it just sits idle until it can acquire the lock. But it just seems inefficient.
I think you are thinking about this entirely wrong. Just because 10 of your 20 threads are "idle" doesn't mean that the operating system (or the JVM) is somehow consuming resources trying to manage these idle threads. Although in general we work on our applications to make sure that our threads are as unblocked as possible so we can achieve the highest throughput, there are tons of times that we write threads where we expect them to be idle most of the time.
If we have 100 tasks submitted to an ExecutorService, and 10 threads in the pool: if one of the threads holds a lock and the other 9 threads are waiting for that lock, then only the thread holding the lock can make progress. I would have imagined that the blocked threads could be temporarily rescheduled to run some of the other submitted tasks.
It is not the threads which are rescheduled, it is the CPU resources of the system. If 9 of your 10 threads are blocked in your thread-pool then other threads in your application (garbage collector) or other processes can be given more CPU resources on your server. This switching between jobs is what the modern operating systems are really good at and it happens many, many times a second. This is all quite normal.
Now, if your question is really "how do I improve the throughput of my application" then you are asking the right question. First off, you should make sure that your locks are as fine grained as possible to make sure that the thread holding the lock does so for a minimal amount of time. If the blocking happens too often then you should consider increasing the number of threads in the pool so that there is a higher likelihood that some of the jobs will run concurrently. This optimizing of the number of threads in a thread-pool is very application specific. See my post here for some more details: Concept behind putting wait(),notify() methods in Object class
Another thing you might consider is breaking your jobs up into pieces to separate the pieces that can run concurrently from the ones that need to be synchronized. You could have a pool of 10 threads doing the concurrent work and then a single thread doing the operations that require the locks. This is why the ExecutorCompletionService was written so that something downstream can take the results of a thread-pool and act on them as they complete. This will make your program more complicated and you'll need to worry about your queues if you are talking about some large number of jobs or large number of results but you can dramatically improve throughput if you do this right.
A good example of such refactoring is a situation where you have a processing job that has to write the results to a database. If at the end of each job, each thread in the pool needs to get a lock on the database connection then there will be a lot of contention for the lock and less concurrency. If, instead, the processing was done in a thread-pool and there was a single database update thread, it could turn off auto-commit and make updates from multiple jobs in a row between commits which could dramatically increase throughput. Then again, using multiple database connections managed by a connection pool might be a fine solution.

Cooperative Scheduling vs Preemptive Scheduling?

In the book Core Java : Volume 1 Fundamentals -> chapter MultiThreading .
The Author wrote as follows :
"All modern desktop and server operating systems use preemptive
scheduling. However, smaller devices such as cell phones may use
cooperative scheduling...."
I am aware of the definitions/workings of both types of scheduling , but want to understand reasons why cooperative scheduling is preferred over preemptive in smaller devices.
Can anyone explain the reasons why ?

Preemptive scheduling has to solve a hard problem -- getting all kinds of software from all kinds of places to efficiently share a CPU.
Cooperative scheduling solves a much simpler problem -- allowing CPU sharing among programs that are designed to work together.
So cooperative scheduling is cheaper and easier when you can get away with it. The key thing about small devices that allows cooperative scheduling to work is that all the software comes from one vendor and all the programs can be designed to work together.

The big benefit in cooperative scheduling over preemptive is that cooperative scheduling does not use "context switching". Context switching involves storing and restoring the state of an application (or thread). This is costly.
The reason why smaller devices are able to get away with cooperative scheduling for now has to do with the fact that there is only one user on a small device. The problem with cooperative scheduling is that one application can hog up the CPU. In preemptive scheduling every application will eventually be given an opportunity to use the CPU for a few cycles. For bigger systems, where multiple demons or users are involved, cooperative scheduling may cause issues.
Reducing context switching is kind of a big thing in modern programming. You see it in Node.js, Nginx, epoll, ReactiveX and many other places.

First you have to find the Meaning of the word Preemption
Preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such changes of the executed task are known as context switches.(https://en.wikipedia.org/wiki/Preemption_(computing))
Therefore, the difference is
In a preemptive model, the operating system's thread scheduler is
allowed to step in and hand control from one thread to another at any
time(tasks can be forcibly suspended).
In cooperative model, once a thread is given control it continues to
run until it explicitly yields control(handover control of CPU to the next task) or until it blocks.
Both models have their advantages and disadvantages. Preemptive scheduling works better when CPU have to run all kinds of software which are not related to each other. And cooperative scheduling works better when running programs that are designed to work together.
Examples for cooperative scheduling threads:
Windows fibers (https://learn.microsoft.com/en-gb/windows/win32/procthread/fibers?redirectedfrom=MSDN)
Sony’s PlayStation 4 SDK (http://twvideo01.ubm-us.net/o1/vault/gdc2015/presentations/Gyrling_Christian_Parallelizing_The_Naughty.pdf)
If you want to learn underline implementations of these cooperative scheduling fibers refer this book (https://www.gameenginebook.com/)
Your book states that "smaller devices such as cell phones", may be author is referring to cell phones from several years back. They had only few programs to run and all are provided by the phone manufacturer. So we can assume those programs are designed to work together.

Cooperative scheduling has fewer synchronizaton problems.
Cooperative scheduling can have better performance in some, mostly contrived, scenarios.
Cooperative scheduling introduces constraints upon design and implementation of threads.
Cooperative scheduling is basically useless for most real purposes because of dire I/O performance, which is why almost nobody uses it.
Even small devices will prefer to use preemptive scheduling if they can possibly get away with it. Smartphones, streaming, (esp. video), and such apps that require good I/O are essentially not possible with cooperative systems.
What you are left with are trivial embedded toaster-controllers and the like.

Hard real-time control applications often demand that at least one thread/task not be preemptively interrupted while other threads are more forgiving. Additionally, the highest priority task may require that it be executed on a rigid schedule rather than being left to the mercy of a scheduler that will eventually provide a time-slot. For these applications, cooperative multitasking seems much closer to what is needed than preemptive multitasking but it still isn't an exact fit since some tasks may need immediate on-demand interrupt response while other tasks are less sensitive to the multi-tasking scheme.

Cooperative Scheduling
A task will give up the CPU on a point called (Synchronization Point). It can use something like that in POSIX:
pthread.yield(Task_ID)
Preemptive Scheduling
The main difference here is that in preemptive scheduling, the task may be forced to relinquish the CPU by the scheduler. For instance, two tasks with same priority, while one of them running, its time slice is ended.

How to schedule Java Threads

I have read that Java threads are user-level threads and one of the differences between user level threads and kernel level threads is that kernel level threads are scheduled by the kernel(we cannot change it) where as for user level threads we can define our own scheduling algorithm.
So how do we schedule threads in Java? At any given time, when multiple threads are ready to be executed, the runtime system chooses the Runnable thread with the highest priority for execution. If two threads of the same priority are waiting for the CPU, the scheduler chooses one of them to run in a round-robin fashion. What if I don't want RR? is there a way I can change it or am I missing something here?

You cannot change the scheduling algorithm as for the JVM this is outside the scope. The JVM uses the threading of user threads provided by the underlying OS.
So from the Java perspective you cannot change the scheduling algorithm. The scheduling is done automatically.
The only thing in Java you can do is set the priority of the thread. But how this affects the scheduling algorithm is not defined.
You can try to change the scheduling algorithm of the OS where your VM is running on. But this is highly dependend on the OS used.

For the last 10 years or so JVM threads are system-level threads and not user-level ('green') threads. Even for user-level threads, you don't get to manage them (the JVM does).

The JVM Spec does not state how threads are supposed to be scheduled by an implementation. The Hotspot VM (and most likely almost every other implementation as well) use the OS scheduling mechanisms (as Uwe stated). See also What is the JVM Scheduling algorithm?.
A simple, yet most likely not very efficient way to influence scheduling of your application threads would be to have only n runnable threads for the OS to schedule (n being the number of threads you'd actually like to run in parallel).
That could e.g. be your own implementation of ExecutorService, which makes all threads you don't want to be scheduled by the OS wait until you think they should run.
Of course this way you don't have any influence on other VM threads, let alone other applications or the OS.
A lot more involved (and not plattform independent) would be to change the OS scheduler itself to something more tailored to the needs of a JVM. A quick google research found this abstract, and I guess there's more work done on that field.

In Effective Java, 2nd Ed., Joshua Bloch devotes an item to the discussion of thread scheduling. He goes on at length about how trying to tweak thread scheduling usually only leads to solutions that are JVM implementation dependent, non-portable, and fragile.
If there's a particular scheduling problem that you have, then for new code you should not deal with low-level thread calls anyway. Java has higher level concurrency libraries that simplify many of these tasks. Rather than defining the solution to your problem with threads, you ought to be thinking of Executors and Tasks. There are also higher level facilities that simplify inter-thread communication, such as CountDownLatch.
Low level thread calls such as wait, notify, and notifyAll can be difficult to do properly.

You could write your own thread scheduler, analogous to the Quartz job scheduler for batch jobs.
This would allow you to execute threads at various times of the day during the run of your application.
If all you want is to determine the order of your thread execution, execute the code from one master thread.

Java Performance Processes vs Threads

I am implementing a worker pool in Java.
This is essentially a whole load of objects which will pick up chunks of data, process the data and then store the result. Because of IO latency there will be significantly more workers than processor cores.
The server is dedicated to this task and I want to wring the maximum performance out of the hardware (but no I don't want to implement it in C++).
The simplest implementation would be to have a single Java process which creates and monitors a number of worker threads. An alternative would be to run a Java process for each worker.
Assuming for arguments sake a quadcore Linux server which of these solutions would you anticipate being more performant and why?
You can assume the workers never need to communicate with one another.

One process, multiple threads - for a few reasons.
When context-switching between jobs, it's cheaper on some processors to switch between threads than between processes. This is especially important in this kind of I/O-bound case with more workers than cores. The more work you do between getting I/O blocked, the less important this is. Good buffering will pay for threads or processes, though.
When switching between threads in the same JVM, at least some Linux implementations (x86, in particular) don't need to flush cache. See Tsuna's blog. Cache pollution between threads will be minimized, since they can share the program cache, are performing the same task, and are sharing the same copy of the code. We're talking savings on the order of 100's of nanoseconds to several microseconds per switch. If that's small potatoes for you, then read on...
Depending on the design, the I/O data path may be shorter for one process.
The startup and warmup time for a thread is generally much shorter. The OS doesn't have to start a process, Java doesn't have to start another JVM, classloading is only done once, JIT-compilation is only done once, and HotSpot optimizations are done once, and sooner.

Well usually, when discussing multi processing (/w one thread per process) versus multi threading in the same process, while the theoretical overhead is bigger in the first case than in the latter (and thus multi processing is theoretically slower than multi threading), in reality on most modern OSs this is not such a big issue. However when discussing it in the Java context, starting a new process is a lot more costly then starting a new thread. Starting a new process means starting up a new instance of the JVM which is very costly especially in terms of memory. I recommend that you start multiple threads in the same JVM.
Moreover, if you say inter-thread communication is not an issue, you can use Java's Executor Service to get a fixed thread pool of size 2x(number of available CPUs). The number of available CPU's can be autodetected at runtime via Java's Runtime class. This way you get a quick simple multithreading going without any boiler plate code.

Actually, if you do this with large scale taks using multiple jvm process is way faster than one jvm with multple threads. At least we never got one jvm runnning as fast as multple jvms.
We do some calculations where each task uses around 2-3GB ram and does some heavy number crunching. If we spawn 30 jvm's and run 30 task they perform around 15-20% better than spawning 30 threads in one jvm. We tried tuning the gc and the various memory sections and never catched up to the first variant.
We did this on various machines 14 tasks on a 16 core server, 34 tasks on a 36 core server etc. Multithreading in java always performed worde than multiple jvm processes.
It may not make any difference on simple tasks but on heavy calculations it seems jvm performce bad on threads.

Understanding java's native threads and the jvm

I understand that the jvm is itself an application that turns the bytecode of the java executable into native machine code, but when using native threads I have some questions that I just cannot seem to answer.
Does every thread create their own
instance of the jvm to handle their
particular execution?
If not then does the jvm have to have some way to schedule which thread it will handle next, if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?

Does every thread create their own instance of the JVM to handle their particular execution?
No. They execute in the same JVM so that (for example) they can share objects and values of static fields.
If not then does the JVM have to have some way to schedule which thread it will handle next
There are two kinds of thread implementation in Java. Native threads are mapped onto a thread abstraction which is implemented by the host OS. The OS takes care of native thread scheduling, and time slicing.
The second kind of thread is "green threads". These are implemented and managed by the JVM itself, with the JVM implementing thread scheduling. Java green thread implementations have not been supported by Sun / Oracle JVMs since Java 1.2. (See Green Threads vs Non Green Threads)
If so wouldn't this render the multi-threaded nature of Java useless since only one thread can be ran at a time?
We are talking about green threads now, and this is of historic interest (only) from the Java perspective.
Green threads have the advantage that scheduling and context switching are faster in the non-I/O case. (Based on measurements made with Java on Linux 2.2; http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.9238)
With pure green threads, N programming language threads are mapped to a single native thread. In this model you don't get true parallel execution, as you noted.
In a hybrid thread implementation, N programming language threads are mapped onto M native threads (where N > M). In this model, the in-process thread scheduler is responsible for the green thread to native thread scheduling AND you get true parallel execution (if M > 1); see https://stackoverflow.com/a/16965741/139985.
But even with the pure green threads, you still get concurrency. Control is switched to another threads a thread blocks on an I/O operation, whick acquiring a lock, and so on. Furthermore, the JVM's runtime could implement periodic thread preemption so that a CPU intensive thread doesn't monopolize the (single) core to the exclusion of other threads

Does every thread create their own instance of the jvm to handle their particular execution?
No, your application running in the JVM can have many threads that all exist within that instance of the JVM.
If not then does the jvm have to have some way to schedule which thread it will handle next...
Yes, the JVM has a thread scheduler. There are many different algorithms for thread scheduling, and which one is used is JVM-vendor dependent. (Scheduling in general is an interesting topic.)
...if so wouldn't this render the multi-threaded nature of java useless since only one thread can be ran at a time?
I'm not sure I understand this part of your question. This is kind of the point of threading. You typically have more threads than CPUs, and you want to run more than one thing at a time. Threading allows you to take full(er) advantage of your CPU by making sure it's busy processing one thread while another is waiting on I/O, or is for some other reason not busy.

A Java thread may be mapped one-to-one to a kernel thread. But this must not be so. There could be n kernel threads running m java threads, where m may be much larger than n, and n should be larger than the number of processors. The JVM itself starts the n kernel threads, and each one of them picks a java thread and runs it for a while, then switches to some other java thread. The operating system picks kernel threads and assigns them to a cpu. So there may be thread scheduling on several levels.
You may be interested to look at the GO programming language, where thousands of so called "Goroutines" are run by dozens of threads.

Java threads are mapped to native OS threads. They have little to do with the JVM itself.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.