I'm using ScheduledThreadPoolExecutor to schedule a large number of tasks to run evenly over an hour.
There will be tens of thousands of tasks, and this may grow during surges in demand to hundreds of thousands or millions.
Normally, with a ThreadPoolExecutor, I can set the corePoolSize to a reasonable figure, such as 10 threads, and then allow the pool to grow as necessary.
However, the documentation for ScheduledThreadPoolExecutor says that:
because it acts as a fixed-sized pool using corePoolSize threads and
an unbounded queue, adjustments to maximumPoolSize have no useful
effect
This means that if I set the corePoolSize to 10, it will be capped at 10 threads. If I set the corePoolSize to 1000, it will instantly allocate 1000 threads, even though it may never reach that many active threads.
Is there an alternative to ScheduledThreadPoolExecutor that will let me set a high maximum thread count, without allocating all of those threads instantly?
FYI, the reason for the need for a huge active thread count is that the threads are I/O bound, and not CPU bound.
Is there an alternative to ScheduledThreadPoolExecutor that will let me set a high maximum thread count
Yes!
Well… maybe. Perhaps in the near future.
Project Loom
Project Loom is an effort in the making for a few years now to bring new capabilities to the concurrency facilities in Java.
Virtual Threads are being previewed now in Java 19. Each conventional thread in Java maps one-to-one to a thread provided and managed by the host operating system. OS threads are expensive in terms of scheduling, CPU utilization, and memory. Virtual threads, in contrast, are mapped many-to-one to a host OS thread. The JVM switches between virtual threads when detecting Java code that blocks. As a result you can have many more threads, millions even, rather than several or dozens.
Structured Concurrency is being incubated in Java 19. Basically this provides an easy way to submit and track a bunch of tasks across a bunch of threads. This “treats multiple tasks running in different threads as a single unit of work“, to quote JEP 428.
Virtual threads promise to be generally appropriate to most tasks in most apps commonly implemented in Java. There are two caveats: (a) Virtual threads are contra-indicated for tasks that are CPU-bound, continually computing with no blocking (no user-interface interaction, no logging, no disk access, no network access, no database access, etc.). An example would be video-encoding; for such work use conventional threads in Java. (b) Tasks that utilize constrained resources will need gate-keeping or throttling.
In your scenario, I expect you will not need to concern yourself with setting thread pool size. Virtual threads handle that automatically in common cases. And you’ll not need to distribute the tasks over time to avoid overloading the machine — again, virtual threads manage that automatically in common cases.
For more info on Project Loom, see the many articles available on the Web, and the many talks and interviews with the Loom team members including Ron Pressler and Alan Bateman. And I recommend multiple recent videos on the YouTube channel JEP Café.
The answer was simple:
I replaced
scheduledThreadPoolExecutor.schedule(task, delay, timeUnit);
with
scheduledThreadPoolExecutor.schedule(()->executorService.submit(task), delay, timeUnit);
where executorService is a ThreadPoolExecutor that is allowed to have a corePoolSize that is different from the maxPoolSize
Related
Java ForkJoin pools has been compared to the other "classic" thread pool implementation in Java many times. The question I have is slightly different though:
Can I use a single, shared ForkJoin pool for an application that have BOTH types of thread usage - long running, socket-handling, transactional threads, AND short running tasks (CompletableFuture)? Or do I have to go through the pain of maintaining 2 separate pools for each type of need?
... In other words, is there a significant (performance?) penalty, if ForkJoin is used in place where other Java thread pool implementations suffice?
According to the documentation, it depends:
(3) Unless the ForkJoinPool.ManagedBlocker API is used, or the number of possibly blocked tasks is known to be less than the pool's ForkJoinPool.getParallelism() level, the pool cannot guarantee that enough threads will be available to ensure progress or good performance.
The parallelism is coupled to the amount of available CPU cores. So given enough CPU cores and not to many blocking I/O tasks, you could use the commonPool. It does not mean you should though. For one thing, ForkJoinPool is explicitly not designed for long running (blocking) tasks. For another thing, you probably want to do something with long running (blocking) tasks during shutdown.
The spec for this method: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.
It's not clear to me from this description - is it safe to have several of these pools in a single program? Or would I potentially run into a situation where one pool stalls on many threads and freezes up other pools?
I don't think there is a clear yes / no answer on this.
On the one hand, there is not a finite number of threads that ThreadPoolExecutor instances consume. The JVM architecture itself doesn't limit the number of threads.
On the the second hand, the OS / environment may place some limits:
The OS may have hard limits on the total number of native threads it will support.
The OS may restrict the number of native threads that a given process (in this case the JVM) can create. This could be done using ulimit or cgroup limits, and potentially other ways.
A Java thread stack has a size of 1MB (by default) on a typical 64 bit JVM. If you attempt to start() too many threads, you may run out of memory and get an OOME.
If there are a large enough number of threads and/or too much thread context switching, the thread scheduler (in the OS) may struggle.
(Context switching typically happens when a thread does a blocking syscall or has to wait on a lock or a notification. Each time you switch context there are hardware related overheads: saving and restoring registers, switching virtual memory contexts, flushing memory caches, etc.)
On the third hand, there are other things than just the number and size of thread pools that could cause problems. For example, if the thread tasks interact with each other, you could experience problems due to:
deadlocking when locking shared objects,
too much contention on shared locks leading to resource starvation,
too much work leading to timeouts, or
priority inversion problems ... if you try to use priorities to "manage" the workload.
So ...
Is it safe to have several of these pools in a single program?
Or would I potentially run into a situation where one pool stalls on many threads and freezes up other pools.
It is unlikely you would get a "stall" ... unless the tasks are interacting in some way.
But if you have too many runnable threads competing for CPU, each one will get (on average) a smaller share of the finite number of cores available. And lock contention or too much context switching can slow things down further.
I am using ThreadPoolexecutor by replacing it with legacy Thread.
I have created executor as below:
pool = new ThreadPoolExecutor(coreSize, size, 0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(coreSize),
new CustomThreadFactory(name),
new CustomRejectionExecutionHandler());
pool.prestartAllCoreThreads();
Here core size is maxpoolsize/5. I have pre-started all the core threads on start up of application roughly around 160 threads.
In legacy design we were creating and starting around 670 threads.
But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
For results memory management we are using top command to see memory usage.
For time we have placed loggers of System.currentTime in millis to check the usage.
Please tell how to optimize this design. Thanks.
But the point is even after using Executor and creating and replacing legacy design we are not getting much better results.
I am assuming that you are looking at the overall throughput from your application and you are not seeing a better performance as opposed to running each task in its own thread -- i.e. not with a pool?
This sounds like you were not being blocked because of context switching. Maybe your application is IO bound or otherwise waiting on some other system resource. 670 threads sounds like a lot and you would have been using a lot of thread stack memory but otherwise it may not have been holding back the performance of your application.
Typically we use the ExecutorService classes not necessarily because they are faster than raw threads but because the code is easier to manage. The concurrent classes take care of a lot of locking, queueing, etc. out of your hands.
Couple code comments:
I'm not sure you want the LinkedBlockingQueue to be limited by core-size. Those are two different numbers. core-size is the minimum number of threads in the pool. The size of the BlockingQueue is how many jobs can be queued up waiting for a free thread.
As an aside, the ThreadPoolExecutor will never allocate a thread past the core thread number, unless the BlockingQueue is full. In your case, if all of the core-threads are busy and the queue is full with the core-size number of queued tasks is when the next thread is forked.
I've never had to use pool.prestartAllCoreThreads();. The core threads will be started once tasks are submitted to the pool so I don't think it buys you much -- at least not with a long running application.
For time we have placed loggers of System.currentTime in millis to check the usage.
Be careful on this. Too many loggers could affect performance of your application more than re-architecting it. But I assume you added the loggers after you didn't see a performance improvement.
The executor merely wraps the creation/usage of Threads, so it's not doing anything magical.
It sounds like you have a bottleneck elsewhere. Are you locking on a single object ? Do you have a single single-threaded resource that every thread hits ? In such a case you wouldn't see any change in behaviour.
Is your process CPU-bound ? If so your threads should (very roughly speaking) match the number of processing cores available. Note that each thread you create consumes memory for its stack, and if you're memory bound, then creating multiple threads won't help here.
Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
Are those true? If not, what are the fundamental principles that debunk those claims (specifically relating to java)?
Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
The claims are not true as a general statement. That is to say, sometimes they are true (or true-ish) and other times they are patently false.
A couple things are indisputably true:
More threads means more memory usage. Each thread requires a thread stack. For recent HotSpot JVMs, the minimum thread stack size is 64Kb, and the default can be as much as 1Mb. That can be significant. In addition, any thread that is alive is likely to own or share objects in the heap whether or not it is currently runnable. Therefore is is reasonable to expect that more threads means a larger memory working set.
A JVM cannot have more threads actually running than there are cores (or hyperthread cores or whatever) on the execution hardware. A car won't run without an engine, and a thread won't run without a core.
Beyond that, things get less clear cut. The "problem" is that a live thread can in a variety of "states". For instance:
A live thread can be running; i.e. actively executing instructions.
A live thread can be runnable; i.e. waiting for a core so that it can be run.
A live thread can by synchronizing; i.e. waiting for a signal from another thread, or waiting for a lock to be released.
A live thread can be waiting on an external event; e.g. waiting for some external server / service to respond to a request.
The "one thread per core" heuristic assumes that threads are either running or runnable (according to the above). But for a lot of multi-threaded applications, the heuristic is wrong ... because it doesn't take account of threads in the other states.
Now "too many" threads clearly can cause significant performance degradation, simple by using too much memory. (Imagine that you have 4Gb of physical memory and you create 8,000 threads with 1Mb stacks. That is a recipe for virtual memory thrashing.)
But what about other things? Can having too many threads cause excessive context switching?
I don't think so. If you have lots of threads, and your application's use of those threads can result in excessive context switches, and that is bad for performance. However, I posit that the root cause of the context switched is not the actual number of threads. The root of the performance problems are more likely that the application is:
synchronizing in a particularly wasteful way; e.g. using Object.notifyAll() when Object.notify() would be better, OR
synchronizing on a highly contended data structure, OR
doing too much synchronization relative to the amount of useful work that each thread is doing, OR
trying to do too much I/O in parallel.
(In the last case, the bottleneck is likely to be the I/O system rather than context switches ... unless the I/O is IPC with services / programs on the same machine.)
The other point is that in the absence of the confounding factors above, having more threads is not going to increase context switches. If your application has N runnable threads competing for M processors, and the threads are purely computational and contention free, then the OS'es thread scheduler is going to attempt to time-slice between them. But the length of a timeslice is likely to be measured in tenths of a second (or more), so that the context switch overhead is negligible compared with the work that a CPU-bound thread actually performs during its slice. And if we assume that the length of a time slice is constant, then the context switch overhead will be constant too. Adding more runnable threads (increasing N) won't change the ratio of work to overhead significantly.
In summary, it is true that "too many threads" is harmful for performance. However, there is no reliable universal "rule of thumb" for how many is "too many". And (fortunately) you generally have considerable leeway before the performance problems of "too many" become significant.
Having fewer threads than cores generally means you can't take advantage of all available cores.
The usual question is how many more threads than cores you want. That, however, varies, depending on the amount of time (overall) that your threads spend doing things like I/O vs. the amount of time they spend doing computation. If they're all doing pure computation, then you'd normally want about the same number of threads as cores. If they're doing a fair amount of I/O, you'd typically want quite a few more threads than cores.
Looking at it from the other direction for a moment, you want enough threads running to ensure that whenever one thread blocks for some reason (typically waiting on I/O) you have another thread (that's not blocked) available to run on that core. The exact number that takes depends on how much of its time each thread spends blocked.
That's not true, unless the number of threads is vastly more than the number of cores. The reasoning is that additional threads will mean additional context switches. But it's not true because an operating system will only make unforced context switches if those context switches are beneficial, and additional threads don't force additional context switches.
If you create an absurd number of threads, that wastes resources. But none of this is anything compared to how bad creating too few threads is. If you create too few threads, an unexpected block (such as a page fault) can result in CPUs sitting idle, and that swamps any possible harm from a few extra context switches.
Not exactly true, this depends on the overall software architecture. There's a reason of keeping more threads than available cores in case some of the threads are suspended by the OS because they're waiting for an I/O to complete. This may be an explicit I/O invocation (such as synchronous reading from file), as well as implicit, such as system paging handling.
Actually I've read in one book that keeping the number of threads twice the number of CPU cores is is a good practice.
For REST API calls or say I/O-bound operations, having more threads than the number of cores can potentially improve the performance by allowing multiple API requests to be processed in parallel. However, the optimal number of threads depends on various factors such as the API request frequency, the complexity of the request processing, and the resources available on the server.
If the API request processing is CPU-bound and requires a lot of computation, having too many threads may cause resource contention and lead to reduced performance. In such cases, the number of threads should be limited to the number of cores available.
On the other hand, if the API request processing is I/O-bound and involves a lot of waiting for responses from external resources such as databases, having more threads may improve performance by allowing multiple requests to be processed in parallel.
In any case, it is recommended to perform performance testing to determine the optimal number of threads for your specific use case and monitor the system performance using metrics such as response time, resource utilization, and error rate.
I'm developing a small multi-thread application (in java) to help me understand it. As I researched about it, I learned that the ideal amount of threads you would like the number supported by the processor (ie. 4 in an Intel i3, 8 in an Intel i7, I think). But swing alone already has 3 threads + 1 thread (the main, in this case). Does that means that I won't have any significant improvement in a processor which supports 4 threads? Will the swing threads just consume all the processor threads and everything else will just run on the same processor? Is it worthed to multi-thread it (performance-wise) even with those swing threads?
OBS: A maybe important observation that needs to be made is that I will be using a JFrame and doing active-rendering. That's probably as far as I will go with swing.
I learned that the ideal amount of threads you would like the number supported by the processor
That statement is only true if your Threads are occupying the whole CPU. For example the Swing thread (Event Dispatch Thread) is most of the time just waiting for user input.
The number one thing that most threads do is waiting. They are just there so they are ready to go at the instant the system needs their services.
The comment about the ideal thread count goes for the number of threads at 100% workload.
Swing's threads spend a lot of time being idle. The ideal number of threads is wrt threads executing at or near 100% processor time.
You may still not see significant improvements due to other factors, but the threads inherent in swing shouldn't be a concern.
The ideal # of threads is not necessarily governed by the # of cpu's (cores) you have. There's a lot of tuning involved base on the actual code you are executing.
For example, let's take a Runnable which executions some database queries (doesn't matter what). Most of the thread time will be spent blocked waiting for a response from the database. So if you have 4 cores, and execute 4 threads. The odds are that at any given time, many of them are blocked on db calls. You can easily spawn more threads with no ill effects on your cpu. In this case you're limited not by the specs of your machine, but by the degree of concurrency which the db will handle.
Another example would be file I/O, which spends most of it's time waiting for the I/O subsystem to respond with data.
The only real way is evaluation of your multi-threaded code, along with trial and error for a given environment.
Yes, they do but only minimally. There are other threads such as GC and finalizer threads as well that are running in the background. These are all necessary for the JVM to operate just as the Swing threads are necessary for Swing to work.
You shouldn't have to worry about them unless you are on a dramatically small system with little resources or CPU capacity.
With modern systems, many of which have multiple processors and/or multiple cores, the JVM and the OS will run these other threads on other processors and still give your user threads all of the processor power that you will need.
Also, most of the background Swing threads are in wait loops waiting to handle events and make display changes. Unless you do something wrong, they should make up a small amount of your application processor requirements.
I learned that the ideal amount of threads you would like the number supported by the processor
As #Robin mentioned, this is only necessary when you are trying to optimize a program that has a number of CPU-bound operations. For example, our application typically has 1000s of threads but 8 processors and still is very responsive since the threads are all waiting for IO or events. The only time you need to worry about the number of CPUs is when you are doing processor intensive operations and you are trying to maximize your throughput.