I have a large Android project with 50 distinct code instances of the newSingleThreadExecutor() that are triggered by various unpredictable events including UI button presses. After researching a bit, I think that a single thread is created for each of the 50 distinct code instances and will maintain sequential order of execution of calls to the executor.
How many threads are created and is there a problem in maintaining sequential execution in order of calls to the executor?
Note: the code is not using submit so this is why I am confused about the order of execution and number of threads created. For example, what happens if a user presses the UI button an hundred times very fast? Will I get a hundred threads finishing at different times based on when the database futures complete?
Listed below is how the code is formatted in all 50 instances.
Executors.newSingleThreadExecutor().execute(new Runnable() {
#Override
public void run()
{
// Executes multiple database calls using futures and
// Uses get() which waits if necessary for the computation to complete,
// and then retrieves its result.
}
});
How many threads are created and is there a problem in maintaining sequential execution in order of calls to the executor?
I'm not sure I'm understanding whether or not you are asking about the pool threads or the threads forked each time through the loop.
So if you are creating a new ExecutorService each of the 50 times then you cannot tell how many threads are actually created. If all of the 49 previous jobs are still running then the 50th will create another thread. If each of the jobs finish before the next one is executed then only one thread will be created. This doesn't take into account the number of futures and other stuff down by each thread. If it is already a background thread do you really need multiple futures, etc.?
Without more details, it seems to be me that the right thing to do here is to use a single ExecutorService for all 50 jobs. Then you can determine if you want a fixed number of threads or make it dynamic.
I think that a single thread is created for each of the 50 distinct code instances and will maintain sequential order of execution of calls to the executor.
There will be no "sequential order" because you are creating a new executor service each time. If you were using a single one then the only way to guarantee sequential order of execution is to only have one thread in the pool.
For example, what happens if a user presses the UI button an hundred times very fast? Will I get a hundred threads finishing at different times based on when the database futures complete?
Yes, you will get a 100 different threads. If you want to control the max number of threads forked then again, use a single thread-pool for all of the tasks.
Btw, you must always shutdown an executor service after you submit the last job:
ExecutorService threadPool = Executors.newSingleThreadExecutor();
threadPool.execute(new Runnable() { ... });
// this must be done to properly quit the threads
threadPool.shutdown();
Related
I have a service which schedules async tasks using ScheduledExecutorService for the user. Each user will trigger the service to schedule two tasks. (The 1st Task schedule the 2nd task with a fixed delay, such as 10 seconds interval)
pseudocode code illustration:
task1Future = threadPoolTaskScheduler.schedule(task1);
for(int i = 0; i< 10000; ++i) {
task2Future = threadPoolTaskScheduler.schedule(task2);
task2Future.get(); // Takes long time
Thread.sleep(10);
}
task1.Future.get();
Suppose I have a potential of 10000 users using the service at the same time, we can have two kinds of ScheduledExecutorService configuration for my service:
A single ScheduledExecutorService for all the users.
Create a ScheduledExecutorService for each user.
What I can think about the first method:
Pros:
Easy to control the number of threads in the thread pool.
Avoid creating new threads for scheduled tasks.
Cons:
Always keeping multiple number of threads available could waste computer resources.
May cause the hang of the service because of lacking available threads. (For example, set the thread pool size to 10, and then there is a 100 person using the service the same time, then after entering the 1st task and it tries to schedule the 2nd task, then finding out there is no thread available for scheduling the 2nd task)
What I can think about the second method
Pros:
Avoiding always keep many threads available when the number of user is small.
Can always provide threads for a large number of simultaneously usage.
Cons:
Creating new threads creates overheads.
Don't know how to control the number of maximum threads for the service. May cause the RAM out of space.
Any ideas about which way is better?
Single ScheduledExecutorService drives many tasks
The entire point of a ScheduledExecutorService is to maintain a collection of tasks to be executed after a certain amount of time elapses.
So given the scenario you describe, you need only a single ScheduledExecutorService object. Submit your 10,000 tasks to that one object. Each task will be executed approximately when its designated delay elapses. Simple, and easy.
Thread pool size
The real issue is deciding how many threads to assign to the ScheduledExecutorService.
Threads, as currently implemented in the OpenJDK project, are mapped directly to host OS threads. This makes them relatively heavyweight in terms of CPU and memory usage. In other words, currently Java threads are “expensive”.
There is no simple easy answer to calculating thread pool size. The optimal number is the least amount of threads that can keep up with the workload without over-burdening the host machine’s limited number of cores and limited memory. If you search Stack Overflow, you’ll find many discussions on the topic of deciding how many threads to use in a pool.
Project Loom
And keep tabs with the progress of Project Loom and its promise to bring virtual threads to Java. That technology has the potential to radically alter the calculus of deciding thread pool size. Virtual threads will be more efficient with CPU and with memory. In other words, virtual threads will be quite “cheap”, “inexpensive”.
How executor service works
You said:
entering the 1st task and it tries to schedule the 2nd task, then finding out there is no thread available for scheduling the 2nd task
That is not how the scheduled executor service (SES) works.
If a task being currently executed by a SES needs to schedule itself or some other task to later execution, that submitted task is added to the queue maintained internally by the SES. There is no need to have a thread immediately available. Nothing happens immediately except that queue addition. Later, when the added task’s specified delay has elapsed, the SES looks for an available thread in its thread-pool to execute that task that was queued a while back in time.
You seem to feel a need to manage the time of each task’s execution on certain threads. But that is the job of the scheduled executor service. The SES tracks the tasks submitted for execution, notices when their specified delay elapses, and schedules their execution on a thread from its managed pool of threads. You don’t need to manage any of that. Your only challenge is to assign an appropriate number of threads to the pool.
Multiple executor services
You commented:
why don't use multiple ScheduledExecutorService instances
Because in your scenario, there is no benefit. Your Question implies that you have many tasks all similar with none being prioritized. In such a case, just use one executor service. One scheduled executor service with 12 threads will get the same amount of work accomplished as 3 services with 4 threads each.
As for excess threads, they are not a burden. Any thread without a task to execute uses virtually no CPU time. A pool may or may not choose to close some unused threads after a while. But such a policy is up to the implementation of the thread pool of the executor service, and is transparent to us as calling programmers.
If the scenario were different, where some of the tasks block for long periods of time, or where you need to prioritize certain tasks, then you may want to segregate those into a separate executor service.
In today's Java (before Project Loom with virtual threads), when code in a thread blocks, that thread sits there doing nothing but waiting to unblock. Blocking means your code is performing an operation that awaits a response. For example, making network calls to a socket or web service blocks, writing to storage blocks, and accessing an external database blocks. Ideally, you would not write code that blocks for long periods of time. But sometimes you must.
In such a case where some tasks run long, or conversely you have some tasks that must be prioritized for fast execution, then yes, use multiple executor services.
For example, say you have a 16-core machine with not much else running except your Java app. You might have one executor service with a thread pool size of 4 maximum for long-running tasks, one executor service with a thread pool with a size of 7 maximum for many run-of-the-mill tasks, and a third executor service with a thread pool maximum size of 2 for very few tasks that run short but must run quickly. (The numbers here are arbitrary examples, not a recommendation.)
Other approaches
As commented, there are other frameworks for managing concurrency. The ScheduledExecutorService discussed here is general purpose.
For example, Swing, JavaFX, Spring, and Jakarta EE each have their own concurrency management. Consider using those where approriate to your particular project.
When trying to determine how the tasks should break down in a data processing server in Java, I need to know how many Futures is too many for ExecutorService.
To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread, meaning the cost to perform a context switch between Futures is very small. Is this true?
Should I submit millions of Futures to ExecutorService (using fixed number of threads in the pool)?
Can I expect to submit many very-short-lived Futures (10 ms) into Executor service without seeing severe performance degradation?
You're conflating a Future, which represents the possible result of an asynchronous operation with a Thread which represents the ability to perform processing on a Callable (in the case of an Executor at least).
There's nothing to stop you calling submit on a thread pool millions of times and get a huge list of Future objects for you to wait on. You don't even need to wait for them to finish if the application will continue running and you have no need to process the result.
But.
If you create all these jobs, they are going to require memory to hold their state. If that memory is somehow part of the input to the job, or the result of executing the job, then you will commit heap space to all these tasks. You can't do this forever. Essentially, you need to think of some sort of throttling, if you're going to pull huge amount of work into a process to run in the background.
To my understanding, ExecutorServices with a pool of heavyweight threads, handles Futures like they are green thread
That's not correct. If we ignore the bells and whistles, an ExecutorService consists of a collection of worker threads, and a blocking queue of tasks. Each task in the queue is a wrapper containing one of your tasks, and a Future.
Each worker thread loops forever,
Picks a task from the queue,
Calls your Runnable or Callable object's run() or call(...) method,
Completes the Future with the value returned by your method or, with an exception that was thrown by your method.
Goes back to wait for another task.
The only threads are the "heavy weight" worker threads. Once one of the worker threads starts to work on a task, it won't do anything else until the task is complete. Tasks that haven't yet been started are just objects in a queue, and the Executor forgets about each task and Future object as soon as the Future is completed. Those won't continue exist after your own code has discarded the references to them.
Should I submit millions of Futures to ExecutorService?
you can, but you should evaluate possible time overhead. The overhead of handling separate Future object is small, but greater than zero. So the less number of tasks, the better. On the other hand, when the number of tasks becomes less then the number of processors (that is, the number of processor cores with respect to hyperthreading), then the level of parallelism is reduced and the overall execution time increases.
Let you have 1 million of 10-ms tasks, and your computer have 8 cores. Then the overall execution time of 1250 sec is increased by (10*8/2) = 40 ms because of decreasing parallelism at the end, and plus 125 ms for task switch (I evaluate it as little as 1 useq for each task switch). if your have 100000 100-ms tasks, then the execution time is still expected to be 1250 sec, plus 400 ms for tail and 12.5 ms for switch. Either way, the time overhead is neglidgible, but it can increase if your tasks are significantly shorter or longer than 10...100 interval.
Please note that I usually ask a question after googling for more than 20 times about the issue. But I can't still understand it. So I need your help.
Basically, I don't understand the exact usage of newFixedThreadPool
Does newFixedThreadPool(10) mean having ten different threads? Or does it mean it can have 10 of the same threads? or the both?
I executed with submit() methods more than 20 times and it's working.
Does submit() print a value? Or are you putting threads in the ExecutorService?
Briefly, tasks are small units of code that could be executed in parallel (code sections). The threads (in a thread pool) are what execute them. You can think of the threads like workers and the tasks like jobs. Jobs can be done in parallel, and workers can work in parallel. Workers work on jobs.
So, to answer your questions:
newFixedThreadPool(int nThreads) creates a thread pool of nThread threads that operate on the same input queue. nThreads is the maximum number of threads that can be running at any given time. Each thread can run a different task. With your example, you can be running up to 10 tasks at the same time. (The documentation can be found here with credit to #hovercraft-full-of-eels)
submit() pushes the given task into an event queue that is shared by the threads in the thread pool. Once a thread is available, it will take a task from the front of the queue and execute it. It shouldn't print anything, unless the Runnable you pass it has a print statement in it. However, the print statement may not be printed right when you submit the task! It will print once a thread is executing that particular task. (The documentation can be found here)
Just refer java docs or JAVA API's description rather than googling it.
For your questions I have below comments .
Question 1 ->
ExecutorService executorService = Executors.newFixedThreadPool(10);
First an ExecutorService is created using the Executors newFixedThreadPool() factory method. This creates a thread pool with 10 threads executing tasks.
Executors.newFixedThreadPool API creates, a thread pool that reuses a fixed number of threads and these threads work on a s***hared unbounded queue***.
At any point, at most nThreads threads will be active processing tasks.
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly SHUTDOWN.
After submitting even 20 tasks ,it worked with this thread pool.
Internally it calls below line of codes .
public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue());
}
Question 2- > Submits a Runnable task for execution in Queue and it can also return an Object of type Future Object representing task. we can use Future's get method to check whether submitted task has successfully completed or not because it will return null upon successful completion.
So, I have a loop where I create thousands of threads which process my data.
I checked and storing a Thread slows down my app.
It's from my loop:
Record r = new Record(id, data, outPath, debug);
//r.start();
threads.add(r);
//id is 4 digits
//data is something like 500 chars long
It stop my for loop for a while (it takes a second or more for one run, too much!).
Only init > duration: 0:00:06.369
With adding thread to ArrayList > duration: 0:00:07.348
Questions:
what is the best way of storing Threads?
how to make Threads faster?
should I create Threads and run them with special executor, means for example 10 at once, then next 10 etc.? (if yes, then how?)
Consider that having a number of threads that is very high is not very useful.
At least you can execute at the same time a number of threads equals to the number of core of your cpu.
The best is to reuse existing threads. To do that you can use the Executor framework.
For example to create an Executor that handle internally at most 10 threads you can do the followig:
List<Record> records = ...;
ExecutorService executor = Executors.newFixedThreadPool(10);
for (Record r : records) {
executor.submit(r);
}
// At the end stop the executor
executor.shutdown();
With a code similar to this one you can submit also many thousands of commands (Runnable implementations) but no more than 10 threads will be created.
I'm guessing that it is not the .add method that is really slowing you down. My guess is that the hundreds of Threads running in parallel is what really is the problem. Of course a simple command like "add" will be queued in the pipeline and can take long to be executed, even if the execution itself is fast. Also it is possible that your data-structure has an add method that is in O(n).
Possible solutions for this:
* Find a real wait-free solution for this. E.g. prioritising threads.
* Add them all to your data-structure before executing them
While it is possible to work like this it is strongly discouraged to create more than some Threads for stuff like this. You should use the Thread Executor as David Lorenzo already pointed out.
I have a loop where I create thousands of threads...
That's a bad sign right there. Creating threads is expensive.
Presumeably your program creates thousands of threads because it has thousands of tasks to perform. The trick is, to de-couple the threads from the tasks. Create just a few threads, and re-use them.
That's what a thread pool does for you.
Learn about the java.util.concurrent.ThreadPoolExecutor class and related classes (e.g., Future). It implements a thread pool, and chances are very likely that it provides all of the features that you need.
If your needs are simple enough, you can use one of the static methdods in java.util.concurrent.Executors to create and configure a thread pool. (e.g., Executors.newFixedThreadPool(N) will create a new thread pool with exactly N threads.)
If your tasks are all compute bound, then there's no reason to have any more threads than the number of CPUs in the machine. If your tasks spend time waiting for something (e.g., waiting for commands from a network client), then the decision of how many threads to create becomes more complicated: It depends on how much of what resources those threads use. You may need to experiment to find the right number.
I want to process some data in parallel worker threads. But instead of a parent thread that checks if one worker thread has finished and then assigning a new task, I want the threads to load the data themselfs and to restart themselfes again.
Now this is what I came up with:
public class MainApp {
ExecutorService executor;
public synchronized void runNewWorkerThread(){
//load the data to be processed in the threads from a file
executor.submit(()->{
try{
// process data (unstable)
}catch(Exception e){
//catch and log exception
}finally{
runNewWorkerThread();
}
});
}
}
now this recursivly restarts the worker threads.Is this an acceptable design, or should I rather keep the worker threads alive by doing some kind of a loop inside the runnable?
If this is an acceptable design, which ExecutorService would you reccomend me to use, and why ?
Thanks a lot,
Flo
Edit: The number of Threads started is fixed, because in the threads a fixed number of real devices is automated. However there is one single list the threads need to load their data from,sequentially.
I think your code is fine. Also, you should not run into a StackOverflowException, since you do not call the method runNewWorkerThread directly. You just submit the code to call the runNewWorkerThread to the ExecutorService and the submit function call will return pretty much instantly (depending on the implementation).
Be sure to start the worker properly. If you want e.g. five threads to run in parallel, you need to call the runNewWorkerThread method five times, because every call to runNewWorkerThread will start only exactly one new runNewWorkerThread after it is finished. Also, you should only have one MainApp object, to ensure the synchronized keyword really synchronizes all load operations.
Update
If you use e.g. the newFixedThreadPool you can be sure to not run into a StackOverflowException, because this ExecutorService only runs a fixed number of threads at a time. That means, that it will only execute another submitted task, after one of the other task is finished. Because the other task is finished, it must have left the runNewWorkerThread method. I hope this is clear enough?