I've been curious about Quasar and its light weight Fibers as a replacement for Threads. After consulting their API docs, I have not been able to figure out how to go about converting a typical ThreadPoolExecutor into a pool of Fibers.
int maxThreadPoolSize = 10;
ThreadPoolExecutor executor = new ThreadPoolExecutor(
maxThreadPoolSize,
maxThreadPoolSize,
10, TimeUnit.MINUTES,
new ArrayBlockingQueue<Runnable>(maxThreadPoolSize),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy()
);
for (int i = 0; i < 100; i++) {
executor.execute(new Runnable() {
#Override
public void run() {
// run some code
}
});
}
The above code creates a pool with 10 threads, a queue in front of the pool that can hold 10 elements and a rejection policy (when queue is full) to have main thread execute a Runnable task itself. As the for loop creates 100 runnables, they will be executed 10 at a time in the pool, 10 queued up, and main thread picks up a Runnable itself until others are finished, after which main thread goes back to adding Runnables to executor.
How would you do this with Quasar's Fibers? Is it meant to be used as such in the first place?
EDIT: My original question was poorly phrased. Essentially I was trying to find a mechanism to limit how many Fibers can run concurrently. For example, do not launch more Fibers if there is already 200 Fibers running. If max number of Fibers are running, wait until one finishes before launching a new one.
Fibers are very cheap so you shouldn't need pooling (and its async job-dispatching model) at all: just fire up a fiber and let it run regular sequential code every time you need a new sequential process to be run concurrently with others.
Each fiber scheduled by a FiberScheduler, when you create a Fiber without scheduler, a FiberForkJoinScheduler will be created and assigned to this fiber.
In short, if you want to manage your fibers in a thread pool, use FiberExecutorScheduler:
Quasar's document about scheduling fibers
Your code could be like this
int maxThreadPoolSize = 10;
ThreadPoolExecutor executor = new ThreadPoolExecutor(
maxThreadPoolSize,
maxThreadPoolSize,
10, TimeUnit.MINUTES,
new ArrayBlockingQueue<Runnable>(maxThreadPoolSize),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy()
);
FiberExecutorScheduler scheduler = new FiberExecutorScheduler("FibersInAPool", executor);
for (int i = 0; i < 100; i++) {
Fiber fiber = new Fiber<Void>(scheduler
, new SuspendableCallable<Void>() {
#Override
public Void run() throws SuspendExecution, InterruptedException {
// run some code
return null;
}
});
fiber.start();
}
java.util.concurrent.Semaphore ended up working well in my particular setup.
General gist of my solution:
create Semaphore with desired max number of permits (aka max concurrent Fibers)
main thread is in charge of picking up tasks to process from a queue
main thread calls semaphore.acquire():
if a permit is available, then launch new Fiber to process task
if all permits are taken, then semaphore will block main thread and wait until a permit becomes available
once Fiber is launched, main thread repeats its logic. Picks up a new task from queue and attempts to launch a new Fiber.
Bonus: standard Java's Semaphore is fixed and number of permits can not be dynamically adjusted. To make it dynamic this link came in handy: http://blog.teamlazerbeez.com/2009/04/20/javas-semaphore-resizing/
we just did a pre-release of kilim 2.0. it provides a fiber and actor implementation (similar to quasar) and is backed by ThreadPoolExecutor
the most efficient way to limit the number of concurrent tasks would be to have one task serve as a controller and listen to a mailbox (i think quasar calls these channels) and maintain a count of running tasks. when each task finishes, message the mailbox
generally, it doesn't make sense to use more threads than there are cores
Related
I am having a scenario of around inserting millions of data into the back end and currently using executor framework to load this. I will explain my problem in simpler terms.
In the below case, I am having 10 runnable and three threads to execute the same. Consider my runnable is doing an insert operation and it is taking time to complete the task. When I checked ,It is understood that ,if all the threads are busy, the other tasks will go to the queue and once the threads completed the tasks ,it will fetch the tasks from the pool and complete it.
So in this case, object of SampleRunnable 4 to 10 will be created and this will be in the pool.
Problem: Since I need to load millions of tasks,I cannot load all the records in queue which can lead to memory issues. So my question is instead of taking all tasks in the queue ,is it possible to make the main thread waiting until any one of the executor worker threads becomes available.
Following approaches I tried as a work around instead of queuing this much tasks:
Approach 1: Used Array Blocking Queue for executor and gave the size as 5 (for e.g.)
So in this case, when the 9th task comes ,this will throw RejectedExecutionException and in the catch clause,put a sleep for 1 minute and recursively trying the same.This will get picked up on any of the retry when the thread is available.
Approach 2: Used shut down and await termination. i.e. if the task count is 5, i am putting shut down and await termination. In the await Termination 'if' block (executor.awaitTermination(60000,TimeUnit.SECONDS)),I am instantiating the thread pool again.
public class SampleMain {
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(3);
for (int i=0;i<10;i++){
executorService.execute(new SampleRunnable(i));
}
executor.shutdown();
}
Sounds like the problem is, you want to throttle the main thread, so that it does not get ahead of the workers. If that's the case, then consider explicitly constructing a ThreadPoolExecutor instance instead of calling Executors.newFixedThreadPool().
That class has several different constructors, and most of them allow you to supply your own blocking queue. If you create an ArrayBlockingQueue with a limited size, then every time the queue becomes full, the main thread will be automatically blocked until a worker makes room by taking another task.
final int work_queue_size = 30;
BlockingQueue work_queue = new ArrayBlockingQueue(work_queue_size);
ExecutorService executor = new ThreadPoolExecutor(..., work_queue);
for (int i=0;i<10;i++){
executorService.execute(new SampleRunnable(i));
}
...
When my application launches, a executor service (using Executors.newFixedThreadPool(maxThreadNum) in java.util.concurrent) object is created. When requests come, the executor service will creates threads to handle them.
Because it takes time to create threads at run time, I want to make threads available when launching application, so that when requests come, it would take less time to process.
What I did is following:
executorService = Executors.newFixedThreadPool(200);
for (int i=0; i<200; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
System.out.println("Start thread in pool " );
}
});
}
It will creates 200 threads in the executorService pool when application launches.
Just wonder is this a correct way of creating threads when application starts?
Or is there a better way of doing it?
You are missing shutdown().It is very important to shutdown the Executor service once the operation is completed. So have try,catch and Finally block
try{
executorService.execute(...);
}catach(Exception e){
...
}finally{
executorService.shutdown(); //Mandatory
}
If you can use a ThreadPoolExecutor directly rather than an ExecutorService from Executors1, then there's perhaps a more standard/supported way to start all the core threads immediately.
int nThreads = 200;
ThreadPoolExecutor executor = new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<>());
executor.prestartAllCoreThreads();
The above uses prestartAllCoreThreads().
Note that, currently, the implementation of Executors.newFixedThreadPool(int) creates a ThreadPoolExecutor in the exact same manner as above. This means you could technically cast the ExecutorService returned by the factory method to a ThreadPoolExecutor. There's nothing in the documentation that guarantees it will be a ThreadPoolExecutor, however.
1. ThreadPoolExecutor implements ExecutorService but provides more functionality. Also, many of the factory methods in Executors either returns a ThreadPoolExecutor directly or a wrapper that delegates to one. Some, like newWorkStealingPool, use the ForkJoinPool. Again, the return types of these factory methods are implementation details so don't rely too much on it.
The number of threads which could run parallel depends on your processor core. Unless you have 200 cores it would be pretty useless to make a thread pool of 200.
A great way to find out how many processors cores you have is:
int cores = Runtime.getRuntime().availableProcessors();
Moreover the overhead which develops during creating a new thread and executing it is unavoidable, so unless the task is heavily computed it would not be worth to create a new single thread for this task.
But after all your code is total fine so far.
Your code is totally fine if it works for your scenario. Since we don't know your use case, only you can answer your question with enough tests and benchmark.
However, do take note that the ThreadPool will reclaim idle threads after some time. That may bite you if you don't pay attention to it.
Just wonder is this a correct way of creating threads when application
starts?
Yes. That's a correct way of creating threads.
Or is there a better way of doing it?
Maybe. Under some workloads you might want to use a Thread pool with a variable number of threads (unlike the one created by newFixedThreadPool) - one that removes from the pool threads that have been idle for some time.
I'm new to this concurrent programming in java and came up with following scenarios where I'm getting confusion which to use when.
Scenario 1: In the following code I was trying to run threads by calling .start() on GPSService class which is a Runnable implementation.
int clientNumber = 0;
ServerSocket listener = new ServerSocket(port);
while (true) {
new GPSService(listener.accept(), clientNumber++, serverUrl).start();
}
Scenario 2: In the following code I was trying to run threads by using ExecutorService class as shown
int clientNumber = 0;
ServerSocket listener = new ServerSocket(port);
while(true) {
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(new GPSService(listener.accept(), client++, serverUrl));
executor.shutdown();
while (!executor.awaitTermination(1, TimeUnit.SECONDS)) {
// Threads are still running
System.out.println("Thread is still running");
}
// All threads are completed
System.out.println("\nThread completed it's execution and terminated successfully\n");
}
My Questions are
Which is the best practice to invoke a thread in concurrent programming?
What will be result(troubles) I'll end up with when I use first or second?
Note: I've been facing an issue with the first scenario where the program is getting hanged after every few days. So, is that issue related/expected when I use first method.?
Any good/helpful answer will be appreciated :) Thank you
There are no big differences in the two scenario you posted, except from managing thread termination in Scenario2; you always create a new thread for each incoming request. If you want to use ThreadPool my advice is not to create one for every request but to create one for each server and reuse threads. Something like:
public class YourClass {
//in init method or constructor
ExecutorService executor = Executors....;// choose from newCachedThreadPool() or newFixedThreadPool(int nThreads) or some custom option
int clientNumber = 0;
ServerSocket listener = new ServerSocket(port);
while(true) {
executor.execute(new GPSService(listener.accept(), client++, serverUrl));
}
This will allow you to use a thread pool and to control how many threads to use for your server. If you want to use a Executor this is the preferred way to go.
With a server pool you need to decide how many threads there are in the pool; you have different choices but you can start or with a fixed number or threads or with a pool that tries to use a non busy thread and if all threads are busy it creates a new one (newCachedThreadPool()). The number of threads to allocate depends form many factors: the number of concurrents requests and it durations. The more your server side code takes time the more you need for additional thread. If your server side code is very faster there are very high chances that the pool can recycle threads already allocated (since the requests do not come all in the same exact instant).
Say for example that you have 10 request during a second and each request lasts 0.2 seconds; if the request arrive at 0, 0.1, 0.2, 0.3, 0.4, 0.5, .. part of the second (for example 23/06/2015 7:16:00:00, 23/06/2015 7:16:00:01, 23/06/2015 7:16:00:02) you need only three threads since the request coming at 0.3 can be performed by the thread that server the first request (the one at 0), and so on (the request at time 0.4 can reuse thread used for the request that came at 0.1). Ten requests managed by three threads.
I recommend you (if you did not it already) to read Java Concurrency in practice (Task Execution is chapter 6); which is an excellent book on how to build concurrent application in Java.
From oracle documentation from Executors
public static ExecutorService newCachedThreadPool()
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks.
Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache.
Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.
public static ExecutorService newFixedThreadPool(int nThreads)
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly shutdown.
#Giovanni is saying that you don' have to provide number of threads to newCachedThreadPool unlike newFixedThreadPool(), where you have to pass maximum cap on number of threads in ThreadPool.
But between these two, newFixedThreadPool() is preferred. newCachedThread Pool may cause leak and you may reach maximum number of available threads due to unbounded nature. Some people consider it as an evil.
Have a look at related SE question:
Why is an ExecutorService created via newCachedThreadPool evil?
I have a set of roughly 20 threads and i want to schedule them so they run in a set order.
Is there a way to do this. I have tried using priority and setting the priority 1-10 but the scheduler still seems to execute threads at its own order. Btw im working in Java
Is there a way to run threads in a set order ?
Thanks
regards
Mike
What you need is an ExecutorService that will run your threads one at a time, namely : newSingleThreadExecutor.
ExecutorService pool = Executors.newSingleThreadExecutor();
pool.submit(job1);
pool.submit(job2);
pool.submit(job3);
Why do you have multiple threads if you want synchronous behaviour in the first place?
If you've acquired multiple Thread objects from "something else" then you can use thread.run() to execute them in the current thread, which will, of course allow you to control the order.
You don't have to run a single threaded version if the jobs can be executed in parallel. Below is an example where you can use eight threads to run your 20 jobs:
public static void main(String[] args) {
final ExecutorService executorService = Executors.newFixedThreadPool(8);
final Queue<Integer> workItems = new ConcurrentLinkedQueue<Integer>();
for (int i = 0; i < 20; i++) {
workItems.add(i);
}
for (int i = 0; i < 20; i++) {
executorService.submit(new Runnable() {
#Override
public void run() {
final Integer workIem = workItems.poll();
// process work item
}
});
}
// await termination of the exec service using shutdown() and awaitTermination()
}
The idea is that you use an auxiliary queue to maintain the items to be processed and rely on the FIFO ordering of the queue to process the items in order and in parallel.
If the threads depend on each other, then one option would be to schedule only the first thread and have it spawn its dependent threads, which can then turn their dependent threads, etc...
You need to understand, however, that even though you may be launching threads in a particular order, as soon as they start they are off of your hands and they will be fighting for resources and the OS will time-slice their executions, which means some may get "ahead" of threads that were launched before. So if you truly need to keep the order, then I would suggest you use only one thread and let it orchestrate the tasks in a synchronized manner.
I am wondering if this is the best way to do this. I have about 500 threads that run indefinitely, but Thread.sleep for a minute when done one cycle of processing.
ExecutorService es = Executors.newFixedThreadPool(list.size()+1);
for (int i = 0; i < list.size(); i++) {
es.execute(coreAppVector.elementAt(i)); //coreAppVector is a vector of extends thread objects
}
The code that is executing is really simple and basically just this
class aThread extends Thread {
public void run(){
while(true){
Thread.sleep(ONE_MINUTE);
//Lots of computation every minute
}
}
}
I do need a separate threads for each running task, so changing the architecture isn't an option. I tried making my threadPool size equal to Runtime.getRuntime().availableProcessors() which attempted to run all 500 threads, but only let 8 (4xhyperthreading) of them execute. The other threads wouldn't surrender and let other threads have their turn. I tried putting in a wait() and notify(), but still no luck. If anyone has a simple example or some tips, I would be grateful!
Well, the design is arguably flawed. The threads implement Genetic-Programming or GP, a type of learning algorithm. Each thread analyzes advanced trends makes predictions. If the thread ever completes, the learning is lost. That said, I was hoping that sleep() would allow me to share some of the resources while one thread isn't "learning"
So the actual requirements are
how can I schedule tasks that maintain
state and run every 2 minutes, but
control how many execute at one time.
If your threads are not terminating, this is the fault of the code within the thread, not the thread pool. For more detailed help you will need to post the code that is being executed.
Also, why do you put each Thread to sleep when it is done; wouldn't it be better just to let it complete?
Additionally, I think you are misusing the thread pool by having a number of threads equal to the number of tasks you wish to execute. The point of a thread pool is to put a constraint on the number of resources used; this approach is no better than not using a thread pool at all.
Finally, you don't need to pass instances of Thread to your ExecutorService, just instances of Runnable. ExecutorService maintains its own pool of threads which loop indefinitely, pulling work off of an internal queue (the work being the Runnables you submit).
Why not used a ScheduledExecutorService to schedule each task to run once per minute, instead of leaving all these threads idle for a full minute?
ScheduledExecutorService workers =
Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors());
for (Runnable task : list) {
workers.scheduleWithFixedDelay(task, 0, 1, TimeUnit.MINUTES);
}
What do you mean by, "changing the architecture isn't an option"? If you mean that you can't modify your task at all (specifically, the tasks have to loop, instead of running once, and the call to Thread.sleep() can't be removed), then "good performance isn't an option," either.
I'm not sure your code is semantically correct in how it's using a thread pool. ExecutionService creates and manages threads internally, a client should just supply an instance of Runnable, whose run() method will be executed in context of one of pooled threads. You can check my example. Also note that each running thread takes ~10Mb of system memory for the stack, and on linux the mapping of java-to-native threads is 1-to-1.
Instead of putting a tread to sleep you should let it return and use a ThreadPoolexecutor to execute work posted every minute to your work queue.
To answer your question, what type of thread pool?
I posted my comments but this really should address your issue. You have a computation that can take 2 seconds to complete. You have many tasks (500) that you want to be completed as fast as possible. The fastest possible throughput you can achieve, assuming there is no IO and or network traffic, is with Runtime.getRuntime().availableProcessors() number of threads.
If you increase your number to 500 threads, then each task will be executing on its own thread, but the OS will schedule a thread out every so often to give to another thread. Thats 125 context switches at any given point. Each context switch will increase the amount of time for each task to run.
The big picture here is that adding more threads does NOT equal greater throughput when you are way over the number of processors.
Edit: A quick update. You dont need to sleep here. When you execute the 500 tasks with 8 processors, each task will complete in the 2 seconds, finish and the thread it was running on will then take the next task and complete that one.
8 Threads is the max that your system can handle, any more and you are slowing yourself down with context switching.
Look at this article http://www.informit.com/articles/article.aspx?p=1339471&seqNum=4 It will give you an overview of how to do it.
This should do what you desire, but not what you asked for :-) You have to take out the Thread.sleep()
ScheduledRunnable.java
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
public class ScheduledRunnable
{
public static void main(final String[] args)
{
final int numTasks = 10;
final ScheduledExecutorService ses = Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors());
for (int i = 0; i < numTasks; i++)
{
ses.scheduleAtFixedRate(new MyRunnable(i), 0, 10, TimeUnit.SECONDS);
}
}
private static class MyRunnable implements Runnable
{
private int id;
private int numRuns;
private MyRunnable(final int id)
{
this.id = id;
this.numRuns = 0;
}
#Override
public void run()
{
this.numRuns += 1;
System.out.format("%d - %d\n", this.id, this.numRuns);
}
}
}
This schedules the Runnables every 10 SECONDS to show the behavior.
If you really need to wait a fixed amount of time AFTER processing is complete you might need to play around with which .scheduleXXX method that you need. I think fixedWait will just run it every N amount of time regardless of what the execution time is.
I do need a separate threads for each running task, so changing the architecture isn't an option.
If that is true (for example, making a call to an external blocking function), then create separate threads for them and start them. You can't create a thread pool with a limited number of threads, as a blocking function in one of threads will prevent any other runnable being put into it, and don't gain much creating a thread pool with one thread per task.
I tried making my threadPool size equal to Runtime.getRuntime().availableProcessors() which attempted to run all 500 threads, but only let 8 (4xhyperthreading) of them execute.
When you pass the Thread objects you are creating to thread pool, it only sees that they implement Runnable. Therefore it will run each Runnable to completion. Any loop which stops the run() method returning will not allow the next enqueued task to run; eg:
public static void main (String...args) {
ExecutorService executor = Executors.newFixedThreadPool(2);
for (int i = 0; i < 10; ++i) {
final int task = i;
executor.execute(new Runnable () {
private long lastRunTime = 0;
#Override
public void run () {
for (int iteration = 0; iteration < 4; )
{
if (System.currentTimeMillis() - this.lastRunTime > TIME_OUT)
{
// do your work here
++iteration;
System.out.printf("Task {%d} iteration {%d} thread {%s}.\n", task, iteration, Thread.currentThread());
this.lastRunTime = System.currentTimeMillis();
}
else
{
Thread.yield(); // otherwise, let other threads run
}
}
}
});
}
executor.shutdown();
}
prints out:
Task {0} iteration {1} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {1} thread {Thread[pool-1-thread-2,5,main]}.
Task {0} iteration {2} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {2} thread {Thread[pool-1-thread-2,5,main]}.
Task {0} iteration {3} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {3} thread {Thread[pool-1-thread-2,5,main]}.
Task {0} iteration {4} thread {Thread[pool-1-thread-1,5,main]}.
Task {2} iteration {1} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {4} thread {Thread[pool-1-thread-2,5,main]}.
Task {3} iteration {1} thread {Thread[pool-1-thread-2,5,main]}.
Task {2} iteration {2} thread {Thread[pool-1-thread-1,5,main]}.
Task {3} iteration {2} thread {Thread[pool-1-thread-2,5,main]}.
Task {2} iteration {3} thread {Thread[pool-1-thread-1,5,main]}.
Task {3} iteration {3} thread {Thread[pool-1-thread-2,5,main]}.
Task {2} iteration {4} thread {Thread[pool-1-thread-1,5,main]}.
...
showing that the first (thread pool size) tasks run to completion before the next tasks get scheduled.
What you need to do is create tasks which run for a while, then let other tasks run. Quite how you structure these depends on what you want to achieve
whether you want all the tasks to run at the same time, the all wait for a minute, then all run at the same time again, or whether the tasks are not synchronised with each other
whether you really wanted each task to run at a one-minute interval
whether your tasks are potentially blocking or not, and so really require separate threads
what behaviour is expected if a task blocks longer than the expected window for running
what behaviour is expected if a task blocks longer than the repeat rate (blocks for more than one minute)
Depending on the answers to these, some combination of ScheduledExecutorService, semaphores or mutexes can be used to co-ordinate the tasks. The simplest case is the non-blocking, non-synchronous tasks, in which case use a ScheduledExecutorService directly to run your runnables once every minute.
Can you rewrite your project for using some agent-based concurrency framework, like Akka?
You can certainly find some improvement in throughput by reducing the number of threads to what the system can realistically handle. Are you open to changing the design of the thread a bit? It'll unburden the scheduler to put the sleeping ones in a queue instead of actually having hundreds of sleeping threads.
class RepeatingWorker implements Runnable {
private ExecutorService executor;
private Date lastRan;
//constructor takes your executor
#Override
public void run() {
try {
if (now > lastRan + ONE_MINUTE) {
//do job
lastRan = now;
} else {
return;
} finally {
executor.submit(this);
}
}
}
This preserves your core semantic of 'job repeats indefinitely, but waits at least one minute between executions' but now you can tune the thread pool to something the machine can handle and the ones that aren't working are in a queue instead of loitering about in the scheduler as sleeping threads. There is some wait busy behavior if nobody's actually doing anything, but I am assuming from your post that the entire purpose of the application is to run these threads and it's currently railing your processors. You may need to tune around that if room has to be made for other things :)
You need a semaphore.
class AThread extends Thread {
Semaphore sem;
AThread(Semaphore sem) {
this.sem = sem;
}
public void run(){
while(true){
Thread.sleep(ONE_MINUTE);
sem.acquire();
try {
//Lots of computation every minute
} finally {
sem.release();
}
}
}
}
When instantiating the AThreads you need to pass the same semaphore instance:
Semaphore sem = new Semaphore(MAX_AVAILABLE, true);
Edit: Who voted down can please explain why? There is something wrong in my solution?