500 Worker Threads, what kind of thread pool? - java

I am wondering if this is the best way to do this. I have about 500 threads that run indefinitely, but Thread.sleep for a minute when done one cycle of processing.
ExecutorService es = Executors.newFixedThreadPool(list.size()+1);
for (int i = 0; i < list.size(); i++) {
es.execute(coreAppVector.elementAt(i)); //coreAppVector is a vector of extends thread objects
}
The code that is executing is really simple and basically just this
class aThread extends Thread {
public void run(){
while(true){
Thread.sleep(ONE_MINUTE);
//Lots of computation every minute
}
}
}
I do need a separate threads for each running task, so changing the architecture isn't an option. I tried making my threadPool size equal to Runtime.getRuntime().availableProcessors() which attempted to run all 500 threads, but only let 8 (4xhyperthreading) of them execute. The other threads wouldn't surrender and let other threads have their turn. I tried putting in a wait() and notify(), but still no luck. If anyone has a simple example or some tips, I would be grateful!
Well, the design is arguably flawed. The threads implement Genetic-Programming or GP, a type of learning algorithm. Each thread analyzes advanced trends makes predictions. If the thread ever completes, the learning is lost. That said, I was hoping that sleep() would allow me to share some of the resources while one thread isn't "learning"
So the actual requirements are
how can I schedule tasks that maintain
state and run every 2 minutes, but
control how many execute at one time.

If your threads are not terminating, this is the fault of the code within the thread, not the thread pool. For more detailed help you will need to post the code that is being executed.
Also, why do you put each Thread to sleep when it is done; wouldn't it be better just to let it complete?
Additionally, I think you are misusing the thread pool by having a number of threads equal to the number of tasks you wish to execute. The point of a thread pool is to put a constraint on the number of resources used; this approach is no better than not using a thread pool at all.
Finally, you don't need to pass instances of Thread to your ExecutorService, just instances of Runnable. ExecutorService maintains its own pool of threads which loop indefinitely, pulling work off of an internal queue (the work being the Runnables you submit).

Why not used a ScheduledExecutorService to schedule each task to run once per minute, instead of leaving all these threads idle for a full minute?
ScheduledExecutorService workers =
Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors());
for (Runnable task : list) {
workers.scheduleWithFixedDelay(task, 0, 1, TimeUnit.MINUTES);
}
What do you mean by, "changing the architecture isn't an option"? If you mean that you can't modify your task at all (specifically, the tasks have to loop, instead of running once, and the call to Thread.sleep() can't be removed), then "good performance isn't an option," either.

I'm not sure your code is semantically correct in how it's using a thread pool. ExecutionService creates and manages threads internally, a client should just supply an instance of Runnable, whose run() method will be executed in context of one of pooled threads. You can check my example. Also note that each running thread takes ~10Mb of system memory for the stack, and on linux the mapping of java-to-native threads is 1-to-1.

Instead of putting a tread to sleep you should let it return and use a ThreadPoolexecutor to execute work posted every minute to your work queue.

To answer your question, what type of thread pool?
I posted my comments but this really should address your issue. You have a computation that can take 2 seconds to complete. You have many tasks (500) that you want to be completed as fast as possible. The fastest possible throughput you can achieve, assuming there is no IO and or network traffic, is with Runtime.getRuntime().availableProcessors() number of threads.
If you increase your number to 500 threads, then each task will be executing on its own thread, but the OS will schedule a thread out every so often to give to another thread. Thats 125 context switches at any given point. Each context switch will increase the amount of time for each task to run.
The big picture here is that adding more threads does NOT equal greater throughput when you are way over the number of processors.
Edit: A quick update. You dont need to sleep here. When you execute the 500 tasks with 8 processors, each task will complete in the 2 seconds, finish and the thread it was running on will then take the next task and complete that one.

8 Threads is the max that your system can handle, any more and you are slowing yourself down with context switching.
Look at this article http://www.informit.com/articles/article.aspx?p=1339471&seqNum=4 It will give you an overview of how to do it.

This should do what you desire, but not what you asked for :-) You have to take out the Thread.sleep()
ScheduledRunnable.java
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
public class ScheduledRunnable
{
public static void main(final String[] args)
{
final int numTasks = 10;
final ScheduledExecutorService ses = Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors());
for (int i = 0; i < numTasks; i++)
{
ses.scheduleAtFixedRate(new MyRunnable(i), 0, 10, TimeUnit.SECONDS);
}
}
private static class MyRunnable implements Runnable
{
private int id;
private int numRuns;
private MyRunnable(final int id)
{
this.id = id;
this.numRuns = 0;
}
#Override
public void run()
{
this.numRuns += 1;
System.out.format("%d - %d\n", this.id, this.numRuns);
}
}
}
This schedules the Runnables every 10 SECONDS to show the behavior.
If you really need to wait a fixed amount of time AFTER processing is complete you might need to play around with which .scheduleXXX method that you need. I think fixedWait will just run it every N amount of time regardless of what the execution time is.

I do need a separate threads for each running task, so changing the architecture isn't an option.
If that is true (for example, making a call to an external blocking function), then create separate threads for them and start them. You can't create a thread pool with a limited number of threads, as a blocking function in one of threads will prevent any other runnable being put into it, and don't gain much creating a thread pool with one thread per task.
I tried making my threadPool size equal to Runtime.getRuntime().availableProcessors() which attempted to run all 500 threads, but only let 8 (4xhyperthreading) of them execute.
When you pass the Thread objects you are creating to thread pool, it only sees that they implement Runnable. Therefore it will run each Runnable to completion. Any loop which stops the run() method returning will not allow the next enqueued task to run; eg:
public static void main (String...args) {
ExecutorService executor = Executors.newFixedThreadPool(2);
for (int i = 0; i < 10; ++i) {
final int task = i;
executor.execute(new Runnable () {
private long lastRunTime = 0;
#Override
public void run () {
for (int iteration = 0; iteration < 4; )
{
if (System.currentTimeMillis() - this.lastRunTime > TIME_OUT)
{
// do your work here
++iteration;
System.out.printf("Task {%d} iteration {%d} thread {%s}.\n", task, iteration, Thread.currentThread());
this.lastRunTime = System.currentTimeMillis();
}
else
{
Thread.yield(); // otherwise, let other threads run
}
}
}
});
}
executor.shutdown();
}
prints out:
Task {0} iteration {1} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {1} thread {Thread[pool-1-thread-2,5,main]}.
Task {0} iteration {2} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {2} thread {Thread[pool-1-thread-2,5,main]}.
Task {0} iteration {3} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {3} thread {Thread[pool-1-thread-2,5,main]}.
Task {0} iteration {4} thread {Thread[pool-1-thread-1,5,main]}.
Task {2} iteration {1} thread {Thread[pool-1-thread-1,5,main]}.
Task {1} iteration {4} thread {Thread[pool-1-thread-2,5,main]}.
Task {3} iteration {1} thread {Thread[pool-1-thread-2,5,main]}.
Task {2} iteration {2} thread {Thread[pool-1-thread-1,5,main]}.
Task {3} iteration {2} thread {Thread[pool-1-thread-2,5,main]}.
Task {2} iteration {3} thread {Thread[pool-1-thread-1,5,main]}.
Task {3} iteration {3} thread {Thread[pool-1-thread-2,5,main]}.
Task {2} iteration {4} thread {Thread[pool-1-thread-1,5,main]}.
...
showing that the first (thread pool size) tasks run to completion before the next tasks get scheduled.
What you need to do is create tasks which run for a while, then let other tasks run. Quite how you structure these depends on what you want to achieve
whether you want all the tasks to run at the same time, the all wait for a minute, then all run at the same time again, or whether the tasks are not synchronised with each other
whether you really wanted each task to run at a one-minute interval
whether your tasks are potentially blocking or not, and so really require separate threads
what behaviour is expected if a task blocks longer than the expected window for running
what behaviour is expected if a task blocks longer than the repeat rate (blocks for more than one minute)
Depending on the answers to these, some combination of ScheduledExecutorService, semaphores or mutexes can be used to co-ordinate the tasks. The simplest case is the non-blocking, non-synchronous tasks, in which case use a ScheduledExecutorService directly to run your runnables once every minute.

Can you rewrite your project for using some agent-based concurrency framework, like Akka?

You can certainly find some improvement in throughput by reducing the number of threads to what the system can realistically handle. Are you open to changing the design of the thread a bit? It'll unburden the scheduler to put the sleeping ones in a queue instead of actually having hundreds of sleeping threads.
class RepeatingWorker implements Runnable {
private ExecutorService executor;
private Date lastRan;
//constructor takes your executor
#Override
public void run() {
try {
if (now > lastRan + ONE_MINUTE) {
//do job
lastRan = now;
} else {
return;
} finally {
executor.submit(this);
}
}
}
This preserves your core semantic of 'job repeats indefinitely, but waits at least one minute between executions' but now you can tune the thread pool to something the machine can handle and the ones that aren't working are in a queue instead of loitering about in the scheduler as sleeping threads. There is some wait busy behavior if nobody's actually doing anything, but I am assuming from your post that the entire purpose of the application is to run these threads and it's currently railing your processors. You may need to tune around that if room has to be made for other things :)

You need a semaphore.
class AThread extends Thread {
Semaphore sem;
AThread(Semaphore sem) {
this.sem = sem;
}
public void run(){
while(true){
Thread.sleep(ONE_MINUTE);
sem.acquire();
try {
//Lots of computation every minute
} finally {
sem.release();
}
}
}
}
When instantiating the AThreads you need to pass the same semaphore instance:
Semaphore sem = new Semaphore(MAX_AVAILABLE, true);
Edit: Who voted down can please explain why? There is something wrong in my solution?

Related

Is it possible to wait the main thread while all the threads of executor service are processing tasks

I am having a scenario of around inserting millions of data into the back end and currently using executor framework to load this. I will explain my problem in simpler terms.
In the below case, I am having 10 runnable and three threads to execute the same. Consider my runnable is doing an insert operation and it is taking time to complete the task. When I checked ,It is understood that ,if all the threads are busy, the other tasks will go to the queue and once the threads completed the tasks ,it will fetch the tasks from the pool and complete it.
So in this case, object of SampleRunnable 4 to 10 will be created and this will be in the pool.
Problem: Since I need to load millions of tasks,I cannot load all the records in queue which can lead to memory issues. So my question is instead of taking all tasks in the queue ,is it possible to make the main thread waiting until any one of the executor worker threads becomes available.
Following approaches I tried as a work around instead of queuing this much tasks:
Approach 1: Used Array Blocking Queue for executor and gave the size as 5 (for e.g.)
So in this case, when the 9th task comes ,this will throw RejectedExecutionException and in the catch clause,put a sleep for 1 minute and recursively trying the same.This will get picked up on any of the retry when the thread is available.
Approach 2: Used shut down and await termination. i.e. if the task count is 5, i am putting shut down and await termination. In the await Termination 'if' block (executor.awaitTermination(60000,TimeUnit.SECONDS)),I am instantiating the thread pool again.
public class SampleMain {
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(3);
for (int i=0;i<10;i++){
executorService.execute(new SampleRunnable(i));
}
executor.shutdown();
}
Sounds like the problem is, you want to throttle the main thread, so that it does not get ahead of the workers. If that's the case, then consider explicitly constructing a ThreadPoolExecutor instance instead of calling Executors.newFixedThreadPool().
That class has several different constructors, and most of them allow you to supply your own blocking queue. If you create an ArrayBlockingQueue with a limited size, then every time the queue becomes full, the main thread will be automatically blocked until a worker makes room by taking another task.
final int work_queue_size = 30;
BlockingQueue work_queue = new ArrayBlockingQueue(work_queue_size);
ExecutorService executor = new ThreadPoolExecutor(..., work_queue);
for (int i=0;i<10;i++){
executorService.execute(new SampleRunnable(i));
}
...

How not to start ScheduledExecutorService task if previous one is not finished

My problem is we have to give it a fixed schedule time to make it start task. Lets say i give 10 seconds and my task has average finish time of 10-15 seconds. Thus after some time waiting threads in quque causes huge memory consumption. If i use syncronized for the method above problem will occur. If i don't use syncronized then i am wasting resources ( cpu) because i dont need to run task if not finished. So i thought a solution of recursive call of task but i believe recursive threads will add more memory problems... what should i do? Shortly i just want to be able to call a task when it is finished. Not fixed time.
public void myScheduledTask{
doJob(); ( use countdown latch to control waiting if necessary)
TimeUnit.SECONDS.sleep(x);
new Thread( new Runnable( { mySchedulTask(); } ));
or
executor.execute( a thread that call myScheduledTask() method);
}
The option that sounds like what you're trying to accomplish:
ScheduledExecutorService executor = Executors.newScheduledThreadPool(count);
ScheduledFuture<?> future = executor.scheduleWithFixedDelay(
task,
delay,
delay,
TimeUnit.MILLISECONDS
);
This would start your task and execute it after delay milliseconds after the previous completion. Count should be the number of threads you want to use, 1 is acceptable. This also lets you stop the task using the future.
The problems with your example. a) You are sleeping on an executor thread. Dont do this let the executor handle it. If you were using a threadpool of 1 then this executor couldn't do any work while you're waiting. b) Starting a new thread is taking control from the executor... just use the executor, then you have some control over the execution.
If you really wanted to stick with the form you have.
class RecurringTask implements Runnable{
#Override
public void run(){
doJob();
executor.schedule(this, delay, TimeUnit.MILLISECONDS);
}
}
Now you will be creating Futures that you never use, so it will be harder to control the execution of the task.
Create static member in your task class - Lock.
In doJob avoid performing job if lock is already aquired :
if (lock.tryLock()) {
try {
// do the job
} finally {
lock.unlock();
}
} else {
// log the fact you skipped the job
return;
}

Java Executor with throttling/throughput control

I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers.
I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard" version out there somewhere I'd at least like to look at it first.
Take a look at guavas RateLimiter:
A rate limiter. Conceptually, a rate limiter distributes permits at a
configurable rate. Each acquire() blocks if necessary until a permit
is available, and then takes it. Once acquired, permits need not be
released. Rate limiters are often used to restrict the rate at which
some physical or logical resource is accessed. This is in contrast to
Semaphore which restricts the number of concurrent accesses instead of
the rate (note though that concurrency and rate are closely related,
e.g. see Little's Law).
Its threadsafe, but still #Beta. Might be worth a try anyway.
You would have to wrap each call to the Executor with respect to the rate limiter. For a more clean solution you could create some kind of wrapper for the ExecutorService.
From the javadoc:
final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
void submitTasks(List<Runnable> tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire(); // may wait
executor.execute(task);
}
}
The Java Executor doesn't offer such a limitation, only limitation by amount of threads, which is not what you are looking for.
In general the Executor is the wrong place to limit such actions anyway, it should be at the moment where the Thread tries to call the outside server. You can do this for example by having a limiting Semaphore that threads wait on before they submit their requests.
Calling Thread:
public void run() {
// ...
requestLimiter.acquire();
connection.send();
// ...
}
While at the same time you schedule a (single) secondary thread to periodically (like every 60 seconds) releases acquired resources:
public void run() {
// ...
requestLimiter.drainPermits(); // make sure not more than max are released by draining the Semaphore empty
requestLimiter.release(MAX_NUM_REQUESTS);
// ...
}
no more than say 100 tasks can be processed in a second -- if more
tasks get submitted they should get queued and executed later
You need to look into Executors.newFixedThreadPool(int limit). This will allow you to limit the number of threads that can be executed simultaneously. If you submit more than one thread, they will be queued and executed later.
ExecutorService threadPool = Executors.newFixedThreadPool(100);
Future<?> result1 = threadPool.submit(runnable1);
Future<?> result2 = threadPool.submit(runnable2);
Futurte<SomeClass> result3 = threadPool.submit(callable1);
...
Snippet above shows how you would work with an ExecutorService that allows no more than 100 threads to be executed simultaneously.
Update:
After going over the comments, here is what I have come up with (kinda stupid). How about manually keeping a track of threads that are to be executed ? How about storing them first in an ArrayList and then submitting them to the Executor based on how many threads have already been executed in the last one second.
So, lets say 200 tasks have been submitted into our maintained ArrayList, We can iterate and add 100 to the Executor. When a second passes, we can add few more threads based on how many have completed in theExecutor and so on
Depending on the scenario, and as suggested in one of the previous responses, the basic functionalities of a ThreadPoolExecutor may do the trick.
But if the threadpool is shared by multiple clients and you want to throttle, to restrict the usage of each one of them, making sure that one client won't use all the threads, then a BoundedExecutor will do the work.
More details can be found in the following example:
http://jcip.net/listings/BoundedExecutor.java
Personally I found this scenario quite interesting. In my case, I wanted to stress that the interesting phase to throttle is the consuming side one, as in classical Producer/Consumer concurrent theory. That's the opposite of some of the suggested answers before. This is, we don't want to block the submitting thread, but block the consuming threads based in a rate (tasks/second) policy. So, even if there are tasks ready in the queue, executing/consuming Threads may block waiting to meet the throtle policy.
That said, I think a good candidate would be the Executors.newScheduledThreadPool(int corePoolSize). This way you would need a simple queue in front of the executor (a simple LinkedBlockingQueue would suit), and then schedule a periodic task to pick actual tasks from the queue (ScheduledExecutorService.scheduleAtFixedRate). So, is not an straightforward solution, but it should perform goog enough if you try to throttle the consumers as discussed before.
Can limit it inside Runnable:
public static Runnable throttle (Runnable realRunner, long delay) {
Runnable throttleRunner = new Runnable() {
// whether is waiting to run
private boolean _isWaiting = false;
// target time to run realRunner
private long _timeToRun;
// specified delay time to wait
private long _delay = delay;
// Runnable that has the real task to run
private Runnable _realRunner = realRunner;
#Override
public void run() {
// current time
long now;
synchronized (this) {
// another thread is waiting, skip
if (_isWaiting) return;
now = System.currentTimeMillis();
// update time to run
// do not update it each time since
// you do not want to postpone it unlimited
_timeToRun = now+_delay;
// set waiting status
_isWaiting = true;
}
try {
Thread.sleep(_timeToRun-now);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
// clear waiting status before run
_isWaiting = false;
// do the real task
_realRunner.run();
}
}};
return throttleRunner;
}
Take from JAVA Thread Debounce and Throttle

Thread Scheduling - Run threads in set order

I have a set of roughly 20 threads and i want to schedule them so they run in a set order.
Is there a way to do this. I have tried using priority and setting the priority 1-10 but the scheduler still seems to execute threads at its own order. Btw im working in Java
Is there a way to run threads in a set order ?
Thanks
regards
Mike
What you need is an ExecutorService that will run your threads one at a time, namely : newSingleThreadExecutor.
ExecutorService pool = Executors.newSingleThreadExecutor();
pool.submit(job1);
pool.submit(job2);
pool.submit(job3);
Why do you have multiple threads if you want synchronous behaviour in the first place?
If you've acquired multiple Thread objects from "something else" then you can use thread.run() to execute them in the current thread, which will, of course allow you to control the order.
You don't have to run a single threaded version if the jobs can be executed in parallel. Below is an example where you can use eight threads to run your 20 jobs:
public static void main(String[] args) {
final ExecutorService executorService = Executors.newFixedThreadPool(8);
final Queue<Integer> workItems = new ConcurrentLinkedQueue<Integer>();
for (int i = 0; i < 20; i++) {
workItems.add(i);
}
for (int i = 0; i < 20; i++) {
executorService.submit(new Runnable() {
#Override
public void run() {
final Integer workIem = workItems.poll();
// process work item
}
});
}
// await termination of the exec service using shutdown() and awaitTermination()
}
The idea is that you use an auxiliary queue to maintain the items to be processed and rely on the FIFO ordering of the queue to process the items in order and in parallel.
If the threads depend on each other, then one option would be to schedule only the first thread and have it spawn its dependent threads, which can then turn their dependent threads, etc...
You need to understand, however, that even though you may be launching threads in a particular order, as soon as they start they are off of your hands and they will be fighting for resources and the OS will time-slice their executions, which means some may get "ahead" of threads that were launched before. So if you truly need to keep the order, then I would suggest you use only one thread and let it orchestrate the tasks in a synchronized manner.

ScheduledExecutorService multiple threads in parallel

I'm interested in using ScheduledExecutorService to spawn multiple threads for tasks if task before did not yet finish. For example I need to process a file every 0.5s. First task starts processing file, after 0.5s if first thread is not finished second thread is spawned and starts processing second file and so on. This can be done with something like this:
ScheduledExecutorService executor = Executors.newScheduledThreadPool(4)
while (!executor.isShutdown()) {
executor.execute(task);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// handle
}
}
Now my question: Why I can't do it with executor.scheduleAtFixedRate?
What I get is if the first task takes longer, the second task is started as soon as first finished, but no new thread is started even if executor has pool of threads. executor.scheduleWithFixedDelay is clear - it executes tasks with same time span between them and it doesn't matter how long it takes to complete the task. So probably I misunderstood ScheduledExecutorService purpose.
Maybe I should look at another kind of executor? Or just use code which I posted here? Any thoughts?
I've solved the problem by launching a nested anonymous runnable in each scheduled execution:
final ScheduledExecutorService service = Executors.newScheduledThreadPool(POOL_SIZE);
final Runnable command = new SlowRunnable();
service.scheduleAtFixedRate(
new Runnable() {
#Override
public void run() {
service.execute(command);
}
}, 0, 1, TimeUnit.SECONDS);
With this example there will be 1 thread executing at every interval a fast instruction, so it will be surely be finished when the next interval is expired. The remaining POOL_SIZE-1 threads will be executing the SlowRunnable's run() in parallel, which may take longer time than the duration of the single interval.
Please note that while I like this solution as it minimize the code and reuse the same ScheduledExecutorService, it must be sized correctly and may not be usable in every context: if the SlowRunnable is so slow that up to POOL_SIZE jobs get executed together, there will be no threads to run the the scheduled task in time.
Also, if you set the interval at 1 TimeUnit.NANOSECONDS it will probably became too slow also the execution of the main runnable.
One of the scheduleAtFixedRate methods is what you're looking for. It starts a task in a thread from the pool at the given interval, even if previous tasks haven't finished. If you're running out of threads to do the processing, adjust the pool size constraints as detailed in the ThreadPoolExecutor docs.

Categories

Resources