ThreadPoolExecutor not shrinking at low load - java

In my program most of the time tasks are rarely submitted to the executor, yet they don't cease completely. There are periodic bursts when many tasks are submitted at once.
Even though allowCoreThreadTimeOut is set and only one thread would be enough most of the time, the redundant executor threads don't stop.
This is because of the fairness of the executor's blocking queue: when multiple threads wait for it, all have equal chance to get a task and their idle time doesn't grow significantly.
Is there a workaround? For example, a queue that in case of multiple waiting threads returns in the thread with lowest id?
public class ShrinkTPE {
public static void main(final String[] args) throws Exception {
final ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors
.newFixedThreadPool(NTHREADS);
executor.setKeepAliveTime(ALIVE_TIME, TimeUnit.SECONDS);
executor.allowCoreThreadTimeOut(true);
// thread alive time is 10s
// load all threads with tasks at start and every 12s
// also submit one task each second
for (int i = 0;; i++) {
int j = 0;
do {
if (false && !mostThreadsUnused(i))
break;
final int i2 = i, j2 = j;
executor.submit(new Callable<Void>() {
#Override
public Void call() throws Exception {
System.out.println(""
+ Thread.currentThread().getName() + " " + i2
+ " " + j2);
Thread.sleep(300);
return null;
}
});
} while (mostThreadsUnused(i) && ++j < NTHREADS);
Thread.sleep(1000);
System.out.println();
}
}
private static boolean mostThreadsUnused(final int i) {
return i % (ALIVE_TIME + 2) == 0;
}
private static final int NTHREADS = 5;
private static final int ALIVE_TIME = 10;
}

final ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(N_THREAD);
You are using fixedThreadPool and that means, that pool will have N_THREAD number of threads constantly all the time. allowCoreThreadTimeout is neglected here.
Use different thread pool, perhaps CachedThreadPool? It will reuse existing threads, but it will spin up additional threads if you submit new task to the pool and there will be no idle thread.
Idle threads dies after X amount of time (default 60 seconds of idle)

The official JDK implementation of newCachedThreadPool is as follows. You can simply call that constructor directly if you want to set a maximum thread pool size or customized the keepAliveTime or use a different queue.
public static ExecutorService newCachedThreadPool() {
return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
}

Related

ExecutorService and AtomicInteger : RejectedExecutionException

I want atomicInteger to have a value of 100 then the program terminates
public static void main(String[] args) throws InterruptedException {
ExecutorService executor = Executors.newSingleThreadExecutor();
AtomicInteger atomicInteger = new AtomicInteger(0);
do {
executor.submit(() -> {
System.out.println(atomicInteger.getAndAdd(10));
if (atomicInteger.get() == 100) {
//executor.shutdownNown();
}
});
} while (true);
}
I have error
Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1d8d10a rejected from java.util.concurrent.ThreadPoolExecutor#9e54c2[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 10]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1374)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
How should I implement it.
There is no need to use AtomicInteger here, since your Runnable lambda function invocations are guaranteed to execute sequentially (by new SingleThreadExecutor). Also, your Runnable lambda code were to take any time to execute (e.g. 2ms), your main loop will queue up far more than 10 tasks needed to hit your limit. You can see this happen if you add a 2ms sleep inside your Runnable lambda function, and also add a counter to your do/while loop, and print the value of the counter out at the end to see how many instances Runnables you queued up.
Assuming that you wish to test this code with concurrent threads, you would need to replace the call to newSingleThreadPool with newFixedThreadPool. The approach your code takes is problematic when concurrent threads are being used. In the following code, I've switched to newFixedThreadPool, added a counter, so we can see how many tasks are queued, and added to short pauses in your Runnable lambda function, just to represent a small amount of work. When I execute this program, atomicInteger became greater than 13000 and the program crashed with java.lang.OutOfMemoryError: GC overhead limit exceeded That is because, your runnable function always adds 10 to atomicInteger regardless of it's current value. And also, the code queues up more tasks than it needs. Here's the code with these small changes that illustrate the problem.
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(3);
AtomicInteger atomicInteger = new AtomicInteger(0);
int i=0;
do {
executor.submit(() -> {
pause(2); // simulates some small amount of work.
System.out.println("atomicInt="+atomicInteger.getAndAdd(10));
pause(2); // simulates some small amount of work.
if (atomicInteger.get() == 100) {
System.out.println("executor.shutdownNow()");
System.out.flush();
executor.shutdownNow();
}
});
if (atomicInteger.get() == 100) {
break;
}
} while (true);
System.out.println("final atomicInt="+atomicInteger.get());
System.out.println("final tasks queued="+i);
}
public static void pause(long millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException ex) {
}
}
Here is a version that fixes the concurrency problems and moves the executor management out of the worker threads where it doesn't really belong:
private static int LIMIT = 100;
private static int INCREMENT = 10;
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(2);
AtomicInteger atomicInteger = new AtomicInteger(0);
for (int i=0; i < LIMIT/INCREMENT; i++) {
executor.submit(() -> {
pause(2);
System.out.println("atomicInt=" + atomicInteger.getAndAdd(INCREMENT));
System.out.flush();
pause(2);
});
}
executor.shutdown();
while (!executor.isTerminated()) {
System.out.println("Executor not yet terminated");
System.out.flush();
pause(4);
}
System.out.println("final atomicInt=" + atomicInteger.get());
}
public static void pause(long millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException ex) {
}
}
You should just change your while loop to check for the condition that you needed and shutdown the executor after that

Does threads of ThreadPoolExecutor not runs concurrently using with PriorityBlockingQueue

I am using java ThreadPoolExecutor to run concurrent thread execution. I used ArrayBlockingQueue to keep threads in queue. But now requirement has changed and I need to add thread run time(no size limit) and it should be prioritized.
So i decided to use PriorityBlockingQueue instead of ArrayBlockingQueue with some comparison Logic.
After using PriorityBlockingQueue, threads are running sequentially one after one not concurrently. Only one thread run at a time, rather than whatever the active thread count will be.
Please let me know if anybody have any suggestions to resolve this issue and achieve my requirement(thread should be added in pool at run time and it execution should be based on priority).
My demo code:
//RejectedExecutionHandler implementation
RejectedExecutionHandlerImpl rejectionHandler = new RejectedExecutionHandlerImpl();
//Get the ThreadFactory implementation to use
BlockingQueue<Runnable> queue = new PriorityBlockingQueue<Runnable>(50, ThreadComparator.getComparator());
ThreadPoolExecutor executorPool = new ThreadPoolExecutor(1, activeThread, 10, TimeUnit.SECONDS, queue, threadFactory, rejectionHandler);
//start the monitoring thread
MyMonitorThread monitor = new MyMonitorThread(executorPool, 20, "Demo");
Thread monitorThread = new Thread(monitor);
monitorThread.start();
for (int i = 0; i < totalThead; i++) {
int prio = i % 3 == 0 ? 3 : 5;
executorPool.execute(new MyThread("Thread-" + i, prio));
}
// Inserting more threads in between concurrent execution.
try {
Thread.sleep(40000);
for (int j = 101; j < 110; j++) {
executorPool.execute(new MyThread("Thread-" + j, 2));
}
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
while(executorPool.getActiveCount() != 0) {
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
System.out.println("Error while thread sleeping: " + e);
}
}
//shut down the pool
executorPool.shutdown();
//shut down the monitor thread
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
System.out.println("Error while thread sleeping: " + e);
}
monitor.shutdown();
public abstract class ThreadComparator implements Comparator<Runnable>{
public static Comparator<Runnable> getComparator() {
return new Comparator<Runnable>() {
#Override
public int compare(Runnable t1, Runnable t2) {
CompareToBuilder compare = new CompareToBuilder();
MyThread mt1 = (MyThread) t1;
MyThread mt2 = (MyThread) t2;
compare.append(mt1.getPriority(), mt2.getPriority());
return compare.toComparison();
}
};
}
}
This is the expected behaviour of ThreadPoolExecutor with an unbounded work queue.
To cite the ThreadPoolExecutor JavaDoc:
Core and maximum pool sizes
A ThreadPoolExecutor will automatically adjust the pool size [..].
When a new task is submitted in method execute(Runnable), and fewer
than corePoolSize threads are running, a new thread is created to
handle the request, even if other worker threads are idle. If there
are more than corePoolSize but less than maximumPoolSize threads
running, a new thread will be created only if the queue is full. [...]
Since you define corePoolSize as 1 and a PriorityBlockingQueue is essentially an unbounded queue (that can never become full), you will never have more than one thread.
The fix is to adjust the corePoolSize to the required number of threads.

ThreadPoolExecutor mysteriously rejecting runnables

Given the following unit tests, can somebody explain to me as to why at some point the ThreadPoolExecutor rejects a tasks?
#Test
public void testRejectionBehavior() throws Exception {
final AtomicLong count = new AtomicLong(0);
final AtomicInteger activeThreads = new AtomicInteger(0);
for (;;) {
ThreadPoolExecutor pool = new ThreadPoolExecutor(20, 20,
0L, TimeUnit.MILLISECONDS,
new SynchronousQueue<Runnable>(), new ThreadPoolExecutor.CallerRunsPolicy());
int prestarted = pool.prestartAllCoreThreads();
pool.allowCoreThreadTimeOut(false);
System.out.println("Prestarted #" + prestarted);
for (int i = 0; i < 100; i++) {
final int thisTasksActive = activeThreads.incrementAndGet();
pool.execute(new Runnable() {
#Override
public void run() {
long value = count.incrementAndGet();
if (value % 50 == 0) {
System.out.println("Execution #" + value + " / active: " + thisTasksActive);
}
if (Thread.currentThread().getName().equals("main")) {
throw new IllegalStateException("Execution #" + value + " / active: " + thisTasksActive);
}
activeThreads.decrementAndGet();
}
});
Thread.sleep(5);
}
}
}
The output for me looks like this:
....
Execution #200 / active: 1
Prestarted #20
java.lang.IllegalStateException: Execution #201 / active: 1 / pool stats: java.util.concurrent.ThreadPoolExecutor#156643d4[Running, pool size = 20, active threads = 20, queued tasks = 0, completed tasks = 0]
As you can see, it does some 200 executions and then suddenly rejects the first task of a new iteration.
Ok, after a lot of digging into the ThreadPoolExecutor it turns out that using the given parameters when creating the ThreadPoolExecutor it is not immediately able to execute tasks.
There is actually a race condition even if you invoke pool.prestartAllCoreThreads();. You see, prestartAllCoreThreads() creates new ThreadPoolExecutor.Worker instances which implement Runnable interface. When instantiating them they set their internal state to to -1 making them appear as "active threads" in the toString() output of the ThreadPoolExecutor. Now also in their constructor, the Worker instances create a new Thread and set themselves as Runnable for this Thread. It is not until their run() method is actually called by the newly started thread that they set their state to be available for taking on tasks and subsequently calling the workQueue.take() method.
In short, when you have a ThreadPoolExecutor with a synchronous queue and prestart all Threads, it might take a while for these threads to really start up and block in the queue.take() state. It is not until then that you can submit tasks and not get a rejected execution.
You haven't provided a proper queue for the executor to store the tasks in. A SynchronousQueue has no capacity, not even 1. You fill out the threadpool and then your next task has to run on the main thread as is the normal behaviour in this case.
SynchronousQueue is a weird beast, and the only time I've seen it being used in code on SO is with executors. In questions like "why does my code act weird". How did you come up with using a SynchronousQueue here?

Learning about Threads

I have written a simple program, that is intended to start a few threads. The threads should then pick a integer n from an integer array, use it to wait n and return the time t the thread waited back into an array for the results.
If one thread finishes it's task, it should pick the next one, that has not yet being assigned to another thread.
Of course: The order in the arrays has to be maintained, so that integers and results match.
My code runs smoothly as far I see.
However I use one line of code block I find in particular unsatisfying and hope there is a good way to fix this without changing too much:
while(Thread.activeCount() != 1); // first evil line
I kinda abuse this line to make sure all my threads finish getting all the tasks done, before I access my array with the results. I want to do that to prevent ill values, like 0.0, Null Pointer Exception... etc. (in short anything that would make an application with an actual use crash)
Any sort of constructive help is appreciated. I am also not sure, if my code still runs smoothly for very very long arrays of tasks for the threads, for example the results no longer match the order of the integer.
Any constructive help is appreciated.
First class:
public class ThreadArrayWriterTest {
int[] repitions;
int len = 0;
double[] timeConsumed;
public boolean finished() {
synchronized (repitions) {
return len <= 0;
}
}
public ThreadArrayWriterTest(int[] repitions) {
this.repitions = repitions;
this.len = repitions.length;
timeConsumed = new double[this.len];
}
public double[] returnTimes(int[] repititions, int numOfThreads, TimeConsumer timeConsumer) {
for (int i = 0; i < numOfThreads; i++) {
new Thread() {
public void run() {
while (!finished()) {
len--;
timeConsumed[len] = timeConsumer.returnTimeConsumed(repititions[len]);
}
}
}.start();
}
while (Thread.activeCount() != 1) // first evil line
;
return timeConsumed;
}
public static void main(String[] args) {
long begin = System.currentTimeMillis();
int[] repitions = { 3, 1, 3, 1, 2, 1, 3, 3, 3 };
int numberOfThreads = 10;
ThreadArrayWriterTest t = new ThreadArrayWriterTest(repitions);
double[] times = t.returnTimes(repitions, numberOfThreads, new TimeConsumer());
for (double d : times) {
System.out.println(d);
}
long end = System.currentTimeMillis();
System.out.println("Total time of execution: " + (end - begin));
}
}
Second class:
public class TimeConsumer {
double returnTimeConsumed(int repitions) {
long before = System.currentTimeMillis();
for (int i = 0; i < repitions; i++) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
long after = System.currentTimeMillis();
double ret = after - before;
System.out.println("It takes: " + ret + "ms" + " for " + repitions + " runs through the for-loop");
return ret;
}
}
The easiest way to wait for all threads to complete is to keep a Collection of them and then call Thread.join() on each one in turn.
In addition to .join() you can use ExecutorService to manage pools of threads,
An Executor that provides methods to manage termination and methods
that can produce a Future for tracking progress of one or more
asynchronous tasks.
An ExecutorService can be shut down, which will cause it to reject new
tasks. Two different methods are provided for shutting down an
ExecutorService. The shutdown() method will allow previously submitted
tasks to execute before terminating, while the shutdownNow() method
prevents waiting tasks from starting and attempts to stop currently
executing tasks. Upon termination, an executor has no tasks actively
executing, no tasks awaiting execution, and no new tasks can be
submitted. An unused ExecutorService should be shut down to allow
reclamation of its resources.
Method submit extends base method Executor.execute(Runnable) by
creating and returning a Future that can be used to cancel execution
and/or wait for completion. Methods invokeAny and invokeAll perform
the most commonly useful forms of bulk execution, executing a
collection of tasks and then waiting for at least one, or all, to
complete.
ExecutorService executorService = Executors.newFixedThreadPool(maximumNumberOfThreads);
CompletionService completionService = new ExecutorCompletionService(executorService);
for (int i = 0; i < numberOfTasks; ++i) {
completionService.take();
}
executorService.shutdown();
Plus take a look at ThreadPoolExecutor
Since java provides more advanced threading API with concurrent package, You should have look into ExecutorService, which simplifies thread management mechanism.
Simple to solution to your problem.
Use Executors API to create thread pool
static ExecutorService newFixedThreadPool(int nThreads)
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.
Use invokeAll to wait for all tasks to complete.
Sample code:
ExecutorService service = Executors.newFixedThreadPool(10);
List<MyCallable> futureList = new ArrayList<MyCallable>();
for ( int i=0; i<12; i++){
MyCallable myCallable = new MyCallable((long)i);
futureList.add(myCallable);
}
System.out.println("Start");
try{
List<Future<Long>> futures = service.invokeAll(futureList);
for(Future<Long> future : futures){
try{
System.out.println("future.isDone = " + future.isDone());
System.out.println("future: call ="+future.get());
}
catch(Exception err1){
err1.printStackTrace();
}
}
}catch(Exception err){
err.printStackTrace();
}
service.shutdown();
Refer to this related SE question for more details on achieving the same:
wait until all threads finish their work in java

Is it possible to use multithreading without creating Threads over and over again?

First and once more, thanks to all that already answered my question. I am not a very experienced programmer and it is my first experience with multithreading.
I got an example that is working quite like my problem. I hope it could ease our case here.
public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
#Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
public static void main(String[] args) {
ThreadFactory threadFactory = new ThreadFactory() {
int counter = 1;
#Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r, "Executor thread " + (counter++));
return t;
}
};
// the total duty to be divided in tasks is fixed (problem dependent).
// Increase ntasks will mean decrease the task time proportionally.
// 4 Is an arbitrary example.
// This tasks will be executed thousands of times, inside a loop alternating
// with serial processing that needs their result and prepare the next ones.
int ntasks = 4;
int nthreads = 2;
int ncores = Runtime.getRuntime().availableProcessors();
if (nthreads<ncores) ncores = nthreads;
Batch serial = new Batch(null);
long serialTime = System.nanoTime();
serial.run();
serialTime = System.nanoTime() - serialTime;
ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
CountDownLatch countDown = new CountDownLatch(ntasks);
ArrayList<Batch> batches = new ArrayList<Batch>();
for (int i = 0; i < ntasks; i++) {
batches.add(new Batch(countDown));
}
long start = System.nanoTime();
for (Batch r : batches){
executor.execute(r);
}
// wait for all threads to finish their task
try {
countDown.await();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long tmeasured = (System.nanoTime() - start);
System.out.println("Task time= " + TASK_TIME + " ms");
System.out.println("Number of tasks= " + ntasks);
System.out.println("Number of threads= " + nthreads);
System.out.println("Number of cores= " + ncores);
System.out.println("Measured time= " + tmeasured);
System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);
executor.shutdown();
}
}
Instead of doing the calculations, each batch just waits for some given time. The program calculates the speedup, that would allways be 2 in theory but can get less than 1 (actually a speed down) if the 'TASK_TIME' is small.
My calculations take at the top 1 ms and are commonly faster. For 1 ms I find a little speedup of around 30%, but in practice, with my program, I notice a speed down.
The structure of this code is very similar to my program, so if you could help me to optimise the thread handling I would be very grateful.
Kind regards.
Below, the original question:
Hi.
I would like to use multithreading on my program, since it could increase its efficiency considerably, I believe. Most of its running time is due to independent calculations.
My program has thousands of independent calculations (several linear systems to solve), but they just happen at the same time by minor groups of dozens or so. Each of this groups would take some miliseconds to run. After one of these groups of calculations, the program has to run sequentially for a little while and then I have to solve the linear systems again.
Actually, it can be seen as these independent linear systems to solve are inside a loop that iterates thousands of times, alternating with sequential calculations that depends on the previous results. My idea to speed up the program is to compute these independent calculations in parallel threads, by dividing each group into (the number of processors I have available) batches of independent calculation. So, in principle, there isn't queuing at all.
I tried using the FixedThreadPool and CachedThreadPool and it got even slower than serial processing. It seems to takes too much time creating new Treads each time I need to solve the batches.
Is there a better way to handle this problem? These pools I've used seem to be proper for cases when each thread takes more time instead of thousands of smaller threads...
Thanks!
Best Regards!
Thread pools don't create new threads over and over. That's why they're pools.
How many threads were you using and how many CPUs/cores do you have? What is the system load like (normally, when you execute them serially, and when you execute with the pool)? Is synchronization or any kind of locking involved?
Is the algorithm for parallel execution exactly the same as the serial one (your description seems to suggest that serial was reusing some results from previous iteration).
From what i've read: "thousands of independent calculations... happen at the same time... would take some miliseconds to run" it seems to me that your problem is perfect for GPU programming.
And i think it answers you question. GPU programming is becoming more and more popular. There are Java bindings for CUDA & OpenCL. If it is possible for you to use it, i say go for it.
I'm not sure how you perform the calculations, but if you're breaking them up into small groups, then your application might be ripe for the Producer/Consumer pattern.
Additionally, you might be interested in using a BlockingQueue. The calculation consumers will block until there is something in the queue and the block occurs on the take() call.
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
CountDownLatch getLatch(){
return countDown;
}
#Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
class CalcProducer implements Runnable {
private final BlockingQueue queue;
CalcProducer(BlockingQueue q) { queue = q; }
public void run() {
try {
while(true) {
CountDownLatch latch = new CountDownLatch(ntasks);
for(int i = 0; i < ntasks; i++) {
queue.put(produce(latch));
}
// don't need to wait for the latch, only consumers wait
}
} catch (InterruptedException ex) { ... handle ...}
}
CalcGroup produce(CountDownLatch latch) {
return new Batch(latch);
}
}
class CalcConsumer implements Runnable {
private final BlockingQueue queue;
CalcConsumer(BlockingQueue q) { queue = q; }
public void run() {
try {
while(true) { consume(queue.take()); }
} catch (InterruptedException ex) { ... handle ...}
}
void consume(Batch batch) {
batch.Run();
batch.getLatch().await();
}
}
class Setup {
void main() {
BlockingQueue<Batch> q = new LinkedBlockingQueue<Batch>();
int numConsumers = 4;
CalcProducer p = new CalcProducer(q);
Thread producerThread = new Thread(p);
producerThread.start();
Thread[] consumerThreads = new Thread[numConsumers];
for(int i = 0; i < numConsumers; i++)
{
consumerThreads[i] = new Thread(new CalcConsumer(q));
consumerThreads[i].start();
}
}
}
Sorry if there are any syntax errors, I've been chomping away at C# code and sometimes I forget the proper java syntax, but the general idea is there.
If you have a problem which does not scale to multiple cores, you need to change your program or you have a problem which is not as parallel as you think. I suspect you have some other type of bug, but cannot say based on the information given.
This test code might help.
Time per million tasks 765 ms
code
ExecutorService es = Executors.newFixedThreadPool(4);
Runnable task = new Runnable() {
#Override
public void run() {
// do nothing.
}
};
long start = System.nanoTime();
for(int i=0;i<1000*1000;i++) {
es.submit(task);
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
long time = System.nanoTime() - start;
System.out.println("Time per million tasks "+time/1000/1000+" ms");
EDIT: Say you have a loop which serially does this.
for(int i=0;i<1000*1000;i++)
doWork(i);
You might assume that changing to loop like this would be faster, but the problem is that the overhead could be greater than the gain.
for(int i=0;i<1000*1000;i++) {
final int i2 = i;
ex.execute(new Runnable() {
public void run() {
doWork(i2);
}
}
}
So you need to create batches of work (at least one per thread) so there are enough tasks to keep all the threads busy, but not so many tasks that your threads are spending time in overhead.
final int batchSize = 10*1000;
for(int i=0;i<1000*1000;i+=batchSize) {
final int i2 = i;
ex.execute(new Runnable() {
public void run() {
for(int i3=i2;i3<i2+batchSize;i3++)
doWork(i3);
}
}
}
EDIT2: RUnning atest which copied data between threads.
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
#Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
#Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
starts badly but warms up to ~50 us.
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
Hmm, CachedThreadPool seems to be created just for your case. It does not recreate threads if you reuse them soon enough, and if you spend a whole minute before you use new thread, the overhead of thread creation is comparatively negligible.
But you can't expect parallel execution to speed up your calculations unless you can also access data in parallel. If you employ extensive locking, many synchronized methods, etc you'll spend more on overhead than gain on parallel processing. Check that your data can be efficiently processed in parallel and that you don't have non-obvious synchronizations lurkinb in the code.
Also, CPUs process data efficiently if data fully fit into cache. If data sets of each thread is bigger than half the cache, two threads will compete for cache and issue many RAM reads, while one thread, if only employing one core, may perform better because it avoids RAM reads in the tight loop it executes. Check this, too.
Here's a psuedo outline of what I'm thinking
class WorkerThread extends Thread {
Queue<Calculation> calcs;
MainCalculator mainCalc;
public void run() {
while(true) {
while(calcs.isEmpty()) sleep(500); // busy waiting? Context switching probably won't be so bad.
Calculation calc = calcs.pop(); // is it pop to get and remove? you'll have to look
CalculationResult result = calc.calc();
mainCalc.returnResultFor(calc,result);
}
}
}
Another option, if you're calling external programs. Don't put them in a loop that does them one at a time or they won't run in parallel. You can put them in a loop that PROCESSES them one at a time, but not that execs them one at a time.
Process calc1 = Runtime.getRuntime.exec("myCalc paramA1 paramA2 paramA3");
Process calc2 = Runtime.getRuntime.exec("myCalc paramB1 paramB2 paramB3");
Process calc3 = Runtime.getRuntime.exec("myCalc paramC1 paramC2 paramC3");
Process calc4 = Runtime.getRuntime.exec("myCalc paramD1 paramD2 paramD3");
calc1.waitFor();
calc2.waitFor();
calc3.waitFor();
calc4.waitFor();
InputStream is1 = calc1.getInputStream();
InputStreamReader isr1 = new InputStreamReader(is1);
BufferedReader br1 = new BufferedReader(isr1);
String resultStr1 = br1.nextLine();
InputStream is2 = calc2.getInputStream();
InputStreamReader isr2 = new InputStreamReader(is2);
BufferedReader br2 = new BufferedReader(isr2);
String resultStr2 = br2.nextLine();
InputStream is3 = calc3.getInputStream();
InputStreamReader isr3 = new InputStreamReader(is3);
BufferedReader br3 = new BufferedReader(isr3);
String resultStr3 = br3.nextLine();
InputStream is4 = calc4.getInputStream();
InputStreamReader isr4 = new InputStreamReader(is4);
BufferedReader br4 = new BufferedReader(isr4);
String resultStr4 = br4.nextLine();

Categories

Resources