I have started working on a project in which I will be passing how many threads I need to use and then I will be trying to measure how much time SELECT sql is taking in getting executed so for that I have a counter just before this line preparedStatement.executeQuery(); and a counter to measure the time after this line.
Below is the snippet of my code-
public class TestPool {
public static void main(String[] args) {
final int no_of_threads = 10;
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(no_of_threads);
// queue some tasks
for(int i = 0; i < 3 * no_of_threads; i++) {
service.submit(new ThreadTask());
}
// wait for termination
service.shutdown();
service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
// Now print the select histogram here
System.out.println(ThreadTask.selectHistogram);
}
}
Below is my ThreadTask class that implements Runnable interface-
class ThreadTask implements Runnable {
private PreparedStatement preparedStatement = null;
private ResultSet rs = null;
public static ConcurrentHashMap<Long, AtomicLong> selectHistogram = new ConcurrentHashMap<Long, AtomicLong>();
public ThreadTask() {
}
#Override
public void run() {
...........
long start = System.nanoTime();
rs = preparedStatement.executeQuery();
long end = System.nanoTime() - start;
final AtomicLong before = selectHistogram.putIfAbsent(end / 1000000L, new AtomicLong(1L));
if (before != null) {
before.incrementAndGet();
}
..............
}
}
Problem Statement:-
Today I had a design meeting, in which most of the folks said do not start measuring the time as soon as you are running your program. Have some warm up period and then start measuring it. So I thought after that it makes some sense to do that. And now I am thinking how should I incorporate that change here in my code. On what basis I will be doing this? Can anyone suggest something?
One of the simplest ways to warm up is to run the same test, without timing, in the same JVM before you run the actual (timed) test. You should run thousands of iterations to give the JIT a chance to identify and optimise hot spots. There is a lot more detail in the answer to How do I write a correct micro-benchmark in Java? (about this and other matters you need to consider)
You can run it for about 15 minutes creating new thread task. After that clear the selectHistogram re-run with required number of threads to print the metrics.
Related
Good day,
I am writing a program where a method is called for each line read from a text file. As each call of this method is independent of any other line read I can call them on parallel. To maximize cpu usage I use a ExecutorService where I submit each run() call. As the text file has 15 million lines, I need to stagger the ExecutorService run to not create too many jobs at once (OutOfMemory exception). I also want to keep track of the time each submitted run has been running as I have seen that some are not finishing. The problem is that when I try to use the Future.get method with timeout, the timeout refers to the time since it got into the queue of the ExecutorService, not since it started running, if it even started. I would like to get the time since it started running, not since it got into the queue.
The code looks like this:
ExecutorService executorService= Executors.newFixedThreadPool(ncpu);
line = reader.readLine();
long start = System.currentTimeMillis();
HashMap<MyFut,String> runs = new HashMap<MyFut, String>();
HashMap<Future, MyFut> tasks = new HashMap<Future, MyFut>();
while ( (line = reader.readLine()) != null ) {
String s = line.split("\t")[1];
final String m = line.split("\t")[0];
MyFut f = new MyFut(s, m);
tasks.put(executorService.submit(f), f);
runs.put(f, line);
while (tasks.size()>ncpu*100){
try {
Thread.sleep(100);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Iterator<Future> i = tasks.keySet().iterator();
while(i.hasNext()){
Future task = i.next();
if (task.isDone()){
i.remove();
} else {
MyFut fut = tasks.get(task);
if (fut.elapsed()>10000){
System.out.println(line);
task.cancel(true);
i.remove();
}
}
}
}
}
private static class MyFut implements Runnable{
private long start;
String copy;
String id2;
public MyFut(String m, String id){
super();
copy=m;
id2 = id;
}
public long elapsed(){
return System.currentTimeMillis()-start;
}
#Override
public void run() {
start = System.currentTimeMillis();
do something...
}
}
As you can see I try to keep track of how many jobs I have sent and if a threshold is passed I wait a bit until some have finished. I also try to check if any of the jobs is taking too long to cancel it, keeping in mind which failed, and continue execution. This is not working as I hoped. 10 seconds execution for one task is much more than needed (I get 1000 lines done in 70 to 130s depending on machine and number of cpu).
What am I doing wrong? Shouldn't the run method in my Runnable class be called only when some Thread in the ExecutorService is free and starts working on it? I get a lot of results that take more than 10 seconds. Is there a better way to achieve what I am trying?
Thanks.
If you are using Future, I would recommend change Runnable to Callable and return total time in execution of thread as result. Below is sample code:
import java.util.concurrent.Callable;
public class MyFut implements Callable<Long> {
String copy;
String id2;
public MyFut(String m, String id) {
super();
copy = m;
id2 = id;
}
#Override
public Long call() throws Exception {
long start = System.currentTimeMillis();
//do something...
long end = System.currentTimeMillis();
return (end - start);
}
}
You are making your work harder as it should be. Java’s framework provides everything you want, you only have to use it.
Limiting the number of pending work items works by using a bounded queue, but the ExecutorService returned by Executors.newFixedThreadPool() uses an unbound queue. The policy to wait once the bounded queue is full can be implemented via a RejectedExecutionHandler. The entire thing looks like this:
static class WaitingRejectionHandler implements RejectedExecutionHandler {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
executor.getQueue().put(r);// block until capacity available
} catch(InterruptedException ex) {
throw new RejectedExecutionException(ex);
}
}
}
public static void main(String[] args)
{
final int nCPU=Runtime.getRuntime().availableProcessors();
final int maxPendingJobs=100;
ExecutorService executorService=new ThreadPoolExecutor(nCPU, nCPU, 1, TimeUnit.MINUTES,
new ArrayBlockingQueue<Runnable>(maxPendingJobs), new WaitingRejectionHandler());
// start flooding the `executorService` with jobs here
That’s all.
Measuring the elapsed time within a job is quite easy as it has nothing to do with multi-threading:
long startTime=System.nanoTime();
// do your work here
long elpasedTimeSoFar = System.nanoTime()-startTime;
But maybe you don’t need it anymore once you are using the bounded queue.
By the way the Future.get method with timeout does not refer to the time since it got into the queue of the ExecutorService, it refers to the time of invoking the get method itself. In other words, it tells how long the get method is allowed to wait, nothing more.
I am working on a project in which I will be spawning multiple threads from a multithreaded code.
Suppose I am spawning 10 threads, then I need to make sure each thread should be running for particular duration of time.
For example, if I want each thread should run for 30 minutes, then in the config.properties file, I will be having TOTAL_RUNNING_TIME=30
So I came up with the below design to make sure each thread is running for 30 minutes.
private static long durationOfRun;
private static long sleepTime;
public static void main(String[] args) {
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(threads);
try {
readPropertyFile();
for (int i = 0; i < threads; i++) {
service.submit(new ReadTask(durationOfRun, sleepTime));
}
}
}
private static void readPropertyFile() throws IOException {
prop.load(Read.class.getClassLoader().getResourceAsStream("config.properties"));
threads = Integer.parseInt(prop.getProperty("NUMBER_OF_THREADS"));
durationOfRun = Long.parseLong(prop.getProperty("TOTAL_RUNNING_TIME"));
sleepTime = Long.parseLong(prop.getProperty("SLEEP_TIME"));
}
Below is my ReadTask class.
class ReadTask implements Runnable {
private long durationOfRun;
private long sleepTime;
public ReadTask(long durationOfRun, long sleepTime) {
this.durationOfRun = durationOfRun;
this.sleepTime = sleepTime;
}
#Override
public void run() {
long startTime = System.currentTimeMillis();
long endTime = startTime + (durationOfRun*60*1000);
//Each thread is running less than endTime
while(System.currentTimeMillis() <= endTime) {
//Do whatever you want to do
}
Thread.sleep(sleepTime);
}
}
If you take a look into my run method, I have a while loop which will check the time. So this approach of making each thread run for particular duration of time is correct or not? Or is there any better way also? Please ignore my ignorance if there are any other better approach or this will also serve the purpose?
Let me know if there are any thread safety issues here as well?
What I am looking for is each thread should run for 30 minutes and if the time for that thread has finished, then complete the task on which it is running currently and do not take anything else after that, just like we have shutdown for ExecutorService. If there is any better approach or better design than this. Please provide me some example so that I can learn that stuff just from my knowledge point of view. Thanks for the help.
UPDATE:-
If you take a look into my while loop in the run method, inside that while loop I will be trying to make a Select call to the database. So what I am looking is something like this- As soon as the time for that thread is finished, it will not make any other select call to the database and finished whatever it was doing previously. Just like shutdown works for ExecutorService.
And I don't want this scenario- as soon as the time for that thread is finished, it will timeout the thread as it might be possible, that particular thread was doing select to database in that period?
One of the concerns I have with this design, is what if the amount of work being done within the while-loop takes more to them the time our?
For example
while(System.currentTimeMillis() <= endTime) {
calaculateTheMeaningOfLife();
}
What happens now? What's stopping the thread, or encouraging it to check the timeout? The samething will occur if you have a blocking operation, such as a File or socket read/write
You could try to interrupt the thread, but there is no guareentee that this will help
You can set the time limit to the the executor. You need to cast the executor returned by Executors and set the time limit:
ThreadPoolExecutor service = (ThreadPoolExecutor) Executors.newFixedThreadPool(4);
service.setKeepAliveTime(1800, TimeUnit.SECONDS);
Taken from "Java Concurrency in Practice" by Brian Goetz.
I am in the process of measuing the performance of our service. So I have a URL that will make the call to our service. So what I did is that before making the call to the service, I make a note of the time and after the response came back from the service, I measure the response time. I wrote a program that was making the call to my service and measuring the performance by putting the numbers in a HashMap-
while (runs > 0) {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("Some URL", String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
runs--;
}
So output I will be getting from the histogram map will be- X number of calls came back in Y ms
Now what I was thinking instead of making a single call at a time, why not I should parallelize the calls to our service, like in my previous program, I am hitting the service one by one. So I wrote a
multithreading program below which will make a call to my service simultaneously. So the below program will be able to measure the time difference accurately or not?
Like one thread is taking this much time, second thread is taking this much time, third thread is taking this much time and so on? Is it possible to do this?
If yes, can anyone tell me how to do it if my below program doesn't work very well?
public static void main(String[] args) {
ExecutorService service = Executors.newFixedThreadPool(10);
for (int i = 0; i < 1 * 2; i++) {
service.submit(new ThreadTask(i));
}
service.shutdown();
try {
service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
} catch (InterruptedException e) {
}
}
class ThreadTask implements Runnable {
private int id;
private RestTemplate restTemplate = new RestTemplate();
private String result;
private HashMap<Long, Long> histogram;
public ThreadTask(int id) {
this.id = id;
}
#Override
public void run() {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("Some URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
System.out.println(histogram);
}
}
Because whenever I run the program, the numbers I am getting from this multithreading program looks very weird.
Output I got from Non Multithreading Program
168=1
41=1
43=3
1 call came back in 168 ms and so on...
And output I got from Multithreading program
{119=1}
{179=1}
{150=1}
{202=1}
{149=1}
1 call came back in 119 ms and so on...
So in the multithreaded program, it is taking lot more time I guess?
I do not understand what you mean by getting weird numbers. My wild guess is that it is because output from different threads is getting interspersed.
One way to solve it is to not print the histogram from the run method at all. It is already an instance variable (though it currently does not need to be) so you can:
Instead of submitting unnamed instances of ThreadTask store them in a list/array.
Create a method ThreadTask.report that prints the histogram
After all the threads have completed, call ThreadTask.report on each in sequence.
I think you are accounting for the same time multiple times. If your threads do not execute this part of the code within a synchronized block:
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("Some URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
then it is possible for this to happen:
Thread1: long start_time ...
Thread1: result = ...
Thread2: long start_time ...
Thread2: result = ...
Thread2: long difference ...
Thread1: long difference ...
So your accounting will not work properly. You could use synchronized blocks or look into Java's java.lang.management (e.g., ThreadMXBean and ThreadInfo), for timing functionality in threaded environments.
Update:
Also see the answer to this related SO question for more details on the problem and how to go around it.
I am trying to execute a simple calculation (it calls Math.random() 10000000 times). Surprisingly running it in simple method performs much faster than using ExecutorService.
I have read another thread at ExecutorService's surprising performance break-even point --- rules of thumb? and tried to follow the answer by executing the Callable using batches, but the performance is still bad
How do I improve the performance based on my current code?
import java.util.*;
import java.util.concurrent.*;
public class MainTest {
public static void main(String[]args) throws Exception {
new MainTest().start();;
}
final List<Worker> workermulti = new ArrayList<Worker>();
final List<Worker> workersingle = new ArrayList<Worker>();
final int count=10000000;
public void start() throws Exception {
int n=2;
workersingle.add(new Worker(1));
for (int i=0;i<n;i++) {
// worker will only do count/n job
workermulti.add(new Worker(n));
}
ExecutorService serviceSingle = Executors.newSingleThreadExecutor();
ExecutorService serviceMulti = Executors.newFixedThreadPool(n);
long s,e;
int tests=10;
List<Long> simple = new ArrayList<Long>();
List<Long> single = new ArrayList<Long>();
List<Long> multi = new ArrayList<Long>();
for (int i=0;i<tests;i++) {
// simple
s = System.currentTimeMillis();
simple();
e = System.currentTimeMillis();
simple.add(e-s);
// single thread
s = System.currentTimeMillis();
serviceSingle.invokeAll(workersingle); // single thread
e = System.currentTimeMillis();
single.add(e-s);
// multi thread
s = System.currentTimeMillis();
serviceMulti.invokeAll(workermulti);
e = System.currentTimeMillis();
multi.add(e-s);
}
long avgSimple=sum(simple)/tests;
long avgSingle=sum(single)/tests;
long avgMulti=sum(multi)/tests;
System.out.println("Average simple: "+avgSimple+" ms");
System.out.println("Average single thread: "+avgSingle+" ms");
System.out.println("Average multi thread: "+avgMulti+" ms");
serviceSingle.shutdown();
serviceMulti.shutdown();
}
long sum(List<Long> list) {
long sum=0;
for (long l : list) {
sum+=l;
}
return sum;
}
private void simple() {
for (int i=0;i<count;i++){
Math.random();
}
}
class Worker implements Callable<Void> {
int n;
public Worker(int n) {
this.n=n;
}
#Override
public Void call() throws Exception {
// divide count with n to perform batch execution
for (int i=0;i<(count/n);i++) {
Math.random();
}
return null;
}
}
}
The output for this code
Average simple: 920 ms
Average single thread: 1034 ms
Average multi thread: 1393 ms
EDIT: performance suffer due to Math.random() being a synchronised method.. after changing Math.random() with new Random object for each thread, the performance improved
The output for the new code (after replacing Math.random() with Random for each thread)
Average simple: 928 ms
Average single thread: 1046 ms
Average multi thread: 642 ms
Math.random() is synchronized. Kind of the whole point of synchronized is to slow things down so they don't collide. Use something that isn't synchronized and/or give each thread its own object to work with, like a new Random.
You'd do well to read the contents of the other thread. There's plenty of good tips in there.
Perhaps the most significant issue with your benchmark is that according to the Math.random() contract, "This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator"
Read this as: the method is synchronized, so only one thread is likely to be able to usefully use it at the same time. So you do a bunch of overhead to distribute the tasks, only to force them again to run serially.
When you use multiple threads, you need to be aware of the overhead of using additional threads. You also need to determine if your algorithm has work which can be preformed in parallel or not. So you need to have work which can be run concurrently which is large enough that it will exceed the overhead of using multiple threads.
In this case, the simplest workaround is to use a separate Random in each thread. The problem you have is that as a micro-benchmark, your loop doesn't actually do anything and the JIT is very good at discarding code which doesn't do anything. A workaround for this is to sum the random results and return it from the call() as this is usually enough to prevent the JIT from discarding the code.
Lastly if you want to sum lots of numbers, you don't need to save them and sum them later. You can sum them as you go.
First and once more, thanks to all that already answered my question. I am not a very experienced programmer and it is my first experience with multithreading.
I got an example that is working quite like my problem. I hope it could ease our case here.
public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
#Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
public static void main(String[] args) {
ThreadFactory threadFactory = new ThreadFactory() {
int counter = 1;
#Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r, "Executor thread " + (counter++));
return t;
}
};
// the total duty to be divided in tasks is fixed (problem dependent).
// Increase ntasks will mean decrease the task time proportionally.
// 4 Is an arbitrary example.
// This tasks will be executed thousands of times, inside a loop alternating
// with serial processing that needs their result and prepare the next ones.
int ntasks = 4;
int nthreads = 2;
int ncores = Runtime.getRuntime().availableProcessors();
if (nthreads<ncores) ncores = nthreads;
Batch serial = new Batch(null);
long serialTime = System.nanoTime();
serial.run();
serialTime = System.nanoTime() - serialTime;
ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
CountDownLatch countDown = new CountDownLatch(ntasks);
ArrayList<Batch> batches = new ArrayList<Batch>();
for (int i = 0; i < ntasks; i++) {
batches.add(new Batch(countDown));
}
long start = System.nanoTime();
for (Batch r : batches){
executor.execute(r);
}
// wait for all threads to finish their task
try {
countDown.await();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long tmeasured = (System.nanoTime() - start);
System.out.println("Task time= " + TASK_TIME + " ms");
System.out.println("Number of tasks= " + ntasks);
System.out.println("Number of threads= " + nthreads);
System.out.println("Number of cores= " + ncores);
System.out.println("Measured time= " + tmeasured);
System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);
executor.shutdown();
}
}
Instead of doing the calculations, each batch just waits for some given time. The program calculates the speedup, that would allways be 2 in theory but can get less than 1 (actually a speed down) if the 'TASK_TIME' is small.
My calculations take at the top 1 ms and are commonly faster. For 1 ms I find a little speedup of around 30%, but in practice, with my program, I notice a speed down.
The structure of this code is very similar to my program, so if you could help me to optimise the thread handling I would be very grateful.
Kind regards.
Below, the original question:
Hi.
I would like to use multithreading on my program, since it could increase its efficiency considerably, I believe. Most of its running time is due to independent calculations.
My program has thousands of independent calculations (several linear systems to solve), but they just happen at the same time by minor groups of dozens or so. Each of this groups would take some miliseconds to run. After one of these groups of calculations, the program has to run sequentially for a little while and then I have to solve the linear systems again.
Actually, it can be seen as these independent linear systems to solve are inside a loop that iterates thousands of times, alternating with sequential calculations that depends on the previous results. My idea to speed up the program is to compute these independent calculations in parallel threads, by dividing each group into (the number of processors I have available) batches of independent calculation. So, in principle, there isn't queuing at all.
I tried using the FixedThreadPool and CachedThreadPool and it got even slower than serial processing. It seems to takes too much time creating new Treads each time I need to solve the batches.
Is there a better way to handle this problem? These pools I've used seem to be proper for cases when each thread takes more time instead of thousands of smaller threads...
Thanks!
Best Regards!
Thread pools don't create new threads over and over. That's why they're pools.
How many threads were you using and how many CPUs/cores do you have? What is the system load like (normally, when you execute them serially, and when you execute with the pool)? Is synchronization or any kind of locking involved?
Is the algorithm for parallel execution exactly the same as the serial one (your description seems to suggest that serial was reusing some results from previous iteration).
From what i've read: "thousands of independent calculations... happen at the same time... would take some miliseconds to run" it seems to me that your problem is perfect for GPU programming.
And i think it answers you question. GPU programming is becoming more and more popular. There are Java bindings for CUDA & OpenCL. If it is possible for you to use it, i say go for it.
I'm not sure how you perform the calculations, but if you're breaking them up into small groups, then your application might be ripe for the Producer/Consumer pattern.
Additionally, you might be interested in using a BlockingQueue. The calculation consumers will block until there is something in the queue and the block occurs on the take() call.
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
CountDownLatch getLatch(){
return countDown;
}
#Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
class CalcProducer implements Runnable {
private final BlockingQueue queue;
CalcProducer(BlockingQueue q) { queue = q; }
public void run() {
try {
while(true) {
CountDownLatch latch = new CountDownLatch(ntasks);
for(int i = 0; i < ntasks; i++) {
queue.put(produce(latch));
}
// don't need to wait for the latch, only consumers wait
}
} catch (InterruptedException ex) { ... handle ...}
}
CalcGroup produce(CountDownLatch latch) {
return new Batch(latch);
}
}
class CalcConsumer implements Runnable {
private final BlockingQueue queue;
CalcConsumer(BlockingQueue q) { queue = q; }
public void run() {
try {
while(true) { consume(queue.take()); }
} catch (InterruptedException ex) { ... handle ...}
}
void consume(Batch batch) {
batch.Run();
batch.getLatch().await();
}
}
class Setup {
void main() {
BlockingQueue<Batch> q = new LinkedBlockingQueue<Batch>();
int numConsumers = 4;
CalcProducer p = new CalcProducer(q);
Thread producerThread = new Thread(p);
producerThread.start();
Thread[] consumerThreads = new Thread[numConsumers];
for(int i = 0; i < numConsumers; i++)
{
consumerThreads[i] = new Thread(new CalcConsumer(q));
consumerThreads[i].start();
}
}
}
Sorry if there are any syntax errors, I've been chomping away at C# code and sometimes I forget the proper java syntax, but the general idea is there.
If you have a problem which does not scale to multiple cores, you need to change your program or you have a problem which is not as parallel as you think. I suspect you have some other type of bug, but cannot say based on the information given.
This test code might help.
Time per million tasks 765 ms
code
ExecutorService es = Executors.newFixedThreadPool(4);
Runnable task = new Runnable() {
#Override
public void run() {
// do nothing.
}
};
long start = System.nanoTime();
for(int i=0;i<1000*1000;i++) {
es.submit(task);
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
long time = System.nanoTime() - start;
System.out.println("Time per million tasks "+time/1000/1000+" ms");
EDIT: Say you have a loop which serially does this.
for(int i=0;i<1000*1000;i++)
doWork(i);
You might assume that changing to loop like this would be faster, but the problem is that the overhead could be greater than the gain.
for(int i=0;i<1000*1000;i++) {
final int i2 = i;
ex.execute(new Runnable() {
public void run() {
doWork(i2);
}
}
}
So you need to create batches of work (at least one per thread) so there are enough tasks to keep all the threads busy, but not so many tasks that your threads are spending time in overhead.
final int batchSize = 10*1000;
for(int i=0;i<1000*1000;i+=batchSize) {
final int i2 = i;
ex.execute(new Runnable() {
public void run() {
for(int i3=i2;i3<i2+batchSize;i3++)
doWork(i3);
}
}
}
EDIT2: RUnning atest which copied data between threads.
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
#Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
#Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
starts badly but warms up to ~50 us.
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
Hmm, CachedThreadPool seems to be created just for your case. It does not recreate threads if you reuse them soon enough, and if you spend a whole minute before you use new thread, the overhead of thread creation is comparatively negligible.
But you can't expect parallel execution to speed up your calculations unless you can also access data in parallel. If you employ extensive locking, many synchronized methods, etc you'll spend more on overhead than gain on parallel processing. Check that your data can be efficiently processed in parallel and that you don't have non-obvious synchronizations lurkinb in the code.
Also, CPUs process data efficiently if data fully fit into cache. If data sets of each thread is bigger than half the cache, two threads will compete for cache and issue many RAM reads, while one thread, if only employing one core, may perform better because it avoids RAM reads in the tight loop it executes. Check this, too.
Here's a psuedo outline of what I'm thinking
class WorkerThread extends Thread {
Queue<Calculation> calcs;
MainCalculator mainCalc;
public void run() {
while(true) {
while(calcs.isEmpty()) sleep(500); // busy waiting? Context switching probably won't be so bad.
Calculation calc = calcs.pop(); // is it pop to get and remove? you'll have to look
CalculationResult result = calc.calc();
mainCalc.returnResultFor(calc,result);
}
}
}
Another option, if you're calling external programs. Don't put them in a loop that does them one at a time or they won't run in parallel. You can put them in a loop that PROCESSES them one at a time, but not that execs them one at a time.
Process calc1 = Runtime.getRuntime.exec("myCalc paramA1 paramA2 paramA3");
Process calc2 = Runtime.getRuntime.exec("myCalc paramB1 paramB2 paramB3");
Process calc3 = Runtime.getRuntime.exec("myCalc paramC1 paramC2 paramC3");
Process calc4 = Runtime.getRuntime.exec("myCalc paramD1 paramD2 paramD3");
calc1.waitFor();
calc2.waitFor();
calc3.waitFor();
calc4.waitFor();
InputStream is1 = calc1.getInputStream();
InputStreamReader isr1 = new InputStreamReader(is1);
BufferedReader br1 = new BufferedReader(isr1);
String resultStr1 = br1.nextLine();
InputStream is2 = calc2.getInputStream();
InputStreamReader isr2 = new InputStreamReader(is2);
BufferedReader br2 = new BufferedReader(isr2);
String resultStr2 = br2.nextLine();
InputStream is3 = calc3.getInputStream();
InputStreamReader isr3 = new InputStreamReader(is3);
BufferedReader br3 = new BufferedReader(isr3);
String resultStr3 = br3.nextLine();
InputStream is4 = calc4.getInputStream();
InputStreamReader isr4 = new InputStreamReader(is4);
BufferedReader br4 = new BufferedReader(isr4);
String resultStr4 = br4.nextLine();