Good day,
I am writing a program where a method is called for each line read from a text file. As each call of this method is independent of any other line read I can call them on parallel. To maximize cpu usage I use a ExecutorService where I submit each run() call. As the text file has 15 million lines, I need to stagger the ExecutorService run to not create too many jobs at once (OutOfMemory exception). I also want to keep track of the time each submitted run has been running as I have seen that some are not finishing. The problem is that when I try to use the Future.get method with timeout, the timeout refers to the time since it got into the queue of the ExecutorService, not since it started running, if it even started. I would like to get the time since it started running, not since it got into the queue.
The code looks like this:
ExecutorService executorService= Executors.newFixedThreadPool(ncpu);
line = reader.readLine();
long start = System.currentTimeMillis();
HashMap<MyFut,String> runs = new HashMap<MyFut, String>();
HashMap<Future, MyFut> tasks = new HashMap<Future, MyFut>();
while ( (line = reader.readLine()) != null ) {
String s = line.split("\t")[1];
final String m = line.split("\t")[0];
MyFut f = new MyFut(s, m);
tasks.put(executorService.submit(f), f);
runs.put(f, line);
while (tasks.size()>ncpu*100){
try {
Thread.sleep(100);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Iterator<Future> i = tasks.keySet().iterator();
while(i.hasNext()){
Future task = i.next();
if (task.isDone()){
i.remove();
} else {
MyFut fut = tasks.get(task);
if (fut.elapsed()>10000){
System.out.println(line);
task.cancel(true);
i.remove();
}
}
}
}
}
private static class MyFut implements Runnable{
private long start;
String copy;
String id2;
public MyFut(String m, String id){
super();
copy=m;
id2 = id;
}
public long elapsed(){
return System.currentTimeMillis()-start;
}
#Override
public void run() {
start = System.currentTimeMillis();
do something...
}
}
As you can see I try to keep track of how many jobs I have sent and if a threshold is passed I wait a bit until some have finished. I also try to check if any of the jobs is taking too long to cancel it, keeping in mind which failed, and continue execution. This is not working as I hoped. 10 seconds execution for one task is much more than needed (I get 1000 lines done in 70 to 130s depending on machine and number of cpu).
What am I doing wrong? Shouldn't the run method in my Runnable class be called only when some Thread in the ExecutorService is free and starts working on it? I get a lot of results that take more than 10 seconds. Is there a better way to achieve what I am trying?
Thanks.
If you are using Future, I would recommend change Runnable to Callable and return total time in execution of thread as result. Below is sample code:
import java.util.concurrent.Callable;
public class MyFut implements Callable<Long> {
String copy;
String id2;
public MyFut(String m, String id) {
super();
copy = m;
id2 = id;
}
#Override
public Long call() throws Exception {
long start = System.currentTimeMillis();
//do something...
long end = System.currentTimeMillis();
return (end - start);
}
}
You are making your work harder as it should be. Java’s framework provides everything you want, you only have to use it.
Limiting the number of pending work items works by using a bounded queue, but the ExecutorService returned by Executors.newFixedThreadPool() uses an unbound queue. The policy to wait once the bounded queue is full can be implemented via a RejectedExecutionHandler. The entire thing looks like this:
static class WaitingRejectionHandler implements RejectedExecutionHandler {
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
executor.getQueue().put(r);// block until capacity available
} catch(InterruptedException ex) {
throw new RejectedExecutionException(ex);
}
}
}
public static void main(String[] args)
{
final int nCPU=Runtime.getRuntime().availableProcessors();
final int maxPendingJobs=100;
ExecutorService executorService=new ThreadPoolExecutor(nCPU, nCPU, 1, TimeUnit.MINUTES,
new ArrayBlockingQueue<Runnable>(maxPendingJobs), new WaitingRejectionHandler());
// start flooding the `executorService` with jobs here
That’s all.
Measuring the elapsed time within a job is quite easy as it has nothing to do with multi-threading:
long startTime=System.nanoTime();
// do your work here
long elpasedTimeSoFar = System.nanoTime()-startTime;
But maybe you don’t need it anymore once you are using the bounded queue.
By the way the Future.get method with timeout does not refer to the time since it got into the queue of the ExecutorService, it refers to the time of invoking the get method itself. In other words, it tells how long the get method is allowed to wait, nothing more.
Related
I am looking for ways to process list entries in parallel, a task that is quite long (say 24 hours - I stream data from huge dbs and then for each row it takes about 1-2 sec to be done with it). I have an application that have 2 methods each processing a list of data. My intitial idea was to use ForkJoin which works but not quite. The simplified dummy code mimicing my app's behaviour is as follows:
#Service
#Slf4j
public class ListProcessing implements Runnable {
#Async
private void processingList() {
// can change to be a 100 or 1000 to speed up the processing,
// but the point is to see the behaviour after the task runs for a long time
// so just using 12.
ForkJoinPool newPool = new ForkJoinPool(12);
newPool.execute(() -> {
List<Integer> testInt = IntStream.rangeClosed(0, 50000)
.boxed().toList();
long start = System.currentTimeMillis();
Map<Integer,DummyModel> output = testInt.stream().parallel()
.map(item -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
log.info("I slept at item {} for map",item);
return new DummyModel(UUID.randomUUID(), item); // a model class with 2 fields and no logic save for getters/setters
}).collect(Collectors.toConcurrentMap(DummyModel::getNum, item -> item));
long end = System.currentTimeMillis();
log.info("Processing time {}",(end-start));
log.info("Size is {}",output.size());
});
newPool.shutdown();
}
// method is identical to the one above for simplicity & demo purposes
#Async
private void processingList2() {
ForkJoinPool newPool = new ForkJoinPool(12);
newPool.execute(() -> {
List<Integer> testInt = IntStream.rangeClosed(0, 50000)
.boxed().toList();
long start = System.currentTimeMillis();
Map<Integer,DummyModel> output = testInt.stream().parallel()
.map(item -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
log.info("I slept at item {} for map2",item);
return new DummyModel(UUID.randomUUID(), item);
}).collect(Collectors.toConcurrentMap(DummyModel::getNum, item -> item));
long end = System.currentTimeMillis();
log.info("Processing time {}",(end-start));
log.info("Size is {}",output.size());
});
newPool.shutdown();
}
#Override
public void run() {
processingList();
processingList2();
}
}
The class is then being called by my controller which is as follows:
#PostMapping
public void startTest() {
Thread startRun = new Thread(new ListProcessing());
startRun.start();
}
This works perfectly - both methods are executed in async and I can see that they are using separate pools with 12 worker threads each. However, about an hour into running this app I can see that the number of threads used by each method starts dropping. After some researching, I learnt that parallel streams might be the problem (according to this discussion).
Now, I can change my ForkJoinPools to have more worker threads (which will shorten the execution time solveing the problem, but that sounds like a temp fix with the problem still there if execution exceeds 1 hour mark). So I decided to try something else, although I would really like to make ForkJoin work.
Another solution that seems to be able to do what I want is using CompletableFuture with Custom Executor as described here. So I removed Runnable & ForkJoin and implemented CompletableFuture as described in the article. The only difference being that I have a separate pool for each method and both methods are being called by controller which looks like so now:
#Autowired
private ListProcessing listProcessing;
#PostMapping
public void startTest() {
listProcessing.processingList();
listProcessing.processingList2();
}
However, the custom Executors never get used and each testInt gets executed synchronosly one by one. I tried to make it work with only 1 method but that also didn't work - custom executor seems to just be ignored. The method looked like so:
private CompletableFuture<List<DummyModel>> processingList() {
List<Integer> testInt = IntStream.rangeClosed(0, 50000)
.boxed().toList();
long start = System.currentTimeMillis();
List<CompletableFuture<DummyModel>> myDummyies = new ArrayList<>();
testInt.forEach(item -> {
myDummyies.add(createDummy(item));
log.info("I slept at item {} for list", item);
});
// waiting for all CompletableFutures to complete and collect them into a list
CompletableFuture<List<DummyModel>> output = CompletableFuture.allOf(myDummyies.toArray(new CompletableFuture[0]))
.thenApply(item -> myDummyies.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList()));
long end = System.currentTimeMillis();
log.info("Processing time {} \n", (end - start));
return output;
}
#Async("myPool")
private CompletableFuture<DummyModel> createDummy(Integer item) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return CompletableFuture.completedFuture(new DummyModel(UUID.randomUUID(), item));
}
So my questions are as follows:
Can I somehow set up ForkJoin to replace blocked worker threads with the fresh ones, so that the number of worker threads remain the same all the time? Or maybe after some time ask it to be replaced by a newly created one and continue the work? Or is it all just a limitation of a ForkJoin framework and I should look elsewhere?
If the ForkJoin cannot happen, how can I make CompletableFuture work? Where did I go worng with what I have implemented?
Is there any other way to process a long running task with custom number of worker threads which run in parallel? What would be the best way to process a lot of data for a prolong period of time in parallel?
I have a problem in Java where I want to spawn multiple concurrent threads simultaneously. I want to use the result of whichever thread/task finishes first, and abandon/ignore the results of the other threads/tasks. I found a similar question for just cancelling slower threads but thought that this new question was different enough to warrant an entirely new question.
Note that I have included an answer below based what I considered to be the best answer from this similar question but changed it to best fit this new (albeit similar) problem. I wanted to share the knowledge and see if there is a better way of solving this problem, hence the question and self-answer below.
You can use ExecutorService.invokeAny. From its documentation:
Executes the given tasks, returning the result of one that has completed successfully …. Upon normal or exceptional return, tasks that have not completed are cancelled.
This answer is based off #lreeder's answer to the question "Java threads - close other threads when first thread completes".
Basically, the difference between my answer and his answer is that he closes the threads via a Semaphore and I just record the result of the fastest thread via an AtomicReference. Note that in my code, I do something a little weird. Namely, I use an instance of AtomicReference<Integer> instead of the simpler AtomicInteger. I do this so that I can compare and set the value to a null integer; I can't use null integers with AtomicInteger. This allows me to set any integer, not just a set of integers, excluding some sentinel value. Also, there are a few less important details like the use of an ExecutorService instead of explicit threads, and the changing of how Worker.completed is set, because previously it was possible that more than one thread could finish first.
public class ThreadController {
public static void main(String[] args) throws Exception {
new ThreadController().threadController();
}
public void threadController() throws Exception {
int numWorkers = 100;
List<Worker> workerList = new ArrayList<>(numWorkers);
CountDownLatch startSignal = new CountDownLatch(1);
CountDownLatch doneSignal = new CountDownLatch(1);
//Semaphore prevents only one thread from completing
//before they are counted
AtomicReference<Integer> firstInt = new AtomicReference<Integer>();
ExecutorService execSvc = Executors.newFixedThreadPool(numWorkers);
for (int i = 0; i < numWorkers; i++) {
Worker worker = new Worker(i, startSignal, doneSignal, firstInt);
execSvc.submit(worker);
workerList.add(worker);
}
//tell workers they can start
startSignal.countDown();
//wait for one thread to complete.
doneSignal.await();
//Look at all workers and find which one is done
for (int i = 0; i < numWorkers; i++) {
if (workerList.get(i).isCompleted()) {
System.out.printf("Thread %d finished first, firstInt=%d\n", i, firstInt.get());
}
}
}
}
class Worker implements Runnable {
private final CountDownLatch startSignal;
private final CountDownLatch doneSignal;
// null when not yet set, not so for AtomicInteger
private final AtomicReference<Integer> singleResult;
private final int id;
private boolean completed = false;
public Worker(int id, CountDownLatch startSignal, CountDownLatch doneSignal, AtomicReference<Integer> singleResult) {
this.id = id;
this.startSignal = startSignal;
this.doneSignal = doneSignal;
this.singleResult = singleResult;
}
public boolean isCompleted() {
return completed;
}
#Override
public void run() {
try {
//block until controller counts down the latch
startSignal.await();
//simulate real work
Thread.sleep((long) (Math.random() * 1000));
//try to get the semaphore. Since there is only
//one permit, the first worker to finish gets it,
//and the rest will block.
boolean finishedFirst = singleResult.compareAndSet(null, id);
// only set this if the result was successfully set
if (finishedFirst) {
//Use a completed flag instead of Thread.isAlive because
//even though countDown is the last thing in the run method,
//the run method may not have before the time the
//controlling thread can check isAlive status
completed = true;
}
}
catch (InterruptedException e) {
//don't care about this
}
//tell controller we are finished, if already there, do nothing
doneSignal.countDown();
}
}
I am trying to test the performance (in terms of execution time) for my webcrawler but I am having trouble timing it due to multi-threading taking place.
My main class:
class WebCrawlerTest {
//methods and variables etc
WebCrawlerTest(List<String> websites){
//
}
if(!started){
startTime = System.currentTimeMillis();
executor = Executors.newFixedThreadPool(32); //this is the value I'm tweaking
started=true;
}
for(String site : websites){
executor.submit(webProcessor = new AllWebsiteProcessorTest(site, deepSearch));
}
executor.shutdown();
//tried grabbing end time here with no luck
AllWebsiteProcessorTest class:
class AllWebsiteProcessorTest implements Runnable{
//methods and var etc
AllWebsiteProcessorTest(String site, boolean deepSearch) {
}
public void run() {
scanSingleWebsite(websites);
for(String email:emails){
System.out.print(email + ", ");
}
private void scanSingleWebsite(String website){
try {
String url = website;
Document document = Jsoup.connect(url).get();
grabEmails(document.toString());
}catch (Exception e) {}
With another class (with a main method), I create an instance of WebCrawlerTest and then pass in an array of websites. The crawler works fine but I can't seem to figure out how to time it.
I can get the start time (System.getCurrentTime...();), but the problem is the end time. I've tried adding the end time like this:
//another class
public static void main(.....){
long start = getCurrent....();
WebCrawlerTest w = new WebCrawlerTest(listOfSites, true);
long end = getCurrent....();
}
Which doesn't work. I also tried adding the end after executor.shutdown(), which again doesn't work (instantly triggered). How do I grab the time for the final completed thread?
After shutting down your executors pool
executor.shutdown();
//tried grabbing end time here with no luck
You can simply
executor.awaitTermination(TimeUnit, value)
This call will block untill all tasks are completed. Take the time, subtract T0 from it and voila, we have execution time.
shutdown() method just assures that no new tasks will be accepted into excution queue. Tasks already in the queue will be performed (shutdownNow() drops pending tasks). To wait for all currently running tasks to complete, you have to awaitTermination().
I am in the process of measuing the performance of our service. So I have a URL that will make the call to our service. So what I did is that before making the call to the service, I make a note of the time and after the response came back from the service, I measure the response time. I wrote a program that was making the call to my service and measuring the performance by putting the numbers in a HashMap-
while (runs > 0) {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("Some URL", String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
runs--;
}
So output I will be getting from the histogram map will be- X number of calls came back in Y ms
Now what I was thinking instead of making a single call at a time, why not I should parallelize the calls to our service, like in my previous program, I am hitting the service one by one. So I wrote a
multithreading program below which will make a call to my service simultaneously. So the below program will be able to measure the time difference accurately or not?
Like one thread is taking this much time, second thread is taking this much time, third thread is taking this much time and so on? Is it possible to do this?
If yes, can anyone tell me how to do it if my below program doesn't work very well?
public static void main(String[] args) {
ExecutorService service = Executors.newFixedThreadPool(10);
for (int i = 0; i < 1 * 2; i++) {
service.submit(new ThreadTask(i));
}
service.shutdown();
try {
service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
} catch (InterruptedException e) {
}
}
class ThreadTask implements Runnable {
private int id;
private RestTemplate restTemplate = new RestTemplate();
private String result;
private HashMap<Long, Long> histogram;
public ThreadTask(int id) {
this.id = id;
}
#Override
public void run() {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("Some URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
System.out.println(histogram);
}
}
Because whenever I run the program, the numbers I am getting from this multithreading program looks very weird.
Output I got from Non Multithreading Program
168=1
41=1
43=3
1 call came back in 168 ms and so on...
And output I got from Multithreading program
{119=1}
{179=1}
{150=1}
{202=1}
{149=1}
1 call came back in 119 ms and so on...
So in the multithreaded program, it is taking lot more time I guess?
I do not understand what you mean by getting weird numbers. My wild guess is that it is because output from different threads is getting interspersed.
One way to solve it is to not print the histogram from the run method at all. It is already an instance variable (though it currently does not need to be) so you can:
Instead of submitting unnamed instances of ThreadTask store them in a list/array.
Create a method ThreadTask.report that prints the histogram
After all the threads have completed, call ThreadTask.report on each in sequence.
I think you are accounting for the same time multiple times. If your threads do not execute this part of the code within a synchronized block:
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("Some URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
then it is possible for this to happen:
Thread1: long start_time ...
Thread1: result = ...
Thread2: long start_time ...
Thread2: result = ...
Thread2: long difference ...
Thread1: long difference ...
So your accounting will not work properly. You could use synchronized blocks or look into Java's java.lang.management (e.g., ThreadMXBean and ThreadInfo), for timing functionality in threaded environments.
Update:
Also see the answer to this related SO question for more details on the problem and how to go around it.
First and once more, thanks to all that already answered my question. I am not a very experienced programmer and it is my first experience with multithreading.
I got an example that is working quite like my problem. I hope it could ease our case here.
public class ThreadMeasuring {
private static final int TASK_TIME = 1; //microseconds
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
#Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
public static void main(String[] args) {
ThreadFactory threadFactory = new ThreadFactory() {
int counter = 1;
#Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r, "Executor thread " + (counter++));
return t;
}
};
// the total duty to be divided in tasks is fixed (problem dependent).
// Increase ntasks will mean decrease the task time proportionally.
// 4 Is an arbitrary example.
// This tasks will be executed thousands of times, inside a loop alternating
// with serial processing that needs their result and prepare the next ones.
int ntasks = 4;
int nthreads = 2;
int ncores = Runtime.getRuntime().availableProcessors();
if (nthreads<ncores) ncores = nthreads;
Batch serial = new Batch(null);
long serialTime = System.nanoTime();
serial.run();
serialTime = System.nanoTime() - serialTime;
ExecutorService executor = Executors.newFixedThreadPool( nthreads, threadFactory );
CountDownLatch countDown = new CountDownLatch(ntasks);
ArrayList<Batch> batches = new ArrayList<Batch>();
for (int i = 0; i < ntasks; i++) {
batches.add(new Batch(countDown));
}
long start = System.nanoTime();
for (Batch r : batches){
executor.execute(r);
}
// wait for all threads to finish their task
try {
countDown.await();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long tmeasured = (System.nanoTime() - start);
System.out.println("Task time= " + TASK_TIME + " ms");
System.out.println("Number of tasks= " + ntasks);
System.out.println("Number of threads= " + nthreads);
System.out.println("Number of cores= " + ncores);
System.out.println("Measured time= " + tmeasured);
System.out.println("Theoretical serial time= " + TASK_TIME*1000000*ntasks);
System.out.println("Theoretical parallel time= " + (TASK_TIME*1000000*ntasks)/ncores);
System.out.println("Speedup= " + (serialTime*ntasks)/(double)tmeasured);
executor.shutdown();
}
}
Instead of doing the calculations, each batch just waits for some given time. The program calculates the speedup, that would allways be 2 in theory but can get less than 1 (actually a speed down) if the 'TASK_TIME' is small.
My calculations take at the top 1 ms and are commonly faster. For 1 ms I find a little speedup of around 30%, but in practice, with my program, I notice a speed down.
The structure of this code is very similar to my program, so if you could help me to optimise the thread handling I would be very grateful.
Kind regards.
Below, the original question:
Hi.
I would like to use multithreading on my program, since it could increase its efficiency considerably, I believe. Most of its running time is due to independent calculations.
My program has thousands of independent calculations (several linear systems to solve), but they just happen at the same time by minor groups of dozens or so. Each of this groups would take some miliseconds to run. After one of these groups of calculations, the program has to run sequentially for a little while and then I have to solve the linear systems again.
Actually, it can be seen as these independent linear systems to solve are inside a loop that iterates thousands of times, alternating with sequential calculations that depends on the previous results. My idea to speed up the program is to compute these independent calculations in parallel threads, by dividing each group into (the number of processors I have available) batches of independent calculation. So, in principle, there isn't queuing at all.
I tried using the FixedThreadPool and CachedThreadPool and it got even slower than serial processing. It seems to takes too much time creating new Treads each time I need to solve the batches.
Is there a better way to handle this problem? These pools I've used seem to be proper for cases when each thread takes more time instead of thousands of smaller threads...
Thanks!
Best Regards!
Thread pools don't create new threads over and over. That's why they're pools.
How many threads were you using and how many CPUs/cores do you have? What is the system load like (normally, when you execute them serially, and when you execute with the pool)? Is synchronization or any kind of locking involved?
Is the algorithm for parallel execution exactly the same as the serial one (your description seems to suggest that serial was reusing some results from previous iteration).
From what i've read: "thousands of independent calculations... happen at the same time... would take some miliseconds to run" it seems to me that your problem is perfect for GPU programming.
And i think it answers you question. GPU programming is becoming more and more popular. There are Java bindings for CUDA & OpenCL. If it is possible for you to use it, i say go for it.
I'm not sure how you perform the calculations, but if you're breaking them up into small groups, then your application might be ripe for the Producer/Consumer pattern.
Additionally, you might be interested in using a BlockingQueue. The calculation consumers will block until there is something in the queue and the block occurs on the take() call.
private static class Batch implements Runnable {
CountDownLatch countDown;
public Batch(CountDownLatch countDown) {
this.countDown = countDown;
}
CountDownLatch getLatch(){
return countDown;
}
#Override
public void run() {
long t0 =System.nanoTime();
long t = 0;
while(t<TASK_TIME*1e6){ t = System.nanoTime() - t0; }
if(countDown!=null) countDown.countDown();
}
}
class CalcProducer implements Runnable {
private final BlockingQueue queue;
CalcProducer(BlockingQueue q) { queue = q; }
public void run() {
try {
while(true) {
CountDownLatch latch = new CountDownLatch(ntasks);
for(int i = 0; i < ntasks; i++) {
queue.put(produce(latch));
}
// don't need to wait for the latch, only consumers wait
}
} catch (InterruptedException ex) { ... handle ...}
}
CalcGroup produce(CountDownLatch latch) {
return new Batch(latch);
}
}
class CalcConsumer implements Runnable {
private final BlockingQueue queue;
CalcConsumer(BlockingQueue q) { queue = q; }
public void run() {
try {
while(true) { consume(queue.take()); }
} catch (InterruptedException ex) { ... handle ...}
}
void consume(Batch batch) {
batch.Run();
batch.getLatch().await();
}
}
class Setup {
void main() {
BlockingQueue<Batch> q = new LinkedBlockingQueue<Batch>();
int numConsumers = 4;
CalcProducer p = new CalcProducer(q);
Thread producerThread = new Thread(p);
producerThread.start();
Thread[] consumerThreads = new Thread[numConsumers];
for(int i = 0; i < numConsumers; i++)
{
consumerThreads[i] = new Thread(new CalcConsumer(q));
consumerThreads[i].start();
}
}
}
Sorry if there are any syntax errors, I've been chomping away at C# code and sometimes I forget the proper java syntax, but the general idea is there.
If you have a problem which does not scale to multiple cores, you need to change your program or you have a problem which is not as parallel as you think. I suspect you have some other type of bug, but cannot say based on the information given.
This test code might help.
Time per million tasks 765 ms
code
ExecutorService es = Executors.newFixedThreadPool(4);
Runnable task = new Runnable() {
#Override
public void run() {
// do nothing.
}
};
long start = System.nanoTime();
for(int i=0;i<1000*1000;i++) {
es.submit(task);
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
long time = System.nanoTime() - start;
System.out.println("Time per million tasks "+time/1000/1000+" ms");
EDIT: Say you have a loop which serially does this.
for(int i=0;i<1000*1000;i++)
doWork(i);
You might assume that changing to loop like this would be faster, but the problem is that the overhead could be greater than the gain.
for(int i=0;i<1000*1000;i++) {
final int i2 = i;
ex.execute(new Runnable() {
public void run() {
doWork(i2);
}
}
}
So you need to create batches of work (at least one per thread) so there are enough tasks to keep all the threads busy, but not so many tasks that your threads are spending time in overhead.
final int batchSize = 10*1000;
for(int i=0;i<1000*1000;i+=batchSize) {
final int i2 = i;
ex.execute(new Runnable() {
public void run() {
for(int i3=i2;i3<i2+batchSize;i3++)
doWork(i3);
}
}
}
EDIT2: RUnning atest which copied data between threads.
for (int i = 0; i < 20; i++) {
ExecutorService es = Executors.newFixedThreadPool(1);
final double[] d = new double[4 * 1024];
Arrays.fill(d, 1);
final double[] d2 = new double[4 * 1024];
es.submit(new Runnable() {
#Override
public void run() {
// nothing.
}
}).get();
long start = System.nanoTime();
es.submit(new Runnable() {
#Override
public void run() {
synchronized (d) {
System.arraycopy(d, 0, d2, 0, d.length);
}
}
});
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
// get a the values in d2.
for (double x : d2) ;
long time = System.nanoTime() - start;
System.out.printf("Time to pass %,d doubles to another thread and back was %,d ns.%n", d.length, time);
}
starts badly but warms up to ~50 us.
Time to pass 4,096 doubles to another thread and back was 1,098,045 ns.
Time to pass 4,096 doubles to another thread and back was 171,949 ns.
... deleted ...
Time to pass 4,096 doubles to another thread and back was 50,566 ns.
Time to pass 4,096 doubles to another thread and back was 49,937 ns.
Hmm, CachedThreadPool seems to be created just for your case. It does not recreate threads if you reuse them soon enough, and if you spend a whole minute before you use new thread, the overhead of thread creation is comparatively negligible.
But you can't expect parallel execution to speed up your calculations unless you can also access data in parallel. If you employ extensive locking, many synchronized methods, etc you'll spend more on overhead than gain on parallel processing. Check that your data can be efficiently processed in parallel and that you don't have non-obvious synchronizations lurkinb in the code.
Also, CPUs process data efficiently if data fully fit into cache. If data sets of each thread is bigger than half the cache, two threads will compete for cache and issue many RAM reads, while one thread, if only employing one core, may perform better because it avoids RAM reads in the tight loop it executes. Check this, too.
Here's a psuedo outline of what I'm thinking
class WorkerThread extends Thread {
Queue<Calculation> calcs;
MainCalculator mainCalc;
public void run() {
while(true) {
while(calcs.isEmpty()) sleep(500); // busy waiting? Context switching probably won't be so bad.
Calculation calc = calcs.pop(); // is it pop to get and remove? you'll have to look
CalculationResult result = calc.calc();
mainCalc.returnResultFor(calc,result);
}
}
}
Another option, if you're calling external programs. Don't put them in a loop that does them one at a time or they won't run in parallel. You can put them in a loop that PROCESSES them one at a time, but not that execs them one at a time.
Process calc1 = Runtime.getRuntime.exec("myCalc paramA1 paramA2 paramA3");
Process calc2 = Runtime.getRuntime.exec("myCalc paramB1 paramB2 paramB3");
Process calc3 = Runtime.getRuntime.exec("myCalc paramC1 paramC2 paramC3");
Process calc4 = Runtime.getRuntime.exec("myCalc paramD1 paramD2 paramD3");
calc1.waitFor();
calc2.waitFor();
calc3.waitFor();
calc4.waitFor();
InputStream is1 = calc1.getInputStream();
InputStreamReader isr1 = new InputStreamReader(is1);
BufferedReader br1 = new BufferedReader(isr1);
String resultStr1 = br1.nextLine();
InputStream is2 = calc2.getInputStream();
InputStreamReader isr2 = new InputStreamReader(is2);
BufferedReader br2 = new BufferedReader(isr2);
String resultStr2 = br2.nextLine();
InputStream is3 = calc3.getInputStream();
InputStreamReader isr3 = new InputStreamReader(is3);
BufferedReader br3 = new BufferedReader(isr3);
String resultStr3 = br3.nextLine();
InputStream is4 = calc4.getInputStream();
InputStreamReader isr4 = new InputStreamReader(is4);
BufferedReader br4 = new BufferedReader(isr4);
String resultStr4 = br4.nextLine();