Single Threaded Program vs Multithreaded Program (measuing time elapsed)

Single Threaded Program vs Multithreaded Program (measuing time elapsed) - java

I want to know if I need to measure time elapsed then Single Threaded Program is good approach or Multithreading Program is a good approach for that.
Below is my single threaded program that is measuring the time of our service-
private static void serviceCall() {
histogram = new HashMap<Long, Long>();
keys = histogram.keySet();
long total = 5;
long runs = total;
while (runs > 0) {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("SOME URL",String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = histogram.get(difference);
if (count != null) {
count++;
histogram.put(Long.valueOf(difference), count);
} else {
histogram.put(Long.valueOf(difference), Long.valueOf(1L));
}
runs--;
}
for (Long key : keys) {
Long value = histogram.get(key);
System.out.println("MEASUREMENT " + key + ":" + value);
}
}
Output I get from this Single Threaded Program is- Total call was 5
MEASUREMENT 163:1
MEASUREMENT 42:3
MEASUREMENT 47:1
which means 1 call came back in 163 ms. 3 calls came back in 42 ms and so on.
And also I did tried using Multithreaded program as well to measure the time elapsed. Meaning hitting the service parallely with few threads and then measuring how much each thread is taking.
Below is the code for that as well-
//create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(10);
// queue some tasks
for (int i = 0; i < 1 * 5; i++) {
service.submit(new ThreadTask(i, histogram));
}
public ThreadTask(int id, HashMap<Long, Long> histogram) {
this.id = id;
this.hg = histogram;
}
#Override
public void run() {
long start_time = System.currentTimeMillis();
result = restTemplate.getForObject("", String.class);
long difference = (System.currentTimeMillis() - start_time);
Long count = hg.get(difference);
if (count != null) {
count++;
hg.put(Long.valueOf(difference), count);
} else {
hg.put(Long.valueOf(difference), Long.valueOf(1L));
}
}
And below is the result I get from the above program-
{176=1, 213=1, 182=1, 136=1, 155=1}
One call came back in 176 ms, and so on
So my question is why Multithreading program is taking a lot more time as compared to above Single threaded program? If there is some loop hole in my Multithreading program, can anyone help me to improve it?

Your multi-threaded program likely makes all the requests at the same time which puts more strain on the server which will cause it to respond slower to all request.
As an aside, the way you are doing the update isn't threadsafe, so your count will likely be off in the multithreaded scenario given enough trials.
For instance, Thread A and B both return in 100 ms at the same time. The count in histogram for 100 is 3. A gets 3. B gets 3. A updates 3 to 4. B updates 3 to 4. A puts the value 4 in the histogram. B puts the value 4 in the histogram. You've now had 2 threads believe they incremented the count but the count in the histogram only reflects being incremented once.

Related

Application with multitheard looks to work with randomly speed

I wrote an Java code just for testing how my CPU will run when have to may operation to do so I wrote loop that will add 1 to var in 100000000000 iterations:
public class NoThread {
public static void main(String[] args) {
long s = System.currentTimeMillis();
int sum = 0;
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
long k = System.currentTimeMillis();
System.out.println("Time" + (k-s)+ " " + sum );
}
}
Code finish working after 30 - 40 sec.
Next I decide to split this operation into 10 threads to make my cpu more cry and say my prog to write time when each thread end:
public class WithThread {
public static void main(String[] args) {
Runnable[] run = new Runnable[10];
Thread[]thread = new Thread[10];
for (int i = 0; i<=9;i++){
run[i] = new Counter(i);
thread[i] = new Thread(run[i]);
thread[i].start();
}
}
}
and
public class Counter implements Runnable {
private int inc;
private int incc;
private int sum = 0;
private int id;
public Counter(int a){
id = a;
inc = a * 100000;
incc = (a+1)*100000;
}
#Override
public void run(){
long s = System.currentTimeMillis();
for (int i = inc;i<=incc;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
long k = System.currentTimeMillis();
System.out.println("Time" + (k-s)+ " " + sum + " in thread " + id);
}
}
In the result whole code end in 18 - 20 second - so two times faster but when I look at time in each Thread end it works i found something interesting. Each thread had same job to do but 4 threads end work in very short time ( 0,8 second ) and rest of threads ( 6 ) end in 18 to 20 second. I start it again and now i had 6 thread with fast time and 4 with slow. Run it again 7 fast and 3 slow. Amount of fast and slow thread looks randomly. So my question is why there is so big difference between fast and slow threads. Why amount of fast and slow threads is so random, and is this Language specific (Java) or maybe operating system, CPU or something else ?

Before moving into the working process of Threads and Processors, I'll explain it in more understandable way.
Scenario
Location A ------------------------------ Location B
| |_____________________________|
| |
| 200 Metres
|
| Have to be carried to
400 Bags of Sand -------------------------- Location B
(In Location A)
So, the worker will have to carry each Sand Bag from Location A to Location B until all the Sandbags are moved to location B.
Lets just pretend that the worker will be instantly Teleported back (for argument sake) to Location A (but not the other way around) once he arrives at Location B.
Case 1
Number of Workforce = 1 (No.of Mens)
Time taken = 2 mins (Time for Moving 1 SandBag from Location A to Location B)
Total time taken to carry 400 Sandbags from Location A to Location B will be
Totaltime Taken = 2 x 400 = 800 mins
Case 2
Number of Workforce = 4 (No.of Mens)
Time taken = 2 mins (Time for Moving 1 SandBag from Location A to Location B)
So now we're going to split the job equally among the available workforce.
Assigned Sandbag for Each worker = 400 / 4 = 100
Lets say everyone is starting their job at the same time.
Total time taken for carrying 100 Sandbags from Location A to Location B for an individual workforce
TimeTaken for Individual Workforce = 2 x 100 = 200 mins
Since everyone had started their job at the same time, all the 400 Sandbags will be carried from Location A to Location B in 200 mins
Case 3
Number of Workforce = 4 (No.of Mens)
Here, lets say that every men has to carry 4 sandbags from Location A to Location B in a single transfer.
Total Sandbags in Single transfer for every worker = 4 bags
Time taken = 12 mins (Time for Moving 4 SandBags from Location A to Location B in a single transfer)
Since everyone is forced to carry 4 sandbags with them instead of 1, this is greatly reduce their speed.
Even consider this,
1) I ordered you to carry 1 sandbag from A to B, you'll take 2 mins.
2) I ordered you to carry 2 sandbags from A to B at one transfer, you'll take 5 mins instead of theoritical 4 mins, because this is due to our body conditions and the weight we're carrying.
3) I ordered you to carry 4 sandbags from A to B at one transfer, you'll take 12 mins instead of (Theoritical 8 mins in Point 1, Theoritical 10 mins in Point 2), which is also because of human nature.
So now we're going to split the job equally among the available workforce.
Assigned Sandbag for Each worker = 400 / 4 = 100
Total transfers for Each worker = 100 / 4 = 25 Transfers
Calculating the time taken for single worker to complete his full job
Total time for Single worker = 12 mins x 25 tranfers = 300
So, they've taken an additional 100 min instead of theoritical 200 mins (Case 2)
Case 4
Total Sandbags in Single transfer for every worker = 100 bags
Since this is impossible to do by anyone, so he'll just quit.
xx--------------------------------------------------------------------------------------xx
This is the same kind of working principle in Threads and Processors
Here
Workforce = No. of Processors
Total Sandbags = No.of Threads
Sandbags in a Single transfer = No.of threads a (1) processor is going to handle simultaneously
Assume
Available Processors = 4
Runtime.getRuntime().availableProcessors() // -> Syntax to get the no of available processors
Note: Link every Case with the Realtime Case explained above
Case 1
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
Whole operation is series process, so it'll take execution time what it's suppose to.
Case 2
for( int n = 1; n <= 4; n++ ){
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=250000;i++){ // 1000000 / 4 = 250000
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
Here each processor will going to handle 1 thread. So it'll take 1/4th of the actual time.
Case 3
for( int n = 1; n <= 16; n++ ){
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=62500;i++){ // 1000000 / 16 = 62500
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
Totally 16 threads will be created and each processor will have to handle 4 threads simultaneously. So practically, it'll increase the processor load to its max, thus it'll reduce the efficiency of the processor resulting in increase in the execution time of each processor.
Totally it'll take 1/4th of(1/4th of actual time) + performace degrade time(will definitely be higher than than the 1/4th of actual time)
Case 4
for( int n = 1; n <= 100000; n++ ){ // 100000 - Just for argument sake
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
At this stage, creating and starting a thread is more expensive (if the processor already have more threads in it) than the time taken for creating and starting previous threads.As the number of simultaneous threads increases, it'll hugely increase the processor load until the processor reaches its capacity, thus lead to System Crash.
The reason why your threads created in the first were having less execution time is because there wont be any performance degrade in processor during the intital stage. But as the for loop continues, no of threads have to be handled by each processor increases beyond the fair ratio (1:1), so you'll start to experience lag when the threads counts were increased in processor.

Java FutureTask<> without using an ExecutorService?

Recently a use case came up where I had to kick off several blocking IO tasks at the same time and use them in sequence. I did not want to change the order of operation on the consumption side and since this was a web app and these were short-lived tasks in the request path, I didn't want to bottleneck on a fixed threadpool and was looking to mirror the .Net async/await coding style. The FutureTask<> seemed ideal for this but required an ExecutorService. This is an attempt to remove the need for one.
Order of operation:
Kick off tasks
Do some stuff
Consume Task 1
Do some other stuff
Consume Task 2
Finish up
...
I wanted to spawn a new thread for each FutureTask<> but simplify the thread management. After run() completed, the calling thread could be joined.
The solution I came up with was:
package com.staples.search.util;
import java.util.concurrent.Callable;
import java.util.concurrent.Future;
import java.util.concurrent.FutureTask;
public class FutureWrapper<T> extends FutureTask<T> implements Future<T> {
private Thread myThread;
public FutureWrapper(Callable<T> callable) {
super(callable);
myThread = new Thread(this);
myThread.start();
}
#Override
public T get() {
T val = null;
try {
val = super.get();
myThread.join();
}
catch (Exception ex)
{
this.setException(ex);
}
return val;
}
}
Here are a couple of JUnit tests I created to compare FutureWrapper to CachedThreadPool.
#Test
public void testFutureWrapper() throws InterruptedException, ExecutionException {
long startTime = System.currentTimeMillis();
int numThreads = 2000;
List<FutureWrapper<ValueHolder>> taskList = new ArrayList<FutureWrapper<ValueHolder>>();
System.out.printf("FutureWrapper: Creating %d tasks\n", numThreads);
for (int i = 0; i < numThreads; i++) {
taskList.add(new FutureWrapper<ValueHolder>(new Callable<ValueHolder>() {
public ValueHolder call() throws InterruptedException {
int value = 500;
Thread.sleep(value);
return new ValueHolder(value);
}
}));
}
for (int i = 0; i < numThreads; i++)
{
FutureWrapper<ValueHolder> wrapper = taskList.get(i);
ValueHolder v = wrapper.get();
}
System.out.printf("Test took %d ms\n", System.currentTimeMillis() - startTime);
Assert.assertTrue(true);
}
#Test
public void testCachedThreadPool() throws InterruptedException, ExecutionException {
long startTime = System.currentTimeMillis();
int numThreads = 2000;
List<Future<ValueHolder>> taskList = new ArrayList<Future<ValueHolder>>();
ExecutorService esvc = Executors.newCachedThreadPool();
System.out.printf("CachedThreadPool: Creating %d tasks\n", numThreads);
for (int i = 0; i < numThreads; i++) {
taskList.add(esvc.submit(new Callable<ValueHolder>() {
public ValueHolder call() throws InterruptedException {
int value = 500;
Thread.sleep(value);
return new ValueHolder(value);
}
}));
}
for (int i = 0; i < numThreads; i++)
{
Future<ValueHolder> wrapper = taskList.get(i);
ValueHolder v = wrapper.get();
}
System.out.printf("Test took %d ms\n", System.currentTimeMillis() - startTime);
Assert.assertTrue(true);
}
class ValueHolder {
private int value;
public ValueHolder(int val) { value = val; }
public int getValue() { return value; }
public void setValue(int val) { value = val; }
}
Repeated runs puts the FutureWrapper at ~925ms vs. ~935ms for the CachedThreadPool. Both tests bump into OS thread limits.
Things seem to work and the thread spawning is pretty fast (10k threads with random sleeps in ~4s). Does anyone see something wrong with this implementation?

Creating and starting thousands of threads is usually a very bad idea, because creating threads is expensive, and having more threads than processors will bring no performance gain but cause thread-context-switches that consume CPU-cycles instead. (See notes very below)
So in my opinion, your test-code contains a big error in reasoning: You are simulating CPU load by calling Thread.sleep(500). But in fact, this does not really cause the CPU to do anything. It is possible to have many sleeping threads in parallel - no matter how many processors you have, but it is not possible to run more CPU consuming tasks than processors in (real) parallel.
If you simulate real CPU load, you'll see, that more threads will just increase the overhead due to thread-management, but not decrease the total processing time.
So let's compare different ways to run CPU consuming tasks in parallel!
First, let's assume we've got some CPU consuming task that always takes the same amount of time:
public Integer task() throws Exception {
// do some computations here (e.g. fibonacchi, primes, cipher, ...)
return 1;
}
Our goal is to run this task NUM_TASKS times using different execution strategies. For our tests, we set NUM_TASKS = 2000.
(1) Using a thread-per-task strategy
This strategy is very comparable to your approach, with the difference, that it is not necessary to subclass FutureTask and fiddle around with threads. Instead, you can use FutureTask directly as it is both, a Runnable and a Future:
#Test
public void testFutureTask() throws InterruptedException, ExecutionException {
List<RunnableFuture<Integer>> taskList = new ArrayList<RunnableFuture<Integer>>();
// run NUM_TASKS FutureTasks in NUM_TASKS threads
for (int i = 0; i < NUM_TASKS; i++) {
RunnableFuture<Integer> rf = new FutureTask<Integer>(this::task);
taskList.add(rf);
new Thread(rf).start();
}
// now wait for all tasks
int sum = 0;
for (Future<Integer> future : taskList) {
sum += future.get();
}
Assert.assertEquals(NUM_TASKS, sum);
}
Running this test with JUnitBenchmarks (10 test iterations + 5 warmup iterations) yields the following result:
ThreadPerformanceTest.testFutureTask: [measured 10 out of 15 rounds, threads: 1 (sequential)]
round: 0.66 [+- 0.01], round.block: 0.00 [+-
0.00], round.gc: 0.00 [+- 0.00], GC.calls: 66, GC.time: 0.06, time.total: 10.59, time.warmup: 4.02, time.bench: 6.57
So one round (execution time of method task()) is about 0.66 seconds.
(2) Using a thread-per-cpu strategy
This strategy uses a fixed number of threads to execute all tasks. Therefore, we create an ExecutorService via Executors.newFixedThreadPool(...). The number of threads should be equal to the number of CPUs (Runtime.getRuntime().availableProcessors()), which is 8 in my case.
To be able to track the results, we simply use a CompletionService. It automatically takes care of the results - no matter in which order they arrive.
#Test
public void testFixedThreadPool() throws InterruptedException, ExecutionException {
ExecutorService exec = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
CompletionService<Integer> ecs = new ExecutorCompletionService<Integer>(exec);
// submit NUM_TASKS tasks
for (int i = 0; i < NUM_TASKS; i++) {
ecs.submit(this::task);
}
// now wait for all tasks
int sum = 0;
for (int i = 0; i < NUM_TASKS; i++) {
sum += ecs.take().get();
}
Assert.assertEquals(NUM_TASKS, sum);
}
Again we run this test with JUnitBenchmarks with the same settings. The results are:
ThreadPerformanceTest.testFixedThreadPool: [measured 10 out of 15 rounds, threads: 1 (sequential)]
round: 0.41 [+- 0.01], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 22, GC.time: 0.04, time.total: 6.59, time.warmup: 2.53, time.bench: 4.05
Now one round is only 0.41 seconds (almost 40% runtime reduction)! Also not the fewer GC calls.
(3) Sequential execution
For comparison we should also measure the non-parallelized execution:
#Test
public void testSequential() throws Exception {
int sum = 0;
for (int i = 0; i < NUM_TASKS; i++) {
sum += this.task();
}
Assert.assertEquals(NUM_TASKS, sum);
}
The results:
ThreadPerformanceTest.testSequential: [measured 10 out of 15 rounds, threads: 1 (sequential)]
round: 1.50 [+- 0.01], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+-0.00], GC.calls: 244, GC.time: 0.15, time.total: 22.81, time.warmup: 7.77, time.bench: 15.04
Note that 1.5 seconds is for 2000 executions, so a single execution of task() takes 0.75 ms.
Interpretation
According to Amdahl's law, the time T(n) to execute an algorithm on n processors, is:
B is the fraction of the algorithm that cannot be parallelized and must run sequentially. For pure sequential algorithms, B is 1, for pure parallel algorithms it would be 0 (but this is not possible as there is always some sequential overhead).
T(1) can be taken from our sequential execution: T(1) = 1.5 s
If we had no overhead (B = 0), on 8 CPUs we'd got: T(8) = 1.5 / 8 = 0.1875 s.
But we do have overhead! So let's compute B for our two strategies:
B(thread-per-task) = 0.36
B(thread-per-cpu) = 0.17
In other words: The thread-per-task strategy has twice the overhead!
Finally, let's compute the speedup S(n). That's the number of times, an algorithm runs faster on n CPUs compared to sequential execution (S(1) = 1):
Applied to our two strategies, we get:
thread-per-task: S(8) = 2.27
thread-per-cpu: S(8) = 3.66
So the thread-per-cpu strategy has about 60% more speedup than thread-per-task.
TODO
We should also measure and compare memory consumption.
Note: This all is only true for CPU consuming tasks. If instead, your tasks perform lots of I/O related stuff, you might benefit from having more threads than CPUs as waiting for I/O will put a thread in idle mode, so the CPU can execute another thread meanwhile. But even in this case, there is a reasonable upper limit which is usually far below 2000 on a PC.

Why this piece of Java code is not running concurrently

I have written Sieve of Eratosthenes which is supposed to work in parallel, but it's not. When I increase number of threads, time of computing is not getting lower. Any ideas why?
Main class
import java.util.Date;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ConcurrentTest {
public static void main(String[] args) throws InterruptedException {
Sieve task = new Sieve();
int x = 1000000;
int threads = 4;
task.setArray(x);
Long beg = new Date().getTime();
ExecutorService exec = Executors.newCachedThreadPool();
for (int i = 0; i < threads; i++) {
exec.execute(task);
}
exec.shutdown();
Long time = 0L;
// Main thread is waiting until all threads are terminated
// ( it means that computing is done)
while (true)
if (exec.isTerminated()) {
time = new Date().getTime() - beg;
break;
}
System.out.println("Time is " + time);
}
}
Sieve class
import java.util.concurrent.ConcurrentHashMap;
public class Sieve implements Runnable {
private ConcurrentHashMap<Integer, Boolean> array =
new ConcurrentHashMap<Integer, Boolean>();
private int x;
public void run() {
while(true){
// I am getting synchronized number to check if it's prime
int n = getCounter();
// If no more numbers to check, stop loop
if( n == -1)
break;
// If HashMap contains number, we can further
if(!array.containsKey(n))continue;
for (int i = 2 * n; i <= x; i += n) {
// Compound numbers are removed from HashMap, Eg. 6, 12 and much more.
array.remove(i);
}
}
}
private synchronized int getCounter(){
if( counter < x)
return counter++;
else return -1;
}
public void setArray(int x) {
this.x = x;
for (int i = 2; i <= x; i++)
array.put(i, false);
}
}
I made some tests with different number of threads. These are results:
Nr of threads 1 Time is 1850, 1795, 1825
Nr of threads 2 Time is 1845, 1836, 1814
Nr of threads 3 Time is 1767, 1820, 1756
Nr of threads 4 Time is 1732, 1840, 2083
Nr of threads 5 Time is 1791, 1795, 1803
Nr of threads 6 Time is 1825, 1728, 1707
Nr of threads 7 Time is 1754, 1729, 1686
Nr of threads 8 Time is 1760, 1717, 1817
Nr of threads 9 Time is 1721, 1699, 1673
Nr of threads 10 Time is 1661, 1722, 1718

When I increase number of threads, time of computing is not getting
lower
tl;dr: your problem size is too small. If you increase x to 10000000, the differences will become more obvious. They won't be what you're expecting, though.
I tried your code on an eight core machine with two slight modifications:
For timing, I used System.nanoTime() instead of getTime() on a Date.
I used the awaitTermination method of ExecutorService rather than a spinloop to check for the end of run.
I tried launching your Sieve tasks using a fixed thread pool, a cached thread pool and a fork join pool and comparing the results of different values for your thread variable.
I see the following results (in milliseconds) on my machine with x=10000000:
Thread count = 1 2 4 8 16
Fixed thread pool = 5451 3866 3639 3227 3120
Cached thread pool= 5434 3763 3709 3258 3078
Fork-join pool = 6732 3670 3735 3190 3102
What these results show us is a clear benefit of changing from a single thread of execution to two threads. However, the benefit of additional threads drops off rapidly. There's an interesting plateau going from two to four threads and marginal benefits up to 16.
In addition, you can also see that the different threading mechanisms have different initial overhead: I didn't expect the Fork-Join pool to cost that much more to start than the other mechanisms.
So, as written, you shouldn't really expect a benefit past two threads for small but non-trivial problem sets.
If you'd like to increase the benefit of additional threads, you're going to need to look at your current implementation. For example, when I switched from your synchronized getCounter() to an AtomicInteger using incrementAndGet(), I eliminated the overhead of the synchronized method. The result is that all of my four thread numbers dropped on the order of 1000 milliseconds.

Total time taken and Average time taken by all the threads

I am working on a project in which I need to measure Total Time taken by program and average time taken by program. And that program is a Multithreaded program.
In that program, each thread is working in a particular range. Input parameters is Number of Threads and Number of Task.
If number of threads is 2 and number of tasks is 10 then each thread will be performing 10 tasks. So that means 2 thread will be doing 20 tasks.
So that means-
First thread should be using id between 1 and 10 and second thread should be using id between 11 and 20.
I got the above scenario working. Now I want to measure what is the total time and average time taken by all the threads. So I got the below setup in my program.
Problem Statement:-
Can anyone tell me the way I am trying to measure the Total time and Average time taken by all the threads is correct or not in my below program?
//create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(noOfThreads);
long startTime = 0L;
try {
readPropertyFiles();
startTime = System.nanoTime();
// queue some tasks
for (int i = 0, nextId = startRange; i < noOfThreads; i++, nextId += noOfTasks) {
service.submit(new XMPTask(nextId, noOfTasks, tableList));
}
service.shutdown();
service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
} finally {
long estimatedTime = System.nanoTime() - startTime;
logTimingInfo(estimatedTime, noOfTasks, noOfThreads);
}
private static void logTimingInfo(long elapsedTime, int noOfTasks, int noOfThreads) {
long timeInMilliseconds = elapsedTime / 1000000L;
float avg = (float) (timeInMilliseconds) / noOfTasks * noOfThreads;
LOG.info(CNAME + "::" + "Total Time taken " + timeInMilliseconds + " ms. And Total Average Time taken " + avg + " ms");
}

service.submit is getting executed only noOfThreads times. XMPTask object is created the same number of times.

The time you measure is not the consumed time but the elapsed time.
If the program tested (the JVM) is the only one on the computer, it may be relatively accurate but in a real world a lot of process runs concurrently.
I have already done this job by using a native call to the OS, on Windows (I'll complete this post Monday at my office) and Linux (/proc).

I think you would need to measure the time within the task class itself (XMPTask). Within that task you should be able to extract the id of the thread that is executing it and log that. Using this approach will require reading the logs and doing some calculations on them.
Another approach would be to keep running totals and averages as time progresses. To do this you could write a simple class that is passed into each task that has some static (per jvm) variables for tracking what each thread is doing. Then you could have a single thread outside the Threadpool that did the calculations. So if you wanted to report the average cpu time for each thread every second, this calculation thread could sleep for a second, then calculate and log all the average times, then sleep for a second....
EDIT: After re-reading the requirements, you don't need a background thread, but not sure if we are tracking the average time per thread or average time per task. I have assumed total time and average time per thread and fleshed out the idea in code below. It has not been tested or debugged but should give you a good idea of how to start:
public class Runner
{
public void startRunning()
{
// Create your thread pool
ExecutorService service = Executors.newFixedThreadPool(noOfThreads);
readPropertyFiles();
MeasureTime measure = new MeasureTime();
// queue some tasks
for (int i = 0, nextId = startRange; i < noOfThreads; i++, nextId += noOfTasks)
{
service.submit(new XMPTask(nextId, noOfTasks, tableList, measure));
}
service.shutdown();
service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
measure.printTotalsAndAverages();
}
}
public class MeasureTime
{
HashMap<Long, Long> threadIdToTotalCPUTimeNanos = new HashMap<Long, Long>();
HashMap<Long, Long> threadIdToStartTimeMillis = new HashMap<Long, Long>();
HashMap<Long, Long> threadIdToStartTimeNanos = new HashMap<Long, Long>();
private void addThread(Long threadId)
{
threadIdToTotalCPUTimeNanos.put(threadId, 0L);
threadIdToStartTimeMillis.put(threadId, 0L);
}
public void startTimeCount(Long threadId)
{
synchronized (threadIdToStartTimeNanos)
{
if (!threadIdToStartTimeNanos.containsKey(threadId))
{
addThread(threadId);
}
long nanos = System.nanoTime();
threadIdToStartTimeNanos.put(threadId, nanos);
}
}
public void endTimeCount(long threadId)
{
synchronized (threadIdToStartTimeNanos)
{
long endNanos = System.nanoTime();
long startNanos = threadIdToStartTimeNanos.get(threadId);
long nanos = threadIdToTotalCPUTimeNanos.get(threadId);
nanos = nanos + (endNanos - startNanos);
threadIdToTotalCPUTimeNanos.put(threadId, nanos);
}
}
public void printTotalsAndAverages()
{
long totalForAllThreadsNanos = 0L;
int numThreads = 0;
long totalWallTimeMillis = 0;
synchronized (threadIdToStartTimeNanos)
{
numThreads = threadIdToStartTimeMillis.size();
for (Long threadId: threadIdToStartTimeNanos.keySet())
{
totalWallTimeMillis += System.currentTimeMillis() - threadIdToStartTimeMillis.get(threadId);
long totalCPUTimeNanos = threadIdToTotalCPUTimeNanos.get(threadId);
totalForAllThreadsNanos += totalCPUTimeNanos;
}
}
long totalCPUMillis = (totalForAllThreadsNanos)/1000000;
System.out.println("Total milli-seconds for all threads: " + totalCPUMillis);
double averageMillis = totalCPUMillis/numThreads;
System.out.println("Average milli-seconds for all threads: " + averageMillis);
double averageCPUUtilisation = totalCPUMillis/totalWallTimeMillis;
System.out.println("Average CPU utilisation for all threads: " + averageCPUUtilisation);
}
}
public class XMPTask implements Callable<String>
{
private final MeasureTime measure;
public XMPTask(// your parameters first
MeasureTime measure)
{
// Save your things first
this.measure = measure;
}
#Override
public String call() throws Exception
{
measure.startTimeCount(Thread.currentThread().getId());
try
{
// do whatever work here that burns some CPU.
}
finally
{
measure.endTimeCount(Thread.currentThread().getId());
}
return "Your return thing";
}
}
After writing all this, there is one thing that seems a bit strange in that the XMPTask seems to know too much about the list of tasks, when, I think you should just create an XMPTask for every task that you have, give it enough information to do the job, and submit them to the service as you create them.

Java strange performance inconsistency

I have a simple recursive method, a depth first search. On each call, it checks if it's in a leaf, otherwise it expands the current node and calls itself on the children.
I'm trying to make it parallel, but I notice the following strange (for me) problem.
I measure execution time with System.currentTimeMillis().
When I break the search into a number of subsearches and add the total execution time, I get a bigger number than the sequential search. I only measure execution time, no communication or sync, etc. I would expect to get the same time when I add the times of the subtasks. This happens even if I just run one task after the other, so without threads. If I just break the search into some subtasks and run the subtasks one after the other, I get a bigger time.
If I add the number of method calls for the subtasks, I get the same number as the sequential search. So, basically, in both cases I do the same number of method calls, but I get different times.
I'm guessing there's some overhead on initial method calls or something else caused by a JVM mechanism. Any ideas what could it be?
For example, one sequential search takes around 3300 ms. If I break it into 13 tasks, it takes a total time of 3500ms.
My method looks like this:
private static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
Whenever I call it, I do it like this:
for(int i = 0; i < num_tasks; i++){
long start = System.currentTimeMillis();
dfs(tasks[i]);
totalTime += (System.currentTimeMillis() - start);
}
Problem is totalTime increases with num_tasks and I would expect to stay the same because the method_calls variable stays the same.

You should average out the numbers over longer runs. Secondly the precision of currentTimeMillis may not be sufficient, you can try using System.nanoTime().

As in all the programming languages, whenever you call a procedure or a method, you have to push the environment, initialize the new one, execute the programs instructions, return the value on the stack and finally reset the previous environment. It cost a bit! Create a thread cost also more!
I suppose that if you enlarge the researching tree you will have benefit by the parallelization.

Adding system clock time for several threads seems a weird idea. Either you are interested in the time until processing is complete, in which case adding doesn't make sense, or in cpu usage, in which case you should only count when the thread is actually scheduled to execute.
What probably happens is that at least part of the time, more threads are ready to execute than the system has cpu cores, and the scheduler puts one of your threads to sleep, which causes it to take longer to complete. It makes sense that this effect is exacerbated the more threads you use. (Even if your program uses less threads than you have cores, other programs (such as your development environment, ...) might).
If you are interested in CPU usage, you might wish to query ThreadMXBean.getCurrentThreadCpuTime

I'd expect to see Threads used. Something like this:
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Puzzle {
static volatile long totalTime = 0;
private static int method_calls = 0;
/**
* #param args
*/
public static void main(String[] args) {
final int num_tasks = 13;
final State[] tasks = new State[num_tasks];
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for(int i = 0; i < num_tasks; i++){
threadPool.submit(new DfsRunner(tasks[i]));
}
try {
threadPool.shutdown();
threadPool.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException e) {
System.out.println("Interrupted");
}
System.out.println(method_calls + " Methods in " + totalTime + "msecs");
}
static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
}
With the runnable bit like this:
public class DfsRunner implements Runnable {
private State state;
public DfsRunner(State state) {
super();
this.state = state;
}
#Override
public void run() {
long start = System.currentTimeMillis();
Puzzle.dfs(state);
Puzzle.totalTime += (System.currentTimeMillis() - start);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.