I need help or some ideas on how to get the loop in this code to stop executing when the speedUp factor settles to a particular value. The idea of this method is continually run an ever increasing number of threads and derive a speedUp factor from the results. The rounded speedUp factor is how many cores are present on the machine. Running a 4 threaded task will have the same speedUp factor as a 16 threaded task on a 4 core machine. I want to be able to not have to manually set number of threads to run. When the speedUp factor settles to a value I want the program to terminate. There is no need to run a test for 8, 16, or 32 threads if the speed up factor has already settled at 2 for example.
Example output for a 4 core machine:
Number of threads tested: 1
Speed up factor: 1.0
Number of threads tested: 2
Speed up factor: 1.8473736372646188
Number of threads tested: 4
Speed up factor: 3.9416666666666669
Number of threads tested: 8
Speed up factor: 3.9750993377483446
Number of threads tested: 16
Speed up factor: 4.026086956521739
THIS MACHINE HAS: 4 CORES
THE APPLICATION HAS COMPLETED EXECUTION. THANK YOU
private static void multiCoreTest() {
// A runnable for the threads
Counter task = new Counter(1500000000L);
// A variable to store the number of threads to run
int threadMultiplier = 1;
// A variable to hold the time it takes for a single thread to execute
double singleThreadTime = ThreadTest.runTime(1, task);
// Calculating speedup factor for a single thread task
double speedUp = (singleThreadTime * threadMultiplier) / (singleThreadTime);
// Printing the speed up factor of a single thread
System.out.println("Number of threads tested: " + threadMultiplier);
System.out.println("Speed up factor: " + speedUp);
// Testing multiple threads
while (threadMultiplier < 16) {
// Increasing the number of threads by a factor of two
threadMultiplier *= 2;
// A variable to hold the time it takes for multiple threads to
// execute
double multiThreadTime = ThreadTest.runTime(threadMultiplier, task);
// Calculating speedup factor for multiple thread tests
speedUp = (singleThreadTime * threadMultiplier) / (multiThreadTime);
// Message to the user
System.out.println("\n" + "Number of threads tested: "
+ threadMultiplier);
System.out.println("Speed up factor: " + speedUp);
}
// Print number of cores
System.out.println("\n" + "THIS MACHINE HAS: " + Math.round(speedUp)
+ " CORES");
System.out.println("\n"
+ "THE APPLICATION HAS COMPLETED EXECUTION. THANK YOU");
// Exiting the system
System.exit(0);
}
}
Test if the new speedup is the same as the old one:
double oldSpeedUp = 0;
boolean found = false;
while(!found && threadMultiplier < 16) {
// ...
found = Math.round(speedUp) == Math.round(oldSpeedUp);
oldSpeedUp = speedUp;
}
As a side note, if you want the number of cores, you can call :
int cores = Runtime.getRuntime().availableProcessors();
Related
I have a simple Java application that basically increments a counter 600M times. I have separated this task into several threads, each incrementing its own counter, and finally summing these counters.
Oddly, having more threads than cores achieves better performance:
Example (on Intel I7-9850H with six cores) for average calculation time:
Having 6 threads, each with 100M increments yields 97 ms
Having 60 threads, each with 10M increments yields 61 ms
AFAIK Java maps each thread to a real system thread.
Any ideas why is this happening?
EDIT:
Is it possible that the reason is that my computer has many other running processes and threads, so competition of 60 threads against all other inhabits is better than having only 6 threads competing on CPU resources?
The code of the increment method:
private static void incrementWithLockFree(long increments, int threads) throws InterruptedException {
final long[] numbers = new long[threads];
ExecutorService threadPool = Executors.newFixedThreadPool(threads);
for (int task = 0; task < threads; task++) {
int finalTask = task;
threadPool.submit(() -> {
for (long increment = 0; increment < increments; increment++) {
numbers[finalTask]++;
}
});
}
threadPool.shutdown();
threadPool.awaitTermination(1, TimeUnit.DAYS);
long number = 0;
for (long num : numbers) {
number += num;
}
System.out.println(number);
}
On my system less threads perform better.
Is it possible that the reason is that my computer has many other running processes and threads, so competition of 60 threads against all other inhabits is better than having only 6 threads competing on CPU resources?
Yes.
In case this is a real use-case, take a look at LongAdder, which is optimized for multithreaded counter scenarios.
I wrote an Java code just for testing how my CPU will run when have to may operation to do so I wrote loop that will add 1 to var in 100000000000 iterations:
public class NoThread {
public static void main(String[] args) {
long s = System.currentTimeMillis();
int sum = 0;
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
long k = System.currentTimeMillis();
System.out.println("Time" + (k-s)+ " " + sum );
}
}
Code finish working after 30 - 40 sec.
Next I decide to split this operation into 10 threads to make my cpu more cry and say my prog to write time when each thread end:
public class WithThread {
public static void main(String[] args) {
Runnable[] run = new Runnable[10];
Thread[]thread = new Thread[10];
for (int i = 0; i<=9;i++){
run[i] = new Counter(i);
thread[i] = new Thread(run[i]);
thread[i].start();
}
}
}
and
public class Counter implements Runnable {
private int inc;
private int incc;
private int sum = 0;
private int id;
public Counter(int a){
id = a;
inc = a * 100000;
incc = (a+1)*100000;
}
#Override
public void run(){
long s = System.currentTimeMillis();
for (int i = inc;i<=incc;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
long k = System.currentTimeMillis();
System.out.println("Time" + (k-s)+ " " + sum + " in thread " + id);
}
}
In the result whole code end in 18 - 20 second - so two times faster but when I look at time in each Thread end it works i found something interesting. Each thread had same job to do but 4 threads end work in very short time ( 0,8 second ) and rest of threads ( 6 ) end in 18 to 20 second. I start it again and now i had 6 thread with fast time and 4 with slow. Run it again 7 fast and 3 slow. Amount of fast and slow thread looks randomly. So my question is why there is so big difference between fast and slow threads. Why amount of fast and slow threads is so random, and is this Language specific (Java) or maybe operating system, CPU or something else ?
Before moving into the working process of Threads and Processors, I'll explain it in more understandable way.
Scenario
Location A ------------------------------ Location B
| |_____________________________|
| |
| 200 Metres
|
| Have to be carried to
400 Bags of Sand -------------------------- Location B
(In Location A)
So, the worker will have to carry each Sand Bag from Location A to Location B until all the Sandbags are moved to location B.
Lets just pretend that the worker will be instantly Teleported back (for argument sake) to Location A (but not the other way around) once he arrives at Location B.
Case 1
Number of Workforce = 1 (No.of Mens)
Time taken = 2 mins (Time for Moving 1 SandBag from Location A to Location B)
Total time taken to carry 400 Sandbags from Location A to Location B will be
Totaltime Taken = 2 x 400 = 800 mins
Case 2
Number of Workforce = 4 (No.of Mens)
Time taken = 2 mins (Time for Moving 1 SandBag from Location A to Location B)
So now we're going to split the job equally among the available workforce.
Assigned Sandbag for Each worker = 400 / 4 = 100
Lets say everyone is starting their job at the same time.
Total time taken for carrying 100 Sandbags from Location A to Location B for an individual workforce
TimeTaken for Individual Workforce = 2 x 100 = 200 mins
Since everyone had started their job at the same time, all the 400 Sandbags will be carried from Location A to Location B in 200 mins
Case 3
Number of Workforce = 4 (No.of Mens)
Here, lets say that every men has to carry 4 sandbags from Location A to Location B in a single transfer.
Total Sandbags in Single transfer for every worker = 4 bags
Time taken = 12 mins (Time for Moving 4 SandBags from Location A to Location B in a single transfer)
Since everyone is forced to carry 4 sandbags with them instead of 1, this is greatly reduce their speed.
Even consider this,
1) I ordered you to carry 1 sandbag from A to B, you'll take 2 mins.
2) I ordered you to carry 2 sandbags from A to B at one transfer, you'll take 5 mins instead of theoritical 4 mins, because this is due to our body conditions and the weight we're carrying.
3) I ordered you to carry 4 sandbags from A to B at one transfer, you'll take 12 mins instead of (Theoritical 8 mins in Point 1, Theoritical 10 mins in Point 2), which is also because of human nature.
So now we're going to split the job equally among the available workforce.
Assigned Sandbag for Each worker = 400 / 4 = 100
Total transfers for Each worker = 100 / 4 = 25 Transfers
Calculating the time taken for single worker to complete his full job
Total time for Single worker = 12 mins x 25 tranfers = 300
So, they've taken an additional 100 min instead of theoritical 200 mins (Case 2)
Case 4
Total Sandbags in Single transfer for every worker = 100 bags
Since this is impossible to do by anyone, so he'll just quit.
xx--------------------------------------------------------------------------------------xx
This is the same kind of working principle in Threads and Processors
Here
Workforce = No. of Processors
Total Sandbags = No.of Threads
Sandbags in a Single transfer = No.of threads a (1) processor is going to handle simultaneously
Assume
Available Processors = 4
Runtime.getRuntime().availableProcessors() // -> Syntax to get the no of available processors
Note: Link every Case with the Realtime Case explained above
Case 1
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
Whole operation is series process, so it'll take execution time what it's suppose to.
Case 2
for( int n = 1; n <= 4; n++ ){
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=250000;i++){ // 1000000 / 4 = 250000
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
Here each processor will going to handle 1 thread. So it'll take 1/4th of the actual time.
Case 3
for( int n = 1; n <= 16; n++ ){
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=62500;i++){ // 1000000 / 16 = 62500
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
Totally 16 threads will be created and each processor will have to handle 4 threads simultaneously. So practically, it'll increase the processor load to its max, thus it'll reduce the efficiency of the processor resulting in increase in the execution time of each processor.
Totally it'll take 1/4th of(1/4th of actual time) + performace degrade time(will definitely be higher than than the 1/4th of actual time)
Case 4
for( int n = 1; n <= 100000; n++ ){ // 100000 - Just for argument sake
Thread t = new Thread(new Runnable(){
void run(){
for (int i=0;i<=1000000;i++){
for (int j=0;j<=10000;j++){
for (int k = 0;k<=10;k++){
sum++;
}
}
}
}
});
t.start();
}
At this stage, creating and starting a thread is more expensive (if the processor already have more threads in it) than the time taken for creating and starting previous threads.As the number of simultaneous threads increases, it'll hugely increase the processor load until the processor reaches its capacity, thus lead to System Crash.
The reason why your threads created in the first were having less execution time is because there wont be any performance degrade in processor during the intital stage. But as the for loop continues, no of threads have to be handled by each processor increases beyond the fair ratio (1:1), so you'll start to experience lag when the threads counts were increased in processor.
I have a program that applies a median filter to an array of over 2 million values.
I'm trying to compare run times for sequential vs parallel on the same dataset. So when I execute the program, it does 20 runs, every run is timed, and an average of the 20 times is outputted to the console.
ArrayList<Double> times = new ArrayList<>(20);//to calculate average run time
for (int run = 1; run < 21; run++) //algorithm will run 20 times
{
long startTime = System.nanoTime();
switch (method)
{
case 1: //Sequential
filt.seqFilter();
break;
case 2: //ForkJoin Framework
pool.invoke(filt); //pool is ForkJoin
break;
}
Double timeElapsed = (System.nanoTime() - startTime) / 1000000.0;
times.add(run - 1, timeElapsed);
System.out.println("Run " + run + ": " + timeElapsed + " milliseconds.");
}
times.remove(Collections.max(times)); //there's always a slow outlier
double timesSum = 0;
for (Double e : times)
{
timesSum += e;
}
double average = timesSum / 19;
System.out.println("Runtime: " + average);
filt is of type FilterObject which extends RecursiveAction. My overridden compute() method in FilterObject looks like this:
public void compute()
{
if (hi - lo <= SEQUENTIAL_THRESHOLD)
{
seqFilter();
}
else
{
FilterObject left = new FilterObject(lo, (hi + lo) / 2);
FilterObject right = new FilterObject((hi + lo) / 2, hi);
left.fork();
right.compute();
left.join();
}
}
seqFilter() processes the values between the lo and hi indices in the starting array and adds the processed values to a final array in the same positions. That's why there is no merging of arrays after left.join().
My run times for this are insanely fast for parallel - so fast that I think there must be something wrong with my timer OR with my left.join() statement. I'm getting average times of around 170 milliseconds for sequential with a filtering window of size 3 and 0.004 milliseconds for parallel. Why am I getting these values? I'm especially concerned that my join() is in the wrong place.
If you'd like to see my entire code, with all the classes and some input files, follow this link.
After some testing of your code I found the reason. It turned out that the ForkJoinPool runs one task instance only once. Subsequent invoke() calls with the same task instance will return immediately. So you have to reinstantiate the task for every run.
Another problem is with the parallel (standard threads) run. You are starting the threads but never waiting for them to finish before measuring the time. I think You could use the CyclicBarrier here.
With the mentioned fixes I get roughly the same time for ForkJoin and standard threads. And it's three times faster than sequential. Seems reasonable.
P.S. You are doing a micro-benchmark. It may be useful to read answers to that question to improve your benchmark accuracy: How do I write a correct micro-benchmark in Java?
Am I correctly implementing these Java threads? The goal is to have ten concurrent threads who compute a sum from 1 to (upper bound 22 + i). I'm trying to identify the name and print it when running the thread, then print the result when the thread exits. Currently, I have all of the results printing at the same time in a random order and I am not sure if I am correctly getting the information when the thread begins and ends.
public class threads {
public static void main(String[] args) {
for(int i = 0; i < 10; i++) {
final int iCopy = i;
new Thread("" + i) {
public void run() {
int sum = 0;
int upperBound = 22;
int lowerBound = 1;
long threadID = Thread.currentThread().getId();
for (int number = lowerBound; number <= upperBound; number++){
sum = sum + number + iCopy;
}
System.out.println(threadID + " thread is running now, I and will compute the sum from 1 to " + (upperBound + iCopy) + ". The i is : " + iCopy);
System.out.println("Thread id #" + threadID + ", the "+ sum + " is done by the thread.");
}
}.start();
}
}
}
I have executed your code and observed that all threads are running properly 10 in this case. Since threads are invoked in random order that is why this behavior might be seen but I an sure that all threads for running fine and executing the functionality you require.
Any how in output i saw that in for loop the value should start from 0 to 9 but here even this is random, may be because some threads are sleeping while executing and giving way to other threads.
Hope this helps
Thanks.
The order the threads run in will depend entirely on the JVM being used and underlying resources.
If you have several cores (cpus) available, your code may run completely differently to a single core.
Essentially, your main loop runs to the end in a single thread, firing 10 new threads, and puts the start method in a process queue. Other processors may start running those threads. Each extra thread causes different total load, so they run slightly differently (performance wise) on each processor, meaning they run faster/slower, and end in different times.
Your code demonstrates this very well.
I have written Sieve of Eratosthenes which is supposed to work in parallel, but it's not. When I increase number of threads, time of computing is not getting lower. Any ideas why?
Main class
import java.util.Date;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ConcurrentTest {
public static void main(String[] args) throws InterruptedException {
Sieve task = new Sieve();
int x = 1000000;
int threads = 4;
task.setArray(x);
Long beg = new Date().getTime();
ExecutorService exec = Executors.newCachedThreadPool();
for (int i = 0; i < threads; i++) {
exec.execute(task);
}
exec.shutdown();
Long time = 0L;
// Main thread is waiting until all threads are terminated
// ( it means that computing is done)
while (true)
if (exec.isTerminated()) {
time = new Date().getTime() - beg;
break;
}
System.out.println("Time is " + time);
}
}
Sieve class
import java.util.concurrent.ConcurrentHashMap;
public class Sieve implements Runnable {
private ConcurrentHashMap<Integer, Boolean> array =
new ConcurrentHashMap<Integer, Boolean>();
private int x;
public void run() {
while(true){
// I am getting synchronized number to check if it's prime
int n = getCounter();
// If no more numbers to check, stop loop
if( n == -1)
break;
// If HashMap contains number, we can further
if(!array.containsKey(n))continue;
for (int i = 2 * n; i <= x; i += n) {
// Compound numbers are removed from HashMap, Eg. 6, 12 and much more.
array.remove(i);
}
}
}
private synchronized int getCounter(){
if( counter < x)
return counter++;
else return -1;
}
public void setArray(int x) {
this.x = x;
for (int i = 2; i <= x; i++)
array.put(i, false);
}
}
I made some tests with different number of threads. These are results:
Nr of threads 1 Time is 1850, 1795, 1825
Nr of threads 2 Time is 1845, 1836, 1814
Nr of threads 3 Time is 1767, 1820, 1756
Nr of threads 4 Time is 1732, 1840, 2083
Nr of threads 5 Time is 1791, 1795, 1803
Nr of threads 6 Time is 1825, 1728, 1707
Nr of threads 7 Time is 1754, 1729, 1686
Nr of threads 8 Time is 1760, 1717, 1817
Nr of threads 9 Time is 1721, 1699, 1673
Nr of threads 10 Time is 1661, 1722, 1718
When I increase number of threads, time of computing is not getting
lower
tl;dr: your problem size is too small. If you increase x to 10000000, the differences will become more obvious. They won't be what you're expecting, though.
I tried your code on an eight core machine with two slight modifications:
For timing, I used System.nanoTime() instead of getTime() on a Date.
I used the awaitTermination method of ExecutorService rather than a spinloop to check for the end of run.
I tried launching your Sieve tasks using a fixed thread pool, a cached thread pool and a fork join pool and comparing the results of different values for your thread variable.
I see the following results (in milliseconds) on my machine with x=10000000:
Thread count = 1 2 4 8 16
Fixed thread pool = 5451 3866 3639 3227 3120
Cached thread pool= 5434 3763 3709 3258 3078
Fork-join pool = 6732 3670 3735 3190 3102
What these results show us is a clear benefit of changing from a single thread of execution to two threads. However, the benefit of additional threads drops off rapidly. There's an interesting plateau going from two to four threads and marginal benefits up to 16.
In addition, you can also see that the different threading mechanisms have different initial overhead: I didn't expect the Fork-Join pool to cost that much more to start than the other mechanisms.
So, as written, you shouldn't really expect a benefit past two threads for small but non-trivial problem sets.
If you'd like to increase the benefit of additional threads, you're going to need to look at your current implementation. For example, when I switched from your synchronized getCounter() to an AtomicInteger using incrementAndGet(), I eliminated the overhead of the synchronized method. The result is that all of my four thread numbers dropped on the order of 1000 milliseconds.