I´m new to this of parallel programming. I was trying to do a method for practicing but everytime the normal process takes less time than the parallel process in execute. Is something wrong with my implementation ?
public class normalExecutor {
public normalExecutor() {
}
public int[][] matriz = new int[3000][3000];
public void search() {
long startTime = System.currentTimeMillis();
int biggest = 0;
matriz[800][800] = 9;
for (int i = 0 ; i < 3000; i++) {
for (int j = 0; j < 3000; j++) {
if(matriz[i][j] == 9) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW normal "+ i + "|" + j + ": " + elapsedTime);
}
}
}
}
}
And this was the try with the Parallel option
public class ParallelExecutor {
final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<?>> futures = new ArrayList<>();
public int[][] matriz = new int[3000][3000];
public ParallelExecutor() {
}
public void parallelSearch() {
long startTime = System.currentTimeMillis();
matriz[800][800] = 9;
for (int i = 0 ; i < 3000; i++) {
for (int j = 0; j < 3000; j++) {
int x = i;
int z = j;
Future<?> future = executor.submit(() -> {
if(matriz[x][z] == 9) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW parallel "+ x + "|" + z+ ": " + elapsedTime);
}
});
}
}
}
}
Even though sometimes the parallel one prints first the output comes always like this
NOW parallel 800|800: 3089
NOW normal 800|800: 21
Thanks
You are running a very simple and fast execution inside a separate thread 9 million times. Just the time it takes to create runnable to wrap your code, the ExecutorService to spend time waiting for available thread and running your code in it will be much greater.
The right approach is to split iteration of 3kx3k matrix into separate threads. For example give each thread 500 rows to process. This way you will have about 6 threads processing independent data in parallel.
I changed your code that shows how relatively fast parallel processing will be when you are dealing with execution that even takes 2 milliseconds for each row.
But I had to make two changes.
First I moved cell with 9 into the middle of matrix so that it will be much harder to find it quickly in normal search.
Second I added Thread.sleep to simulate long running execution in order to justify parallel processing.
final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<?>> futures = new ArrayList<>();
public int[][] matriz = new int[3000][3000];
public void parallelSearch() {
long startTime = System.currentTimeMillis();
matriz[1580][1] = 9;
executor.submit( () -> search( 0, 500, startTime) );
executor.submit( () -> search( 500, 1000, startTime) );
executor.submit( () -> search( 1000, 1500, startTime) );
executor.submit( () -> search( 1500, 2000, startTime) );
executor.submit( () -> search( 2000, 2500, startTime) );
executor.submit( () -> search( 2500, 3000, startTime) );
}
public void search(int startRow, int endRow, long startTime){
for (int i = startRow ; i < endRow; i++) {
//add some execution time to justify parallel processing
try {
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
for (int j = 0; j < 3000; j++) {
int x = i;
int z = j;
if(matriz[x][z] == 9) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW parallel "+ x + "|" + z+ ": " + elapsedTime);
}
}
}
}
public void search() {
long startTime = System.currentTimeMillis();
int biggest = 0;
for (int i = 0 ; i < 3000; i++) {
try {
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
for (int j = 0; j < 3000; j++) {
if( matriz[i][j] == 9 ) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW normal "+ i + "|" + j + ": " + elapsedTime);
}
}
}
}
With the code above and using Thread.sleep you will have this result:
NOW parallel 1580|1: 206
NOW normal 1580|1: 3162
Without Thread.sleep (Threading overhead is much greater then the searching):
NOW parallel 1580|1: 46
NOW normal 1580|1: 9
Related
I would like to measure the program runtime in the code (only the time of searching for min and max), my problem is that searching for min and max is in the loop, and the run time also shows x times as many as in the loop, as I can measure the time in the loop and display only ONE result on the screen?
long realTime = System.currentTimeMillis();
for (int i = 0; i < PARTITIONS; i++) {
final int partition = i;
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
// find min,max
int from = arrayToSearch.length * partition / PARTITIONS;
int to = arrayToSearch.length * (partition + 1) / PARTITIONS;
int min = Integer.MAX_VALUE, max = Integer.MIN_VALUE;
for (int j = from; j < to; j++) {
min = Math.min(min, arrayToSearch[j]);
max = Math.max(max, arrayToSearch[j]);
}
partitionMin[partition] = min;
partitionMax[partition] = max;
long execcutionTime = System.currentTimeMillis() - realTime;
System.out.println("exec time: " + execcutionTime + "ms");
}
});
In this program, PARTITIONS = 4, so the screen shows 4x exec time.
Well, if you don't care which result to show, then the simplest way is to print execution time only if partition is 0, like so:
if (partition == 0) {
System.out.println("exec time: " + execcutionTime + "ms");
}
Just put out of for loop this two lines of your code:
long execcutionTime = System.currentTimeMillis() - realTime; System.out.println("exec time: " + execcutionTime + "ms");
public class Runtime {
public static void main(String[] args) {
int[] n = {1,100,1000,10000};
for (int i=0; i<4; i++) {
StringRepeater s = new StringRepeater();
long start = System.nanoTime();
s.repeatString("hello", n[i]);
long stop = System.nanoTime();
long runtime = stop - start;
System.out.println("T(" + n[i] + ") = " + runtime/1000000000.0 + " seconds");
}
for (int i=0; i<4; i++) {
long start = 0;
long stop = 0;
long runtime100 = 0;
for (int j=0; j<100; j++) {
StringRepeater s = new StringRepeater();
start = System.nanoTime();
s.repeatString("hello", n[i]);
stop = System.nanoTime();
runtime100 = runtime100 + (stop - start);
}
System.out.println("T(" + n[i] + ") = " + runtime100/100000000000.0 + " seconds");
}
}
}
So i've got this code which measures the runtime of repeatString
public class StringRepeater {
public String repeatString(String s, int n){
String result = "";
for(int i=0; i<n; i++) {
result = result + s;
}
return result;
}
}
The top part with 1 for loop calculates runtime of 1 run. The bottom part with 2 for loop calculates it based on average of 100. But for some reason the runtime of second part is averagely so much faster especially for lower n. For n=1 its even 100 times faster.
T(1) = 2.3405E-5 seconds
T(100) = 1.47748E-4 seconds
T(1000) = 0.00358515 seconds
T(10000) = 0.173254266 seconds
T(1) = 1.9015E-7 seconds
T(100) = 3.035997E-5 seconds
T(1000) = 0.00168481277 seconds
T(10000) = 0.10354477848 seconds
This is about the typical return. Is my code wrong or is there something else going on. TL:DL why is average runtime so much lower than 1x runtime? You would expect it to be fairly similar right?
There are many things that require attention:
It's better to avoid division to monitor execute time because you can have a precision problem. So, a first suggestion: keep speed time in nanoseconds.
The performance difference is probably due to just in time compilation: the first time that compiler executes the code, it takes some time to compile on-the-fly the bytecode. Just to demonstrate this, simply try to invert the loops in your code. I do it for you:
public class Runtime {
public static void main(String[] args) {
int[] n = { 1, 100, 1000, 10000 };
for (int i = 0; i < 4; i++) {
long start = 0;
long stop = 0;
long runtime100 = 0;
for (int j = 0; j < 100; j++) {
StringRepeater s = new StringRepeater();
start = System.nanoTime();
s.repeatString("hello", n[i]);
stop = System.nanoTime();
runtime100 = runtime100 + (stop - start);
}
System.out.println("T(" + n[i] + ") = " + runtime100 / 100.0 + " seconds");
}
for (int i = 0; i < 4; i++) {
StringRepeater s = new StringRepeater();
long start = System.nanoTime();
s.repeatString("hello", n[i]);
long stop = System.nanoTime();
long runtime = stop - start;
//System.out.println("T(" + n[i] + ") = " + runtime / 1000000000.0 + " seconds");
System.out.println("T(" + n[i] + ") = " + runtime + " seconds");
}
}
public static class StringRepeater {
public String repeatString(String s, int n) {
String result = "";
for (int i = 0; i < n; i++) {
result = result + s;
}
return result;
}
}
}
When I run this code on my machine I obtain the following results:
T(1) = 985.31 seconds
T(100) = 109439.19 seconds
T(1000) = 2604811.29 seconds
T(10000) = 1.1787790449E8 seconds
T(1) = 821 seconds
T(100) = 18886 seconds
T(1000) = 1099442 seconds
T(10000) = 121750683 seconds
You can see that now the 100's loop now is slower than single round execution. This is because it is executed before now.
3 - If you observe the above result you probably notice that now the situation is simply the opposite respect the initial situation. Why? In my opinion, this is due to the garbage collector work. In the bigger cycle, garbage collection has more work to do, just because there are many temporary variables to garbage.
I hope it helps you.
I am running a very big for loop with 10 million iterations. When I do this in one go it takes 14 secs while when I break it into 20 iterations of 500k small iterations it takes only 6 secs. I am not able to understand why is there such behavior. Is there any problem with my code? Thanks!
Code
public class Benchmark {
static int max = 10000000;
static int start = 0;
static int end = 0;
static boolean dnc = false;
public static void main(String[] args) {
TimeIt timer = new TimeIt();
timer.printTime("bmTimer dnc false", () -> bmTimer());
dnc = true;
sum = 0;
timer.printTime("bmTimer dnc true", () -> bmTimer());
}
private static void bmTimer() {
if (dnc) {
int factor = 500000;
for (int i = 0; i < max; i += factor) {
end = start + factor;
bm(start, end);
start = end + 1;
}
} else {
bm(0, max);
}
}
static int sum = 0;
private static void bm(int start, int end) {
try {
ExecutorService executor = Executors.newFixedThreadPool(4);
List<Future<String>> futures = new ArrayList<>();
for (int j = start; j < end; j++) {
futures.add(executor.submit(new Callable<String>() {
#Override
public String call() throws Exception {
int i = 10;
int j = 9;
return (i - j) + "";
}
}));
}
for (Future<String> future : futures) {
sum += Integer.parseInt(future.get());
}
System.out.println(sum);
executor.shutdown();
executor.awaitTermination(1, TimeUnit.DAYS);
} catch (Exception e) {
e.printStackTrace();
}
}
Output
10000000
Method bmTimer dnc false took : 14.39s
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
5500000
6000000
6500000
7000000
7500000
8000000
8500000
9000000
9500000
10000000
Method bmTimer dnc true took : 5.856s
Can someone help me with this, please? I'm trying to do a matrix multiplication, using threads. This is what I have so far:
//updated
public class Multiplication {
public static final int NUM_OF_THREADS = 8;
public static final int MATRIX_SIZE = 1000;
public static void main(String args[]) throws InterruptedException {
long startTime = System.currentTimeMillis();
int MatrixA[][] = matrixGenerator();
int MatrixB[][] = matrixGenerator();
int m1rows = MatrixA.length;
int m1cols = MatrixA[0].length;
int m2cols = MatrixB[0].length;
int MatrixC[][] = new int[m1rows][m2cols];
ExecutorService pool = Executors.newFixedThreadPool(NUM_OF_THREADS);
for (int row1 = 0; row1 < m1rows; row1++) {
for (int col1 = 0; col1 < m1cols; col1++) {
pool.submit(new MultiplicationThreading(row1, col1, MatrixA, MatrixB, MatrixC));
}
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS);
long endTime = System.currentTimeMillis();
System.out.println("Calculated in "
+ (endTime - startTime) + " milliseconds");
}
public static int[][] matrixGenerator() {
int matrix[][] = new int[MATRIX_SIZE][MATRIX_SIZE];
Random r = new Random();
for (int i = 0; i < matrix.length; i++) {
for (int j = 0; j < matrix[i].length; j++) {
matrix[i][j] = r.nextInt(10000);
}
}
return matrix;
}
}
//I have updated the code
I get better timings now. When using 2 threads I get 1.5k milliseconds and when I use 8 threads 1.3k milliseconds
You initialize the thrd array with NUM_THREADS == 9 elements. If m1rows*m1cols exceeds that value, you will get this problem, since you attempt to create more than 9 threads and assign them to elements of the array. (You are attempting to create 50 threads).
Two solutions:
Initialize thrd = new Thread[m1rows*m1cols]
Use a List<Thread>.
Note that you won't execute the threads in parallel, because you are calling Thread.join() immediately after calling Thread.start(). This just blocks the current thread until thrd[threadcount] finishes.
Move the Thread.join() calls into a separate loop, so the threads are all started before you call join on any of them.
for (row = 0; row < m1rows; row++) {
for (col = 0; col < m1cols; col++) {
// creating thread for multiplications
thrd[threadcount] = new Thread(new MultiplicationThreading(row, col, MatrixA, MatrixB, MatrixC));
thrd[threadcount].start(); //thread start
threadcount++;
}
}
for (Thread thread : thrd) {
thread.join();
}
I have the following pieces of code:
long start = System.currentTimeMillis();
for(int i = 0; i < keys.length; ++i) {
obj.getElement(keys[i]);
}
long total = System.currentTimeMillis() - start;
System.out.println(total/1000d + " seconds");
And the following:
long start = System.currentTimeMillis();
for(int i = 0; i < keys.length; ++i) {
obj.hasElement(keys[i]);
}
long total = System.currentTimeMillis() - start;
System.out.println(total/1000d + " seconds");
The implementations of these methods are:
public T getElement(int key) {
int idx = findIndexOfElement(key);
return idx >= 0? ITEMS[idx]:null;
}
public boolean hasElement(int key) {
return findIndexOfElement(key) >= 0;
}
Pretty straightforward. The only difference between the 2 methods is the conditional access to the table.
Problem: When actually measuring the performance of these snippets the getElement takes twice the time than the hasElement.
So for a series of tests I get ~2.5seconds for the first loop of getElement and ~0.8 secs for the second loop of hasElement.
How is it possible to have such a big difference? I understand that the conditional statement is a branch and jump but still seems to me too big.
Is there a way to improve this?
Update:
The way I measure is:
long min = Long.MAX_VALUE;
long max = Long.MIN_VALUE;
long run = 0;
for(int i = 0; i < 10; ++i) {
long start = System.currentTimeMillis();
for(int i = 0; i < keys.length; ++i) {
obj.getElement(keys[i]);
}
long total = System.currentTimeMillis() - start;
System.out.println(total/1000d + " seconds");
if(total < min) {
min = time;
}
if(total > max) {
max = time;
}
run += time;
for(int i = 0; i < 50; ++i) {
System.gc();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
System.out.println("min=" + min + " max=" + max);
System.out.println("avg = " + (double)run/1000/keys.length);
Is ITEMS definitely an array, and implemented as an array? If it is somehow implemented as a linked list, that would cause O(n) time instead of O(1) time on the get.
Your branches are probably the limiting factor in the short code posted. In the getElement method there is one branch and in the hasElement method there is another one plus it calls the getElement method, making it two branches for that method.
So in summary, the number of branches are double in that method and it seems very reasonable that the runtime also is double.