I've been writing the following code for my OS course and I got some weird results. The code creates x threads and runs them concurrently in order to multiply two squared matrices. Every thread will multiply Number_of_rows/Number_of_threads rows of the input matrices.
When running it on a 1024X1024 matrices, with 1...8 threads, I get that the fastest multiplication happens when using only one thread. I would expect that a MacBook pro with i5 processor (2-cores) will utilize the two cores and that will yield faster results when using two threads.
Running time goes from about ~9.2 seconds using one thread, ~9.6 seconds to 27 seconds using 8.
Any idea why this is happening?
BTW, A few things about the code:
a. Assume that both matrices have identical dimensions and are square.
b. Assume that number of threads <= number of rows/columns.
public class MatrixMultThread implements Runnable {
final static int MATRIX_SIZE = 1024;
final static int MAX_THREADS = MATRIX_SIZE;
private float[][] a;
private float[][] b;
private float[][] res;
private int startIndex;
private int endIndex;
public MatrixMultThread(float[][] a, float[][]b, float[][] res, int startIndex, int endIndex) {
this.a = a;
this.b = b;
this.res = res;
this.startIndex = startIndex;
this.endIndex = endIndex;
}
public void run() {
float value = 0;
for (int k = startIndex; k < endIndex; k++) {
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < a.length; j++) {
value += a[k][j]*b[j][i];
}
res[k][i] = value;
value = 0;
}
}
}
public static float[][] mult(float[][] a, float[][] b, int threadCount){
// Get number of rows per each thread.
int rowsPerThread = (int) Math.ceil(MATRIX_SIZE / threadCount);
float[][] res = new float[MATRIX_SIZE][MATRIX_SIZE];
// Create thread array
Thread[] threadsArray = new Thread[threadCount];
int rowCounter = 0;
for (int i = 0; i < threadCount; i++) {
threadsArray[i] = new Thread(new MatrixMultThread(a,b,res,rowCounter, Math.max(rowCounter + rowsPerThread, MATRIX_SIZE)));
threadsArray[i].start();
rowCounter += rowsPerThread;
}
// Wait for all threads to end before finishing execution.
for (int i = 0; i < threadCount; i++) {
try {
threadsArray[i].join();
} catch (InterruptedException e) {
System.out.println("join failed");
}
}
return res;
}
public static void main(String args[]) {
// Create matrices and random generator
Random randomGenerator = new Random();
float[][] a = new float[MATRIX_SIZE][MATRIX_SIZE];
float[][] b = new float[MATRIX_SIZE][MATRIX_SIZE];
// Initialize two matrices with initial values from 1 to 10.
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < a.length; j++) {
a[i][j] = randomGenerator.nextFloat() * randomGenerator.nextInt(100);
b[i][j] = randomGenerator.nextFloat() * randomGenerator.nextInt(100);
}
}
long startTime;
for (int i = 1; i <= 8; i++) {
startTime = System.currentTimeMillis();
mult(a,b,i);
System.out.println("Total running time is: " + (System.currentTimeMillis() - startTime) + " ms");
}
}
}
Firstly a bit of logging helps. I did logging for this and found out a bug in your logic.
Here is the log
Starting execution for thread count: 1
Start index: 0
End index: 1024
Starting execution: MatrixMultiplier: 0
Ending executionMatrixMultiplier: 0
Total running time is: 6593 ms
Starting execution for thread count: 2
Start index: 0
End index: 1024 <------ This is the problem area
Start index: 512
End index: 1024
Starting execution: MatrixMultiplier: 1
Starting execution: MatrixMultiplier: 0
Your first thread in all iterations is performing whole multiplication everytime. That's why you are not seeing results. Figure out the bug.
Related
I have the following Class with some sorting algorithms like MaxSort, BubbleSort, etc.:
class ArrayUtility {
public static int returnPosMax(int[] A, int i, int j) {
int max = i;
int position = 0;
for(int c = 0; c <= j; c++){
if(c >= i){
if(A[c] > max){
max = A[c];
position = c;
}
}
}
return position;
}
public static int returnMax(int[] A, int i, int j) {
return A[returnPosMax(A, i, j)];
}
public static void swap(int[] A, int i, int j) {
int b = A[i];
A[i] = A[j];
A[j] = b;
}
public static void MaxSort(int[] A) {
int posMax;
for(int i = A.length - 1; i >= 0; i--){
posMax = returnPosMax(A, 0, i);
swap(A, posMax, i);
}
}
public static void BubbleSort(int[] A) {
boolean flag = true;
while (flag != false){
flag = false;
for(int i = 1; i <= A.length - 1; i++){
if(A[i-1]>A[i]){
swap(A, i-1, i);
flag = true;
}
}
if(flag = false) {
break;
}
for(int i = A.length - 1; i >= 1; i--){
if(A[i-1]>A[i]){
swap(A, i - 1, i);
flag = true;
}
}
}
}
public static void BubbleSortX(int[] A) {
boolean flag = true;
while (flag != false){
flag = false;
for(int i = 1; i <= A.length - 1; i++){
if(A[i-1]>A[i]){
swap(A, i-1, i);
flag = true;
}
}
}
}
}
Now i have to create a Test Class to evaluate the different sorting algorithms for different lengths of randomly created Arrays:
import java.util.Random;
import java.util.Arrays;
public class TestSorting{
public static void main(String[] args){
int[] lengthArray = {100, 1000, 10000, 100000};
for(int i = 0; i <= lengthArray.length - 1; i++){
int[] arr = new int[i];
for(int j = 0; j < i; j++){
Random rd = new Random();
int randInt = rd.nextInt();
arr[j] = randInt;
}
/* long startTime = System.nanoTime();
ArrayUtility.MaxSort(arr);
long cpuTime = System.nanoTime() - startTime;
System.out.println("Time: " + cpuTime + " - Array with Length: " + lengthArray[i] + " Using MaxSort"); */
/* long startTime = System.nanoTime();
ArrayUtility.BubbleSortX(arr);
long cpuTime = System.nanoTime() - startTime;
System.out.println("Time: " + cpuTime + " - Array with Length: " + lengthArray[i] + " Using BubbleSortX"); */
long startTime = System.nanoTime();
ArrayUtility.BubbleSort(arr);
long cpuTime = System.nanoTime() - startTime;
System.out.println("Time: " + cpuTime + " - Array with Length: " + lengthArray[i] + " Using BubbleSort");
/*long startTime = System.nanoTime();
Arrays.sort(arr)
long cpuTime = System.nanoTime() - startTime;
System.out.println("Time: " + cpuTime + " - Array with Length: " + lengthArray[i] + " Using BubbleSort"); */
}
}
}
Now when i run a certain sorting algorithm (i set the others as comment for the meantime), i get weird results, for example
Time: 1049500 - Array with Length: 100 Using BubbleSort
Time: 2200 - Array with Length: 1000 Using BubbleSort
Time: 13300 - Array with Length: 10000 Using BubbleSort
Time: 3900 - Array with Length: 100000 Using BubbleSort
And any time i run the test i get different results, such that Arrays with 10 times the length take less time to sort, also i dont understand why the array with 100 integers takes so long.
TL;DR: your benchmark is wrong.
Explanation
To make a good benchmark, you need to do a lot of research. A good starting point is this article and this talk by Alexey Shipilev, the author of micro-benchmark toolkit JMH.
Main rules for benchmarking:
warm up! Do a bunch (like, thousands) of warmup rounds before you actually measure stuff - this will allow JIT compiler to do its job, all optimizations to apply, etc.
Monitor your GC closely - GC event can skid the results drastically
To avoid that - repeat the benchmark many (hundreds thousands) times and get the average.
All this can be done in JMH.
I took a snippet out of your code to show you where your code is buggy.
public static void main(String[] args){
int[] lengthArray = {100, 1000, 10000, 100000};
for(int i = 0; i <= lengthArray.length - 1; i++) { // this loop goes from 0 - 3
int[] arr = new int[i]; // thats why this array will be of size 0 - 3
// correct line would be:
// int[] arr = new int[lengthArray[i]];
for(int j = 0; j < i; j++) {
// correct line would be:
// for (int j = 0; j < arr.length; j++) {
...
Additionally, the hint for benchmarking from Dmitry is also important to note.
Can someone help me with this, please? I'm trying to do a matrix multiplication, using threads. This is what I have so far:
//updated
public class Multiplication {
public static final int NUM_OF_THREADS = 8;
public static final int MATRIX_SIZE = 1000;
public static void main(String args[]) throws InterruptedException {
long startTime = System.currentTimeMillis();
int MatrixA[][] = matrixGenerator();
int MatrixB[][] = matrixGenerator();
int m1rows = MatrixA.length;
int m1cols = MatrixA[0].length;
int m2cols = MatrixB[0].length;
int MatrixC[][] = new int[m1rows][m2cols];
ExecutorService pool = Executors.newFixedThreadPool(NUM_OF_THREADS);
for (int row1 = 0; row1 < m1rows; row1++) {
for (int col1 = 0; col1 < m1cols; col1++) {
pool.submit(new MultiplicationThreading(row1, col1, MatrixA, MatrixB, MatrixC));
}
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS);
long endTime = System.currentTimeMillis();
System.out.println("Calculated in "
+ (endTime - startTime) + " milliseconds");
}
public static int[][] matrixGenerator() {
int matrix[][] = new int[MATRIX_SIZE][MATRIX_SIZE];
Random r = new Random();
for (int i = 0; i < matrix.length; i++) {
for (int j = 0; j < matrix[i].length; j++) {
matrix[i][j] = r.nextInt(10000);
}
}
return matrix;
}
}
//I have updated the code
I get better timings now. When using 2 threads I get 1.5k milliseconds and when I use 8 threads 1.3k milliseconds
You initialize the thrd array with NUM_THREADS == 9 elements. If m1rows*m1cols exceeds that value, you will get this problem, since you attempt to create more than 9 threads and assign them to elements of the array. (You are attempting to create 50 threads).
Two solutions:
Initialize thrd = new Thread[m1rows*m1cols]
Use a List<Thread>.
Note that you won't execute the threads in parallel, because you are calling Thread.join() immediately after calling Thread.start(). This just blocks the current thread until thrd[threadcount] finishes.
Move the Thread.join() calls into a separate loop, so the threads are all started before you call join on any of them.
for (row = 0; row < m1rows; row++) {
for (col = 0; col < m1cols; col++) {
// creating thread for multiplications
thrd[threadcount] = new Thread(new MultiplicationThreading(row, col, MatrixA, MatrixB, MatrixC));
thrd[threadcount].start(); //thread start
threadcount++;
}
}
for (Thread thread : thrd) {
thread.join();
}
I'm writing a program which is supposed to find the 25 top numbers in a large array using threads. My algorithm seems to work fine, however when comparing the result to an Arrays.sort-ed version of the original array, it seems like my top 25-list misses some of the numbers. I really hate posting this much code in a question, but I'm completely stuck on this, and has been for a couple of hours now. I'd love some help figuring out what's wrong. Here are my classes:
Main.java
import java.util.Arrays;
import java.util.Random;
public class Main {
public static void main(String[] args) {
final int NUM_THRS = 4;
int[] numbers = new int[500];
Random generator = new Random(500);
for(int i = 0; i < numbers.length; i++) {
numbers[i] = Math.abs(generator.nextInt());
}
Thread[] thrs = new Thread[NUM_THRS];
NumberThread[] nthrs = new NumberThread[NUM_THRS];
long startTime = System.currentTimeMillis();
for(int i = 0; i < thrs.length; i++) {
int start = getStart(i, thrs.length, numbers.length);
int stop = getStop(i, thrs.length, numbers.length);
nthrs[i] = new NumberThread(numbers, start, stop);
thrs[i] = new Thread(nthrs[i]);
thrs[i].start();
}
for (int i = 0; i < thrs.length; i++) {
try {
thrs[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
int[] top = new int[25];
int[] indices = new int[NUM_THRS];
for (int i = 0; i < indices.length; i++) {
indices[i] = 24;
}
for(int i = 0; i < top.length; i++) {
top[i] = getMax(nthrs, indices);
}
for (int i = 0; i < top.length; i++) {
System.out.println(top[i]);
}
}
public static int getMax(NumberThread[] thrs, int[] indices) {
int maxNum = 0;
int maxIdx = 0;
for(int i = 0; i < indices.length; i++) {
if(indices[i] >= 0) {
if(thrs[i].topNums[indices[i]] > maxNum) {
maxNum = thrs[i].topNums[indices[i]];
maxIdx = i;
}
}
}
System.out.println("iterate");
indices[maxIdx] = indices[maxIdx]-1;
return maxNum;
}
public static int getStart(int i, int total, int len) {
return i*len/total;
}
public static int getStop(int i, int total, int len) {
if(i != total-1) {
return (i+1)*len/total;
}
return len-1;
}
}
NumberThread.java
public class NumberThread implements Runnable {
int start, stop;
int[] numbers;
int[] topNums;
public NumberThread(int[] numbers, int start, int stop) {
this.numbers = numbers;
this.start = start;
this.stop = stop;
this.topNums = new int[25];
System.out.println(start + " " + stop);
}
#Override
public void run() {
for (int i = start; i <= stop; i++) {
inner: for (int j = topNums.length-1; j > 0; j--) {
if(numbers[i] > topNums[j]) {
topNums[j] = numbers[i];
break inner;
}
}
}
}
}
The numbers printed after main are not the same as the top numbers when I Arrays.sort the numbers-array and print the top 25. Some numbers seem to be missing.
Thanks a lot in advance.
I think that your NumberThread classes Run method isn't doing what it should do. It needs to find the 25 largest numbers in the partition you assign to it, for example if the array you are searching was already sorted then the 25 largest numbers could all be in 1 partition but what its's actually doing is overwriting the first number it finds that's smaller than the current number so you end up with less than 25 numbers and they might not be the largest.
For example consider the sequence 98 99 1 2 3... 98 would get written to topNums[19] but then overwritten with 99.
I'm also not sure about the getMax function, it seems to be trying to merge the different topNums arrays together; however the arrays aren't sorted so I don't see how it can work.
Trying to get a multi-threaded matrix multiplication to work in Java. It is given a (m x n) matrix, a (n x k) matrix and 't' threads to perform the operation on.
My program works when the matrices are square and t == n. When running with t < n, the other threads do not pick up the additional operations, and it returns a partially completed matrix. When the matrices are not square, the additional threads return array out of bounds errors and do not run. I would really appreciate any advice. Here are the relevant code snippets
Beginning threads. multipliers is an array of MatrixMultiplier, a class defined later.
Multiply multiply = new Multiply(cols_mat, rows_mat2);
for (int i = 0; i < threads; i++) {
multipliers[i] = new MatrixMultiplier(multiply);
}
for (int i = 0; i < threads; i++) {
my_threads[i] = new Thread(multipliers[i]);
}
for (int i = 0; i < threads; i++) {
my_threads[i].start();
}
for (int i = 0; i < threads; i++) {
my_threads[i].join();
}
Multiply class which defines the matrix multiplication
class Multiply extends MatrixMultiplication {
private int i;
private int j;
private int chance;
public Multiply(int i, int j) {
this.i = i;
this.j = j;
chance = 0;
}
public synchronized void multiplyMatrix() {
int sum = 0;
int a = 0;
for (a = 0; a < i; a++) {
sum = 0;
for (int b = 0; b < j; b++) {
sum = sum + mat[chance][b] * mat2[b][a];
}
result[chance][a] = sum;
}
if (chance >= i)
return;
chance++;
}
}
And the matrix multiplier
class MatrixMultiplier implements Runnable {
private final Multiply mul;
public MatrixMultiplier(Multiply mul) {
this.mul = mul;
}
#Override
public void run() {
mul.multiplyMatrix();
}
}
Where I personally think the issue lies is with if (chance >= i) return; but I have not found a way to incorporate a thread's column responsibilities with the program still working. Again, any advice pointing me in the right direction would be greatly appreciated.
There are several issues with your code.
The t threads assume that only t multiplications are required to produce your result matrix. This is not to be the case when m != k or t != m or t != k. The threads are worker threads that will only process your requests. I would consider making each MatrixMultiplier have access to the mxn, nxk, mxk matrices and a rolcolumn entries container.
class MatricMultiplier {
private double a[][], b[][], results[][];
private Queue<..> entries;
....
}
The run method will then use the entries container to calculate the sum for a given <row,column> entry of the resulting mxk matrix. The run method could become:
run() {
for (Entry entry = entries.poll(); entry != null; entry = entries.poll()) {
int row = entry.row;
int col = entry.col;
double sum = 0.0;
for (int i = 0; i < a[row].length; i++) {
sum += a[row][i] * b[i][col];
}
results[row][col] = sum;
}
}
There are three things to note here that is different than what you have.
you are not using a synchronization block
each entry is calculating the answer for a unique row/column of the result matrix
the Multiple class is not required any longer
You can then create t threads that process each entry in the entries container and will exit when the entries container is empty.
Note that the entries container should be one of the concurrent Queue containers available in the java.util.concurrent package.
The remaining task is how to create the rowcolumn entries container. Here is some code that you could use:
Queue<..> entries = new Concurrent...<..>();
int rowSize = a.length;
int colSize = b[0].length;
for (int row = 0; row < rowSize; row++) {
for (int col = 0; col < colSize; col++) {
entries.add(new RowColumnEntry(row, col));
}
}
Noting that the a and b are the m×n and n×k matrices.
Hope this helps.
I would like to simulate a situation which is mentioned in books about concurrency - that without a proper synchronization one thread can see a stale value of a variable that has been already modified by a different thread. This could happen because for example a CPU cache.
To do this I have written the following program. The idea is that there are 4 threads that initialize a different part of a shared array. The 5th thread (main, parent thread) waits until all 4 previous threads are done, iterates over the shared array and adds its values (always 1 or if I'm lucky null, which would mean a stale value)
package p1;
class ArrFill implements Runnable {
int l, r;
Integer[] arr;
ArrFill(int l, int r, Integer[] arr) {
this.l = l;
this.r = r;
this.arr = arr;
}
#Override
public void run() {
for(int i = l; i < r; i++)
arr[i] = new Integer(1);
}
}
public class Main {
final static int MAX = 10000000;
final static int tnum = 4;
public static void main(String[] args) throws InterruptedException {
int cores = Runtime.getRuntime().availableProcessors();
System.out.println(cores);
Integer[] arr = new Integer[MAX];
Thread[] t = new Thread[tnum];
if(MAX % tnum != 0)
throw new IllegalStateException();
int step = MAX / tnum;
int l = 0, r = 0;
for(int i = 0; i < tnum; i++) {
l = r;
r += step;
t[i] = new Thread(new ArrFill(l, r, arr));
t[i].start();
}
for(int i = 0; i < tnum; i++)
t[i].join();
int res = 0;
for(int i = 0; i < MAX; i++)
if(arr[i] != null)
res += arr[i];
System.out.println(res == MAX);
}
}
I have run this program many times although I never seen a stale value (null). I have 2 cores. Do you have any idea how this program could be improved to actually present the cached value phenomena? Or maybe you have a completly different approach?
Thanks!