I am running a very big for loop with 10 million iterations. When I do this in one go it takes 14 secs while when I break it into 20 iterations of 500k small iterations it takes only 6 secs. I am not able to understand why is there such behavior. Is there any problem with my code? Thanks!
Code
public class Benchmark {
static int max = 10000000;
static int start = 0;
static int end = 0;
static boolean dnc = false;
public static void main(String[] args) {
TimeIt timer = new TimeIt();
timer.printTime("bmTimer dnc false", () -> bmTimer());
dnc = true;
sum = 0;
timer.printTime("bmTimer dnc true", () -> bmTimer());
}
private static void bmTimer() {
if (dnc) {
int factor = 500000;
for (int i = 0; i < max; i += factor) {
end = start + factor;
bm(start, end);
start = end + 1;
}
} else {
bm(0, max);
}
}
static int sum = 0;
private static void bm(int start, int end) {
try {
ExecutorService executor = Executors.newFixedThreadPool(4);
List<Future<String>> futures = new ArrayList<>();
for (int j = start; j < end; j++) {
futures.add(executor.submit(new Callable<String>() {
#Override
public String call() throws Exception {
int i = 10;
int j = 9;
return (i - j) + "";
}
}));
}
for (Future<String> future : futures) {
sum += Integer.parseInt(future.get());
}
System.out.println(sum);
executor.shutdown();
executor.awaitTermination(1, TimeUnit.DAYS);
} catch (Exception e) {
e.printStackTrace();
}
}
Output
10000000
Method bmTimer dnc false took : 14.39s
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
5500000
6000000
6500000
7000000
7500000
8000000
8500000
9000000
9500000
10000000
Method bmTimer dnc true took : 5.856s
Related
I´m new to this of parallel programming. I was trying to do a method for practicing but everytime the normal process takes less time than the parallel process in execute. Is something wrong with my implementation ?
public class normalExecutor {
public normalExecutor() {
}
public int[][] matriz = new int[3000][3000];
public void search() {
long startTime = System.currentTimeMillis();
int biggest = 0;
matriz[800][800] = 9;
for (int i = 0 ; i < 3000; i++) {
for (int j = 0; j < 3000; j++) {
if(matriz[i][j] == 9) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW normal "+ i + "|" + j + ": " + elapsedTime);
}
}
}
}
}
And this was the try with the Parallel option
public class ParallelExecutor {
final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<?>> futures = new ArrayList<>();
public int[][] matriz = new int[3000][3000];
public ParallelExecutor() {
}
public void parallelSearch() {
long startTime = System.currentTimeMillis();
matriz[800][800] = 9;
for (int i = 0 ; i < 3000; i++) {
for (int j = 0; j < 3000; j++) {
int x = i;
int z = j;
Future<?> future = executor.submit(() -> {
if(matriz[x][z] == 9) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW parallel "+ x + "|" + z+ ": " + elapsedTime);
}
});
}
}
}
}
Even though sometimes the parallel one prints first the output comes always like this
NOW parallel 800|800: 3089
NOW normal 800|800: 21
Thanks
You are running a very simple and fast execution inside a separate thread 9 million times. Just the time it takes to create runnable to wrap your code, the ExecutorService to spend time waiting for available thread and running your code in it will be much greater.
The right approach is to split iteration of 3kx3k matrix into separate threads. For example give each thread 500 rows to process. This way you will have about 6 threads processing independent data in parallel.
I changed your code that shows how relatively fast parallel processing will be when you are dealing with execution that even takes 2 milliseconds for each row.
But I had to make two changes.
First I moved cell with 9 into the middle of matrix so that it will be much harder to find it quickly in normal search.
Second I added Thread.sleep to simulate long running execution in order to justify parallel processing.
final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<?>> futures = new ArrayList<>();
public int[][] matriz = new int[3000][3000];
public void parallelSearch() {
long startTime = System.currentTimeMillis();
matriz[1580][1] = 9;
executor.submit( () -> search( 0, 500, startTime) );
executor.submit( () -> search( 500, 1000, startTime) );
executor.submit( () -> search( 1000, 1500, startTime) );
executor.submit( () -> search( 1500, 2000, startTime) );
executor.submit( () -> search( 2000, 2500, startTime) );
executor.submit( () -> search( 2500, 3000, startTime) );
}
public void search(int startRow, int endRow, long startTime){
for (int i = startRow ; i < endRow; i++) {
//add some execution time to justify parallel processing
try {
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
for (int j = 0; j < 3000; j++) {
int x = i;
int z = j;
if(matriz[x][z] == 9) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW parallel "+ x + "|" + z+ ": " + elapsedTime);
}
}
}
}
public void search() {
long startTime = System.currentTimeMillis();
int biggest = 0;
for (int i = 0 ; i < 3000; i++) {
try {
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
for (int j = 0; j < 3000; j++) {
if( matriz[i][j] == 9 ) {
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("NOW normal "+ i + "|" + j + ": " + elapsedTime);
}
}
}
}
With the code above and using Thread.sleep you will have this result:
NOW parallel 1580|1: 206
NOW normal 1580|1: 3162
Without Thread.sleep (Threading overhead is much greater then the searching):
NOW parallel 1580|1: 46
NOW normal 1580|1: 9
I need to create a program that can calculate approximation to the constant PI, using Java multi-thread.
I'm intent to use Gregory-Leibniz Series to calculate the result for PI / 4, and then multiply by 4 to get the PI approximation.
But I have some concern about the program:
How can I seperate the calculation process so that I can implement a multi-thread processing for the program? Because the formula is for the total sum, I don't know how to split them into parts and then in the end I will collect them all.
I'm thinking about the fact that the program will execute the formula to infinite so user will need to provide some means of configuring the execution in order to determine when it should stop and return a result. Is it possible and how can I do that?
This is so far the most I can do by now.
public class PICalculate {
public static void main(String[] args) {
System.out.println(calculatePI(5000000) * 4);
}
static double calculatePI(int n) {
double result = 0.0;
if (n < 0) {
return 0.0;
}
for (int i = 0; i <= n; i++) {
result += Math.pow(-1, i) / ((2 * i) + 1);
}
return result;
}
}
The most straightforward, but not the most optimal, approach is to distribute the sequence elements between threads you have. Ie, if you have 4 threads, thread one will work with n%4 == 0 elements, thread2 with n%4 == 1 elements and so on
public static void main(String ... args) throws InterruptedException {
int threadCount = 4;
int N = 100_000;
PiThread[] threads = new PiThread[threadCount];
for (int i = 0; i < threadCount; i++) {
threads[i] = new PiThread(threadCount, i, N);
threads[i].start();
}
for (int i = 0; i < threadCount; i++) {
threads[i].join();
}
double pi = 0;
for (int i = 0; i < threadCount; i++) {
pi += threads[i].getSum();
}
System.out.print("PI/4 = " + pi);
}
static class PiThread extends Thread {
private final int threadCount;
private final int threadRemainder;
private final int N;
private double sum = 0;
public PiThread(int threadCount, int threadRemainder, int n) {
this.threadCount = threadCount;
this.threadRemainder = threadRemainder;
N = n;
}
#Override
public void run() {
for (int i = 0; i <= N; i++) {
if (i % threadCount == threadRemainder) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
}
public double getSum() {
return sum;
}
}
PiThread is more efficient, but arguably harder to read, if the loop is shorter:
public void run() {
for (int i = threadRemainder; i <= N; i += threadCount) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
In case you don't want to limit yourself with number of elements in sequence and just by time, you may follow an approach below. But note, that it is still limited with Long.MAX_VALUE and you'll have to use BigIntegers, BigDecimals or any other reasonable approach to improve it
public static volatile boolean running = true;
public static void main(String ... args) throws InterruptedException {
int threadCount = 4;
long timeoutMs = 5_000;
final AtomicLong counter = new AtomicLong(0);
PiThread[] threads = new PiThread[threadCount];
for (int i = 0; i < threadCount; i++) {
threads[i] = new PiThread(counter);
threads[i].start();
}
Thread.sleep(timeoutMs);
running = false;
for (int i = 0; i < threadCount; i++) {
threads[i].join();
}
double sum = 0;
for (int i = 0; i < threadCount; i++) {
sum += threads[i].getSum();
}
System.out.print("counter = " + counter.get());
System.out.print("PI = " + 4*sum);
}
static class PiThread extends Thread {
private AtomicLong counter;
private double sum = 0;
public PiThread(AtomicLong counter) {
this.counter = counter;
}
#Override
public void run() {
long i;
while (running && isValidCounter(i = counter.getAndAdd(1))) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
private boolean isValidCounter(long value) {
return value >= 0 && value < Long.MAX_VALUE;
}
public double getSum() {
return sum;
}
}
Can someone help me with this, please? I'm trying to do a matrix multiplication, using threads. This is what I have so far:
//updated
public class Multiplication {
public static final int NUM_OF_THREADS = 8;
public static final int MATRIX_SIZE = 1000;
public static void main(String args[]) throws InterruptedException {
long startTime = System.currentTimeMillis();
int MatrixA[][] = matrixGenerator();
int MatrixB[][] = matrixGenerator();
int m1rows = MatrixA.length;
int m1cols = MatrixA[0].length;
int m2cols = MatrixB[0].length;
int MatrixC[][] = new int[m1rows][m2cols];
ExecutorService pool = Executors.newFixedThreadPool(NUM_OF_THREADS);
for (int row1 = 0; row1 < m1rows; row1++) {
for (int col1 = 0; col1 < m1cols; col1++) {
pool.submit(new MultiplicationThreading(row1, col1, MatrixA, MatrixB, MatrixC));
}
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS);
long endTime = System.currentTimeMillis();
System.out.println("Calculated in "
+ (endTime - startTime) + " milliseconds");
}
public static int[][] matrixGenerator() {
int matrix[][] = new int[MATRIX_SIZE][MATRIX_SIZE];
Random r = new Random();
for (int i = 0; i < matrix.length; i++) {
for (int j = 0; j < matrix[i].length; j++) {
matrix[i][j] = r.nextInt(10000);
}
}
return matrix;
}
}
//I have updated the code
I get better timings now. When using 2 threads I get 1.5k milliseconds and when I use 8 threads 1.3k milliseconds
You initialize the thrd array with NUM_THREADS == 9 elements. If m1rows*m1cols exceeds that value, you will get this problem, since you attempt to create more than 9 threads and assign them to elements of the array. (You are attempting to create 50 threads).
Two solutions:
Initialize thrd = new Thread[m1rows*m1cols]
Use a List<Thread>.
Note that you won't execute the threads in parallel, because you are calling Thread.join() immediately after calling Thread.start(). This just blocks the current thread until thrd[threadcount] finishes.
Move the Thread.join() calls into a separate loop, so the threads are all started before you call join on any of them.
for (row = 0; row < m1rows; row++) {
for (col = 0; col < m1cols; col++) {
// creating thread for multiplications
thrd[threadcount] = new Thread(new MultiplicationThreading(row, col, MatrixA, MatrixB, MatrixC));
thrd[threadcount].start(); //thread start
threadcount++;
}
}
for (Thread thread : thrd) {
thread.join();
}
I know this may be a stupid question, maybe the most stupid question today, but I have to ask it: Have I invented this sorting algorithm?
Yesterday, I had a little inspiration about an exchange-based sorting algorithm. Today, I implemented it, and it worked.
It probably already exists, since there are many not-so-popular sorting algorithms out there that has little or none information about, and almost no implementation of them exist.
Description: Basically, this algorithm takes an item, them a pair, then an item again... until the end of the list. For each item/pair, compare EVERY two items at the same radius distance from pair space or item, until a border of the array is reached, and then exchange those items if needed. Repeat this for each pair/item of the list.
An English-based pseudo-code:
FOR i index to last index of Array (starting from 0)
L index is i - 1
R index is i + 1
//Odd case, where i is the center
WHILE (L is in array range and R is in array range)
IF item Array[L] is greater than Array[R]
EXCHANGE item Array[L] with Array[R]
END-IF
ADD 1 to R
REST 1 to L
END-WHILE
//Even case, where i is not the center
L index is now i
R index in now i + 1
WHILE (L is in array range and R is in array range)
IF item Array[L] is greater than Array[R]
EXCHANGE Array[L] with Array[R]
END-IF
ADD 1 to R
REST 1 to L
END-WHILE
END FOR
This is the implementation in Java:
//package sorting;
public class OrbitSort {
public static void main(String[] args) {
int[] numbers ={ 15, 8, 6, 3, 11, 1, 2, 0, 14, 13, 7, 9, 4, 10, 5, 12 };
System.out.println("Original list:");
display(numbers);
sort(numbers);
System.out.println("\nSorted list:");
display(numbers);
}
//Sorting algorithm
public static void sort(int[] array) {
for(int i = 0; i < array.length; i++){
int L = i - 1;
int R = i + 1;
//Odd case (with a central item)
while(L >= 0 && R < array.length){
if(array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
//Even case (with no central item)
L = i;
R = i + 1;
while(L >= 0 && R < array.length) {
if(array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
}
}
//Swap two items in array.
public static void swap(int[] array, int x, int y) {
int temp = array[x];
array[x] = array[y];
array[y] = temp;
}
//Display items
public static void display(int[] numbers){
for(int i: numbers)
System.out.print(" " + i);
System.out.println();
}
}
I know can be shorter, but it's just an early implementation.
It probably runs in O(n^2), but I'm not sure.
So, what do you think? Does it already exists?
To me, it looks like a modified bubble sort algo, which may perform better for certain arrangements of input elements.
Altough not necessarily fair, I did a benchmark with warmup cycles using your input array, for comparison of:
java.util.Arrays.sort(), which is a merge quick sort implementation
BubbleSort.sort(), a java implementation of the bubble sort algo
OrbitSort.sort(), your algo
Results:
input size: 8192
warmup iterations: 32
Arrays.sort()
iterations : 10000
total time : 4940.0ms
avg time : 0.494ms
BubbleSort.sort()
iterations : 100
total time : 8360.0ms
avg time : 83.6ms
OrbitSort.sort()
iterations : 100
total time : 8820.0ms
avg time : 88.2ms
Of course, the performance depends on input size and arrangement
Straightforward code:
package com.sam.tests;
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.Callable;
public class SortBenchmark {
public static class OrbitSort {
// Sorting algorithm
public static void sort(int[] array) {
for (int i = 0; i < array.length; i++) {
int L = i - 1;
int R = i + 1;
// Odd case (with a central item)
while (L >= 0 && R < array.length) {
if (array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
// Even case (with no central item)
L = i;
R = i + 1;
while (L >= 0 && R < array.length) {
if (array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
}
}
// Swap two items in array.
public static void swap(int[] array, int x, int y) {
int temp = array[x];
array[x] = array[y];
array[y] = temp;
}
}
public static class BubbleSort {
public static void sort(int[] numbers) {
boolean swapped = true;
for (int i = numbers.length - 1; i > 0 && swapped; i--) {
swapped = false;
for (int j = 0; j < i; j++) {
if (numbers[j] > numbers[j + 1]) {
int temp = numbers[j];
numbers[j] = numbers[j + 1];
numbers[j + 1] = temp;
swapped = true;
}
}
}
}
}
public static class TestDataFactory {
public static enum ElementOrder {
Ascending, Descending, Random
}
public static int[] createIntArray(final int size, final ElementOrder elementOrder) {
int[] array = new int[size];
switch (elementOrder) {
case Ascending:
for (int i = 0; i < size; ++i)
array[i] = i;
break;
case Descending:
for (int i = 0; i < size; ++i)
array[i] = size - i - 1;
break;
case Random:
default:
Random rg = new Random(System.nanoTime());
for (int i = 0; i < size; ++i)
array[i] = rg.nextInt(size);
break;
}
return array;
}
}
public static class Benchmark {
// misc constants
public static final int NANOS_PER_MSEC = 1000000;
// config constants
public static final int BIGDECIMAL_PRECISION = 6;
// constant defaults
public static final long AUTOTUNING_MIN_ITERATIONS_DEFAULT = 1;
public static final long AUTOTUNING_MIN_DURATION_DEFAULT = 125;
public static final long BENCHMARK_MIN_ITERATIONS_DEFAULT = 1;
public static final long BENCHMARK_MAX_ITERATIONS_DEFAULT = Integer.MAX_VALUE;
public static final long BENCHMARK_TARGET_DURATION_DEFAULT = 125;
// private static final ThreadMXBean threadBean =
// ManagementFactory.getThreadMXBean();
public static final long getNanoTime() {
// return threadBean.getCurrentThreadCpuTime();// not good, runs at
// some time slice resolution
return System.nanoTime();
}
public static class Result {
public String name;
public long iterations;
public long totalTime; // nanoseconds
public Result(String name, long iterations, long startTime, long endTime) {
this.name = name;
this.iterations = iterations;
this.totalTime = endTime - startTime;
}
#Override
public String toString() {
final double totalTimeMSecs = ((double) totalTime) / NANOS_PER_MSEC;
final BigDecimal avgTimeMsecs = new BigDecimal(this.totalTime).divide(new BigDecimal(this.iterations).multiply(new BigDecimal(NANOS_PER_MSEC)),
BIGDECIMAL_PRECISION, RoundingMode.HALF_UP);
final String newLine = System.getProperty("line.separator");
StringBuilder sb = new StringBuilder();
sb.append(name).append(newLine);
sb.append(" ").append("iterations : ").append(iterations).append(newLine);
sb.append(" ").append("total time : ").append(totalTimeMSecs).append(" ms").append(newLine);
sb.append(" ").append("avg time : ").append(avgTimeMsecs).append(" ms").append(newLine);
return sb.toString();
}
}
public static <T> Result executionTime(final String name, final long iterations, final long warmupIterations, final Callable<T> test) throws Exception {
// vars
#SuppressWarnings("unused")
T ret;
long startTime;
long endTime;
// warmup
for (long i = 0; i < warmupIterations; ++i)
ret = test.call();
// actual benchmark iterations
{
startTime = getNanoTime();
for (long i = 0; i < iterations; ++i)
ret = test.call();
endTime = getNanoTime();
}
// return result
return new Result(name, iterations, startTime, endTime);
}
/**
* Auto tuned execution time measurement for test callbacks with steady
* execution time
*
* #param name
* #param test
* #return
* #throws Exception
*/
public static <T> Result executionTimeAutotuned(final String name, final Callable<T> test) throws Exception {
final long autoTuningMinIterations = AUTOTUNING_MIN_ITERATIONS_DEFAULT;
final long autoTuningMinDuration = AUTOTUNING_MIN_DURATION_DEFAULT;
final long benchmarkTargetDuration = BENCHMARK_TARGET_DURATION_DEFAULT;
final long benchmarkMinIterations = BENCHMARK_MIN_ITERATIONS_DEFAULT;
final long benchmarkMaxIterations = BENCHMARK_MAX_ITERATIONS_DEFAULT;
// vars
#SuppressWarnings("unused")
T ret;
final int prevThreadPriority;
long warmupIterations = 0;
long autoTuningDuration = 0;
long iterations = benchmarkMinIterations;
long startTime;
long endTime;
// store current thread priority and set it to max
prevThreadPriority = Thread.currentThread().getPriority();
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
// warmup and iteration count tuning
{
final long autoTuningMinTimeNanos = autoTuningMinDuration * NANOS_PER_MSEC;
long autoTuningConsecutiveLoops = 1;
double avgExecutionTime = 0;
do {
{
startTime = getNanoTime();
for (long i = 0; i < autoTuningConsecutiveLoops; ++i, ++warmupIterations) {
ret = test.call();
}
endTime = getNanoTime();
autoTuningDuration += (endTime - startTime);
}
avgExecutionTime = ((double) autoTuningDuration) / ((double) (warmupIterations));
if ((autoTuningDuration >= autoTuningMinTimeNanos) && (warmupIterations >= autoTuningMinIterations)) {
break;
} else {
final double remainingAutotuningIterations = ((double) (autoTuningMinTimeNanos - autoTuningDuration)) / avgExecutionTime;
autoTuningConsecutiveLoops = Math.max(1, Math.min(Integer.MAX_VALUE, (long) Math.ceil(remainingAutotuningIterations)));
}
} while (warmupIterations < Integer.MAX_VALUE);
final double requiredIterations = ((double) benchmarkTargetDuration * NANOS_PER_MSEC) / avgExecutionTime;
iterations = Math.max(1, Math.min(benchmarkMaxIterations, (long) Math.ceil(requiredIterations)));
}
// actual benchmark iterations
{
startTime = getNanoTime();
for (long i = 0; i < iterations; ++i)
ret = test.call();
endTime = getNanoTime();
}
// restore previous thread priority
Thread.currentThread().setPriority(prevThreadPriority);
// return result
return new Result(name, iterations, startTime, endTime);
}
}
public static void executeBenchmark(int inputSize, ArrayList<Benchmark.Result> results) {
// final int[] inputArray = { 15, 8, 6, 3, 11, 1, 2, 0, 14, 13, 7, 9, 4,
// 10, 5, 12 };
final int[] inputArray = TestDataFactory.createIntArray(inputSize, TestDataFactory.ElementOrder.Random);
try {
// compare against Arrays.sort()
{
final int[] ref = inputArray.clone();
Arrays.sort(ref);
{
int[] temp = inputArray.clone();
BubbleSort.sort(temp);
if (!Arrays.equals(temp, ref))
throw new Exception("BubbleSort.sort() failed");
}
{
int[] temp = inputArray.clone();
OrbitSort.sort(temp);
if (!Arrays.equals(temp, ref))
throw new Exception("OrbitSort.sort() failed");
}
}
results.add(Benchmark.executionTimeAutotuned("Arrays.sort()", new Callable<Void>() {
#Override
public Void call() throws Exception {
int[] temp = Arrays.copyOf(inputArray, inputArray.length);
Arrays.sort(temp);
return null;
}
}));
results.add(Benchmark.executionTimeAutotuned("BubbleSort.sort()", new Callable<Void>() {
#Override
public Void call() throws Exception {
int[] temp = Arrays.copyOf(inputArray, inputArray.length);
BubbleSort.sort(temp);
return null;
}
}));
results.add(Benchmark.executionTimeAutotuned("OrbitSort.sort()", new Callable<Void>() {
#Override
public Void call() throws Exception {
int[] temp = Arrays.copyOf(inputArray, inputArray.length);
OrbitSort.sort(temp);
return null;
}
}));
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
ArrayList<Benchmark.Result> results = new ArrayList<Benchmark.Result>();
for (int i = 16; i <= 16384; i <<= 1) {
results.clear();
executeBenchmark(i, results);
System.out.println("input size : " + i);
System.out.println("");
for (Benchmark.Result result : results) {
System.out.print(result.toString());
}
System.out.println("----------------------------------------------------");
}
}
}
It is O(n^2) (assuming it works, I am not sure about that), as to already exists - maybe - it is not really original, as it can be considered a variation of a trivial sorting implementation, but I doubt if there is any published algorithm which is exactly the same as this one, specifically one with two consecutive inner loops.
I am not saying it is without merit, there can be a use case for which its behavior is uniquely efficient (maybe where reading is much faster than writing, and cache behavior benefits its access pattern).
To see why it is O(n^2), think about the first n/6 outer loop iterations, the inner loops run on O(n) length O(n) times.
I have implemented serial and parallel algorithm for solving linear systems using jacobi method. Both implementations converge and give correct solutions.
I am having trouble with understanding:
How can parallel implementation converge after so low number of iterations compared to serial (same method is used in both). Am I facing some concurrency issues that I am not aware of?
How can number of iterations vary from run to run in parallel implementation (6,7)?
Thanks!
Program output:
Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}
Serial: iterations=7194 , error=false, solution=[-1.1270591, 4.7042074, -1.8922218, 1.5626835]
Parallel: iterations=6 , error=false, solution=[-1.1274619, 4.7035804, -1.8927546, 1.5621948]
Code:
Main
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
Serial s = new Serial();
Parallel p = new Parallel(2);
s.solve();
p.solve();
System.out.println("Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}");
System.out.println(String.format("Serial: iterations=%d , error=%s, solution=%s", s.iter, s.errorFlag, Arrays.toString(s.data.solution)));
System.out.println(String.format("Parallel: iterations=%d , error=%s, solution=%s", p.iter, p.errorFlag, Arrays.toString(p.data.solution)));
}
}
Data
public class Data {
public float A[][] = {{2.886139567217389f, 0.9778259187352214f, 0.9432146432722157f, 0.9622157488990459f}
,{0.3023479007910952f,0.7503803506938734f,0.06163831478699766f,0.3856445043958068f}
,{0.4298384105199724f, 0.7787439716945019f, 1.838686110345417f, 0.6282668788698587f}
,{0.27798718418255075f, 0.09021764079496353f, 0.8765867330141233f, 1.246036349549629f}};
public float b[] = {1.0630309381779384f,3.674438173599066f,0.6796639099285651f,0.39831385324794155f};
public int size = A.length;
public float x[] = new float[size];
public float solution[] = new float[size];
}
Parallel
import java.util.Arrays;
public class Parallel {
private final int workers;
private float[] globalNorm;
public int iter;
public int maxIter = 1000000;
public double epsilon = 1.0e-3;
public boolean errorFlag = false;
public Data data = new Data();
public Parallel(int workers) {
this.workers = workers;
this.globalNorm = new float[workers];
Arrays.fill(globalNorm, 0);
}
public void solve() {
JacobiWorker[] threads = new JacobiWorker[workers];
int batchSize = data.size / workers;
float norm;
do {
for(int i=0;i<workers;i++) {
threads[i] = new JacobiWorker(i,batchSize);
threads[i].start();
}
for(int i=0;i<workers;i++)
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
// At this point all worker calculations are done!
norm = 0;
for (float d : globalNorm) if (d > norm) norm = d;
if (norm < epsilon)
errorFlag = false; // Converged
else
errorFlag = true; // No desired convergence
} while (norm >= epsilon && ++iter <= maxIter);
}
class JacobiWorker extends Thread {
private final int idx;
private final int batchSize;
JacobiWorker(int idx, int batchSize) {
this.idx = idx;
this.batchSize = batchSize;
}
#Override
public void run() {
int upper = idx == workers - 1 ? data.size : (idx + 1) * batchSize;
float localNorm = 0, diff = 0;
for (int j = idx * batchSize; j < upper; j++) { // For every
// equation in batch
float s = 0;
for (int i = 0; i < data.size; i++) { // For every variable in
// equation
if (i != j)
s += data.A[j][i] * data.x[i];
data.solution[j] = (data.b[j] - s) / data.A[j][j];
}
diff = Math.abs(data.solution[j] - data.x[j]);
if (diff > localNorm) localNorm = diff;
data.x[j] = data.solution[j];
}
globalNorm[idx] = localNorm;
}
}
}
Serial
public class Serial {
public int iter;
public int maxIter = 1000000;
public double epsilon = 1.0e-3;
public boolean errorFlag = false;
public Data data = new Data();
public void solve() {
float norm,diff=0;
do {
for(int i=0;i<data.size;i++) {
float s=0;
for (int j = 0; j < data.size; j++) {
if (i != j)
s += data.A[i][j] * data.x[j];
data.solution[i] = (data.b[i] - s) / data.A[i][i];
}
}
norm = 0;
for (int i=0;i<data.size;i++) {
diff = Math.abs(data.solution[i]-data.x[i]); // Calculate convergence
if (diff > norm) norm = diff;
data.x[i] = data.solution[i];
}
if (norm < epsilon)
errorFlag = false; // Converged
else
errorFlag = true; // No desired convergence
} while (norm >= epsilon && ++iter <= maxIter);
}
}
I think its a matter of implementation and not parallelization. Look at what happens with Parallel p = new Parallel(1);
Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}
Serial: iterations=7194 , error=false, solution=[-1.1270591, 4.7042074, -1.8922218, 1.5626835]
Parallel: iterations=6 , error=false, solution=[-1.1274619, 4.7035804, -1.8927546, 1.5621948]
As it turns out - your second implementation is not doing exactly the same thing as your first one.
I added this into your parallel version and it ran in the same number of iterations.
for (int i = idx * batchSize; i < upper; i++) {
diff = Math.abs(data.solution[i] - data.x[i]); // Calculate
// convergence
if (diff > localNorm)
localNorm = diff;
data.x[i] = data.solution[i];
}
}