I'm trying to compute Pi, but what I really want to achieve is efficiency when using more than one thread. The algorithm is simple: I randomly generate points in the unit square and after that count how many of them are in the circle inscribed within the square. (more here: http://math.fullerton.edu/mathews/n2003/montecarlopimod.html)
My idea is to split the square horizontally and to run a different thread for each part of it.
But instead of speed up, all I get is a delay. Any ideas why? Here is the code:
public class TaskManager {
public static void main(String[] args) {
int threadsCount = 3;
int size = 10000000;
boolean isQuiet = false;
PiCalculator pi = new PiCalculator(size);
Thread tr[] = new Thread[threadsCount];
long time = -System.currentTimeMillis();
int i;
double s = 1.0/threadsCount;
int p = size/threadsCount;
for(i = 0; i < threadsCount; i++) {
PiRunnable r = new PiRunnable(pi, s*i, s*(1.0+i), p, isQuiet);
tr[i] = new Thread(r);
}
for(i = 0; i < threadsCount; i++) {
tr[i].start();
}
for(i = 0; i < threadsCount; i++) {
try {
tr[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
double myPi = 4.0*pi.getPointsInCircle()/pi.getPointsInSquare();
System.out.println(myPi + " time = " + (System.currentTimeMillis()+time));
}
}
public class PiRunnable implements Runnable {
PiCalculator pi;
private double minX;
private double maxX;
private int pointsToSpread;
public PiRunnable(PiCalculator pi, double minX, double maxX, int pointsToSpread, boolean isQuiet) {
super();
this.pi = pi;
this.minX = minX;
this.maxX = maxX;
this.pointsToSpread = pointsToSpread;
}
#Override
public void run() {
int n = countPointsInAreaInCircle(minX, maxX, pointsToSpread);
pi.addToPointsInCircle(n);
}
public int countPointsInAreaInCircle (double minX, double maxX, int pointsCount) {
double x;
double y;
int inCircle = 0;
for (int i = 0; i < pointsCount; i++) {
x = Math.random() * (maxX - minX) + minX;
y = Math.random();
if (x*x + y*y <= 1) {
inCircle++;
}
}
return inCircle;
}
}
public class PiCalculator {
private int pointsInSquare;
private int pointsInCircle;
public PiCalculator(int pointsInSquare) {
super();
this.pointsInSquare = pointsInSquare;
}
public synchronized void addToPointsInCircle (int pointsCount) {
this.pointsInCircle += pointsCount;
}
public synchronized int getPointsInCircle () {
return this.pointsInCircle;
}
public synchronized void setPointsInSquare (int pointsInSquare) {
this.pointsInSquare = pointsInSquare;
}
public synchronized int getPointsInSquare () {
return this.pointsInSquare;
}
}
Some results:
-for 3 threads: "3.1424696 time = 2803"
-for 1 thread: "3.1416192 time = 2337"
Your threads could be fighting/waiting for Math.random() which is synchronized, you should create an instance of java.util.Random for each thread. Also in this case speedup with multiple threads will only happen if you have more than one core/cpu.
From the javadoc of Math.random():
This method is properly synchronized
to allow correct use by more than one
thread. However, if many threads need
to generate pseudorandom numbers at a
great rate, it may reduce contention
for each thread to have its own
pseudorandom-number generator.
Here is an alternate main method that uses the java.util.concurrency package instead of manually managing the threads and waiting for them to finish.
public static void main(final String[] args) throws InterruptedException
{
final int threadsCount = Runtime.getRuntime().availableProcessors();
final int size = 10000000;
boolean isQuiet = false;
final PiCalculator pi = new PiCalculator(size);
final ExecutorService es = Executors.newFixedThreadPool(threadsCount);
long time = -System.currentTimeMillis();
int i;
double s = 1.0 / threadsCount;
int p = size / threadsCount;
for (i = 0; i < threadsCount; i++)
{
es.submit(new PiRunnable(pi, s * i, s * (1.0 + i), p, isQuiet));
}
es.shutdown();
while (!es.isTerminated()) { /* do nothing waiting for threads to complete */ }
double myPi = 4.0 * pi.getPointsInCircle() / pi.getPointsInSquare();
System.out.println(myPi + " time = " + (System.currentTimeMillis() + time));
}
I also changed the Math.random() to use local instances of Random for each Runnable.
final private Random rnd;
...
x = this.rnd.nextDouble() * (maxX - minX) + minX;
y = this.rnd.nextDouble();
this is the new output I get ...
3.1419284 time = 235
I think you could probably drop the time some more using Futures and not having to synchronized so much on the PiCalculator.
Related
The code below intends to show a simple use of a recursive fork join (find max), I know Java JIT can achieve this faster in a simple single threaded loop, however its just for demonstration.
I initially implemented a find max using the ForkJoin framework which worked fine for large arrays of doubles (1024*1024).
I feel I should be able to achieve the same without using the ForkJoin framework, using only Executor.workStealingPool() and Callables / Futures.
Is this possible?
My attempt below:
class MaxTask implements Callable<Double> {
private double[] array;
private ExecutorService executorService;
public MaxTask(double[] array, ExecutorService es){
this.array = array;
this.executorService = es;
}
#Override
public Double call() throws Exception {
if (this.array.length!=2){
double[] a = new double[(this.array.length/2)];
double[] b = new double[(this.array.length/2)];
for (int i=0;i<(this.array.length/2);i++){
a[i] = array[i];
b[i] = array[i+(this.array.length/2)];
}
Future<Double> f1 = this.executorService.submit(new MaxTask(a,this.executorService));
Future<Double> f2 = this.executorService.submit(new MaxTask(b,this.executorService));
return Math.max(f1.get(), f2.get());
} else {
return Math.max(this.array[0], this.array[1]);
}
}
}
ExecutorService es = Executors.newWorkStealingPool();
double[] x = new double[1024*1024];
for (int i=0;i<x.length;i++){
x[i] = Math.random();
}
MaxTask mt = new MaxTask(x,es);
es.submit(mt).get();
It seems as though its possible to write a "fork/join" type computation without the ForkJoin framework (see use of Callable below).
The ForkJoin framework itself seems to make no performance difference but maybe a bit tidier to code, I prefer just using Callables.
I also fixed the original attempt.
Looks like the threshold was too small on the original attempt which is why it was slow, I think it needs to be at least as large as the number of cores.
Im not sure if use of ForkJoinPool will be faster for this use, I would need to gather more stats, I'm thinking not as it does not have any operations which block for a long time.
public class Main {
static class FindMaxTask extends RecursiveTask<Double> {
private int threshold;
private double[] data;
private int startIndex;
private int endIndex;
public FindMaxTask(double[] data, int startIndex, int endIndex, int threshold) {
super();
this.data = data;
this.startIndex = startIndex;
this.endIndex = endIndex;
this.threshold = threshold;
}
#Override
protected Double compute() {
int diff = (endIndex-startIndex+1);
if (diff!=(this.data.length/threshold)){
int aStartIndex = startIndex;
int aEndIndex = startIndex + (diff/2) - 1;
int bStartIndex = startIndex + (diff/2);
int bEndIndex = endIndex;
FindMaxTask f1 = new FindMaxTask(this.data,aStartIndex,aEndIndex,threshold);
f1.fork();
FindMaxTask f2 = new FindMaxTask(this.data,bStartIndex,bEndIndex,threshold);
return Math.max(f1.join(),f2.compute());
} else {
double max = Double.MIN_VALUE;
for (int i = startIndex; i <= endIndex; i++) {
double n = data[i];
if (n > max) {
max = n;
}
}
return max;
}
}
}
static class FindMax implements Callable<Double> {
private double[] data;
private int startIndex;
private int endIndex;
private int threshold;
private ExecutorService executorService;
public FindMax(double[] data, int startIndex, int endIndex, int threshold, ExecutorService executorService) {
super();
this.data = data;
this.startIndex = startIndex;
this.endIndex = endIndex;
this.executorService = executorService;
this.threshold = threshold;
}
#Override
public Double call() throws Exception {
int diff = (endIndex-startIndex+1);
if (diff!=(this.data.length/this.threshold)){
int aStartIndex = startIndex;
int aEndIndex = startIndex + (diff/2) - 1;
int bStartIndex = startIndex + (diff/2);
int bEndIndex = endIndex;
Future<Double> f1 = this.executorService.submit(new FindMax(this.data,aStartIndex,aEndIndex,this.threshold,this.executorService));
Future<Double> f2 = this.executorService.submit(new FindMax(this.data,bStartIndex,bEndIndex,this.threshold,this.executorService));
return Math.max(f1.get(), f2.get());
} else {
double max = Double.MIN_VALUE;
for (int i = startIndex; i <= endIndex; i++) {
double n = data[i];
if (n > max) {
max = n;
}
}
return max;
}
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
double[] data = new double[1024*1024*64];
for (int i=0;i<data.length;i++){
data[i] = Math.random();
}
int p = Runtime.getRuntime().availableProcessors();
int threshold = p;
int threads = p;
Instant start = null;
Instant end = null;
ExecutorService es = null;
es = Executors.newFixedThreadPool(threads);
System.out.println("1. started..");
start = Instant.now();
System.out.println("max = "+es.submit(new FindMax(data,0,data.length-1,threshold,es)).get());
end = Instant.now();
System.out.println("Callable (recrusive), with fixed pool, Find Max took ms = "+ Duration.between(start, end).toMillis());
es = new ForkJoinPool();
System.out.println("2. started..");
start = Instant.now();
System.out.println("max = "+es.submit(new FindMax(data,0,data.length-1,threshold,es)).get());
end = Instant.now();
System.out.println("Callable (recursive), with fork join pool, Find Max took ms = "+ Duration.between(start, end).toMillis());
ForkJoinPool fj = new ForkJoinPool(threads);
System.out.println("3. started..");
start = Instant.now();
System.out.println("max = "+fj.invoke(new FindMaxTask(data,0,data.length-1,threshold)));
end = Instant.now();
System.out.println("RecursiveTask (fork/join framework),with fork join pool, Find Max took ms = "+ Duration.between(start, end).toMillis());
}
}
I need to create a program that can calculate approximation to the constant PI, using Java multi-thread.
I'm intent to use Gregory-Leibniz Series to calculate the result for PI / 4, and then multiply by 4 to get the PI approximation.
But I have some concern about the program:
How can I seperate the calculation process so that I can implement a multi-thread processing for the program? Because the formula is for the total sum, I don't know how to split them into parts and then in the end I will collect them all.
I'm thinking about the fact that the program will execute the formula to infinite so user will need to provide some means of configuring the execution in order to determine when it should stop and return a result. Is it possible and how can I do that?
This is so far the most I can do by now.
public class PICalculate {
public static void main(String[] args) {
System.out.println(calculatePI(5000000) * 4);
}
static double calculatePI(int n) {
double result = 0.0;
if (n < 0) {
return 0.0;
}
for (int i = 0; i <= n; i++) {
result += Math.pow(-1, i) / ((2 * i) + 1);
}
return result;
}
}
The most straightforward, but not the most optimal, approach is to distribute the sequence elements between threads you have. Ie, if you have 4 threads, thread one will work with n%4 == 0 elements, thread2 with n%4 == 1 elements and so on
public static void main(String ... args) throws InterruptedException {
int threadCount = 4;
int N = 100_000;
PiThread[] threads = new PiThread[threadCount];
for (int i = 0; i < threadCount; i++) {
threads[i] = new PiThread(threadCount, i, N);
threads[i].start();
}
for (int i = 0; i < threadCount; i++) {
threads[i].join();
}
double pi = 0;
for (int i = 0; i < threadCount; i++) {
pi += threads[i].getSum();
}
System.out.print("PI/4 = " + pi);
}
static class PiThread extends Thread {
private final int threadCount;
private final int threadRemainder;
private final int N;
private double sum = 0;
public PiThread(int threadCount, int threadRemainder, int n) {
this.threadCount = threadCount;
this.threadRemainder = threadRemainder;
N = n;
}
#Override
public void run() {
for (int i = 0; i <= N; i++) {
if (i % threadCount == threadRemainder) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
}
public double getSum() {
return sum;
}
}
PiThread is more efficient, but arguably harder to read, if the loop is shorter:
public void run() {
for (int i = threadRemainder; i <= N; i += threadCount) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
In case you don't want to limit yourself with number of elements in sequence and just by time, you may follow an approach below. But note, that it is still limited with Long.MAX_VALUE and you'll have to use BigIntegers, BigDecimals or any other reasonable approach to improve it
public static volatile boolean running = true;
public static void main(String ... args) throws InterruptedException {
int threadCount = 4;
long timeoutMs = 5_000;
final AtomicLong counter = new AtomicLong(0);
PiThread[] threads = new PiThread[threadCount];
for (int i = 0; i < threadCount; i++) {
threads[i] = new PiThread(counter);
threads[i].start();
}
Thread.sleep(timeoutMs);
running = false;
for (int i = 0; i < threadCount; i++) {
threads[i].join();
}
double sum = 0;
for (int i = 0; i < threadCount; i++) {
sum += threads[i].getSum();
}
System.out.print("counter = " + counter.get());
System.out.print("PI = " + 4*sum);
}
static class PiThread extends Thread {
private AtomicLong counter;
private double sum = 0;
public PiThread(AtomicLong counter) {
this.counter = counter;
}
#Override
public void run() {
long i;
while (running && isValidCounter(i = counter.getAndAdd(1))) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
private boolean isValidCounter(long value) {
return value >= 0 && value < Long.MAX_VALUE;
}
public double getSum() {
return sum;
}
}
I am currently working with a gaming code library and one of the parts uses a genetic algorithm for some of the AI. This concept is rather new to me and although I've looked up and read about how it works I am finding it hard how to link precisely what is being evolved, mutated etc. to the actual code.
The actions are basically each of the possible options that the AI has to move. So it is trying to evolve a state and find out which is the best action to take. Can anyone help me understand it a bit clearer than that?
private static double GAMMA = 0.90;
private static long BREAK_MS = 35;
private static int SIMULATION_DEPTH = 7;
private static int POPULATION_SIZE = 5;
private static double RECPROB = 0.1;
private double MUT = (1.0 / SIMULATION_DEPTH);
private final int N_ACTIONS;
private ElapsedCpuTimer timer;
private int genome[][][];
private final HashMap<Integer, Types.ACTIONS> action_mapping;
private final HashMap<Types.ACTIONS, Integer> r_action_mapping;
protected Random randomGenerator;
private int numSimulations;
/**
* Public constructor with state observation and time due.
*
* #param stateObs state observation of the current game.
* #param elapsedTimer Timer for the controller creation.
*/
public Agent(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {
randomGenerator = new Random();
action_mapping = new HashMap<Integer, Types.ACTIONS>();
r_action_mapping = new HashMap<Types.ACTIONS, Integer>();
int i = 0;
for (Types.ACTIONS action : stateObs.getAvailableActions()) {
action_mapping.put(i, action);
r_action_mapping.put(action, i);
i++;
}
N_ACTIONS = stateObs.getAvailableActions().size();
initGenome(stateObs);
}
double microbial_tournament(int[][] actionGenome, StateObservation stateObs, StateHeuristic heuristic) throws TimeoutException {
int a, b, c, W, L;
int i;
a = (int) ((POPULATION_SIZE - 1) * randomGenerator.nextDouble());
do {
b = (int) ((POPULATION_SIZE - 1) * randomGenerator.nextDouble());
} while (a == b);
double score_a = simulate(stateObs, heuristic, actionGenome[a]);
double score_b = simulate(stateObs, heuristic, actionGenome[b]);
if (score_a > score_b) {
W = a;
L = b;
} else {
W = b;
L = a;
}
int LEN = actionGenome[0].length;
for (i = 0; i < LEN; i++) {
if (randomGenerator.nextDouble() < RECPROB) {
actionGenome[L][i] = actionGenome[W][i];
}
}
for (i = 0; i < LEN; i++) {
if (randomGenerator.nextDouble() < MUT) actionGenome[L][i] = randomGenerator.nextInt(N_ACTIONS);
}
return Math.max(score_a, score_b);
}
private void initGenome(StateObservation stateObs) {
genome = new int[N_ACTIONS][POPULATION_SIZE][SIMULATION_DEPTH];
// Randomize initial genome
for (int i = 0; i < genome.length; i++) {
for (int j = 0; j < genome[i].length; j++) {
for (int k = 0; k < genome[i][j].length; k++) {
genome[i][j][k] = randomGenerator.nextInt(N_ACTIONS);
}
}
}
}
private double simulate(StateObservation stateObs, StateHeuristic heuristic, int[] policy) throws TimeoutException {
//System.out.println("depth" + depth);
long remaining = timer.remainingTimeMillis();
if (remaining < BREAK_MS) {
//System.out.println(remaining);
throw new TimeoutException("Timeout");
}
int depth = 0;
stateObs = stateObs.copy();
for (; depth < policy.length; depth++) {
Types.ACTIONS action = action_mapping.get(policy[depth]);
stateObs.advance(action);
if (stateObs.isGameOver()) {
break;
}
}
numSimulations++;
double score = Math.pow(GAMMA, depth) * heuristic.evaluateState(stateObs);
return score;
}
private Types.ACTIONS microbial(StateObservation stateObs, int maxdepth, StateHeuristic heuristic, int iterations) {
double[] maxScores = new double[stateObs.getAvailableActions().size()];
for (int i = 0; i < maxScores.length; i++) {
maxScores[i] = Double.NEGATIVE_INFINITY;
}
outerloop:
for (int i = 0; i < iterations; i++) {
for (Types.ACTIONS action : stateObs.getAvailableActions()) {
StateObservation stCopy = stateObs.copy();
stCopy.advance(action);
double score = 0;
try {
score = microbial_tournament(genome[r_action_mapping.get(action)], stCopy, heuristic) + randomGenerator.nextDouble()*0.00001;
} catch (TimeoutException e) {
break outerloop;
}
int int_act = this.r_action_mapping.get(action);
if (score > maxScores[int_act]) {
maxScores[int_act] = score;
}
}
}
Types.ACTIONS maxAction = this.action_mapping.get(Utils.argmax(maxScores));
return maxAction;
}
/**
* Picks an action. This function is called every game step to request an
* action from the player.
*
* #param stateObs Observation of the current state.
* #param elapsedTimer Timer when the action returned is due.
* #return An action for the current state
*/
public Types.ACTIONS act(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {
this.timer = elapsedTimer;
numSimulations = 0;
Types.ACTIONS lastGoodAction = microbial(stateObs, SIMULATION_DEPTH, new WinScoreHeuristic(stateObs), 100);
return lastGoodAction;
}
#Override
public void draw(Graphics2D g)
{
//g.drawString("Num Simulations: " + numSimulations, 10, 20);
}
}
genome is the encoding of the solution (genotype) which is translated by simulate into the actual problem space (phenotype). In addition a fitness score is returned as part of the evaluation. The other methods initialise or perturbate the genotype to obtain a different solution.
Please ask more specific questions, if you need more rather than dumping a whole lot of code asking 'please explain'!
I tried many ways to get the below scenario works, and the result is infinite printing fork fork fork.. I tried to debug, but it always wait in task.join(); for long time with no result. I understand the concept of fork/join well, I can use it when I have task can be divided into sub-parts such as: Fibonacci, and Maximum of arrays. The scenario here is different in sense that I have to an iterate in compute which isn't recursively. Can anyone help ?
CompositePoolTest
import java.util.Random;
import java.util.concurrent.ForkJoinPool;
public class CompositePoolTest {
Random random = new Random(123);
int done = 0;
int rest= 0;
int tt = 4;
ForkJoinPool pool = new ForkJoinPool(tt);
int M= 1000;
int N = 1000;
public static void main(String[] args) {
new CompositePoolTest().compute();
}
private void compute() {
double[][] original_matrix = new double[M][N];
original_matrix = radom_intialization();
double[][] temp_matrix = new double[M][N];
done= 0;
rest= (M * N - done) / (tt- 0);
DynamicCompositeFinder dynamicFinder = new DynamicCompositeFinder(done,rest,original_matrix,temp_matrix);
new ForkJoinPool().invoke(dynamicFinder);
}
private double[][] radom_intialization() {
double [][] grid_matrix = new double[M][N];
for (int i = 0; i < M; i++)
for (int j = 0; j < N; j++) {
grid_matrix[i][j] = random.nextDouble()+0.10;
}
return grid_matrix;
}
}
DynamicCompositeFinder
package test;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
public class DynamicCompositeFinder extends RecursiveAction {
int done = 0;
int rest = 0;
int pp = 4;
ForkJoinPool pool = new ForkJoinPool(pp);
// Matrix dimensions
int M = 1000;
int N = 1000;
int x = 0;
int y = 0;
int niteration = 150;
double[][] original_matrix = new double[M][N];
double[][] temp_matrix = new double[M][N];
public DynamicCompositeFinder(int done, int rest, double[][] original_matrix, double[][] temp_matrix) {
this.done = done;
this.rest = rest;
this.original_matrix = original_matrix;
this.temp_matrix = temp_matrix;
int limit = done + rest;
for (int i = done; i < limit; i++) {
x = i / M;
y = i % M;
temp_matrix[x][y] = fun_calculation(x, y, original_matrix);
}
}
private double fun_calculation(int x2, int y2, double[][] original_matrix2) {
double temp = 2 * (original_matrix2[x][y] );
return temp;
}
#Override
protected void compute() {
for (int i = 0; i < niteration; i++) {
done = 0;
List<RecursiveAction> forks = new LinkedList<RecursiveAction>();
for (int p = 0; p < pp; p++) // n is predefined n = 9
{
rest = (M * N - done) / (pp - p);
DynamicCompositeFinder finder = new DynamicCompositeFinder(done, rest, original_matrix, temp_matrix);
p++;
forks.add((RecursiveAction) finder.fork());
System.out.println("Fork-" + Thread.currentThread().getName()
+ " State: " + Thread.currentThread().getState());
}
for (RecursiveAction task : forks) {
task.join();
System.out.println("Join-" + Thread.currentThread().getName()
+ " State: " + Thread.currentThread().getState());
}
original_matrix = copy_matrix(temp_matrix);
}
}
public double[][] copy_matrix(double [][] matrix)
{
double [][] out= new double [matrix.length][matrix[0].length];
for(int i=0;i<matrix.length;i++)
{
out[i]= matrix[i].clone();
}
return out;
}}
Output
Fork-ForkJoinPool-1-worker-1 State: RUNNABLE
Fork-ForkJoinPool-1-worker-1 State: RUNNABLE
Fork-ForkJoinPool-1-worker-2 State: RUNNABLE
Fork-ForkJoinPool-1-worker-2 State: RUNNABLE
Fork-ForkJoinPool-1-worker-3 State: RUNNABLE
Fork-ForkJoinPool-1-worker-4 State: RUNNABLE
Fork-ForkJoinPool-1-worker-4 State: RUNNABLE
Fork-ForkJoinPool-1-worker-3 State: RUNNABLE
Fork-ForkJoinPool-1-worker-5 State: RUNNABLE
Fork-ForkJoinPool-1-worker-5 State: RUNNABLE
Fork-ForkJoinPool-1-worker-6 State: RUNNABLE
Fork-ForkJoinPool-1-worker-6 State: RUNNABLE
Fork-ForkJoinPool-1-worker-7 State: RUNNABLE
Fork-ForkJoinPool-1-worker-7 State: RUNNABLE
Fork-ForkJoinPool-1-worker-8 State: RUNNABLE
Fork-ForkJoinPool-1-worker-8 State: RUNNABLE
Fork-ForkJoinPool-1-worker-9 State: RUNNABLE
Fork-ForkJoinPool-1-worker-9 State: RUNNABLE
Fork-ForkJoinPool-1-worker-10 State: RUNNABLE
Fork-ForkJoinPool-1-worker-10 State: RUNNABLE
Fork-ForkJoinPool-1-worker-11 State: RUNNABLE
Fork-ForkJoinPool-1-worker-11 State: RUNNABLE
Fork-ForkJoinPool-1-worker-12 State: RUNNABLE
Fork-ForkJoinPool-1-worker-12 State: RUNNABLE
Fork-ForkJoinPool-1-worker-13 State: RUNNABLE
.....
......
The major problem is that you fork() forever. There is no stopper code, such as:
if (computed < limiter) return;
Therefore, you add tasks to the deque, the thread picks up each task and forks more tasks, forever. I added a stopper to your code and ran it in Java7. The join() gets called but the outside iteration keeps going forever. So you have some logic problem there.
The second problem is that you misunderstand the F/J framework. This framework is not a general purpose parallel engine. It is academic code specifically designed to recursively walk down the leaves of a balanced tree (D.A.G.) Since you do not have a balanced tree you cannot process according to the examples given in the JavaDoc:
split left, right;
left.fork();
right.compute();
left.join();
And you are not doing recursive decomposition. Your code would be more appropriate for Java8's CountedCompler()
I have implemented serial and parallel algorithm for solving linear systems using jacobi method. Both implementations converge and give correct solutions.
I am having trouble with understanding:
How can parallel implementation converge after so low number of iterations compared to serial (same method is used in both). Am I facing some concurrency issues that I am not aware of?
How can number of iterations vary from run to run in parallel implementation (6,7)?
Thanks!
Program output:
Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}
Serial: iterations=7194 , error=false, solution=[-1.1270591, 4.7042074, -1.8922218, 1.5626835]
Parallel: iterations=6 , error=false, solution=[-1.1274619, 4.7035804, -1.8927546, 1.5621948]
Code:
Main
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
Serial s = new Serial();
Parallel p = new Parallel(2);
s.solve();
p.solve();
System.out.println("Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}");
System.out.println(String.format("Serial: iterations=%d , error=%s, solution=%s", s.iter, s.errorFlag, Arrays.toString(s.data.solution)));
System.out.println(String.format("Parallel: iterations=%d , error=%s, solution=%s", p.iter, p.errorFlag, Arrays.toString(p.data.solution)));
}
}
Data
public class Data {
public float A[][] = {{2.886139567217389f, 0.9778259187352214f, 0.9432146432722157f, 0.9622157488990459f}
,{0.3023479007910952f,0.7503803506938734f,0.06163831478699766f,0.3856445043958068f}
,{0.4298384105199724f, 0.7787439716945019f, 1.838686110345417f, 0.6282668788698587f}
,{0.27798718418255075f, 0.09021764079496353f, 0.8765867330141233f, 1.246036349549629f}};
public float b[] = {1.0630309381779384f,3.674438173599066f,0.6796639099285651f,0.39831385324794155f};
public int size = A.length;
public float x[] = new float[size];
public float solution[] = new float[size];
}
Parallel
import java.util.Arrays;
public class Parallel {
private final int workers;
private float[] globalNorm;
public int iter;
public int maxIter = 1000000;
public double epsilon = 1.0e-3;
public boolean errorFlag = false;
public Data data = new Data();
public Parallel(int workers) {
this.workers = workers;
this.globalNorm = new float[workers];
Arrays.fill(globalNorm, 0);
}
public void solve() {
JacobiWorker[] threads = new JacobiWorker[workers];
int batchSize = data.size / workers;
float norm;
do {
for(int i=0;i<workers;i++) {
threads[i] = new JacobiWorker(i,batchSize);
threads[i].start();
}
for(int i=0;i<workers;i++)
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
// At this point all worker calculations are done!
norm = 0;
for (float d : globalNorm) if (d > norm) norm = d;
if (norm < epsilon)
errorFlag = false; // Converged
else
errorFlag = true; // No desired convergence
} while (norm >= epsilon && ++iter <= maxIter);
}
class JacobiWorker extends Thread {
private final int idx;
private final int batchSize;
JacobiWorker(int idx, int batchSize) {
this.idx = idx;
this.batchSize = batchSize;
}
#Override
public void run() {
int upper = idx == workers - 1 ? data.size : (idx + 1) * batchSize;
float localNorm = 0, diff = 0;
for (int j = idx * batchSize; j < upper; j++) { // For every
// equation in batch
float s = 0;
for (int i = 0; i < data.size; i++) { // For every variable in
// equation
if (i != j)
s += data.A[j][i] * data.x[i];
data.solution[j] = (data.b[j] - s) / data.A[j][j];
}
diff = Math.abs(data.solution[j] - data.x[j]);
if (diff > localNorm) localNorm = diff;
data.x[j] = data.solution[j];
}
globalNorm[idx] = localNorm;
}
}
}
Serial
public class Serial {
public int iter;
public int maxIter = 1000000;
public double epsilon = 1.0e-3;
public boolean errorFlag = false;
public Data data = new Data();
public void solve() {
float norm,diff=0;
do {
for(int i=0;i<data.size;i++) {
float s=0;
for (int j = 0; j < data.size; j++) {
if (i != j)
s += data.A[i][j] * data.x[j];
data.solution[i] = (data.b[i] - s) / data.A[i][i];
}
}
norm = 0;
for (int i=0;i<data.size;i++) {
diff = Math.abs(data.solution[i]-data.x[i]); // Calculate convergence
if (diff > norm) norm = diff;
data.x[i] = data.solution[i];
}
if (norm < epsilon)
errorFlag = false; // Converged
else
errorFlag = true; // No desired convergence
} while (norm >= epsilon && ++iter <= maxIter);
}
}
I think its a matter of implementation and not parallelization. Look at what happens with Parallel p = new Parallel(1);
Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}
Serial: iterations=7194 , error=false, solution=[-1.1270591, 4.7042074, -1.8922218, 1.5626835]
Parallel: iterations=6 , error=false, solution=[-1.1274619, 4.7035804, -1.8927546, 1.5621948]
As it turns out - your second implementation is not doing exactly the same thing as your first one.
I added this into your parallel version and it ran in the same number of iterations.
for (int i = idx * batchSize; i < upper; i++) {
diff = Math.abs(data.solution[i] - data.x[i]); // Calculate
// convergence
if (diff > localNorm)
localNorm = diff;
data.x[i] = data.solution[i];
}
}