I have implemented serial and parallel algorithm for solving linear systems using jacobi method. Both implementations converge and give correct solutions.
I am having trouble with understanding:
How can parallel implementation converge after so low number of iterations compared to serial (same method is used in both). Am I facing some concurrency issues that I am not aware of?
How can number of iterations vary from run to run in parallel implementation (6,7)?
Thanks!
Program output:
Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}
Serial: iterations=7194 , error=false, solution=[-1.1270591, 4.7042074, -1.8922218, 1.5626835]
Parallel: iterations=6 , error=false, solution=[-1.1274619, 4.7035804, -1.8927546, 1.5621948]
Code:
Main
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
Serial s = new Serial();
Parallel p = new Parallel(2);
s.solve();
p.solve();
System.out.println("Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}");
System.out.println(String.format("Serial: iterations=%d , error=%s, solution=%s", s.iter, s.errorFlag, Arrays.toString(s.data.solution)));
System.out.println(String.format("Parallel: iterations=%d , error=%s, solution=%s", p.iter, p.errorFlag, Arrays.toString(p.data.solution)));
}
}
Data
public class Data {
public float A[][] = {{2.886139567217389f, 0.9778259187352214f, 0.9432146432722157f, 0.9622157488990459f}
,{0.3023479007910952f,0.7503803506938734f,0.06163831478699766f,0.3856445043958068f}
,{0.4298384105199724f, 0.7787439716945019f, 1.838686110345417f, 0.6282668788698587f}
,{0.27798718418255075f, 0.09021764079496353f, 0.8765867330141233f, 1.246036349549629f}};
public float b[] = {1.0630309381779384f,3.674438173599066f,0.6796639099285651f,0.39831385324794155f};
public int size = A.length;
public float x[] = new float[size];
public float solution[] = new float[size];
}
Parallel
import java.util.Arrays;
public class Parallel {
private final int workers;
private float[] globalNorm;
public int iter;
public int maxIter = 1000000;
public double epsilon = 1.0e-3;
public boolean errorFlag = false;
public Data data = new Data();
public Parallel(int workers) {
this.workers = workers;
this.globalNorm = new float[workers];
Arrays.fill(globalNorm, 0);
}
public void solve() {
JacobiWorker[] threads = new JacobiWorker[workers];
int batchSize = data.size / workers;
float norm;
do {
for(int i=0;i<workers;i++) {
threads[i] = new JacobiWorker(i,batchSize);
threads[i].start();
}
for(int i=0;i<workers;i++)
try {
threads[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
// At this point all worker calculations are done!
norm = 0;
for (float d : globalNorm) if (d > norm) norm = d;
if (norm < epsilon)
errorFlag = false; // Converged
else
errorFlag = true; // No desired convergence
} while (norm >= epsilon && ++iter <= maxIter);
}
class JacobiWorker extends Thread {
private final int idx;
private final int batchSize;
JacobiWorker(int idx, int batchSize) {
this.idx = idx;
this.batchSize = batchSize;
}
#Override
public void run() {
int upper = idx == workers - 1 ? data.size : (idx + 1) * batchSize;
float localNorm = 0, diff = 0;
for (int j = idx * batchSize; j < upper; j++) { // For every
// equation in batch
float s = 0;
for (int i = 0; i < data.size; i++) { // For every variable in
// equation
if (i != j)
s += data.A[j][i] * data.x[i];
data.solution[j] = (data.b[j] - s) / data.A[j][j];
}
diff = Math.abs(data.solution[j] - data.x[j]);
if (diff > localNorm) localNorm = diff;
data.x[j] = data.solution[j];
}
globalNorm[idx] = localNorm;
}
}
}
Serial
public class Serial {
public int iter;
public int maxIter = 1000000;
public double epsilon = 1.0e-3;
public boolean errorFlag = false;
public Data data = new Data();
public void solve() {
float norm,diff=0;
do {
for(int i=0;i<data.size;i++) {
float s=0;
for (int j = 0; j < data.size; j++) {
if (i != j)
s += data.A[i][j] * data.x[j];
data.solution[i] = (data.b[i] - s) / data.A[i][i];
}
}
norm = 0;
for (int i=0;i<data.size;i++) {
diff = Math.abs(data.solution[i]-data.x[i]); // Calculate convergence
if (diff > norm) norm = diff;
data.x[i] = data.solution[i];
}
if (norm < epsilon)
errorFlag = false; // Converged
else
errorFlag = true; // No desired convergence
} while (norm >= epsilon && ++iter <= maxIter);
}
}
I think its a matter of implementation and not parallelization. Look at what happens with Parallel p = new Parallel(1);
Mathematica solution: {{-1.12756}, {4.70371}, {-1.89272}, {1.56218}}
Serial: iterations=7194 , error=false, solution=[-1.1270591, 4.7042074, -1.8922218, 1.5626835]
Parallel: iterations=6 , error=false, solution=[-1.1274619, 4.7035804, -1.8927546, 1.5621948]
As it turns out - your second implementation is not doing exactly the same thing as your first one.
I added this into your parallel version and it ran in the same number of iterations.
for (int i = idx * batchSize; i < upper; i++) {
diff = Math.abs(data.solution[i] - data.x[i]); // Calculate
// convergence
if (diff > localNorm)
localNorm = diff;
data.x[i] = data.solution[i];
}
}
Related
I need to create a program that can calculate approximation to the constant PI, using Java multi-thread.
I'm intent to use Gregory-Leibniz Series to calculate the result for PI / 4, and then multiply by 4 to get the PI approximation.
But I have some concern about the program:
How can I seperate the calculation process so that I can implement a multi-thread processing for the program? Because the formula is for the total sum, I don't know how to split them into parts and then in the end I will collect them all.
I'm thinking about the fact that the program will execute the formula to infinite so user will need to provide some means of configuring the execution in order to determine when it should stop and return a result. Is it possible and how can I do that?
This is so far the most I can do by now.
public class PICalculate {
public static void main(String[] args) {
System.out.println(calculatePI(5000000) * 4);
}
static double calculatePI(int n) {
double result = 0.0;
if (n < 0) {
return 0.0;
}
for (int i = 0; i <= n; i++) {
result += Math.pow(-1, i) / ((2 * i) + 1);
}
return result;
}
}
The most straightforward, but not the most optimal, approach is to distribute the sequence elements between threads you have. Ie, if you have 4 threads, thread one will work with n%4 == 0 elements, thread2 with n%4 == 1 elements and so on
public static void main(String ... args) throws InterruptedException {
int threadCount = 4;
int N = 100_000;
PiThread[] threads = new PiThread[threadCount];
for (int i = 0; i < threadCount; i++) {
threads[i] = new PiThread(threadCount, i, N);
threads[i].start();
}
for (int i = 0; i < threadCount; i++) {
threads[i].join();
}
double pi = 0;
for (int i = 0; i < threadCount; i++) {
pi += threads[i].getSum();
}
System.out.print("PI/4 = " + pi);
}
static class PiThread extends Thread {
private final int threadCount;
private final int threadRemainder;
private final int N;
private double sum = 0;
public PiThread(int threadCount, int threadRemainder, int n) {
this.threadCount = threadCount;
this.threadRemainder = threadRemainder;
N = n;
}
#Override
public void run() {
for (int i = 0; i <= N; i++) {
if (i % threadCount == threadRemainder) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
}
public double getSum() {
return sum;
}
}
PiThread is more efficient, but arguably harder to read, if the loop is shorter:
public void run() {
for (int i = threadRemainder; i <= N; i += threadCount) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
In case you don't want to limit yourself with number of elements in sequence and just by time, you may follow an approach below. But note, that it is still limited with Long.MAX_VALUE and you'll have to use BigIntegers, BigDecimals or any other reasonable approach to improve it
public static volatile boolean running = true;
public static void main(String ... args) throws InterruptedException {
int threadCount = 4;
long timeoutMs = 5_000;
final AtomicLong counter = new AtomicLong(0);
PiThread[] threads = new PiThread[threadCount];
for (int i = 0; i < threadCount; i++) {
threads[i] = new PiThread(counter);
threads[i].start();
}
Thread.sleep(timeoutMs);
running = false;
for (int i = 0; i < threadCount; i++) {
threads[i].join();
}
double sum = 0;
for (int i = 0; i < threadCount; i++) {
sum += threads[i].getSum();
}
System.out.print("counter = " + counter.get());
System.out.print("PI = " + 4*sum);
}
static class PiThread extends Thread {
private AtomicLong counter;
private double sum = 0;
public PiThread(AtomicLong counter) {
this.counter = counter;
}
#Override
public void run() {
long i;
while (running && isValidCounter(i = counter.getAndAdd(1))) {
sum += Math.pow(-1, i) / (2 * i + 1);
}
}
private boolean isValidCounter(long value) {
return value >= 0 && value < Long.MAX_VALUE;
}
public double getSum() {
return sum;
}
}
I am currently working with a gaming code library and one of the parts uses a genetic algorithm for some of the AI. This concept is rather new to me and although I've looked up and read about how it works I am finding it hard how to link precisely what is being evolved, mutated etc. to the actual code.
The actions are basically each of the possible options that the AI has to move. So it is trying to evolve a state and find out which is the best action to take. Can anyone help me understand it a bit clearer than that?
private static double GAMMA = 0.90;
private static long BREAK_MS = 35;
private static int SIMULATION_DEPTH = 7;
private static int POPULATION_SIZE = 5;
private static double RECPROB = 0.1;
private double MUT = (1.0 / SIMULATION_DEPTH);
private final int N_ACTIONS;
private ElapsedCpuTimer timer;
private int genome[][][];
private final HashMap<Integer, Types.ACTIONS> action_mapping;
private final HashMap<Types.ACTIONS, Integer> r_action_mapping;
protected Random randomGenerator;
private int numSimulations;
/**
* Public constructor with state observation and time due.
*
* #param stateObs state observation of the current game.
* #param elapsedTimer Timer for the controller creation.
*/
public Agent(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {
randomGenerator = new Random();
action_mapping = new HashMap<Integer, Types.ACTIONS>();
r_action_mapping = new HashMap<Types.ACTIONS, Integer>();
int i = 0;
for (Types.ACTIONS action : stateObs.getAvailableActions()) {
action_mapping.put(i, action);
r_action_mapping.put(action, i);
i++;
}
N_ACTIONS = stateObs.getAvailableActions().size();
initGenome(stateObs);
}
double microbial_tournament(int[][] actionGenome, StateObservation stateObs, StateHeuristic heuristic) throws TimeoutException {
int a, b, c, W, L;
int i;
a = (int) ((POPULATION_SIZE - 1) * randomGenerator.nextDouble());
do {
b = (int) ((POPULATION_SIZE - 1) * randomGenerator.nextDouble());
} while (a == b);
double score_a = simulate(stateObs, heuristic, actionGenome[a]);
double score_b = simulate(stateObs, heuristic, actionGenome[b]);
if (score_a > score_b) {
W = a;
L = b;
} else {
W = b;
L = a;
}
int LEN = actionGenome[0].length;
for (i = 0; i < LEN; i++) {
if (randomGenerator.nextDouble() < RECPROB) {
actionGenome[L][i] = actionGenome[W][i];
}
}
for (i = 0; i < LEN; i++) {
if (randomGenerator.nextDouble() < MUT) actionGenome[L][i] = randomGenerator.nextInt(N_ACTIONS);
}
return Math.max(score_a, score_b);
}
private void initGenome(StateObservation stateObs) {
genome = new int[N_ACTIONS][POPULATION_SIZE][SIMULATION_DEPTH];
// Randomize initial genome
for (int i = 0; i < genome.length; i++) {
for (int j = 0; j < genome[i].length; j++) {
for (int k = 0; k < genome[i][j].length; k++) {
genome[i][j][k] = randomGenerator.nextInt(N_ACTIONS);
}
}
}
}
private double simulate(StateObservation stateObs, StateHeuristic heuristic, int[] policy) throws TimeoutException {
//System.out.println("depth" + depth);
long remaining = timer.remainingTimeMillis();
if (remaining < BREAK_MS) {
//System.out.println(remaining);
throw new TimeoutException("Timeout");
}
int depth = 0;
stateObs = stateObs.copy();
for (; depth < policy.length; depth++) {
Types.ACTIONS action = action_mapping.get(policy[depth]);
stateObs.advance(action);
if (stateObs.isGameOver()) {
break;
}
}
numSimulations++;
double score = Math.pow(GAMMA, depth) * heuristic.evaluateState(stateObs);
return score;
}
private Types.ACTIONS microbial(StateObservation stateObs, int maxdepth, StateHeuristic heuristic, int iterations) {
double[] maxScores = new double[stateObs.getAvailableActions().size()];
for (int i = 0; i < maxScores.length; i++) {
maxScores[i] = Double.NEGATIVE_INFINITY;
}
outerloop:
for (int i = 0; i < iterations; i++) {
for (Types.ACTIONS action : stateObs.getAvailableActions()) {
StateObservation stCopy = stateObs.copy();
stCopy.advance(action);
double score = 0;
try {
score = microbial_tournament(genome[r_action_mapping.get(action)], stCopy, heuristic) + randomGenerator.nextDouble()*0.00001;
} catch (TimeoutException e) {
break outerloop;
}
int int_act = this.r_action_mapping.get(action);
if (score > maxScores[int_act]) {
maxScores[int_act] = score;
}
}
}
Types.ACTIONS maxAction = this.action_mapping.get(Utils.argmax(maxScores));
return maxAction;
}
/**
* Picks an action. This function is called every game step to request an
* action from the player.
*
* #param stateObs Observation of the current state.
* #param elapsedTimer Timer when the action returned is due.
* #return An action for the current state
*/
public Types.ACTIONS act(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {
this.timer = elapsedTimer;
numSimulations = 0;
Types.ACTIONS lastGoodAction = microbial(stateObs, SIMULATION_DEPTH, new WinScoreHeuristic(stateObs), 100);
return lastGoodAction;
}
#Override
public void draw(Graphics2D g)
{
//g.drawString("Num Simulations: " + numSimulations, 10, 20);
}
}
genome is the encoding of the solution (genotype) which is translated by simulate into the actual problem space (phenotype). In addition a fitness score is returned as part of the evaluation. The other methods initialise or perturbate the genotype to obtain a different solution.
Please ask more specific questions, if you need more rather than dumping a whole lot of code asking 'please explain'!
I know this may be a stupid question, maybe the most stupid question today, but I have to ask it: Have I invented this sorting algorithm?
Yesterday, I had a little inspiration about an exchange-based sorting algorithm. Today, I implemented it, and it worked.
It probably already exists, since there are many not-so-popular sorting algorithms out there that has little or none information about, and almost no implementation of them exist.
Description: Basically, this algorithm takes an item, them a pair, then an item again... until the end of the list. For each item/pair, compare EVERY two items at the same radius distance from pair space or item, until a border of the array is reached, and then exchange those items if needed. Repeat this for each pair/item of the list.
An English-based pseudo-code:
FOR i index to last index of Array (starting from 0)
L index is i - 1
R index is i + 1
//Odd case, where i is the center
WHILE (L is in array range and R is in array range)
IF item Array[L] is greater than Array[R]
EXCHANGE item Array[L] with Array[R]
END-IF
ADD 1 to R
REST 1 to L
END-WHILE
//Even case, where i is not the center
L index is now i
R index in now i + 1
WHILE (L is in array range and R is in array range)
IF item Array[L] is greater than Array[R]
EXCHANGE Array[L] with Array[R]
END-IF
ADD 1 to R
REST 1 to L
END-WHILE
END FOR
This is the implementation in Java:
//package sorting;
public class OrbitSort {
public static void main(String[] args) {
int[] numbers ={ 15, 8, 6, 3, 11, 1, 2, 0, 14, 13, 7, 9, 4, 10, 5, 12 };
System.out.println("Original list:");
display(numbers);
sort(numbers);
System.out.println("\nSorted list:");
display(numbers);
}
//Sorting algorithm
public static void sort(int[] array) {
for(int i = 0; i < array.length; i++){
int L = i - 1;
int R = i + 1;
//Odd case (with a central item)
while(L >= 0 && R < array.length){
if(array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
//Even case (with no central item)
L = i;
R = i + 1;
while(L >= 0 && R < array.length) {
if(array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
}
}
//Swap two items in array.
public static void swap(int[] array, int x, int y) {
int temp = array[x];
array[x] = array[y];
array[y] = temp;
}
//Display items
public static void display(int[] numbers){
for(int i: numbers)
System.out.print(" " + i);
System.out.println();
}
}
I know can be shorter, but it's just an early implementation.
It probably runs in O(n^2), but I'm not sure.
So, what do you think? Does it already exists?
To me, it looks like a modified bubble sort algo, which may perform better for certain arrangements of input elements.
Altough not necessarily fair, I did a benchmark with warmup cycles using your input array, for comparison of:
java.util.Arrays.sort(), which is a merge quick sort implementation
BubbleSort.sort(), a java implementation of the bubble sort algo
OrbitSort.sort(), your algo
Results:
input size: 8192
warmup iterations: 32
Arrays.sort()
iterations : 10000
total time : 4940.0ms
avg time : 0.494ms
BubbleSort.sort()
iterations : 100
total time : 8360.0ms
avg time : 83.6ms
OrbitSort.sort()
iterations : 100
total time : 8820.0ms
avg time : 88.2ms
Of course, the performance depends on input size and arrangement
Straightforward code:
package com.sam.tests;
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.Callable;
public class SortBenchmark {
public static class OrbitSort {
// Sorting algorithm
public static void sort(int[] array) {
for (int i = 0; i < array.length; i++) {
int L = i - 1;
int R = i + 1;
// Odd case (with a central item)
while (L >= 0 && R < array.length) {
if (array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
// Even case (with no central item)
L = i;
R = i + 1;
while (L >= 0 && R < array.length) {
if (array[L] > array[R])
swap(array, L, R);
L--;
R++;
}
}
}
// Swap two items in array.
public static void swap(int[] array, int x, int y) {
int temp = array[x];
array[x] = array[y];
array[y] = temp;
}
}
public static class BubbleSort {
public static void sort(int[] numbers) {
boolean swapped = true;
for (int i = numbers.length - 1; i > 0 && swapped; i--) {
swapped = false;
for (int j = 0; j < i; j++) {
if (numbers[j] > numbers[j + 1]) {
int temp = numbers[j];
numbers[j] = numbers[j + 1];
numbers[j + 1] = temp;
swapped = true;
}
}
}
}
}
public static class TestDataFactory {
public static enum ElementOrder {
Ascending, Descending, Random
}
public static int[] createIntArray(final int size, final ElementOrder elementOrder) {
int[] array = new int[size];
switch (elementOrder) {
case Ascending:
for (int i = 0; i < size; ++i)
array[i] = i;
break;
case Descending:
for (int i = 0; i < size; ++i)
array[i] = size - i - 1;
break;
case Random:
default:
Random rg = new Random(System.nanoTime());
for (int i = 0; i < size; ++i)
array[i] = rg.nextInt(size);
break;
}
return array;
}
}
public static class Benchmark {
// misc constants
public static final int NANOS_PER_MSEC = 1000000;
// config constants
public static final int BIGDECIMAL_PRECISION = 6;
// constant defaults
public static final long AUTOTUNING_MIN_ITERATIONS_DEFAULT = 1;
public static final long AUTOTUNING_MIN_DURATION_DEFAULT = 125;
public static final long BENCHMARK_MIN_ITERATIONS_DEFAULT = 1;
public static final long BENCHMARK_MAX_ITERATIONS_DEFAULT = Integer.MAX_VALUE;
public static final long BENCHMARK_TARGET_DURATION_DEFAULT = 125;
// private static final ThreadMXBean threadBean =
// ManagementFactory.getThreadMXBean();
public static final long getNanoTime() {
// return threadBean.getCurrentThreadCpuTime();// not good, runs at
// some time slice resolution
return System.nanoTime();
}
public static class Result {
public String name;
public long iterations;
public long totalTime; // nanoseconds
public Result(String name, long iterations, long startTime, long endTime) {
this.name = name;
this.iterations = iterations;
this.totalTime = endTime - startTime;
}
#Override
public String toString() {
final double totalTimeMSecs = ((double) totalTime) / NANOS_PER_MSEC;
final BigDecimal avgTimeMsecs = new BigDecimal(this.totalTime).divide(new BigDecimal(this.iterations).multiply(new BigDecimal(NANOS_PER_MSEC)),
BIGDECIMAL_PRECISION, RoundingMode.HALF_UP);
final String newLine = System.getProperty("line.separator");
StringBuilder sb = new StringBuilder();
sb.append(name).append(newLine);
sb.append(" ").append("iterations : ").append(iterations).append(newLine);
sb.append(" ").append("total time : ").append(totalTimeMSecs).append(" ms").append(newLine);
sb.append(" ").append("avg time : ").append(avgTimeMsecs).append(" ms").append(newLine);
return sb.toString();
}
}
public static <T> Result executionTime(final String name, final long iterations, final long warmupIterations, final Callable<T> test) throws Exception {
// vars
#SuppressWarnings("unused")
T ret;
long startTime;
long endTime;
// warmup
for (long i = 0; i < warmupIterations; ++i)
ret = test.call();
// actual benchmark iterations
{
startTime = getNanoTime();
for (long i = 0; i < iterations; ++i)
ret = test.call();
endTime = getNanoTime();
}
// return result
return new Result(name, iterations, startTime, endTime);
}
/**
* Auto tuned execution time measurement for test callbacks with steady
* execution time
*
* #param name
* #param test
* #return
* #throws Exception
*/
public static <T> Result executionTimeAutotuned(final String name, final Callable<T> test) throws Exception {
final long autoTuningMinIterations = AUTOTUNING_MIN_ITERATIONS_DEFAULT;
final long autoTuningMinDuration = AUTOTUNING_MIN_DURATION_DEFAULT;
final long benchmarkTargetDuration = BENCHMARK_TARGET_DURATION_DEFAULT;
final long benchmarkMinIterations = BENCHMARK_MIN_ITERATIONS_DEFAULT;
final long benchmarkMaxIterations = BENCHMARK_MAX_ITERATIONS_DEFAULT;
// vars
#SuppressWarnings("unused")
T ret;
final int prevThreadPriority;
long warmupIterations = 0;
long autoTuningDuration = 0;
long iterations = benchmarkMinIterations;
long startTime;
long endTime;
// store current thread priority and set it to max
prevThreadPriority = Thread.currentThread().getPriority();
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
// warmup and iteration count tuning
{
final long autoTuningMinTimeNanos = autoTuningMinDuration * NANOS_PER_MSEC;
long autoTuningConsecutiveLoops = 1;
double avgExecutionTime = 0;
do {
{
startTime = getNanoTime();
for (long i = 0; i < autoTuningConsecutiveLoops; ++i, ++warmupIterations) {
ret = test.call();
}
endTime = getNanoTime();
autoTuningDuration += (endTime - startTime);
}
avgExecutionTime = ((double) autoTuningDuration) / ((double) (warmupIterations));
if ((autoTuningDuration >= autoTuningMinTimeNanos) && (warmupIterations >= autoTuningMinIterations)) {
break;
} else {
final double remainingAutotuningIterations = ((double) (autoTuningMinTimeNanos - autoTuningDuration)) / avgExecutionTime;
autoTuningConsecutiveLoops = Math.max(1, Math.min(Integer.MAX_VALUE, (long) Math.ceil(remainingAutotuningIterations)));
}
} while (warmupIterations < Integer.MAX_VALUE);
final double requiredIterations = ((double) benchmarkTargetDuration * NANOS_PER_MSEC) / avgExecutionTime;
iterations = Math.max(1, Math.min(benchmarkMaxIterations, (long) Math.ceil(requiredIterations)));
}
// actual benchmark iterations
{
startTime = getNanoTime();
for (long i = 0; i < iterations; ++i)
ret = test.call();
endTime = getNanoTime();
}
// restore previous thread priority
Thread.currentThread().setPriority(prevThreadPriority);
// return result
return new Result(name, iterations, startTime, endTime);
}
}
public static void executeBenchmark(int inputSize, ArrayList<Benchmark.Result> results) {
// final int[] inputArray = { 15, 8, 6, 3, 11, 1, 2, 0, 14, 13, 7, 9, 4,
// 10, 5, 12 };
final int[] inputArray = TestDataFactory.createIntArray(inputSize, TestDataFactory.ElementOrder.Random);
try {
// compare against Arrays.sort()
{
final int[] ref = inputArray.clone();
Arrays.sort(ref);
{
int[] temp = inputArray.clone();
BubbleSort.sort(temp);
if (!Arrays.equals(temp, ref))
throw new Exception("BubbleSort.sort() failed");
}
{
int[] temp = inputArray.clone();
OrbitSort.sort(temp);
if (!Arrays.equals(temp, ref))
throw new Exception("OrbitSort.sort() failed");
}
}
results.add(Benchmark.executionTimeAutotuned("Arrays.sort()", new Callable<Void>() {
#Override
public Void call() throws Exception {
int[] temp = Arrays.copyOf(inputArray, inputArray.length);
Arrays.sort(temp);
return null;
}
}));
results.add(Benchmark.executionTimeAutotuned("BubbleSort.sort()", new Callable<Void>() {
#Override
public Void call() throws Exception {
int[] temp = Arrays.copyOf(inputArray, inputArray.length);
BubbleSort.sort(temp);
return null;
}
}));
results.add(Benchmark.executionTimeAutotuned("OrbitSort.sort()", new Callable<Void>() {
#Override
public Void call() throws Exception {
int[] temp = Arrays.copyOf(inputArray, inputArray.length);
OrbitSort.sort(temp);
return null;
}
}));
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
ArrayList<Benchmark.Result> results = new ArrayList<Benchmark.Result>();
for (int i = 16; i <= 16384; i <<= 1) {
results.clear();
executeBenchmark(i, results);
System.out.println("input size : " + i);
System.out.println("");
for (Benchmark.Result result : results) {
System.out.print(result.toString());
}
System.out.println("----------------------------------------------------");
}
}
}
It is O(n^2) (assuming it works, I am not sure about that), as to already exists - maybe - it is not really original, as it can be considered a variation of a trivial sorting implementation, but I doubt if there is any published algorithm which is exactly the same as this one, specifically one with two consecutive inner loops.
I am not saying it is without merit, there can be a use case for which its behavior is uniquely efficient (maybe where reading is much faster than writing, and cache behavior benefits its access pattern).
To see why it is O(n^2), think about the first n/6 outer loop iterations, the inner loops run on O(n) length O(n) times.
I'm trying to solve the problem of positioning N queens on NxN board without row, column and diagonal conflicts. I use an algorithm with minimizing the conflicts. Firstly, on each column randomly a queen is positioned. After that, of all conflict queens randomly one is chosen and for her column are calculated the conflicts of each possible position. Then, the queen moves to the best position with min number of conflicts. It works, but it runs extremely slow. My goal is to make it run fast for 10000 queens. Would you, please, suggest me some improvements or maybe notice some mistakes in my logic?
Here is my code:
public class Queen {
int column;
int row;
int d1;
int d2;
public Queen(int column, int row, int d1, int d2) {
super();
this.column = column;
this.row = row;
this.d1 = d1;
this.d2 = d2;
}
#Override
public String toString() {
return "Queen [column=" + column + ", row=" + row + ", d1=" + d1
+ ", d2=" + d2 + "]";
}
#Override
public boolean equals(Object obj) {
return ((Queen)obj).column == this.column && ((Queen)obj).row == this.row;
}
}
And:
import java.util.HashSet;
import java.util.Random;
public class SolveQueens {
public static boolean printBoard = false;
public static int N = 100;
public static int maxSteps = 2000000;
public static int[] queens = new int[N];
public static Random random = new Random();
public static HashSet<Queen> q = new HashSet<Queen>();
public static HashSet rowConfl[] = new HashSet[N];
public static HashSet d1Confl[] = new HashSet[2*N - 1];
public static HashSet d2Confl[] = new HashSet[2*N - 1];
public static void init () {
int r;
rowConfl = new HashSet[N];
d1Confl = new HashSet[2*N - 1];
d2Confl = new HashSet[2*N - 1];
for (int i = 0; i < N; i++) {
r = random.nextInt(N);
queens[i] = r;
Queen k = new Queen(i, r, i + r, N - 1 + i - r);
q.add(k);
if (rowConfl[k.row] == null) {
rowConfl[k.row] = new HashSet<Queen>();
}
if (d1Confl[k.d1] == null) {
d1Confl[k.d1] = new HashSet<Queen>();
}
if (d2Confl[k.d2] == null) {
d2Confl[k.d2] = new HashSet<Queen>();
}
((HashSet<Queen>)rowConfl[k.row]).add(k);
((HashSet<Queen>)d1Confl[k.d1]).add(k);
((HashSet<Queen>)d2Confl[k.d2]).add(k);
}
}
public static void print () {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
System.out.print(queens[i] == j ? "♕ " : "◻◻◻ ");
}
System.out.println();
}
System.out.println();
}
public static boolean checkItLinear() {
Queen r = choseConflictQueen();
if (r == null) {
return true;
}
Queen newQ = findNewBestPosition(r);
q.remove(r);
q.add(newQ);
rowConfl[r.row].remove(r);
d1Confl[r.d1].remove(r);
d2Confl[r.d2].remove(r);
if (rowConfl[newQ.row] == null) {
rowConfl[newQ.row] = new HashSet<Queen>();
}
if (d1Confl[newQ.d1] == null) {
d1Confl[newQ.d1] = new HashSet<Queen>();
}
if (d2Confl[newQ.d2] == null) {
d2Confl[newQ.d2] = new HashSet<Queen>();
}
((HashSet<Queen>)rowConfl[newQ.row]).add(newQ);
((HashSet<Queen>)d1Confl[newQ.d1]).add(newQ);
((HashSet<Queen>)d2Confl[newQ.d2]).add(newQ);
queens[r.column] = newQ.row;
return false;
}
public static Queen choseConflictQueen () {
HashSet<Queen> conflictSet = new HashSet<Queen>();
boolean hasConflicts = false;
for (int i = 0; i < 2*N - 1; i++) {
if (i < N && rowConfl[i] != null) {
hasConflicts = hasConflicts || rowConfl[i].size() > 1;
conflictSet.addAll(rowConfl[i]);
}
if (d1Confl[i] != null) {
hasConflicts = hasConflicts || d1Confl[i].size() > 1;
conflictSet.addAll(d1Confl[i]);
}
if (d2Confl[i] != null) {
hasConflicts = hasConflicts || d2Confl[i].size() > 1;
conflictSet.addAll(d2Confl[i]);
}
}
if (hasConflicts) {
int c = random.nextInt(conflictSet.size());
return (Queen) conflictSet.toArray()[c];
}
return null;
}
public static Queen findNewBestPosition(Queen old) {
int[] row = new int[N];
int min = Integer.MAX_VALUE;
int minInd = old.row;
for (int i = 0; i < N; i++) {
if (rowConfl[i] != null) {
row[i] = rowConfl[i].size();
}
if (d1Confl[old.column + i] != null) {
row[i] += d1Confl[old.column + i].size();
}
if (d2Confl[N - 1 + old.column - i] != null) {
row[i] += d2Confl[N - 1 + old.column - i].size();
}
if (i == old.row) {
row[i] = row[i] - 3;
}
if (row[i] <= min && i != minInd) {
min = row[i];
minInd = i;
}
}
return new Queen(old.column, minInd, old.column + minInd, N - 1 + old.column - minInd);
}
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
init();
int steps = 0;
while(!checkItLinear()) {
if (++steps > maxSteps) {
init();
steps = 0;
}
}
long endTime = System.currentTimeMillis();
System.out.println("Done for " + (endTime - startTime) + "ms\n");
if(printBoard){
print();
}
}
}
Edit:
Here is my a-little-bit-optimized solution with removing some unused objects and putting the queens on diagonal positions when initializing.
import java.util.Random;
import java.util.Vector;
public class SolveQueens {
public static boolean PRINT_BOARD = true;
public static int N = 10;
public static int MAX_STEPS = 5000;
public static int[] queens = new int[N];
public static Random random = new Random();
public static int[] rowConfl = new int[N];
public static int[] d1Confl = new int[2*N - 1];
public static int[] d2Confl = new int[2*N - 1];
public static Vector<Integer> conflicts = new Vector<Integer>();
public static void init () {
random = new Random();
for (int i = 0; i < N; i++) {
queens[i] = i;
}
}
public static int getD1Pos (int col, int row) {
return col + row;
}
public static int getD2Pos (int col, int row) {
return N - 1 + col - row;
}
public static void print () {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
System.out.print(queens[i] == j ? "Q " : "* ");
}
System.out.println();
}
System.out.println();
}
public static boolean hasConflicts() {
generateConflicts();
if (conflicts.isEmpty()) {
return false;
}
int r = random.nextInt(conflicts.size());
int conflQueenCol = conflicts.get(r);
int currentRow = queens[conflQueenCol];
int bestRow = currentRow;
int minConfl = getConflicts(conflQueenCol, queens[conflQueenCol]) - 3;
int tempConflCount;
for (int i = 0; i < N ; i++) {
tempConflCount = getConflicts(conflQueenCol, i);
if (i != currentRow && tempConflCount <= minConfl) {
minConfl = tempConflCount;
bestRow = i;
}
}
queens[conflQueenCol] = bestRow;
return true;
}
public static void generateConflicts () {
conflicts = new Vector<Integer>();
rowConfl = new int[N];
d1Confl = new int[2*N - 1];
d2Confl = new int[2*N - 1];
for (int i = 0; i < N; i++) {
int r = queens[i];
rowConfl[r]++;
d1Confl[getD1Pos(i, r)]++;
d2Confl[getD2Pos(i, r)]++;
}
for (int i = 0; i < N; i++) {
int conflictsCount = getConflicts(i, queens[i]) - 3;
if (conflictsCount > 0) {
conflicts.add(i);
}
}
}
public static int getConflicts(int col, int row) {
return rowConfl[row] + d1Confl[getD1Pos(col, row)] + d2Confl[getD2Pos(col, row)];
}
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
init();
int steps = 0;
while(hasConflicts()) {
if (++steps > MAX_STEPS) {
init();
steps = 0;
}
}
long endTime = System.currentTimeMillis();
System.out.println("Done for " + (endTime - startTime) + "ms\n");
if(PRINT_BOARD){
print();
}
}
}
Comments would have been helpful :)
Rather than recreating your conflict set and your "worst conflict" queen everything, could you create it once, and then just update the changed rows/columns?
EDIT 0:
I tried playing around with your code a bit. Since the code is randomized, it's hard to find out if a change is good or not, since you might start with a good initial state or a crappy one. I tried making 10 runs with 10 queens, and got wildly different answers, but results are below.
I psuedo-profiled to see which statements were being executed the most, and it turns out the inner loop statements in chooseConflictQueen are executed the most. I tried inserting a break to pull the first conflict queen if found, but it didn't seem to help much.
Grouping only runs that took more than a second:
I realize I only have 10 runs, which is not really enough to be statistically valid, but hey.
So adding breaks didn't seem to help. I think a constructive solution will likely be faster, but randomness will again make it harder to check.
Your approach is good : Local search algorithm with minimum-conflicts constraint. I would suggest try improving your initial state. Instead of randomly placing all queens, 1 per column, try to place them so that you minimize the number of conflicts. An example would be to try placing you next queen based on the position of the previous one ... or maybe position of previous two ... Then you local search will have less problematic columns to deal with.
If you randomly select, you could be selecting the same state as a previous state. Theoretically, you might never find a solution even if there is one.
I think you woud be better to iterate normally through the states.
Also, are you sure boards other than 8x8 are solvable?
By inspection, 2x2 is not, 3x3 is not, 4x4 is not.
I'm trying to compute Pi, but what I really want to achieve is efficiency when using more than one thread. The algorithm is simple: I randomly generate points in the unit square and after that count how many of them are in the circle inscribed within the square. (more here: http://math.fullerton.edu/mathews/n2003/montecarlopimod.html)
My idea is to split the square horizontally and to run a different thread for each part of it.
But instead of speed up, all I get is a delay. Any ideas why? Here is the code:
public class TaskManager {
public static void main(String[] args) {
int threadsCount = 3;
int size = 10000000;
boolean isQuiet = false;
PiCalculator pi = new PiCalculator(size);
Thread tr[] = new Thread[threadsCount];
long time = -System.currentTimeMillis();
int i;
double s = 1.0/threadsCount;
int p = size/threadsCount;
for(i = 0; i < threadsCount; i++) {
PiRunnable r = new PiRunnable(pi, s*i, s*(1.0+i), p, isQuiet);
tr[i] = new Thread(r);
}
for(i = 0; i < threadsCount; i++) {
tr[i].start();
}
for(i = 0; i < threadsCount; i++) {
try {
tr[i].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
double myPi = 4.0*pi.getPointsInCircle()/pi.getPointsInSquare();
System.out.println(myPi + " time = " + (System.currentTimeMillis()+time));
}
}
public class PiRunnable implements Runnable {
PiCalculator pi;
private double minX;
private double maxX;
private int pointsToSpread;
public PiRunnable(PiCalculator pi, double minX, double maxX, int pointsToSpread, boolean isQuiet) {
super();
this.pi = pi;
this.minX = minX;
this.maxX = maxX;
this.pointsToSpread = pointsToSpread;
}
#Override
public void run() {
int n = countPointsInAreaInCircle(minX, maxX, pointsToSpread);
pi.addToPointsInCircle(n);
}
public int countPointsInAreaInCircle (double minX, double maxX, int pointsCount) {
double x;
double y;
int inCircle = 0;
for (int i = 0; i < pointsCount; i++) {
x = Math.random() * (maxX - minX) + minX;
y = Math.random();
if (x*x + y*y <= 1) {
inCircle++;
}
}
return inCircle;
}
}
public class PiCalculator {
private int pointsInSquare;
private int pointsInCircle;
public PiCalculator(int pointsInSquare) {
super();
this.pointsInSquare = pointsInSquare;
}
public synchronized void addToPointsInCircle (int pointsCount) {
this.pointsInCircle += pointsCount;
}
public synchronized int getPointsInCircle () {
return this.pointsInCircle;
}
public synchronized void setPointsInSquare (int pointsInSquare) {
this.pointsInSquare = pointsInSquare;
}
public synchronized int getPointsInSquare () {
return this.pointsInSquare;
}
}
Some results:
-for 3 threads: "3.1424696 time = 2803"
-for 1 thread: "3.1416192 time = 2337"
Your threads could be fighting/waiting for Math.random() which is synchronized, you should create an instance of java.util.Random for each thread. Also in this case speedup with multiple threads will only happen if you have more than one core/cpu.
From the javadoc of Math.random():
This method is properly synchronized
to allow correct use by more than one
thread. However, if many threads need
to generate pseudorandom numbers at a
great rate, it may reduce contention
for each thread to have its own
pseudorandom-number generator.
Here is an alternate main method that uses the java.util.concurrency package instead of manually managing the threads and waiting for them to finish.
public static void main(final String[] args) throws InterruptedException
{
final int threadsCount = Runtime.getRuntime().availableProcessors();
final int size = 10000000;
boolean isQuiet = false;
final PiCalculator pi = new PiCalculator(size);
final ExecutorService es = Executors.newFixedThreadPool(threadsCount);
long time = -System.currentTimeMillis();
int i;
double s = 1.0 / threadsCount;
int p = size / threadsCount;
for (i = 0; i < threadsCount; i++)
{
es.submit(new PiRunnable(pi, s * i, s * (1.0 + i), p, isQuiet));
}
es.shutdown();
while (!es.isTerminated()) { /* do nothing waiting for threads to complete */ }
double myPi = 4.0 * pi.getPointsInCircle() / pi.getPointsInSquare();
System.out.println(myPi + " time = " + (System.currentTimeMillis() + time));
}
I also changed the Math.random() to use local instances of Random for each Runnable.
final private Random rnd;
...
x = this.rnd.nextDouble() * (maxX - minX) + minX;
y = this.rnd.nextDouble();
this is the new output I get ...
3.1419284 time = 235
I think you could probably drop the time some more using Futures and not having to synchronized so much on the PiCalculator.