Empirical analysis for binary search not matching Theoretical Analysis - java

I'm currently doing a test for the Binary searches average case. Simply all I do is I generate a random variable and then search for this random variable in different sized arrays using the binary search. Below is my code used:
public static void main(String[] args)
{
//This array keeps track of the times of the linear search
long[] ArrayTimeTaken = new long[18];
//Values of the array lengths that we test for
int[] ArrayAssignValues = new int[18];
ArrayAssignValues[0] = 1000000;
ArrayAssignValues[1] = 10000000;
ArrayAssignValues[2] = 20000000;
ArrayAssignValues[3] = 30000000;
ArrayAssignValues[4] = 40000000;
ArrayAssignValues[5] = 50000000;
ArrayAssignValues[6] = 60000000;
ArrayAssignValues[7] = 70000000;
ArrayAssignValues[8] = 80000000;
ArrayAssignValues[9] = 90000000;
ArrayAssignValues[10] = 100000000;
ArrayAssignValues[11] = 110000000;
ArrayAssignValues[12] = 120000000;
ArrayAssignValues[13] = 130000000;
ArrayAssignValues[14] = 140000000;
ArrayAssignValues[15] = 150000000;
ArrayAssignValues[16] = 160000000;
ArrayAssignValues[17] = 170000000;
//Code that runs the linear search
for (int i = 0; i < ArrayAssignValues.length; i++)
{
float[] arrayExperimentTest = new float[ ArrayAssignValues[i]];
//We fill the array with ascending numbers
for (int j = 0; j < arrayExperimentTest.length; j++)
{
arrayExperimentTest[j] = j;
}
Random Generator = new Random();
int ValuetoSearchfor = (int) Generator.nextInt(ArrayAssignValues[i]);
System.out.println(ValuetoSearchfor);
ValuetoSearchfor = (int) arrayExperimentTest[ValuetoSearchfor];
//Here we perform a the Linear Search
ArrayTimeTaken[i] = BinarySearch(arrayExperimentTest,ValuetoSearchfor);
}
ChartCreate(ArrayTimeTaken);
System.out.println("Done");
}
Here is my code for the binary search:
static long BinarySearch(float[] ArraySearch,int ValueFind)
{
System.gc();
long startTime = System.nanoTime();
int low = 0;
int high = ArraySearch.length-1;
int mid = Math.round((low+high)/2);
while (ArraySearch[mid] != ValueFind )
{
if (ValueFind <ArraySearch[mid])
{
high = mid-1;
}
else
{
low = mid+1;
}
mid = (low+high)/2;
}
long TimeTaken = System.nanoTime() - startTime;
return TimeTaken;
}
Now the problem is that the results aren't making sense. Below is a graph:
Can some explain how the 1st array takes so much time? I've run the code a good few times and its basically the same graph created every time. Does Java some how cache results? Can anyone explain the result why the 1st binary search takes so long relativve to the others even though the array size is tiny compared to the rest?

It looks like you're doing these searches one after another, starting with the lowest values. If that's true, then the code will be running much slower, because the JIT compiler won't have had a chance to warm up yet. Generally, for benchmarking like this, you want to run through all relevant code to give the JIT compiler time to compile it and optimize before you do the real testing.
For more information on the JIT compiler, read this
You should also see this question to learn more about benchmarking.
Another possible cause of the slowness is that the JVM could still be in the process of starting up, and running it' own background code while you're timing it, causing slowdown.

Benchmarking is not done this way, you should run at least 1000 cycles as a "warmup" and only then start measuring. Benchmarking could be more complicated than it seems, it should be carefully constructed to not be affected by other programs that run in the memory at the same time etc. Here and here you can find some good tips.

Related

During Performance Test (Jmeter) of Spring Boot application the tested Method has some extremely short runtimes

I am currently working on my Bachelors Thesis on the advantages/disadvantages of GraalVM Native Image compared to a Jar running on the JVM.
During one of the Tests i am calling a certain Method which allocates and populates an Array of size 10^6. After that the function loops the array and performs arithmetic operations (It is a variant of the ackley function). The usual runtime of this method was between 3 and 4 seconds, but sometimes the method would complete after just 50 ms (when running as either the Native Image or as the jar file running on the JVM).
Since the array is populated by using the Math.Random() function I dont think it is due to caching and the Native Image rules out JIT compilation as the source for these outliers.
The endpoint looks like this, where dtno is the Data Transfer Object containing the "range" variable:
#PostMapping(path="/ackley")
public static #ResponseBody long calculateackley (#RequestBody dtno d) {
long start = System.nanoTime();
double res = ackley(d.range);
long end = System.nanoTime();
System.out.println("Ackley funtion took: "+res);
return end - start;
}
The ackley function looks like this:
public static long ackley(int range){
long start = System.nanoTime();
if(range!=0){
double[] a = new double[range];
int counter = 0;
for(int i=-range/2;i<range/2;i++){
a[counter++] = Math.random()*range*i;
}
double sum1 = 0.0;
double sum2 = 0.0;
for (int i = 0 ; i < a.length ; i ++) {
sum1 += a[i]*a[i];
sum2 += (Math.cos(2*Math.PI*a[i]));
}
double result = -20.0*Math.exp(-0.2*Math.sqrt(sum1 / ((double )a.length))) + 20
- Math.exp(sum2 /((double)a.length)) + Math.exp(1.0);
}
long end = System.nanoTime();
return end - start;
}
As already mentioned the range variable in the test was 10^6. What I was also suspecting is that, since the result and sum are never actually used to calculate the return value, the programm decides to skip everything between for loop and the decleration of "end".
In the graph from the JMeter test you can see that these fast execution times where all during ramp up and the very end of the testrun.
Test Results Performance Test
In the summary Report you can see the huge deviation from the average runtime.
Performance Test Summary Report
If anyone could give me a hint or a good source, where I could find a hint as to what is going on, I would be very thankfull.

Compartmentalizing loops over a large iteration

The Goal of my question is to enhance the performance of my algorithm by splitting the range of my loop iterations over a large array list.
For example: I have an Array list with a size of about 10 billion entries of long values, the goal I am trying to achieve is to start the loop from 0 to 100 million entries, output the result for the 100 million entries of whatever calculations inside the loop; then begin and 100 million to 200 million doing the previous and outputting the result, then 300-400million,400-500million and so on and so forth.
after I get all the 100 billion/100 million results, then I can sum them up outside of the loop collecting the results from the loop outputs parallel.
I have tried to use a range that might be able to achieve something similar by trying to use a dynamic range shift method but I cant seem to have the logic fully implemented like I would like to.
public static void tt4() {
long essir2 = 0;
long essir3 = 0;
List cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
long t1 = (long) cc.get((int) k);
long t2 = (long) cc.get((int) (k + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}
System.out.println("\n" + essir3);
}
I don't have any errors, I am just looking for a way to enhance performance and time. I can do a million entries in under a second directly, but when I put the size I require it runs forever. The size I'm giving are abstracts to illustrate size magnitudes, I don't want opinions like a 100 billion is not much, if I can do a million under a second, I'm talking massively huge numbers I need to iterate over doing complex tasks and calls, I just need help with the logic I'm trying to achieve if I can.
One thing I would suggest right off the bat would be to store your Breakpoint return value inside a simple array rather then using a List. This should improve your execution time significantly:
List<Long> cc = new ArrayList<>();
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
Long[] ccArray = cc.toArray(new Long[0]);
I believe what you're looking for is to split your tasks across multiple threads. You can do this with ExecutorService "which simplifies the execution of tasks in asynchronous mode".
Note that I am not overly familiar with this whole concept but have experimented with it a bit recently and give you a quick draft of how you could implement this.
I welcome those more experienced with multi-threading to either correct this post or provide additional information in the comments to help improve this answer.
Runnable Task class
public class CompartmentalizationTask implements Runnable {
private final ArrayList<Long> cc;
private final long index;
public CompartmentalizationTask(ArrayList<Long> list, long index) {
this.cc = list;
this.index = index;
}
#Override
public void run() {
Main.compartmentalize(cc, index);
}
}
Main class
private static ExecutorService exeService = Executors.newCachedThreadPool();
private static List<Future> futureTasks = new ArrayList<>();
public static void tt4() throws ExecutionException, InterruptedException
{
long essir2 = 0;
long essir3 = 0;
ArrayList<Long> cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
futureTasks.add(Main.exeService.submit(new CompartmentalizationTask(cc, k)));
}
for (int i = 0; i < futureTasks.size(); i++) {
futureTasks.get(i).get();
}
exeService.shutdown();
}
public static void compartmentalize(ArrayList<Long> cc, long index)
{
long t1 = (long) cc.get((int) index);
long t2 = (long) cc.get((int) (index + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}

Time how long a function runs (short duration)

I'm relatively new to Java programming, and I'm running into an issue calculating the amount of time it takes for a function to run.
First some background - I've got a lot of experience with Python, and I'm trying to recreate the functionality of the Jupyter Notebook/Lab %%timeit function, if you're familiar with that. Here's a pic of it in action (sorry, not enough karma to embed yet):
Snip of Jupyter %%timeit
What it does is run the contents of the cell (in this case a recursive function) either 1k, 10k, or 100k times, and give you the average run time of the function, and the standard deviation.
My first implementation (using the same recursive function) used System.nanoTime():
public static void main(String[] args) {
long t1, t2, diff;
long[] times = new long[1000];
int t;
for (int i=0; i< 1000; i++) {
t1 = System.nanoTime();
t = triangle(20);
t2 = System.nanoTime();
diff = t2-t1;
System.out.println(diff);
times[i] = diff;
}
long total = 0;
for (int j=0; j<times.length; j++) {
total += times[j];
}
System.out.println("Mean = " + total/1000.0);
}
But the mean is wildly thrown off -- for some reason, the first iteration of the function (on many runs) takes upwards of a million nanoseconds:
Pic of initial terminal output
Every iteration after the first dozen or so takes either 395 nanos or 0 -- so there could be a problem there too... not sure what's going on!
Also -- the code of the recursive function I'm timing:
static int triangle(int n) {
if (n == 1) {
return n;
} else {
return n + triangle(n -1);
}
}
Initially I had the line n = Math.abs(n) on the first line of the function, but then I removed it because... meh. I'm the only one using this.
I tried a number of different suggestions brought up in this SO post, but they each have their own problems... which I can go into if you need.
Anyway, thank you in advance for your help and expertise!

Genetic Algorithm for Process Allocation

I have the following uni assignment that's been puzzling me. I have to implement a genetic algorithm that allocates processes into processors. More specifically the problem is the following:
"You have a program that is computed in parallel processor system. The program is made up of a N number of processes that need to be allocated on a n number of processors (where n is way smaller than N). The communication of processes during this whole process can be quite time consuming, so the best practice would be to assign processes that need intercommunication with one another to same processor.
In order to reduce the communication time between processes you could allocate of these processes to the same processor but this would negate the parallel processing idea that every processor needs to contribute to the whole process.
Consider the following: Let's say that Cij is the total amount of communication between process i and process j. Assume that every process needs the same amount of computing power so that the limitations of the processing process can be handled by assigning the same amount of processes to a processor. Use a genetic algorithm to assign N processes to n processors."
The above is roughly translated the description of the problem. Now I have the following question that puzzle me.
1) What would be the best viable solution in order to for the genetic algorithm to run. I have the theory behind them and I have deduced that you need a best possible solution in order to check each generation of the produced population.
2) How can I properly design the whole problem in order to be handled by a program.
I am planning to implement this in Java but any other recommendations for other programming languages would be welcome.
The Dude abides. Or El Duderino if you're not into the whole brevity thing.
What you're asking is really a two part question, but the Genetic Algorithm part can be easily illustrated in concept. I find that giving a basic start can be helpful, but this problem as a "whole" is too complicated to address here.
Genetic Algorithms (GA) can be used as an optimizer, as you note. In order to apply a GA to a process execution plan, you need to be able to score an execution plan, then clone and mutate the best plans. A GA works by running several plans, cloning the best, and then mutating some of them slightly to see if the offspring (cloned) plans are improved or worsened.
In this example, I created a array of 100 random Integers. Each Integer is a "process" to be run and the value of the Integer is the "cost" of running that individual process.
List<Integer> processes = new ArrayList<Integer>();
The processes are then added to an ExecutionPlan, which is a List<List<Integer>>. This List of List of Integers will be used to represent 4 processors doing 25 rounds of processing:
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
The total cost of an execution plan will be computed by taking the highest process cost per round (the greatest Integer) and summing the costs of all the rounds. Thus, the goal of the optimizer is to distribute the initial 100 integers (processes) into 25 rounds of "processing" on 4 "processors" such that total cost is as low as possible.
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
// i.e., kill off the least fit and reproduce the best fit.
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
When the program is run, the cost is initially randomly determined, but with each generation it improves. If you run it for 1000 generations and plot the results, a typical run will look like this:
The purpose of the GA is to Optimize or attempt to determine the best possible solution. The reason it can be applied to you problem is that your ExecutionPlan can be scored, cloned and mutated. The path to success, therefore, is to separate the problems in your mind. First, figure out how you can make an execution plan that can be scored as to what the cost will be to actually run it on an assumed set of hardware. Add rountines to clone and mutate an ExecutionPlan. Once you have that plug it into this GA example. Good luck and stay cool dude.
public class Optimize {
private static int GENERATIONCOUNT = 1000;
private static int PROCESSCOUNT = 100;
private static int MUTATIONCOUNT = PROCESSCOUNT/10;
public static void main(String...strings) {
new Optimize().run();
}
// define an execution plan as 25 runs on 4 processors
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
public ExecutionPlan(List<List<Integer>> plan) {
this.plan = plan;
}
#Override
public int compareTo(ExecutionPlan o) {
return cost-o.cost;
}
#Override
public String toString() {
return Integer.toString(cost);
}
}
private void run() {
// make 100 processes to be completed
List<Integer> processes = new ArrayList<Integer>();
// assign them a random cost between 1 and 100;
for ( int index = 0; index < PROCESSCOUNT; ++index) {
processes.add( new Integer((int)(Math.random() * 99.0)+1));
}
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
}
private void mutateClones(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
// mutate 10 different location swaps, maybe the plan improves
for ( int mutationCount = 0; mutationCount < MUTATIONCOUNT ; ++mutationCount) {
int location1 = (int)(Math.random() * 100.0);
int location2 = (int)(Math.random() * 100.0);
// swap two locations
Integer processCostTemp = execution.plan.get(location1/4).get(location1%4);
execution.plan.get(location1/4).set(location1%4, execution.plan.get(location2/4).get(location2%4));
execution.plan.get(location2/4).set(location2%4, processCostTemp);
}
}
}
private void cloneBetterPlansOverWorsePlans(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
List<List<Integer>> clonePlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
clonePlan.add( new ArrayList<Integer>(execution.plan.get(roundNumber)) );
}
executionPlans.set( index + executionPlans.size()/2, new ExecutionPlan(clonePlan) );
}
}
private void computeCostOfPlans(List<ExecutionPlan> executionPlans) {
for ( ExecutionPlan execution: executionPlans) {
execution.cost = 0;
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
// cost of a round is greatest "communication time".
List<Integer> round = execution.plan.get(roundNumber);
int roundCost = round.get(0)>round.get(1)?round.get(0):round.get(1);
roundCost = execution.cost>round.get(2)?roundCost:round.get(2);
roundCost = execution.cost>round.get(3)?roundCost:round.get(3);
// add all the round costs' to determine total plan cost
execution.cost += roundCost;
}
}
}
private List<ExecutionPlan> createAndIntializePlans(List<Integer> processes) {
List<ExecutionPlan> executionPlans = new ArrayList<ExecutionPlan>();
for ( int planNumber = 0; planNumber < 10; ++planNumber) {
// randomize the processes for this plan
Collections.shuffle(processes);
// and make the plan
List<List<Integer>> currentPlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
List<Integer> round = new ArrayList<Integer>();
round.add(processes.get(4*roundNumber+0));
round.add(processes.get(4*roundNumber+1));
round.add(processes.get(4*roundNumber+2));
round.add(processes.get(4*roundNumber+3));
currentPlan.add(round);
}
executionPlans.add(new ExecutionPlan(currentPlan));
}
return executionPlans;
}
}

Time Complexity Theory not matching up with practical experiment for Linear Search

I'm working with the Linear Search Algorithm and from the theory of this algorithm its time complexity is O(n). Now I have to prove this using actual code and create a graph proving the algorithm is in fact O(n). But some the practical results doesn't show this at all.
Here is my coding methodology used:
I have a loop that creates an array dynamically based on the loop number.
I then fill this array randomly with numbers.
I then implement the Linear Search Algorithm. Now before the Algorithm runs I take note of the time and once value I am searching for is found I stop the clock and save the times and at the end of the loop I write the values to a text file.
I then Import the text'file into excel and create a graph. But the results doesn't corroborate the fact that the algorithm is O(n)
Here is my code in java:
public static void main(String[] args)
{
long[] ArrayTimeTaken = new long[10000];
String Display = "";
//Code that runs the linear search
for (int i = 2; i < 10000; i++)
{
int[] arrayExperimentTest = new int[i];
arrayExperimentTest = FillArray(i);
int ValuetoSearchfor = Math.round(((arrayExperimentTest.length)/2));
System.out.println(ValuetoSearchfor);
ArrayTimeTaken[i] = LinearSearch(arrayExperimentTest,ValuetoSearchfor);
Display = Display+ System.getProperty("line.separator") + ArrayTimeTaken[i];
}
PrintWriter writer;
try {
writer = new PrintWriter("C:/Users/Roberto/Desktop/testing.txt", "UTF-8");
writer.println(Display);
writer.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//ChartCreate(ArrayTimeTaken);
System.out.println("Done");
}
Here is the code for the filling of the array with random numbers and the Linear Search:
//This code simply populates our array with numbers in each of the array positions
static int[] FillArray(int number)
{
int[] ArrayofValues = new int[number];
for (int i = 0; i < number; i++)
{
Random randomGenerator = new Random();
boolean flag = true;
while (flag)
{
int index = randomGenerator.nextInt(number);
if (ArrayofValues[index] == 0)
{
ArrayofValues[index] = i;
flag = false;
}
}
}
return ArrayofValues;
}
//This function does a linear search on an array with integers
static long LinearSearch(int[] ArraySearch,int ValueFind)
{
long TimeTaken = 0;
long startTime = System.currentTimeMillis();
System.gc();
for (int i = 0; i < ArraySearch.length; i++)
{
if (ArraySearch[i] == ValueFind)
{
TimeTaken = System.currentTimeMillis() - startTime;
break;
}
}
return TimeTaken;
}
Here is a graph of the results. Shouldn't I be getting a straight line graph?
You cannot infer results from a single run. Use something like Google Caliper to do microbenchmarking properly, this will generate all sorts of useful metrics for you (standard deviation etc), and many important technical things (warm up the JVM so bytecode is likely to be optimized).
In addition to assylias's answer - doing IO may have a huge impact on your results. It is also good practice to run on OS without graphical UI and with minimal number of services running.
Read the Caliper wiki for best practice microbenchmarking and look at the source for examples.
(Edit: for version 1.0-beta-1 check this branch for examples, the API is changing and master doesn't match the documentation)
There are several issues with your testing methodology, but mostly:
the resolution of System.currentMillis() is not great, so considering your results are all blow 12ms, I would not trust them
you run a full GC during your measurement, which makes no sense: the GC itself is probably going to take a few milliseconds
So:
make your arrays significantly larger - you could run a test with sizes from 10,000 to 10,000,000 in increment of 10,000 for example
move the GC to before long startTime = System.currentTimeMillis();

Categories

Resources