Weighted-random probability in Java - java

I have a list of fitness values (percentages), which are ordered in descending order:
List<Double> fitnesses = new ArrayList<Double>();
I would like to choose one of these Doubles, with an extreme likelyhood of it being the first one, then decreasing likelyhood for each item, until the final one is close to 0% chance of it being the final item in the list.
How do I go about achieving this?
Thanks for any advice.

If you want to select "one of these Doubles, with an extreme likelihood of it being the first one, then decreasing likelihood for each item, until the final one is close to 0% chance of it being the final item in the list" then it seems like you want an exponential probability function. (p = x2).
However, you will only know whether you have chosen the right function once you have coded a solution and tried it, and if it does not suit your needs then you will need to choose some other probability function, like a sinusoidal (p = sin( x * PI/2 )) or an inverse ratio (p = 1/x).
So, the important thing is to code an algorithm for selecting an item based on a probability function, so that you can then try any probability function you like.
So, here is one way to do it.
Note the following:
I am seeding the random number generator with 10 in order to always produce the same results. Remove the seeding to get different results at each run.
I am using a list of Integer for your "percentages" in order to avoid confusion. Feel free to replace with a list of Double once you have understood how things work.
I am providing a few sample probability functions. Try them to see what distributions they yield.
Have fun!
import java.util.*;
public final class Scratch3
{
private Scratch3()
{
}
interface ProbabilityFunction<T>
{
double getProbability( double x );
}
private static double exponential2( double x )
{
assert x >= 0.0 && x <= 1.0;
return StrictMath.pow( x, 2 );
}
private static double exponential3( double x )
{
assert x >= 0.0 && x <= 1.0;
return StrictMath.pow( x, 3 );
}
private static double inverse( double x )
{
assert x >= 0.0 && x <= 1.0;
return 1/x;
}
private static double identity( double x )
{
assert x >= 0.0 && x <= 1.0;
return x;
}
#SuppressWarnings( { "UnsecureRandomNumberGeneration", "ConstantNamingConvention" } )
private static final Random randomNumberGenerator = new Random( 10 );
private static <T> T select( List<T> values, ProbabilityFunction<T> probabilityFunction )
{
double x = randomNumberGenerator.nextDouble();
double p = probabilityFunction.getProbability( x );
int i = (int)( p * values.size() );
return values.get( i );
}
public static void main( String[] args )
{
List<Integer> values = Arrays.asList( 10, 11, 12, 13, 14, 15 );
Map<Integer,Integer> counts = new HashMap<>();
for( int i = 0; i < 10000; i++ )
{
int value = select( values, Scratch3::exponential3 );
counts.merge( value, 1, ( a, b ) -> a + b );
}
for( int value : values )
System.out.println( value + ": " + counts.get( value ) );
}
}

Here's another way of doing it that gives you the ability to approximate an arbitrary weight distribution.
The array passed to WeightedIndexPicker indicates the number of "buckets" (>0) that should be allocated to each index. In your case these would be descending, but they don't have to be. When you need an index, pick a random number between 0 and the total number of buckets and return the index associated with that bucket.
I've used an int weight array as it's easier to visualize and it avoids rounding errors associated with floating point.
import java.util.Random;
public class WeightedIndexPicker
{
private int total;
private int[] counts;
private Random rand;
public WeightedIndexPicker(int[] weights)
{
rand = new Random();
counts = weights.clone();
for(int i=1; i<counts.length; i++)
{
counts[i] += counts[i-1];
}
total = counts[counts.length-1];
}
public int nextIndex()
{
int idx = 0;
int pick = rand.nextInt(total);
while(pick >= counts[idx]) idx++;
return idx;
}
public static void main(String[] args)
{
int[] dist = {1000, 100, 10, 1};
WeightedIndexPicker wip = new WeightedIndexPicker(dist);
int idx = wip.nextIndex();
System.out.println(idx);
}
}

I don't think you need all this code to answer your question since your question seems to be much more about math than code. For example, using the apache commons maths library getting a distribution is easy:
ExponentialDistribution dist = new ExponentialDistribution(1);
// getting a sample (aka index into the list) is easy
dist.sample();
// lot's of extra code to display the distribution.
int NUM_BUCKETS = 100;
int NUM_SAMPLES = 1000000;
DoubleStream.of(dist.sample(NUM_SAMPLES))
.map(s->((long)s*NUM_BUCKETS)/NUM_BUCKETS)
.boxed()
.collect(groupingBy(identity(), TreeMap::new, counting()))
.forEach((k,v)->System.out.println(k.longValue() + " -> " + v));
However, as you said, there are so many possible distributions in the math library. If you are writing code for a specific purpose then the end user will probably want you to explain why you chose a specific distribution and why you set the parameters for that distribution the way you did. That's a math question and should be asked in the mathematics forum.

Related

Java functional streams: generate a set of random points between distance A and B from each other

While working on a toy project I was faced with the problem of generating a set of N 2d points where every point was between distance A and B from every other point in the set (and also within certain absolute bounds).
I prefer working with java streams and lambdas for practice, because of their elegance and the possibility for easy parallelization, so I'm not asking how to solve this problem in an imperative manner!
The solution that first came to mind was:
seed the set (or list) with a random vector
until the set reaches size N:
create a random vector with length between A and B and add it to a random "parent" vector
if it's outside the bounds or closer than A to any vector in the set, discard it, otherwise add it to the set
repeat
This would be trivial for me with imperative programming (loops), but I was stumped when doing this the functional way because the newly generated elements in the stream depend on previously generated elements in the same stream.
Here's what I came up with - notice the icky loop at the beginning.
while (pointList.size() < size) {
// find a suitable position, not too close and not too far from another one
Vec point =
// generate a stream of random vectors
Stream.generate(vecGen::generate)
// elongate the vector and add it to the position of one randomly existing vector
.map(v -> listSelector.getRandom(pointList).add(v.mul(random.nextDouble() * (maxDistance - minDistance) + minDistance)))
// remove those that are outside the borders
.filter(v -> v.length < diameter)
// remove those that are too close to another one
.filter(v -> pointList.stream().allMatch(p -> Vec.distance(p, v) > minDistance))
// take the first one
.findAny().get();
pointList.add(point);
}
I know that this loop might never terminate, depending on the parameters - the real code has additional checks.
One working functional solution that comes to mind is to generate completely random sets of N vectors until one of the sets satisfy the condition, but the performance would be abysmal. Also, this would circumvent the problem I'm facing: is it possible to work with the already generated elements in a stream while adding new elements to the stream (Pretty sure that would violate some fundamental principle, so I guess the answer is NO)?
Is there a way to do this in a functional - and not too wasteful - way?
A simple solution is shown below. The Pair class can be found in the Apache commons lang3.
public List<Pair<Double, Double>> generate(int N, double A, double B) {
Random ySrc = new Random();
return new Random()
.doubles(N, A, B)
.boxed()
.map(x -> Pair.of(x, (ySrc.nextDouble() * (B - A)) + A))
.collect(Collectors.toList());
}
My original solution (above) missed the point that A and B represented the minimum and maximum distance between any two points. So I would instead propose a different solution (way more complicated) that relies on generating points on a unit circle. I scale (multiply) the unit vector representing the point using a random distance with minimum of -1/2 B and maximum of 1/2 B. This approach uniformly distributes points in an area bounded by a circle of radius 1/2 B. This addresses the maximum distance between points constraint. Given sufficient difference between A and B, where A < B, and N is not too large, the minimum distance constraint will probably also be satisfied.Satisfying the maximum distance constraint can be accomplished with purely functional code (i.e., no side effects).
To ensure that the minimum constraint is satisfied requires some imperative code (i.e., side effects). For this purpose, I use a predicate with side effects. The predicate accumulates points that meet the minimum constraint criteria and returns true when N points have been accumulated.
Note the running time is unknown because points are randomly generated. With N = 100, A = 1.0, and B = 30.0, the test code runs quickly. I tried values of 10 and 20 for B and didn't wait for it to end. If you want a tighter cluster of points you will probably need to speed up this code or start looking at linear solvers.
public class RandomPoints {
/**
* The stop rule is a predicate implementation with side effects. Not sure
* about the wisdom of this approach. The class does not support concurrent
* modification.
*
* #author jgmorris
*
*/
private class StopRule implements Predicate<Pair<Double, Double>> {
private final int N;
private final List<Pair<Double, Double>> points;
public StopRule(int N, List<Pair<Double, Double>> points) {
this.N = N;
this.points = points;
}
#Override
public boolean test(Pair<Double, Double> t) {
// Brute force test. A hash based test would work a lot better.
for (int i = 0; i < points.size(); ++i) {
if (distance(t, points.get(i)) < dL) {
// List size unchanged, continue
return false;
}
}
points.add(t);
return points.size() >= N;
}
}
private final double dL;
private final double dH;
private final double maxRadius;
private final Random r;
public RandomPoints(double dL, double dH) {
this.dL = dL;
this.dH = dH;
this.maxRadius = dH / 2;
r = new Random();
}
public List<Pair<Double, Double>> generate(int N) {
List<Pair<Double, Double>> points = new ArrayList<>();
StopRule pred = new StopRule(N, points);
new Random()
// Generate a uniform distribution of doubles between 0.0 and
// 1.0
.doubles()
// Transform primitive double into a Double
.boxed()
// Transform to a number between 0.0 and 2ϖ
.map(u -> u * 2 * Math.PI)
// Generate a random point
.map(theta -> randomPoint(theta))
// Add point to points if it meets minimum distance criteria.
// Stop when enough points are gathered.
.anyMatch(p -> pred.test(p));
return points;
}
private final Pair<Double, Double> randomPoint(double theta) {
double x = Math.cos(theta);
double y = Math.sin(theta);
double radius = randRadius();
return Pair.of(radius * x, radius * y);
}
private double randRadius() {
return maxRadius * (r.nextDouble() - 0.5);
}
public static void main(String[] args) {
RandomPoints rp = new RandomPoints(1.0, 30.0);
List<Pair<Double, Double>> points = rp.generate(100);
for (int i = 0; i < points.size(); ++i) {
for (int j = 1; j < points.size() - 1; ++j) {
if (i == j) {
continue;
}
double distance = distance(points.get(i), points.get(j));
if (distance < 1.0 || distance > 30.0) {
System.out.println("oops");
}
}
}
}
private static double distance(Pair<Double, Double> p1, Pair<Double, Double> p2) {
return Math.sqrt(Math.pow(p1.getLeft() - p2.getLeft(), 2.0) + Math.pow(p1.getRight() - p2.getRight(), 2.0));
}
}

Is there random nextInt with weights?

I am using random class to generate random numbers between 1 and 5 like myrandom.nextInt(6) and it is working fine however i would like to know if there is a way to give a specific number a weight to increase it is probability to appear, lets say instead of %20 probability i want number "4" to have %40 probability and other numbers 1,2,3,5 share the rest of %60 probability equally. is there a way for this?
Thanks in advance.
I use an array. E.g.
int[] arr = {4, 4, 4, 5, 5, 6};
int a = arr[random.nextInt(arr.length)];
For a more dynamic solution, try this. The weights do not have to add up to any particular value.
public static <T> T choice(Map<? extends T, Double> map) {
if (map == null || map.size() == 0)
throw new IllegalArgumentException();
double sum = 0;
for (double w : map.values()) {
if (Double.compare(w, 0) <= 0 || Double.isInfinite(w) || Double.isNaN(w))
throw new IllegalArgumentException();
sum += w;
}
double rand = sum * Math.random();
sum = 0;
T t = null;
for (Map.Entry<? extends T, Double> entry : map.entrySet()) {
t = entry.getKey();
if ((sum += entry.getValue()) >= rand)
return t;
}
return t;
}
You can easily add / remove / change entries from the map whenever you like. Here is an example of how you use this method.
Map<Integer, Double> map = new HashMap<>();
map.put(1, 40.0);
map.put(2, 50.0);
map.put(3, 10.0);
for (int i = 0; i < 10; i++)
System.out.println(choice(map));
pbadcefp's answer is probably the easiest. Since you stated in the comments that you need it to be "dynamic", here's an alternate approach. Note that the weights basically specify how often the number appears in the array to pick from
public int weightedRandom( Random random, int max, Map<Integer, Integer> weights ) {
int totalWeight = max;
for( int weight : weights.values() ) {
totalWeight += weight - 1;
}
int choice = random.nextInt( totalWeight );
int current = 0;
for( int i = 0; i < max; i++ ) {
current += weights.containsKey( i ) ? weights.get( i ) : 1;
if( choice < current ) {
return i;
}
}
throw new IllegalStateException();
}
Example usage:
Map<Integer, Integer> weights = new HashMap<>();
weights.put( 1, 0 ); // make choosing '1' impossible
weights.put( 4, 3 ); // '4' appears 3 times rather than once
int result = weightedRandom( new Random(), 5, weights );
Basically, this is equivalent to pbadcefp's solution applied on the array { 0, 2, 3, 4, 4, 4 }
You will have to adapt this if you want to use percentages. Just calculate weights accordingly. Also, I didn't test cornercases on this, so you might want to test this a little bit more extensively.
This is by no means a complete solution, but I prefer giving something to work on over complete solutions since you should do some of the work yourself.
I'll also go on record and say that IMHO this is over-engineered; but you wanted something like this.
You'll have to generate a larger range of numbers (like from 1 to 100) and use ranges to return the numbers you really want. Eg: (in pseudocode)
r = randint(1..100)
if (r >= 1 && r <= 20) // 20% chance
return 1
else if (r >= 21 && r <= 60) // 40% chance
return 2
Etc.

How to calculate the median of an array?

I'm trying to calculate the total, mean and median of an array thats populated by input received by a textfield. I've managed to work out the total and the mean, I just can't get the median to work. I think the array needs to be sorted before I can do this, but I'm not sure how to do this. Is this the problem, or is there another one that I didn't find? Here is my code:
import java.applet.Applet;
import java.awt.Graphics;
import java.awt.*;
import java.awt.event.*;
public class whileloopq extends Applet implements ActionListener
{
Label label;
TextField input;
int num;
int index;
int[] numArray = new int[20];
int sum;
int total;
double avg;
int median;
public void init ()
{
label = new Label("Enter numbers");
input = new TextField(5);
add(label);
add(input);
input.addActionListener(this);
index = 0;
}
public void actionPerformed (ActionEvent ev)
{
int num = Integer.parseInt(input.getText());
numArray[index] = num;
index++;
if (index == 20)
input.setEnabled(false);
input.setText("");
sum = 0;
for (int i = 0; i < numArray.length; i++)
{
sum += numArray[i];
}
total = sum;
avg = total / index;
median = numArray[numArray.length/2];
repaint();
}
public void paint (Graphics graf)
{
graf.drawString("Total = " + Integer.toString(total), 25, 85);
graf.drawString("Average = " + Double.toString(avg), 25, 100);
graf.drawString("Median = " + Integer.toString(median), 25, 115);
}
}
The Arrays class in Java has a static sort function, which you can invoke with Arrays.sort(numArray).
Arrays.sort(numArray);
double median;
if (numArray.length % 2 == 0)
median = ((double)numArray[numArray.length/2] + (double)numArray[numArray.length/2 - 1])/2;
else
median = (double) numArray[numArray.length/2];
Sorting the array is unnecessary and inefficient. There's a variation of the QuickSort (QuickSelect) algorithm which has an average run time of O(n); if you sort first, you're down to O(n log n). It actually finds the nth smallest item in a list; for a median, you just use n = half the list length. Let's call it quickNth (list, n).
The concept is that to find the nth smallest, choose a 'pivot' value. (Exactly how you choose it isn't critical; if you know the data will be thoroughly random, you can take the first item on the list.)
Split the original list into three smaller lists:
One with values smaller than the pivot.
One with values equal to the pivot.
And one with values greater than the pivot.
You then have three cases:
The "smaller" list has >= n items. In that case, you know that the nth smallest is in that list. Return quickNth(smaller, n).
The smaller list has < n items, but the sum of the lengths of the smaller and equal lists have >= n items. In this case, the nth is equal to any item in the "equal" list; you're done.
n is greater than the sum of the lengths of the smaller and equal lists. In that case, you can essentially skip over those two, and adjust n accordingly. Return quickNth(greater, n - length(smaller) - length(equal)).
Done.
If you're not sure that the data is thoroughly random, you need to be more sophisticated about choosing the pivot. Taking the median of the first value in the list, the last value in the list, and the one midway between the two works pretty well.
If you're very unlucky with your choice of pivots, and you always choose the smallest or highest value as your pivot, this takes O(n^2) time; that's bad. But, it's also very unlikely if you choose your pivot with a decent algorithm.
Sample code:
import java.util.*;
public class Utility {
/****************
* #param coll an ArrayList of Comparable objects
* #return the median of coll
*****************/
public static <T extends Number> double median(ArrayList<T> coll, Comparator<T> comp) {
double result;
int n = coll.size()/2;
if (coll.size() % 2 == 0) // even number of items; find the middle two and average them
result = (nth(coll, n-1, comp).doubleValue() + nth(coll, n, comp).doubleValue()) / 2.0;
else // odd number of items; return the one in the middle
result = nth(coll, n, comp).doubleValue();
return result;
} // median(coll)
/*****************
* #param coll a collection of Comparable objects
* #param n the position of the desired object, using the ordering defined on the list elements
* #return the nth smallest object
*******************/
public static <T> T nth(ArrayList<T> coll, int n, Comparator<T> comp) {
T result, pivot;
ArrayList<T> underPivot = new ArrayList<>(), overPivot = new ArrayList<>(), equalPivot = new ArrayList<>();
// choosing a pivot is a whole topic in itself.
// this implementation uses the simple strategy of grabbing something from the middle of the ArrayList.
pivot = coll.get(n/2);
// split coll into 3 lists based on comparison with the pivot
for (T obj : coll) {
int order = comp.compare(obj, pivot);
if (order < 0) // obj < pivot
underPivot.add(obj);
else if (order > 0) // obj > pivot
overPivot.add(obj);
else // obj = pivot
equalPivot.add(obj);
} // for each obj in coll
// recurse on the appropriate list
if (n < underPivot.size())
result = nth(underPivot, n, comp);
else if (n < underPivot.size() + equalPivot.size()) // equal to pivot; just return it
result = pivot;
else // everything in underPivot and equalPivot is too small. Adjust n accordingly in the recursion.
result = nth(overPivot, n - underPivot.size() - equalPivot.size(), comp);
return result;
} // nth(coll, n)
public static void main (String[] args) {
Comparator<Integer> comp = Comparator.naturalOrder();
Random rnd = new Random();
for (int size = 1; size <= 10; size++) {
ArrayList<Integer> coll = new ArrayList<>(size);
for (int i = 0; i < size; i++)
coll.add(rnd.nextInt(100));
System.out.println("Median of " + coll.toString() + " is " + median(coll, comp));
} // for a range of possible input sizes
} // main(args)
} // Utility
If you want to use any external library here is Apache commons math library using you can calculate the Median.
For more methods and use take look at the API documentation
import org.apache.commons.math3.*;
.....
......
........
//calculate median
public double getMedian(double[] values){
Median median = new Median();
double medianValue = median.evaluate(values);
return medianValue;
}
.......
For more on evaluate method AbstractUnivariateStatistic#evaluate
Update
Calculate in program
Generally, median is calculated using the following two formulas given here
If n is odd then Median (M) = value of ((n + 1)/2)th item term.
If n is even then Median (M) = value of [((n)/2)th item term + ((n)/2 + 1)th item term ]/2
In your program you have numArray, first you need to sort array using Arrays#sort
Arrays.sort(numArray);
int middle = numArray.length/2;
int medianValue = 0; //declare variable
if (numArray.length%2 == 1)
medianValue = numArray[middle];
else
medianValue = (numArray[middle-1] + numArray[middle]) / 2;
Arrays.sort(numArray);
return (numArray[size/2] + numArray[(size-1)/2]) / 2;
Arrays.sort(numArray);
int middle = ((numArray.length) / 2);
if(numArray.length % 2 == 0){
int medianA = numArray[middle];
int medianB = numArray[middle-1];
median = (medianA + medianB) / 2;
} else{
median = numArray[middle + 1];
}
EDIT: I initially had medianB setting to middle+1 in the even length arrays, this was wrong due to arrays starting count at 0. I have updated it to use middle-1 which is correct and should work properly for an array with an even length.
You can find good explanation at https://www.youtube.com/watch?time_continue=23&v=VmogG01IjYc
The idea it to use 2 Heaps viz one max heap and mean heap.
class Heap {
private Queue<Integer> low = new PriorityQueue<>(Comparator.reverseOrder());
private Queue<Integer> high = new PriorityQueue<>();
public void add(int number) {
Queue<Integer> target = low.size() <= high.size() ? low : high;
target.add(number);
balance();
}
private void balance() {
while(!low.isEmpty() && !high.isEmpty() && low.peek() > high.peek()) {
Integer lowHead= low.poll();
Integer highHead = high.poll();
low.add(highHead);
high.add(lowHead);
}
}
public double median() {
if(low.isEmpty() && high.isEmpty()) {
throw new IllegalStateException("Heap is empty");
} else {
return low.size() == high.size() ? (low.peek() + high.peek()) / 2.0 : low.peek();
}
}
}
Try sorting the array first. Then after it's sorted, if the array has an even amount of elements the mean of the middle two is the median, if it has a odd number, the middle element is the median.
Use Arrays.sort and then take the middle element (in case the number n of elements in the array is odd) or take the average of the two middle elements (in case n is even).
public static long median(long[] l)
{
Arrays.sort(l);
int middle = l.length / 2;
if (l.length % 2 == 0)
{
long left = l[middle - 1];
long right = l[middle];
return (left + right) / 2;
}
else
{
return l[middle];
}
}
Here are some examples:
#Test
public void evenTest()
{
long[] l = {
5, 6, 1, 3, 2
};
Assert.assertEquals((3 + 4) / 2, median(l));
}
#Test
public oddTest()
{
long[] l = {
5, 1, 3, 2, 4
};
Assert.assertEquals(3, median(l));
}
And in case your input is a Collection, you might use Google Guava to do something like this:
public static long median(Collection<Long> numbers)
{
return median(Longs.toArray(numbers)); // requires import com.google.common.primitives.Longs;
}
I was looking at the same statistics problems. The approach you are thinking it is good and it will work. (Answer to the sorting has been given)
But in case you are interested in algorithm performance, I think there are a couple of algorithms that have better performance than just sorting the array, one (QuickSelect) is indicated by #bruce-feist's answer and is very well explained.
[Java implementation: https://discuss.leetcode.com/topic/14611/java-quick-select ]
But there is a variation of this algorithm named median of medians, you can find a good explanation on this link:
http://austinrochford.com/posts/2013-10-28-median-of-medians.html
Java implementation of this:
- https://stackoverflow.com/a/27719796/957979
I faced a similar problem yesterday.
I wrote a method with Java generics in order to calculate the median value of every collection of Numbers; you can apply my method to collections of Doubles, Integers, Floats and returns a double. Please consider that my method creates another collection in order to not alter the original one.
I provide also a test, have fun. ;-)
public static <T extends Number & Comparable<T>> double median(Collection<T> numbers){
if(numbers.isEmpty()){
throw new IllegalArgumentException("Cannot compute median on empty collection of numbers");
}
List<T> numbersList = new ArrayList<>(numbers);
Collections.sort(numbersList);
int middle = numbersList.size()/2;
if(numbersList.size() % 2 == 0){
return 0.5 * (numbersList.get(middle).doubleValue() + numbersList.get(middle-1).doubleValue());
} else {
return numbersList.get(middle).doubleValue();
}
}
JUnit test code snippet:
/**
* Test of median method, of class Utils.
*/
#Test
public void testMedian() {
System.out.println("median");
Double expResult = 3.0;
Double result = Utils.median(Arrays.asList(3.0,2.0,1.0,9.0,13.0));
assertEquals(expResult, result);
expResult = 3.5;
result = Utils.median(Arrays.asList(3.0,2.0,1.0,9.0,4.0,13.0));
assertEquals(expResult, result);
}
Usage example (consider the class name is Utils):
List<Integer> intValues = ... //omitted init
Set<Float> floatValues = ... //omitted init
.....
double intListMedian = Utils.median(intValues);
double floatSetMedian = Utils.median(floatValues);
Note: my method works on collections, you can convert arrays of numbers to list of numbers as pointed here
And nobody paying attention when list contains only one element (list.size == 1). All your answers will crash with index out of bound exception, because integer division returns zero (1 / 2 = 0). Correct answer (in Kotlin):
MEDIAN("MEDIAN") {
override fun calculate(values: List<BigDecimal>): BigDecimal? {
if (values.size == 1) {
return values.first()
}
if (values.size > 1) {
val valuesSorted = values.sorted()
val mid = valuesSorted.size / 2
return if (valuesSorted.size % 2 != 0) {
valuesSorted[mid]
} else {
AVERAGE.calculate(listOf(valuesSorted[mid - 1], valuesSorted[mid]))
}
}
return null
}
},
As #Bruce-Feist mentions, for a large number of elements, I'd avoid any solution involving sort if performance is something you are concerned about. A different approach than those suggested in the other answers is Hoare's algorithm to find the k-th smallest of element of n items. This algorithm runs in O(n).
public int findKthSmallest(int[] array, int k)
{
if (array.length < 10)
{
Arrays.sort(array);
return array[k];
}
int start = 0;
int end = array.length - 1;
int x, temp;
int i, j;
while (start < end)
{
x = array[k];
i = start;
j = end;
do
{
while (array[i] < x)
i++;
while (x < array[j])
j--;
if (i <= j)
{
temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
} while (i <= j);
if (j < k)
start = i;
if (k < i)
end = j;
}
return array[k];
}
And to find the median:
public int median(int[] array)
{
int length = array.length;
if ((length & 1) == 0) // even
return (findKthSmallest(array, array.length / 2) + findKthSmallest(array, array.length / 2 + 1)) / 2;
else // odd
return findKthSmallest(array, array.length / 2);
}
public static int median(int[] arr) {
int median = 0;
java.util.Arrays.sort(arr);
for (int i=0;i<arr.length;i++) {
if (arr.length % 2 == 1) {
median = Math.round(arr[arr.length/2]);
} else {
median = (arr[(arr.length/2)] + arr[(arr.length/2)-1])/2;
}
}
return median;
}
Check out the Arrays.sort methods:
http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html
You should also really abstract finding the median into its own method, and just return the value to the calling method. This will make testing your code much easier.
public int[] data={31, 29, 47, 48, 23, 30, 21
, 40, 23, 39, 47, 47, 42, 44, 23, 26, 44, 32, 20, 40};
public double median()
{
Arrays.sort(this.data);
double result=0;
int size=this.data.length;
if(size%2==1)
{
result=data[((size-1)/2)+1];
System.out.println(" uneven size : "+result);
}
else
{
int middle_pair_first_index =(size-1)/2;
result=(data[middle_pair_first_index+1]+data[middle_pair_first_index])/2;
System.out.println(" Even size : "+result);
}
return result;
}
package arrays;
public class Arraymidleelement {
static public double middleArrayElement(int [] arr)
{
double mid;
if(arr.length%2==0)
{
mid=((double)arr[arr.length/2]+(double)arr[arr.length/2-1])/2;
return mid;
}
return arr[arr.length/2];
}
public static void main(String[] args) {
int arr[]= {1,2,3,4,5,6};
System.out.println( middleArrayElement(arr));
}
}

Finding the mode of a randomly generated array of 10 numbers

I'm attempting to make this program
public class Statistics {
public static void main(String[] args) {
final int SIZE = 10;
int sum =0;
int[] numArray= new int [SIZE];
for (int c=0; c < SIZE; c++)
{
numArray[c]=(int)(Math.random()*6+1);
System.out.print( numArray[c]+ " ");
sum+=numArray[c];
}
System.out.println("\nSum of all numbers is " + sum);
System.out.println("\n Mean of numbers is " + (sum) / 5);
}
}
Calculate the mode of the randomly generated array.
I've seen source codes posted where they use a seperate method called computemode, but I don't kno where to place this second method within my code. I'm sorry, I am very very green when it comes to programming. I'm being taught Java as my first language and so far its overwhelming.
If someone could post the syntax with detailed instruction/explanation I'd be so grateful.
The mode is quite easy to compute. One way, assuming your inputs are bounded, is to simply have an array that tracks the number of occurrences of each number:
int[] data; //your data bounded by 0 and MAX_VALUE
int[] occurrences = new int[MAX_VALUE+1];
for ( int datum : data ) {
occurrences[newNumber]++;
}
Then figure out the index(es) in occurrences that has the highest value.
int maxOccurrences = Integer.MIN_VALUE;
int mode = -1;
for ( int i = 0; i < occurrences.length; i++ ) {
if ( occurrences[i] > maxOccurrences ) {
maxOccurrences = occurrences[i];
mode = i;
}
}
You would have to adjust this to handle multiple modes.

how to Compute the average probe length for success and failure - Linear probe (Hash Tables) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I'm doing an assignment for my Data Structures class. we were asked to to study linear probing with load factors of .1, .2 , .3, ...., and .9. The formula for testing is:
The average probe length using linear probing is roughly
Success--> ( 1 + 1/(1-L)**2)/2
or
Failure--> (1+1(1-L))/2.
we are required to find the theoretical using the formula above which I did(just plug the load factor in the formula), then we have to calculate the empirical (which I not quite sure how to do). here is the rest of the requirements
**For each load factor, 10,000 randomly generated positive ints
between 1 and 50000 (inclusive) will
be inserted into a table of the
"right" size, where "right" is
strictly based upon the load factor
you are testing. Repeats are allowed.
Be sure that your formula for randomly
generated ints is correct. There is a
class called Random in java.util. USE
it! After a table of the right (based
upon L) size is loaded with 10,000
ints, do 100 searches of newly
generated random ints from the range
of 1 to 50000. Compute the average
probe length for each of the two
formulas and indicate the denominators
used in each calculationSo, for example, each test for a .5 load would have a table of > > size
approximately 20,000 (adjusted to be
prime) and similarly each test for a
.9 load would have a table of
approximate size 10,000/.9 (again
adjusted to be prime).
The program should run displaying the
various load factors tested, the
average probe for each search (the two
denominators used to compute the
averages will add to 100), and the
theoretical answers using the formula
above. .**
how do I calculate the empirical success?
here is my code so far:
import java.util.Random;
/**
*
* #author Johnny
*/
class DataItem
{
private int iData;
public DataItem(int it)
{iData = it;}
public int getKey()
{
return iData;
}
}
class HashTable
{
private DataItem[] hashArray;
private int arraySize;
public HashTable(int size)
{
arraySize = size;
hashArray = new DataItem[arraySize];
}
public void displayTable()
{
int sp=0;
System.out.print("Table: ");
for(int j=0; j<arraySize; j++)
{
if(sp>50){System.out.println("");sp=0;}
if(hashArray[j] != null){
System.out.print(hashArray[j].getKey() + " ");sp++;}
else
{System.out.print("** "); sp++;}
}
System.out.println("");
}
public int hashFunc(int key)
{
return key %arraySize;
}
public void insert(DataItem item)
{
int key = item.getKey();
int hashVal = hashFunc(key);
while(hashArray[hashVal] != null &&
hashArray[hashVal].getKey() != -1)
{
++hashVal;
hashVal %= arraySize;
}
hashArray[hashVal]=item;
}
public int hashFunc1(int key)
{
return key % arraySize;
}
public int hashFunc2(int key)
{
// non-zero, less than array size, different from hF1
// array size must be relatively prime to 5, 4, 3, and 2
return 5 - key % 5;
}
public DataItem find(int key) // find item with key
// (assumes table not full)
{
int hashVal = hashFunc1(key); // hash the key
int stepSize = hashFunc2(key); // get step size
while(hashArray[hashVal] != null) // until empty cell,
{ // is correct hashVal?
if(hashArray[hashVal].getKey() == key)
return hashArray[hashVal]; // yes, return item
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
return null; // can’t find item
}
}
public class n00645805 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
double b=1;
double L;
double[] tf = new double[9];
double[] ts = new double[9];
double d=0.1;
DataItem aDataItem;
int aKey;
HashTable h1Table = new HashTable(100003); //L=.1
HashTable h2Table = new HashTable(50051); //L=.2
HashTable h3Table = new HashTable(33343); //L=.3
HashTable h4Table = new HashTable(25013); //L=.4
HashTable h5Table = new HashTable(20011); //L=.5
HashTable h6Table = new HashTable(16673); //L=.6
HashTable h7Table = new HashTable(14243); //L=.7
HashTable h8Table = new HashTable(12503); //L=.8
HashTable h9Table = new HashTable(11113); //L=.9
fillht(h1Table);
fillht(h2Table);
fillht(h3Table);
fillht(h4Table);
fillht(h5Table);
fillht(h6Table);
fillht(h7Table);
fillht(h8Table);
fillht(h9Table);
pm(h1Table);
pm(h2Table);
pm(h3Table);
pm(h4Table);
pm(h5Table);
pm(h6Table);
pm(h7Table);
pm(h8Table);
pm(h9Table);
for (int j=1;j<10;j++)
{
//System.out.println(j);
L=Math.round((b-d)*100.0)/100.0;
System.out.println(L);
System.out.println("ts "+(1+(1/(1-L)))/2);
System.out.println("tf "+(1+(1/((1-L)*(1-L))))/2);
tf[j-1]=(1+(1/(1-L)))/2;
ts[j-1]=(1+(1/((1-L)*(1-L))))/2;
d=d+.1;
}
display(ts,tf);
}
public static void fillht(HashTable a)
{
Random r = new Random();
for(int j=0; j<10000; j++)
{
int aKey;
DataItem y;
aKey =1+Math.round(r.nextInt(50000));
y = new DataItem(aKey);
a.insert(y);
}
}
public static void pm(HashTable a)
{
DataItem X;
int numsuc=0;
int numfail=0;
int aKey;
Random r = new Random();
for(int j=0; j<100;j++)
{
aKey =1+Math.round(r.nextInt(50000));
X = a.find(aKey);
if(X != null)
{
//System.out.println("Found " + aKey);
numsuc++;
}
else
{
//System.out.println("Could not find " + aKey);
numfail++;
}
}
System.out.println("# of succ is "+ numsuc+" # of failures is "+ numfail);
}
public static void display(double[] s, double[] f)
{
}
}
You should take into account that Java's HashTable uses a closed addressing (no probing) implementation, so you have separate buckets in which many items can be placed. This is not what you are looking for in your benchmarks. I'm not sure about HashMap implementation but I think it uses open addressing too.
So forget about JDK classes.. since you want to calculate empirical values you should write your own version of an hashtable that uses the open addressing implementation with linear probing but you should take care of counting the probe length whenever you try to get a value from the hashmap..
For example you can write your hashmap and then take care of having
class YourHashMap
{
int empiricalGet(K key)
{
// search for the key but store the probe length of this get operation
return probeLength;
}
}
Then you can easily benchmark it by searching how many keys you want and calculating the average probe length.
Otherwise you can just provide the hasmap the ability of storing the total probe length and the count of gets requested and retrieve them after the benchmark run to calculate average value.
This kind of exercises must prove that the empirical value concordates with the theoretical one. So take also into account the fact that you may need many benchmarks, and then do the average of them all, assuring that variance is not too high.

Categories

Resources