I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values."
Example:
100
105
102
13
104
22
101
How would I be able to write the code to detect that (in this case) 13 and 22 don't fall within the "good values" of around 100?
There are several criteria for detecting outliers. The simplest ones, like Chauvenet's criterion, use the mean and standard deviation calculated from the sample to determine a "normal" range for values. Any value outside of this range is deemed an outlier.
Other criterions are Grubb's test and Dixon's Q test and may give better results than Chauvenet's for example if the sample comes from a skew distribution.
package test;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Main {
public static void main(String[] args) {
List<Double> data = new ArrayList<Double>();
data.add((double) 20);
data.add((double) 65);
data.add((double) 72);
data.add((double) 75);
data.add((double) 77);
data.add((double) 78);
data.add((double) 80);
data.add((double) 81);
data.add((double) 82);
data.add((double) 83);
Collections.sort(data);
System.out.println(getOutliers(data));
}
public static List<Double> getOutliers(List<Double> input) {
List<Double> output = new ArrayList<Double>();
List<Double> data1 = new ArrayList<Double>();
List<Double> data2 = new ArrayList<Double>();
if (input.size() % 2 == 0) {
data1 = input.subList(0, input.size() / 2);
data2 = input.subList(input.size() / 2, input.size());
} else {
data1 = input.subList(0, input.size() / 2);
data2 = input.subList(input.size() / 2 + 1, input.size());
}
double q1 = getMedian(data1);
double q3 = getMedian(data2);
double iqr = q3 - q1;
double lowerFence = q1 - 1.5 * iqr;
double upperFence = q3 + 1.5 * iqr;
for (int i = 0; i < input.size(); i++) {
if (input.get(i) < lowerFence || input.get(i) > upperFence)
output.add(input.get(i));
}
return output;
}
private static double getMedian(List<Double> data) {
if (data.size() % 2 == 0)
return (data.get(data.size() / 2) + data.get(data.size() / 2 - 1)) / 2;
else
return data.get(data.size() / 2);
}
}
Output:
[20.0]
Explanation:
Sort a list of integers, from low to high
Split a list of integers into 2 parts (by a middle) and put them into 2 new separate ArrayLists (call them "left" and "right")
Find a middle number (median) in both of those new ArrayLists
Q1 is a median from left side, and Q3 is the median from the right side
Applying mathematical formula:
IQR = Q3 - Q1
LowerFence = Q1 - 1.5*IQR
UpperFence = Q3 + 1.5*IQR
More info about this formula: http://www.mathwords.com/o/outlier.htm
Loop through all of my original elements, and if any of them are lower than a lower fence, or higher than an upper fence, add them to
"output" ArrayList
This new "output" ArrayList contains the outliers
An implementation of the Grubb's test can be found at MathUtil.java. It will find a single outlier, of which you can remove from your list and repeat until you've removed all outliers.
Depends on commons-math, so if you're using Gradle:
dependencies {
compile 'org.apache.commons:commons-math:2.2'
}
find the mean value for your list
create a Map that maps the number to the distance from mean
sort values by the distance from mean
and differentiate last n number, making sure there is no injustice with distance
Use this algorithm. This algorithm uses the average and standard deviation. These 2 number optional values (2 * standardDeviation).
public static List<int> StatisticalOutLierAnalysis(List<int> allNumbers)
{
if (allNumbers.Count == 0)
return null;
List<int> normalNumbers = new List<int>();
List<int> outLierNumbers = new List<int>();
double avg = allNumbers.Average();
double standardDeviation = Math.Sqrt(allNumbers.Average(v => Math.Pow(v - avg, 2)));
foreach (int number in allNumbers)
{
if ((Math.Abs(number - avg)) > (2 * standardDeviation))
outLierNumbers.Add(number);
else
normalNumbers.Add(number);
}
return normalNumbers;
}
As Joni already pointed out , you can eliminate outliers with the help of Standard Deviation and Mean. Here is my code, that you can use for your purposes.
public static void main(String[] args) {
List<Integer> values = new ArrayList<>();
values.add(100);
values.add(105);
values.add(102);
values.add(13);
values.add(104);
values.add(22);
values.add(101);
System.out.println("Before: " + values);
System.out.println("After: " + eliminateOutliers(values,1.5f));
}
protected static double getMean(List<Integer> values) {
int sum = 0;
for (int value : values) {
sum += value;
}
return (sum / values.size());
}
public static double getVariance(List<Integer> values) {
double mean = getMean(values);
int temp = 0;
for (int a : values) {
temp += (a - mean) * (a - mean);
}
return temp / (values.size() - 1);
}
public static double getStdDev(List<Integer> values) {
return Math.sqrt(getVariance(values));
}
public static List<Integer> eliminateOutliers(List<Integer> values, float scaleOfElimination) {
double mean = getMean(values);
double stdDev = getStdDev(values);
final List<Integer> newList = new ArrayList<>();
for (int value : values) {
boolean isLessThanLowerBound = value < mean - stdDev * scaleOfElimination;
boolean isGreaterThanUpperBound = value > mean + stdDev * scaleOfElimination;
boolean isOutOfBounds = isLessThanLowerBound || isGreaterThanUpperBound;
if (!isOutOfBounds) {
newList.add(value);
}
}
int countOfOutliers = values.size() - newList.size();
if (countOfOutliers == 0) {
return values;
}
return eliminateOutliers(newList,scaleOfElimination);
}
eliminateOutliers() method is doing all the work
It is a recursive method, which modifies the list with every recursive call
scaleOfElimination variable, which you pass to the method, defines at what scale
you want to remove outliers: Normally i go with 1.5f-2f, the greater the variable is,
the less outliers will be removed
The output of the code:
Before: [100, 105, 102, 13, 104, 22, 101]
After: [100, 105, 102, 104, 101]
I'm very glad and thanks to Valiyev. His solution helped me a lot. And I want to shere my little SRP on his works.
Please note that I use List.of() to store Dixon's critical values, for this reason it is required to use Java higher than 8.
public class DixonTest {
protected List<Double> criticalValues =
List.of(0.941, 0.765, 0.642, 0.56, 0.507, 0.468, 0.437);
private double scaleOfElimination;
private double mean;
private double stdDev;
private double getMean(final List<Double> input) {
double sum = input.stream()
.mapToDouble(value -> value)
.sum();
return (sum / input.size());
}
private double getVariance(List<Double> input) {
double mean = getMean(input);
double temp = input.stream()
.mapToDouble(a -> a)
.map(a -> (a - mean) * (a - mean))
.sum();
return temp / (input.size() - 1);
}
private double getStdDev(List<Double> input) {
return Math.sqrt(getVariance(input));
}
protected List<Double> eliminateOutliers(List<Double> input) {
int N = input.size() - 3;
scaleOfElimination = criticalValues.get(N).floatValue();
mean = getMean(input);
stdDev = getStdDev(input);
return input.stream()
.filter(this::isOutOfBounds)
.collect(Collectors.toList());
}
private boolean isOutOfBounds(Double value) {
return !(isLessThanLowerBound(value)
|| isGreaterThanUpperBound(value));
}
private boolean isGreaterThanUpperBound(Double value) {
return value > mean + stdDev * scaleOfElimination;
}
private boolean isLessThanLowerBound(Double value) {
return value < mean - stdDev * scaleOfElimination;
}
}
I hope it will help someone else.
Best regard
Thanks to #Emil_Wozniak for posting the complete code. I struggled with it for a while not realizing that eliminateOutliers() actually returns the outliers, not the list with them eliminated. The isOutOfBounds() method also was confusing because it actually returns TRUE when the value is IN bounds. Below is my update with some (IMHO) improvements:
The eliminateOutliers() method returns the input list with outliers removed
Added getOutliers() method to get just the list of outliers
Removed confusing isOutOfBounds() method in favor of a simple filtering expression
Expanded N list to support up to 30 input values
Protect against out of bounds errors when input list is too big or too small
Made stats methods (mean, stddev, variance) static utility methods
Calculate upper/lower bounds only once instead of on every comparison
Supply input list on ctor and store as an instance variable
Refactor to avoid using the same variable name as instance and local variables
Code:
/**
* Implements an outlier removal algorithm based on https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/dixon.htm#:~:text=It%20can%20be%20used%20to,but%20one%20or%20two%20observations).
* Original Java code by Emil Wozniak at https://stackoverflow.com/questions/18805178/how-to-detect-outliers-in-an-arraylist
*
* Reorganized, made more robust, and clarified many of the methods.
*/
import java.util.List;
import java.util.stream.Collectors;
public class DixonTest {
protected List<Double> criticalValues =
List.of( // Taken from https://sebastianraschka.com/Articles/2014_dixon_test.html#2-calculate-q
// Alfa level of 0.1 (90% confidence)
0.941, // N=3
0.765, // N=4
0.642, // ...
0.56,
0.507,
0.468,
0.437,
0.412,
0.392,
0.376,
0.361,
0.349,
0.338,
0.329,
0.32,
0.313,
0.306,
0.3,
0.295,
0.29,
0.285,
0.281,
0.277,
0.273,
0.269,
0.266,
0.263,
0.26 // N=30
);
// Stats calculated on original input data (including outliers)
private double scaleOfElimination;
private double mean;
private double stdDev;
private double UB;
private double LB;
private List<Double> input;
/**
* Ctor taking a list of values to be analyzed.
* #param input
*/
public DixonTest(List<Double> input) {
this.input = input;
// Create statistics on the original input data
calcStats();
}
/**
* Utility method returns the mean of a list of values.
* #param valueList
* #return
*/
public static double getMean(final List<Double> valueList) {
double sum = valueList.stream()
.mapToDouble(value -> value)
.sum();
return (sum / valueList.size());
}
/**
* Utility method returns the variance of a list of values.
* #param valueList
* #return
*/
public static double getVariance(List<Double> valueList) {
double listMean = getMean(valueList);
double temp = valueList.stream()
.mapToDouble(a -> a)
.map(a -> (a - listMean) * (a - listMean))
.sum();
return temp / (valueList.size() - 1);
}
/**
* Utility method returns the std deviation of a list of values.
* #param input
* #return
*/
public static double getStdDev(List<Double> valueList) {
return Math.sqrt(getVariance(valueList));
}
/**
* Calculate statistics and bounds from the input values and store
* them in class variables.
* #param input
*/
private void calcStats() {
int N = Math.min(Math.max(0, input.size() - 3), criticalValues.size()-1); // Changed to protect against too-small or too-large lists
scaleOfElimination = criticalValues.get(N).floatValue();
mean = getMean(input);
stdDev = getStdDev(input);
UB = mean + stdDev * scaleOfElimination;
LB = mean - stdDev * scaleOfElimination;
}
/**
* Returns the input values with outliers removed.
* #param input
* #return
*/
public List<Double> eliminateOutliers() {
return input.stream()
.filter(value -> value>=LB && value <=UB)
.collect(Collectors.toList());
}
/**
* Returns the outliers found in the input list.
* #param input
* #return
*/
public List<Double> getOutliers() {
return input.stream()
.filter(value -> value<LB || value>UB)
.collect(Collectors.toList());
}
/**
* Test and sample usage
* #param args
*/
public static void main(String[] args) {
List<Double> testValues = List.of(1200.0,1205.0,1220.0,1194.0,1212.0);
DixonTest outlierDetector = new DixonTest(testValues);
List<Double> goodValues = outlierDetector.eliminateOutliers();
List<Double> badValues = outlierDetector.getOutliers();
System.out.println(goodValues.size()+ " good values:");
for (double v: goodValues) {
System.out.println(v);
}
System.out.println(badValues.size()+" outliers detected:");
for (double v: badValues) {
System.out.println(v);
}
// Get stats on remaining (good) values
System.out.println("\nMean of good values is "+DixonTest.getMean(goodValues));
}
}
It is just a very simple implementation which fetches the information which numbers are not in the range:
List<Integer> notInRangeNumbers = new ArrayList<Integer>();
for (Integer number : numbers) {
if (!isInRange(number)) {
// call with a predefined factor value, here example value = 5
notInRangeNumbers.add(number, 5);
}
}
Additionally inside the isInRange method you have to define what do you mean by 'good values'. Below you will find an examplary implementation.
private boolean isInRange(Integer number, int aroundFactor) {
//TODO the implementation of the 'in range condition'
// here the example implementation
return number <= 100 + aroundFactor && number >= 100 - aroundFactor;
}
Related
I want to choose a random item from a set, but the chance of choosing any item should be proportional to the associated weight
Example inputs:
item weight
---- ------
sword of misery 10
shield of happy 5
potion of dying 6
triple-edged sword 1
So, if I have 4 possible items, the chance of getting any one item without weights would be 1 in 4.
In this case, a user should be 10 times more likely to get the sword of misery than the triple-edged sword.
How do I make a weighted random selection in Java?
I would use a NavigableMap
public class RandomCollection<E> {
private final NavigableMap<Double, E> map = new TreeMap<Double, E>();
private final Random random;
private double total = 0;
public RandomCollection() {
this(new Random());
}
public RandomCollection(Random random) {
this.random = random;
}
public RandomCollection<E> add(double weight, E result) {
if (weight <= 0) return this;
total += weight;
map.put(total, result);
return this;
}
public E next() {
double value = random.nextDouble() * total;
return map.higherEntry(value).getValue();
}
}
Say I have a list of animals dog, cat, horse with probabilities as 40%, 35%, 25% respectively
RandomCollection<String> rc = new RandomCollection<>()
.add(40, "dog").add(35, "cat").add(25, "horse");
for (int i = 0; i < 10; i++) {
System.out.println(rc.next());
}
There is now a class for this in Apache Commons: EnumeratedDistribution
Item selectedItem = new EnumeratedDistribution<>(itemWeights).sample();
where itemWeights is a List<Pair<Item, Double>>, like (assuming Item interface in Arne's answer):
final List<Pair<Item, Double>> itemWeights = Collections.newArrayList();
for (Item i: itemSet) {
itemWeights.add(new Pair(i, i.getWeight()));
}
or in Java 8:
itemSet.stream().map(i -> new Pair(i, i.getWeight())).collect(toList());
Note: Pair here needs to be org.apache.commons.math3.util.Pair, not org.apache.commons.lang3.tuple.Pair.
You will not find a framework for this kind of problem, as the requested functionality is nothing more then a simple function. Do something like this:
interface Item {
double getWeight();
}
class RandomItemChooser {
public Item chooseOnWeight(List<Item> items) {
double completeWeight = 0.0;
for (Item item : items)
completeWeight += item.getWeight();
double r = Math.random() * completeWeight;
double countWeight = 0.0;
for (Item item : items) {
countWeight += item.getWeight();
if (countWeight >= r)
return item;
}
throw new RuntimeException("Should never be shown.");
}
}
139
There is a straightforward algorithm for picking an item at random, where items have individual weights:
calculate the sum of all the weights
pick a random number that is 0 or greater and is less than the sum of the weights
go through the items one at a time, subtracting their weight from your random number until you get the item where the random number is less than that item's weight
Use an alias method
If you're gonna roll a lot of times (as in a game), you should use an alias method.
The code below is rather long implementation of such an alias method, indeed. But this is because of the initialization part. The retrieval of elements is very fast (see the next and the applyAsInt methods they don't loop).
Usage
Set<Item> items = ... ;
ToDoubleFunction<Item> weighter = ... ;
Random random = new Random();
RandomSelector<T> selector = RandomSelector.weighted(items, weighter);
Item drop = selector.next(random);
Implementation
This implementation:
uses Java 8;
is designed to be as fast as possible (well, at least, I tried to do so using micro-benchmarking);
is totally thread-safe (keep one Random in each thread for maximum performance, use ThreadLocalRandom?);
fetches elements in O(1), unlike what you mostly find on the internet or on StackOverflow, where naive implementations run in O(n) or O(log(n));
keeps the items independant from their weight, so an item can be assigned various weights in different contexts.
Anyways, here's the code. (Note that I maintain an up to date version of this class.)
import static java.util.Objects.requireNonNull;
import java.util.*;
import java.util.function.*;
public final class RandomSelector<T> {
public static <T> RandomSelector<T> weighted(Set<T> elements, ToDoubleFunction<? super T> weighter)
throws IllegalArgumentException {
requireNonNull(elements, "elements must not be null");
requireNonNull(weighter, "weighter must not be null");
if (elements.isEmpty()) { throw new IllegalArgumentException("elements must not be empty"); }
// Array is faster than anything. Use that.
int size = elements.size();
T[] elementArray = elements.toArray((T[]) new Object[size]);
double totalWeight = 0d;
double[] discreteProbabilities = new double[size];
// Retrieve the probabilities
for (int i = 0; i < size; i++) {
double weight = weighter.applyAsDouble(elementArray[i]);
if (weight < 0.0d) { throw new IllegalArgumentException("weighter may not return a negative number"); }
discreteProbabilities[i] = weight;
totalWeight += weight;
}
if (totalWeight == 0.0d) { throw new IllegalArgumentException("the total weight of elements must be greater than 0"); }
// Normalize the probabilities
for (int i = 0; i < size; i++) {
discreteProbabilities[i] /= totalWeight;
}
return new RandomSelector<>(elementArray, new RandomWeightedSelection(discreteProbabilities));
}
private final T[] elements;
private final ToIntFunction<Random> selection;
private RandomSelector(T[] elements, ToIntFunction<Random> selection) {
this.elements = elements;
this.selection = selection;
}
public T next(Random random) {
return elements[selection.applyAsInt(random)];
}
private static class RandomWeightedSelection implements ToIntFunction<Random> {
// Alias method implementation O(1)
// using Vose's algorithm to initialize O(n)
private final double[] probabilities;
private final int[] alias;
RandomWeightedSelection(double[] probabilities) {
int size = probabilities.length;
double average = 1.0d / size;
int[] small = new int[size];
int smallSize = 0;
int[] large = new int[size];
int largeSize = 0;
// Describe a column as either small (below average) or large (above average).
for (int i = 0; i < size; i++) {
if (probabilities[i] < average) {
small[smallSize++] = i;
} else {
large[largeSize++] = i;
}
}
// For each column, saturate a small probability to average with a large probability.
while (largeSize != 0 && smallSize != 0) {
int less = small[--smallSize];
int more = large[--largeSize];
probabilities[less] = probabilities[less] * size;
alias[less] = more;
probabilities[more] += probabilities[less] - average;
if (probabilities[more] < average) {
small[smallSize++] = more;
} else {
large[largeSize++] = more;
}
}
// Flush unused columns.
while (smallSize != 0) {
probabilities[small[--smallSize]] = 1.0d;
}
while (largeSize != 0) {
probabilities[large[--largeSize]] = 1.0d;
}
}
#Override public int applyAsInt(Random random) {
// Call random once to decide which column will be used.
int column = random.nextInt(probabilities.length);
// Call random a second time to decide which will be used: the column or the alias.
if (random.nextDouble() < probabilities[column]) {
return column;
} else {
return alias[column];
}
}
}
}
public class RandomCollection<E> {
private final NavigableMap<Double, E> map = new TreeMap<Double, E>();
private double total = 0;
public void add(double weight, E result) {
if (weight <= 0 || map.containsValue(result))
return;
total += weight;
map.put(total, result);
}
public E next() {
double value = ThreadLocalRandom.current().nextDouble() * total;
return map.ceilingEntry(value).getValue();
}
}
A simple (even naive?), but (as I believe) straightforward method:
/**
* Draws an integer between a given range (excluding the upper limit).
* <p>
* Simulates Python's randint method.
*
* #param min: the smallest value to be drawed.
* #param max: the biggest value to be drawed.
* #return The value drawn.
*/
public static int randomInt(int min, int max)
{return (int) (min + Math.random()*max);}
/**
* Tests wether a given matrix has all its inner vectors
* has the same passed and expected lenght.
* #param matrix: the matrix from which the vectors length will be measured.
* #param expectedLenght: the length each vector should have.
* #return false if at least one vector has a different length.
*/
public static boolean haveAllVectorsEqualLength(int[][] matrix, int expectedLenght){
for(int[] vector: matrix){if (vector.length != expectedLenght) {return false;}}
return true;
}
/**
* Draws an integer between a given range
* by weighted values.
*
* #param ticketBlock: matrix with limits and weights for the drawing. All its
* vectors should have lenght two. The weights, instead of percentages, should be
* measured as integers, according to how rare each one should be draw, the rarest
* receiving the smallest value.
* #return The value drawn.
*/
public static int weightedRandomInt(int[][] ticketBlock) throws RuntimeException {
boolean theVectorsHaventAllLengthTwo = !(haveAllVectorsEqualLength(ticketBlock, 2));
if (theVectorsHaventAllLengthTwo)
{throw new RuntimeException("The given matrix has, at least, one vector with length lower or higher than two.");}
// Need to test for duplicates or null values in ticketBlock!
// Raffle urn building:
int raffleUrnSize = 0, urnIndex = 0, blockIndex = 0, repetitionCount = 0;
for(int[] ticket: ticketBlock){raffleUrnSize += ticket[1];}
int[] raffleUrn = new int[raffleUrnSize];
// Raffle urn filling:
while (urnIndex < raffleUrn.length){
do {
raffleUrn[urnIndex] = ticketBlock[blockIndex][0];
urnIndex++; repetitionCount++;
} while (repetitionCount < ticketBlock[blockIndex][1]);
repetitionCount = 0; blockIndex++;
}
return raffleUrn[randomInt(0, raffleUrn.length)];
}
I have a list of fitness values (percentages), which are ordered in descending order:
List<Double> fitnesses = new ArrayList<Double>();
I would like to choose one of these Doubles, with an extreme likelyhood of it being the first one, then decreasing likelyhood for each item, until the final one is close to 0% chance of it being the final item in the list.
How do I go about achieving this?
Thanks for any advice.
If you want to select "one of these Doubles, with an extreme likelihood of it being the first one, then decreasing likelihood for each item, until the final one is close to 0% chance of it being the final item in the list" then it seems like you want an exponential probability function. (p = x2).
However, you will only know whether you have chosen the right function once you have coded a solution and tried it, and if it does not suit your needs then you will need to choose some other probability function, like a sinusoidal (p = sin( x * PI/2 )) or an inverse ratio (p = 1/x).
So, the important thing is to code an algorithm for selecting an item based on a probability function, so that you can then try any probability function you like.
So, here is one way to do it.
Note the following:
I am seeding the random number generator with 10 in order to always produce the same results. Remove the seeding to get different results at each run.
I am using a list of Integer for your "percentages" in order to avoid confusion. Feel free to replace with a list of Double once you have understood how things work.
I am providing a few sample probability functions. Try them to see what distributions they yield.
Have fun!
import java.util.*;
public final class Scratch3
{
private Scratch3()
{
}
interface ProbabilityFunction<T>
{
double getProbability( double x );
}
private static double exponential2( double x )
{
assert x >= 0.0 && x <= 1.0;
return StrictMath.pow( x, 2 );
}
private static double exponential3( double x )
{
assert x >= 0.0 && x <= 1.0;
return StrictMath.pow( x, 3 );
}
private static double inverse( double x )
{
assert x >= 0.0 && x <= 1.0;
return 1/x;
}
private static double identity( double x )
{
assert x >= 0.0 && x <= 1.0;
return x;
}
#SuppressWarnings( { "UnsecureRandomNumberGeneration", "ConstantNamingConvention" } )
private static final Random randomNumberGenerator = new Random( 10 );
private static <T> T select( List<T> values, ProbabilityFunction<T> probabilityFunction )
{
double x = randomNumberGenerator.nextDouble();
double p = probabilityFunction.getProbability( x );
int i = (int)( p * values.size() );
return values.get( i );
}
public static void main( String[] args )
{
List<Integer> values = Arrays.asList( 10, 11, 12, 13, 14, 15 );
Map<Integer,Integer> counts = new HashMap<>();
for( int i = 0; i < 10000; i++ )
{
int value = select( values, Scratch3::exponential3 );
counts.merge( value, 1, ( a, b ) -> a + b );
}
for( int value : values )
System.out.println( value + ": " + counts.get( value ) );
}
}
Here's another way of doing it that gives you the ability to approximate an arbitrary weight distribution.
The array passed to WeightedIndexPicker indicates the number of "buckets" (>0) that should be allocated to each index. In your case these would be descending, but they don't have to be. When you need an index, pick a random number between 0 and the total number of buckets and return the index associated with that bucket.
I've used an int weight array as it's easier to visualize and it avoids rounding errors associated with floating point.
import java.util.Random;
public class WeightedIndexPicker
{
private int total;
private int[] counts;
private Random rand;
public WeightedIndexPicker(int[] weights)
{
rand = new Random();
counts = weights.clone();
for(int i=1; i<counts.length; i++)
{
counts[i] += counts[i-1];
}
total = counts[counts.length-1];
}
public int nextIndex()
{
int idx = 0;
int pick = rand.nextInt(total);
while(pick >= counts[idx]) idx++;
return idx;
}
public static void main(String[] args)
{
int[] dist = {1000, 100, 10, 1};
WeightedIndexPicker wip = new WeightedIndexPicker(dist);
int idx = wip.nextIndex();
System.out.println(idx);
}
}
I don't think you need all this code to answer your question since your question seems to be much more about math than code. For example, using the apache commons maths library getting a distribution is easy:
ExponentialDistribution dist = new ExponentialDistribution(1);
// getting a sample (aka index into the list) is easy
dist.sample();
// lot's of extra code to display the distribution.
int NUM_BUCKETS = 100;
int NUM_SAMPLES = 1000000;
DoubleStream.of(dist.sample(NUM_SAMPLES))
.map(s->((long)s*NUM_BUCKETS)/NUM_BUCKETS)
.boxed()
.collect(groupingBy(identity(), TreeMap::new, counting()))
.forEach((k,v)->System.out.println(k.longValue() + " -> " + v));
However, as you said, there are so many possible distributions in the math library. If you are writing code for a specific purpose then the end user will probably want you to explain why you chose a specific distribution and why you set the parameters for that distribution the way you did. That's a math question and should be asked in the mathematics forum.
How do I go about calculating weighted mean of a Map<Double, Integer> where the Integer value is the weight for the Double value to be averaged.
eg: Map has following elements:
(0.7, 100) // value is 0.7 and weight is 100
(0.5, 200)
(0.3, 300)
(0.0, 400)
I am looking to apply the following formula using Java 8 streams, but unsure how to calculate the numerator and denominator together and preserve it at the same time. How to use reduction here?
You can create your own collector for this task:
static <T> Collector<T,?,Double> averagingWeighted(ToDoubleFunction<T> valueFunction, ToIntFunction<T> weightFunction) {
class Box {
double num = 0;
long denom = 0;
}
return Collector.of(
Box::new,
(b, e) -> {
b.num += valueFunction.applyAsDouble(e) * weightFunction.applyAsInt(e);
b.denom += weightFunction.applyAsInt(e);
},
(b1, b2) -> { b1.num += b2.num; b1.denom += b2.denom; return b1; },
b -> b.num / b.denom
);
}
This custom collector takes two functions as parameter: one is a function returning the value to use for a given stream element (as a ToDoubleFunction), and the other returns the weight (as a ToIntFunction). It uses a helper local class storing the numerator and denominator during the collecting process. Each time an entry is accepted, the numerator is increased with the result of multiplying the value with its weight, and the denominator is increased with the weight. The finisher then returns the division of the two as a Double.
A sample usage would be:
Map<Double,Integer> map = new HashMap<>();
map.put(0.7, 100);
map.put(0.5, 200);
double weightedAverage =
map.entrySet().stream().collect(averagingWeighted(Map.Entry::getKey, Map.Entry::getValue));
You can use this procedure to calculate the weighted average of a map. Note that the key of the map entry should contain the value and the value of the map entry should contain the weight.
/**
* Calculates the weighted average of a map.
*
* #throws ArithmeticException If divide by zero happens
* #param map A map of values and weights
* #return The weighted average of the map
*/
static Double calculateWeightedAverage(Map<Double, Integer> map) throws ArithmeticException {
double num = 0;
double denom = 0;
for (Map.Entry<Double, Integer> entry : map.entrySet()) {
num += entry.getKey() * entry.getValue();
denom += entry.getValue();
}
return num / denom;
}
You can look at its unit test to see a usecase.
/**
* Tests our method to calculate the weighted average.
*/
#Test
public void testAveragingWeighted() {
Map<Double, Integer> map = new HashMap<>();
map.put(0.7, 100);
map.put(0.5, 200);
Double weightedAverage = calculateWeightedAverage(map);
Assert.assertTrue(weightedAverage.equals(0.5666666666666667));
}
You need these imports for the unit tests:
import org.junit.Assert;
import org.junit.Test;
You need these imports for the code:
import java.util.HashMap;
import java.util.Map;
I hope it helps.
public static double weightedAvg(Collection<Map.Entry<? extends Number, ? extends Number> data) {
var sumWeights = data.stream()
.map(Map.Entry::getKey)
.mapToDouble(Number::doubleValue)
.sum();
var sumData = data.stream()
.mapToDouble(e -> e.getKey().doubleValue() * e.getValue().doubleValue())
.sum();
return sumData / sumWeights;
}
static float weightedMean(List<Double> value, List<Integer> weighted, int n) {
int sum = 0;
double numWeight = 0;
for (int i = 0; i < n; i++) {
numWeight = numWeight + value.get(i).doubleValue() * weighted.get(i).intValue();
sum = sum + weighted.get(i).intValue();
}
return (float) (numWeight) / sum;
}
I need to determine the minimum value after removing the first value.
For instance is these are the numbers 0.5 70 80 90 10
I need to remove 0.5, the determine the minimum value in the remaining numbers. calweightAvg is my focus ...
The final output should be “The weighted average of the numbers is 40, when using the data 0.5 70 80 90 10, where 0.5 is the weight, and the average is computed after dropping the lowest of the rest of the values.”
EDIT: Everything seems to be working, EXCEPT during the final out put. "The weighted average of the numbers is 40.0, when using the data 70.0, 80.0, 90.0, 10.0, where 70.0 (should be 0.5) is the weight, and the average is computed after dropping the lowest of the rest of the values."
So the math is right, the output is not.
EDIT: While using a class static double weight=0.5;to establish the weight, if the user were to change the values in the input file, that would not work. How can I change the class?
/*
*
*/
package calcweightedavg;
import java.util.Scanner;
import java.util.ArrayList;
import java.io.File;
import java.io.PrintWriter;
import java.io.FileNotFoundException;
import java.io.IOException;
public class CalcWeightedAvg {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
//System.out.println(System.getProperty("user.dir"));
ArrayList<Double> inputValues = getData(); // User entered integers.
double weightedAvg = calcWeightedAvg(inputValues); // User entered weight.
printResults(inputValues, weightedAvg); //Weighted average of integers.
}
public class CalcWeightedAvg {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
//System.out.println(System.getProperty("user.dir"));
ArrayList<Double> inputValues = getData(); // User entered integers.
double weightedAvg = calcWeightedAvg(inputValues); // User entered weight.
printResults(inputValues, weightedAvg); //Weighted average of integers.
}
public static ArrayList<Double> getData() throws FileNotFoundException {
// Get input file name.
Scanner console = new Scanner(System.in);
System.out.print("Input File: ");
String inputFileName = console.next();
File inputFile = new File(inputFileName);
//
Scanner in = new Scanner(inputFile);
String inputString = in.nextLine();
//
String[] strArray = inputString.split("\\s+"); //LEFT OFF HERE
// Create arraylist with integers.
ArrayList<Double> doubleArrayList = new ArrayList<>();
for (String strElement : strArray) {
doubleArrayList.add(Double.parseDouble(strElement));
}
in.close();
return doubleArrayList;
}
public static double calcWeightedAvg(ArrayList<Double> inputValues){
//Get and remove weight.
Double weight = inputValues.get(0);
inputValues.remove(0);
//Sum and find min.
double min = Double.MAX_VALUE;
double sum = 0;
for (Double d : inputValues) {
if (d < min) min = d;
sum += d;
}
// Calculate weighted average.
return (sum-min)/(inputValues.size()-1) * weight;
}
public static void printResults(ArrayList<Double> inputValues, double weightedAvg) throws IOException {
Scanner console = new Scanner(System.in);
System.out.print("Output File: ");
String outputFileName = console.next();
PrintWriter out = new PrintWriter(outputFileName);
System.out.println("Your output is in the file " + outputFileName);
out.print("The weighted average of the numbers is " + weightedAvg + ", ");
out.print("when using the data ");
for (int i=0; i<inputValues.size(); i++) {
out.print(inputValues.get(i) + ", ");
}
out.print("\n where " + inputValues.get(0) + " is the weight, ");
out.print("and the average is computed after dropping the lowest of the rest of the values.\n");
out.close();
}
}
to do this task in a complexity of O(n) isn't a hard task.
you can use ArrayList's .get(0) to Save weight in a temp variable, then use .remove(0) function which removes the first value (in this case 0.5)
then you should use a For Each loop for (Double d : list) to sum AND find the minimal value
afterwards subtract the minimum value from the sum. and apply weight to the sum (in this case you'll end up with 240*0.5 = 120; 120\3 = 40;
finally, you can use ArrayList's .size()-1 function to determine the divisor.
The problem in your code:
in your implementation you've removed the weight item from list. then multiplied by the first item in the list even though it's no longer the weight:
return (sum-min)/(inputValues.size()-1) * inputValues.get(0);
your calculation than was: ((70+80+90+10)-10)/(4-1) * (70) = 5600
if(inputValues.size() <= 1){
inputValues.remove(0);
}
this size safeguard will not remove weight from the list. perhaps you've meant to use >=1
even if that was your intention this will not result in a correct computation of your algorithm in the edge cases where size==0\1\2 I would recommend that you re-think this.
the full steps that need to be taken in abstract code:
ArrayList<Double> list = new ArrayList();
// get and remove weight
Double weight = list.get(0);
list.remove(0);
// sum and find min
double min=Double.MAX_VALUE;
double sum=0;
for (Double d : list) {
if (d<min) min = d;
sum+=d;
}
// subtract min value from sum
sum-=min;
// apply weight
sum*=weight;
// calc weighted avg
double avg = sum/list.size()-1;
// viola!
do take notice that you can now safely add weight back into the array list after its use via ArrayList's .add(int index, T value) function. also, the code is very abstract and safeguards regarding size should be implemented.
Regarding your Edit:
it appears you're outputting the wrong variable.
out.print("\n where " + inputValues.get(0) + " is the weight, ");
the weight variable was already removed from the list at this stage, so the first item in the list is indeed 70. either add back the weight variable into the list after you've computed the result or save it in a class variable and input it directly.
following are the implementation of both solutions. you should only use one of them not both.
1) add weight back into list solution:
change this function to add weight back to list:
public static double calcWeightedAvg(ArrayList<Double> inputValues){
//Get and remove weight.
Double weight = inputValues.get(0);
inputValues.remove(0);
//Sum and find min.
double min = Double.MAX_VALUE;
double sum = 0;
for (Double d : inputValues) {
if (d < min) min = d;
sum += d;
}
// Calculate weighted average.
double returnVal = (sum-min)/(inputValues.size()-1) * weight;
// add weight back to list
inputValues.add(0,weight);
return returnVal;
}
2) class variable solution:
change for class:
public class CalcWeightedAvg {
static double weight=0;
//...
}
change for function:
public static double calcWeightedAvg(ArrayList<Double> inputValues){
//Get and remove weight.
weight = inputValues.get(0); // changed to class variable
//...
}
change for output:
out.print("\n where " + weight + " is the weight, ");
Since you're using an ArrayList, this should be a piece of cake.
To remove a value from an ArrayList, just find the index of the value and call
myList.remove(index);
If 0.5 is the first element in the list, remove it with
inputValues.remove(0);
If you want to find the minimum value in an ArrayList of doubles, just use this algorithm to find both the minimum value and its index:
double minVal = Double.MAX_VALUE;
int minIndex = -1;
for(int i = 0; i < myList.size(); i++) {
if(myList.get(i) < minVal) {
minVal = myList.get(i);
minIndex = i;
}
}
Hope this helps!
If you want to remove the first element from ArrayList and calculate the minimum in the remaining you should do:
if(inputValues.size() <= 1) //no point in calculation of one element
return;
inputValues.remove(0);
double min = inputValues.get(0);
for (int i = 1; i < inputValues.size(); i++) {
if (inputValues.get(i) < min)
min = inputValues.get(i);
}
I am a little unclear about your goal here. If you are required to make frequent calls to check the minimum value, a min heap would be a very good choice.
A min heap has the property that it offers constant time access to the minimum value. This [implementation] uses an ArrayList. So, you can add to the ArrayList using the add() method, and minValue() gives constant time access to the minimum value of the list since it ensures that the minimum value is always at index 0. The list is modified accordingly when the least value is removed, or a new value is added (called heapify).
I am not adding any code here since the link should make that part clear. If you would like some clarification, I would be more than glad to be of help.
Edit.
public class HelloWorld {
private static ArrayList<Double> values;
private static Double sum = 0.0D;
/**
* Identifies the minimum value stored in the heap
* #return the minimum value
*/
public static Double minValue() {
if (values.size() == 0) {
throw new NoSuchElementException();
}
return values.get(0);
}
/**
* Adds a new value to the heap.
* #param newValue the value to be added
*/
public static void add(Double newValue) {
values.add(newValue);
int pos = values.size()-1;
while (pos > 0) {
if (newValue.compareTo(values.get((pos-1)/2)) < 0) {
values.set(pos, values.get((pos-1)/2));
pos = (pos-1)/2;
}
else {
break;
}
}
values.set(pos, newValue);
// update global sum
sum += newValue;
}
/**
* Removes the minimum value from the heap.
*/
public static void remove() {
Double newValue = values.remove(values.size()-1);
int pos = 0;
if (values.size() > 0) {
while (2*pos+1 < values.size()) {
int minChild = 2*pos+1;
if (2*pos+2 < values.size() &&
values.get(2*pos+2).compareTo(values.get(2*pos+1)) < 0) {
minChild = 2*pos+2;
}
if (newValue.compareTo(values.get(minChild)) > 0) {
values.set(pos, values.get(minChild));
pos = minChild;
}
else {
break;
}
}
values.set(pos, newValue);
}
// update global sum
sum -= newValue;
}
/**
* NEEDS EDIT Computes the average of the list, leaving out the minimum value.
* #param newValue the value to be added
*/
public static double calcWeightedAvg() {
double minValue = minValue();
// the running total of the sum took this into account
// so, we have to remove this from the sum to get the effective sum
double effectiveSum = (sum - minValue);
return effectiveSum * minValue;
}
public static void main(String []args) {
values = new ArrayList<Double>();
// add values to the arraylist -> order is intentionally ruined
double[] arr = new double[]{10,70,90,80,0.5};
for(double val: arr)
add(val);
System.out.println("Present minimum in the list: " + minValue()); // 0.5
System.out.println("CalcWeightedAvg: " + calcWeightedAvg()); // 125.0
}
}
I'm trying to calculate the total, mean and median of an array thats populated by input received by a textfield. I've managed to work out the total and the mean, I just can't get the median to work. I think the array needs to be sorted before I can do this, but I'm not sure how to do this. Is this the problem, or is there another one that I didn't find? Here is my code:
import java.applet.Applet;
import java.awt.Graphics;
import java.awt.*;
import java.awt.event.*;
public class whileloopq extends Applet implements ActionListener
{
Label label;
TextField input;
int num;
int index;
int[] numArray = new int[20];
int sum;
int total;
double avg;
int median;
public void init ()
{
label = new Label("Enter numbers");
input = new TextField(5);
add(label);
add(input);
input.addActionListener(this);
index = 0;
}
public void actionPerformed (ActionEvent ev)
{
int num = Integer.parseInt(input.getText());
numArray[index] = num;
index++;
if (index == 20)
input.setEnabled(false);
input.setText("");
sum = 0;
for (int i = 0; i < numArray.length; i++)
{
sum += numArray[i];
}
total = sum;
avg = total / index;
median = numArray[numArray.length/2];
repaint();
}
public void paint (Graphics graf)
{
graf.drawString("Total = " + Integer.toString(total), 25, 85);
graf.drawString("Average = " + Double.toString(avg), 25, 100);
graf.drawString("Median = " + Integer.toString(median), 25, 115);
}
}
The Arrays class in Java has a static sort function, which you can invoke with Arrays.sort(numArray).
Arrays.sort(numArray);
double median;
if (numArray.length % 2 == 0)
median = ((double)numArray[numArray.length/2] + (double)numArray[numArray.length/2 - 1])/2;
else
median = (double) numArray[numArray.length/2];
Sorting the array is unnecessary and inefficient. There's a variation of the QuickSort (QuickSelect) algorithm which has an average run time of O(n); if you sort first, you're down to O(n log n). It actually finds the nth smallest item in a list; for a median, you just use n = half the list length. Let's call it quickNth (list, n).
The concept is that to find the nth smallest, choose a 'pivot' value. (Exactly how you choose it isn't critical; if you know the data will be thoroughly random, you can take the first item on the list.)
Split the original list into three smaller lists:
One with values smaller than the pivot.
One with values equal to the pivot.
And one with values greater than the pivot.
You then have three cases:
The "smaller" list has >= n items. In that case, you know that the nth smallest is in that list. Return quickNth(smaller, n).
The smaller list has < n items, but the sum of the lengths of the smaller and equal lists have >= n items. In this case, the nth is equal to any item in the "equal" list; you're done.
n is greater than the sum of the lengths of the smaller and equal lists. In that case, you can essentially skip over those two, and adjust n accordingly. Return quickNth(greater, n - length(smaller) - length(equal)).
Done.
If you're not sure that the data is thoroughly random, you need to be more sophisticated about choosing the pivot. Taking the median of the first value in the list, the last value in the list, and the one midway between the two works pretty well.
If you're very unlucky with your choice of pivots, and you always choose the smallest or highest value as your pivot, this takes O(n^2) time; that's bad. But, it's also very unlikely if you choose your pivot with a decent algorithm.
Sample code:
import java.util.*;
public class Utility {
/****************
* #param coll an ArrayList of Comparable objects
* #return the median of coll
*****************/
public static <T extends Number> double median(ArrayList<T> coll, Comparator<T> comp) {
double result;
int n = coll.size()/2;
if (coll.size() % 2 == 0) // even number of items; find the middle two and average them
result = (nth(coll, n-1, comp).doubleValue() + nth(coll, n, comp).doubleValue()) / 2.0;
else // odd number of items; return the one in the middle
result = nth(coll, n, comp).doubleValue();
return result;
} // median(coll)
/*****************
* #param coll a collection of Comparable objects
* #param n the position of the desired object, using the ordering defined on the list elements
* #return the nth smallest object
*******************/
public static <T> T nth(ArrayList<T> coll, int n, Comparator<T> comp) {
T result, pivot;
ArrayList<T> underPivot = new ArrayList<>(), overPivot = new ArrayList<>(), equalPivot = new ArrayList<>();
// choosing a pivot is a whole topic in itself.
// this implementation uses the simple strategy of grabbing something from the middle of the ArrayList.
pivot = coll.get(n/2);
// split coll into 3 lists based on comparison with the pivot
for (T obj : coll) {
int order = comp.compare(obj, pivot);
if (order < 0) // obj < pivot
underPivot.add(obj);
else if (order > 0) // obj > pivot
overPivot.add(obj);
else // obj = pivot
equalPivot.add(obj);
} // for each obj in coll
// recurse on the appropriate list
if (n < underPivot.size())
result = nth(underPivot, n, comp);
else if (n < underPivot.size() + equalPivot.size()) // equal to pivot; just return it
result = pivot;
else // everything in underPivot and equalPivot is too small. Adjust n accordingly in the recursion.
result = nth(overPivot, n - underPivot.size() - equalPivot.size(), comp);
return result;
} // nth(coll, n)
public static void main (String[] args) {
Comparator<Integer> comp = Comparator.naturalOrder();
Random rnd = new Random();
for (int size = 1; size <= 10; size++) {
ArrayList<Integer> coll = new ArrayList<>(size);
for (int i = 0; i < size; i++)
coll.add(rnd.nextInt(100));
System.out.println("Median of " + coll.toString() + " is " + median(coll, comp));
} // for a range of possible input sizes
} // main(args)
} // Utility
If you want to use any external library here is Apache commons math library using you can calculate the Median.
For more methods and use take look at the API documentation
import org.apache.commons.math3.*;
.....
......
........
//calculate median
public double getMedian(double[] values){
Median median = new Median();
double medianValue = median.evaluate(values);
return medianValue;
}
.......
For more on evaluate method AbstractUnivariateStatistic#evaluate
Update
Calculate in program
Generally, median is calculated using the following two formulas given here
If n is odd then Median (M) = value of ((n + 1)/2)th item term.
If n is even then Median (M) = value of [((n)/2)th item term + ((n)/2 + 1)th item term ]/2
In your program you have numArray, first you need to sort array using Arrays#sort
Arrays.sort(numArray);
int middle = numArray.length/2;
int medianValue = 0; //declare variable
if (numArray.length%2 == 1)
medianValue = numArray[middle];
else
medianValue = (numArray[middle-1] + numArray[middle]) / 2;
Arrays.sort(numArray);
return (numArray[size/2] + numArray[(size-1)/2]) / 2;
Arrays.sort(numArray);
int middle = ((numArray.length) / 2);
if(numArray.length % 2 == 0){
int medianA = numArray[middle];
int medianB = numArray[middle-1];
median = (medianA + medianB) / 2;
} else{
median = numArray[middle + 1];
}
EDIT: I initially had medianB setting to middle+1 in the even length arrays, this was wrong due to arrays starting count at 0. I have updated it to use middle-1 which is correct and should work properly for an array with an even length.
You can find good explanation at https://www.youtube.com/watch?time_continue=23&v=VmogG01IjYc
The idea it to use 2 Heaps viz one max heap and mean heap.
class Heap {
private Queue<Integer> low = new PriorityQueue<>(Comparator.reverseOrder());
private Queue<Integer> high = new PriorityQueue<>();
public void add(int number) {
Queue<Integer> target = low.size() <= high.size() ? low : high;
target.add(number);
balance();
}
private void balance() {
while(!low.isEmpty() && !high.isEmpty() && low.peek() > high.peek()) {
Integer lowHead= low.poll();
Integer highHead = high.poll();
low.add(highHead);
high.add(lowHead);
}
}
public double median() {
if(low.isEmpty() && high.isEmpty()) {
throw new IllegalStateException("Heap is empty");
} else {
return low.size() == high.size() ? (low.peek() + high.peek()) / 2.0 : low.peek();
}
}
}
Try sorting the array first. Then after it's sorted, if the array has an even amount of elements the mean of the middle two is the median, if it has a odd number, the middle element is the median.
Use Arrays.sort and then take the middle element (in case the number n of elements in the array is odd) or take the average of the two middle elements (in case n is even).
public static long median(long[] l)
{
Arrays.sort(l);
int middle = l.length / 2;
if (l.length % 2 == 0)
{
long left = l[middle - 1];
long right = l[middle];
return (left + right) / 2;
}
else
{
return l[middle];
}
}
Here are some examples:
#Test
public void evenTest()
{
long[] l = {
5, 6, 1, 3, 2
};
Assert.assertEquals((3 + 4) / 2, median(l));
}
#Test
public oddTest()
{
long[] l = {
5, 1, 3, 2, 4
};
Assert.assertEquals(3, median(l));
}
And in case your input is a Collection, you might use Google Guava to do something like this:
public static long median(Collection<Long> numbers)
{
return median(Longs.toArray(numbers)); // requires import com.google.common.primitives.Longs;
}
I was looking at the same statistics problems. The approach you are thinking it is good and it will work. (Answer to the sorting has been given)
But in case you are interested in algorithm performance, I think there are a couple of algorithms that have better performance than just sorting the array, one (QuickSelect) is indicated by #bruce-feist's answer and is very well explained.
[Java implementation: https://discuss.leetcode.com/topic/14611/java-quick-select ]
But there is a variation of this algorithm named median of medians, you can find a good explanation on this link:
http://austinrochford.com/posts/2013-10-28-median-of-medians.html
Java implementation of this:
- https://stackoverflow.com/a/27719796/957979
I faced a similar problem yesterday.
I wrote a method with Java generics in order to calculate the median value of every collection of Numbers; you can apply my method to collections of Doubles, Integers, Floats and returns a double. Please consider that my method creates another collection in order to not alter the original one.
I provide also a test, have fun. ;-)
public static <T extends Number & Comparable<T>> double median(Collection<T> numbers){
if(numbers.isEmpty()){
throw new IllegalArgumentException("Cannot compute median on empty collection of numbers");
}
List<T> numbersList = new ArrayList<>(numbers);
Collections.sort(numbersList);
int middle = numbersList.size()/2;
if(numbersList.size() % 2 == 0){
return 0.5 * (numbersList.get(middle).doubleValue() + numbersList.get(middle-1).doubleValue());
} else {
return numbersList.get(middle).doubleValue();
}
}
JUnit test code snippet:
/**
* Test of median method, of class Utils.
*/
#Test
public void testMedian() {
System.out.println("median");
Double expResult = 3.0;
Double result = Utils.median(Arrays.asList(3.0,2.0,1.0,9.0,13.0));
assertEquals(expResult, result);
expResult = 3.5;
result = Utils.median(Arrays.asList(3.0,2.0,1.0,9.0,4.0,13.0));
assertEquals(expResult, result);
}
Usage example (consider the class name is Utils):
List<Integer> intValues = ... //omitted init
Set<Float> floatValues = ... //omitted init
.....
double intListMedian = Utils.median(intValues);
double floatSetMedian = Utils.median(floatValues);
Note: my method works on collections, you can convert arrays of numbers to list of numbers as pointed here
And nobody paying attention when list contains only one element (list.size == 1). All your answers will crash with index out of bound exception, because integer division returns zero (1 / 2 = 0). Correct answer (in Kotlin):
MEDIAN("MEDIAN") {
override fun calculate(values: List<BigDecimal>): BigDecimal? {
if (values.size == 1) {
return values.first()
}
if (values.size > 1) {
val valuesSorted = values.sorted()
val mid = valuesSorted.size / 2
return if (valuesSorted.size % 2 != 0) {
valuesSorted[mid]
} else {
AVERAGE.calculate(listOf(valuesSorted[mid - 1], valuesSorted[mid]))
}
}
return null
}
},
As #Bruce-Feist mentions, for a large number of elements, I'd avoid any solution involving sort if performance is something you are concerned about. A different approach than those suggested in the other answers is Hoare's algorithm to find the k-th smallest of element of n items. This algorithm runs in O(n).
public int findKthSmallest(int[] array, int k)
{
if (array.length < 10)
{
Arrays.sort(array);
return array[k];
}
int start = 0;
int end = array.length - 1;
int x, temp;
int i, j;
while (start < end)
{
x = array[k];
i = start;
j = end;
do
{
while (array[i] < x)
i++;
while (x < array[j])
j--;
if (i <= j)
{
temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
} while (i <= j);
if (j < k)
start = i;
if (k < i)
end = j;
}
return array[k];
}
And to find the median:
public int median(int[] array)
{
int length = array.length;
if ((length & 1) == 0) // even
return (findKthSmallest(array, array.length / 2) + findKthSmallest(array, array.length / 2 + 1)) / 2;
else // odd
return findKthSmallest(array, array.length / 2);
}
public static int median(int[] arr) {
int median = 0;
java.util.Arrays.sort(arr);
for (int i=0;i<arr.length;i++) {
if (arr.length % 2 == 1) {
median = Math.round(arr[arr.length/2]);
} else {
median = (arr[(arr.length/2)] + arr[(arr.length/2)-1])/2;
}
}
return median;
}
Check out the Arrays.sort methods:
http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html
You should also really abstract finding the median into its own method, and just return the value to the calling method. This will make testing your code much easier.
public int[] data={31, 29, 47, 48, 23, 30, 21
, 40, 23, 39, 47, 47, 42, 44, 23, 26, 44, 32, 20, 40};
public double median()
{
Arrays.sort(this.data);
double result=0;
int size=this.data.length;
if(size%2==1)
{
result=data[((size-1)/2)+1];
System.out.println(" uneven size : "+result);
}
else
{
int middle_pair_first_index =(size-1)/2;
result=(data[middle_pair_first_index+1]+data[middle_pair_first_index])/2;
System.out.println(" Even size : "+result);
}
return result;
}
package arrays;
public class Arraymidleelement {
static public double middleArrayElement(int [] arr)
{
double mid;
if(arr.length%2==0)
{
mid=((double)arr[arr.length/2]+(double)arr[arr.length/2-1])/2;
return mid;
}
return arr[arr.length/2];
}
public static void main(String[] args) {
int arr[]= {1,2,3,4,5,6};
System.out.println( middleArrayElement(arr));
}
}