I need to determine the minimum value after removing the first value.
For instance is these are the numbers 0.5 70 80 90 10
I need to remove 0.5, the determine the minimum value in the remaining numbers. calweightAvg is my focus ...
The final output should be “The weighted average of the numbers is 40, when using the data 0.5 70 80 90 10, where 0.5 is the weight, and the average is computed after dropping the lowest of the rest of the values.”
EDIT: Everything seems to be working, EXCEPT during the final out put. "The weighted average of the numbers is 40.0, when using the data 70.0, 80.0, 90.0, 10.0, where 70.0 (should be 0.5) is the weight, and the average is computed after dropping the lowest of the rest of the values."
So the math is right, the output is not.
EDIT: While using a class static double weight=0.5;to establish the weight, if the user were to change the values in the input file, that would not work. How can I change the class?
/*
*
*/
package calcweightedavg;
import java.util.Scanner;
import java.util.ArrayList;
import java.io.File;
import java.io.PrintWriter;
import java.io.FileNotFoundException;
import java.io.IOException;
public class CalcWeightedAvg {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
//System.out.println(System.getProperty("user.dir"));
ArrayList<Double> inputValues = getData(); // User entered integers.
double weightedAvg = calcWeightedAvg(inputValues); // User entered weight.
printResults(inputValues, weightedAvg); //Weighted average of integers.
}
public class CalcWeightedAvg {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException {
//System.out.println(System.getProperty("user.dir"));
ArrayList<Double> inputValues = getData(); // User entered integers.
double weightedAvg = calcWeightedAvg(inputValues); // User entered weight.
printResults(inputValues, weightedAvg); //Weighted average of integers.
}
public static ArrayList<Double> getData() throws FileNotFoundException {
// Get input file name.
Scanner console = new Scanner(System.in);
System.out.print("Input File: ");
String inputFileName = console.next();
File inputFile = new File(inputFileName);
//
Scanner in = new Scanner(inputFile);
String inputString = in.nextLine();
//
String[] strArray = inputString.split("\\s+"); //LEFT OFF HERE
// Create arraylist with integers.
ArrayList<Double> doubleArrayList = new ArrayList<>();
for (String strElement : strArray) {
doubleArrayList.add(Double.parseDouble(strElement));
}
in.close();
return doubleArrayList;
}
public static double calcWeightedAvg(ArrayList<Double> inputValues){
//Get and remove weight.
Double weight = inputValues.get(0);
inputValues.remove(0);
//Sum and find min.
double min = Double.MAX_VALUE;
double sum = 0;
for (Double d : inputValues) {
if (d < min) min = d;
sum += d;
}
// Calculate weighted average.
return (sum-min)/(inputValues.size()-1) * weight;
}
public static void printResults(ArrayList<Double> inputValues, double weightedAvg) throws IOException {
Scanner console = new Scanner(System.in);
System.out.print("Output File: ");
String outputFileName = console.next();
PrintWriter out = new PrintWriter(outputFileName);
System.out.println("Your output is in the file " + outputFileName);
out.print("The weighted average of the numbers is " + weightedAvg + ", ");
out.print("when using the data ");
for (int i=0; i<inputValues.size(); i++) {
out.print(inputValues.get(i) + ", ");
}
out.print("\n where " + inputValues.get(0) + " is the weight, ");
out.print("and the average is computed after dropping the lowest of the rest of the values.\n");
out.close();
}
}
to do this task in a complexity of O(n) isn't a hard task.
you can use ArrayList's .get(0) to Save weight in a temp variable, then use .remove(0) function which removes the first value (in this case 0.5)
then you should use a For Each loop for (Double d : list) to sum AND find the minimal value
afterwards subtract the minimum value from the sum. and apply weight to the sum (in this case you'll end up with 240*0.5 = 120; 120\3 = 40;
finally, you can use ArrayList's .size()-1 function to determine the divisor.
The problem in your code:
in your implementation you've removed the weight item from list. then multiplied by the first item in the list even though it's no longer the weight:
return (sum-min)/(inputValues.size()-1) * inputValues.get(0);
your calculation than was: ((70+80+90+10)-10)/(4-1) * (70) = 5600
if(inputValues.size() <= 1){
inputValues.remove(0);
}
this size safeguard will not remove weight from the list. perhaps you've meant to use >=1
even if that was your intention this will not result in a correct computation of your algorithm in the edge cases where size==0\1\2 I would recommend that you re-think this.
the full steps that need to be taken in abstract code:
ArrayList<Double> list = new ArrayList();
// get and remove weight
Double weight = list.get(0);
list.remove(0);
// sum and find min
double min=Double.MAX_VALUE;
double sum=0;
for (Double d : list) {
if (d<min) min = d;
sum+=d;
}
// subtract min value from sum
sum-=min;
// apply weight
sum*=weight;
// calc weighted avg
double avg = sum/list.size()-1;
// viola!
do take notice that you can now safely add weight back into the array list after its use via ArrayList's .add(int index, T value) function. also, the code is very abstract and safeguards regarding size should be implemented.
Regarding your Edit:
it appears you're outputting the wrong variable.
out.print("\n where " + inputValues.get(0) + " is the weight, ");
the weight variable was already removed from the list at this stage, so the first item in the list is indeed 70. either add back the weight variable into the list after you've computed the result or save it in a class variable and input it directly.
following are the implementation of both solutions. you should only use one of them not both.
1) add weight back into list solution:
change this function to add weight back to list:
public static double calcWeightedAvg(ArrayList<Double> inputValues){
//Get and remove weight.
Double weight = inputValues.get(0);
inputValues.remove(0);
//Sum and find min.
double min = Double.MAX_VALUE;
double sum = 0;
for (Double d : inputValues) {
if (d < min) min = d;
sum += d;
}
// Calculate weighted average.
double returnVal = (sum-min)/(inputValues.size()-1) * weight;
// add weight back to list
inputValues.add(0,weight);
return returnVal;
}
2) class variable solution:
change for class:
public class CalcWeightedAvg {
static double weight=0;
//...
}
change for function:
public static double calcWeightedAvg(ArrayList<Double> inputValues){
//Get and remove weight.
weight = inputValues.get(0); // changed to class variable
//...
}
change for output:
out.print("\n where " + weight + " is the weight, ");
Since you're using an ArrayList, this should be a piece of cake.
To remove a value from an ArrayList, just find the index of the value and call
myList.remove(index);
If 0.5 is the first element in the list, remove it with
inputValues.remove(0);
If you want to find the minimum value in an ArrayList of doubles, just use this algorithm to find both the minimum value and its index:
double minVal = Double.MAX_VALUE;
int minIndex = -1;
for(int i = 0; i < myList.size(); i++) {
if(myList.get(i) < minVal) {
minVal = myList.get(i);
minIndex = i;
}
}
Hope this helps!
If you want to remove the first element from ArrayList and calculate the minimum in the remaining you should do:
if(inputValues.size() <= 1) //no point in calculation of one element
return;
inputValues.remove(0);
double min = inputValues.get(0);
for (int i = 1; i < inputValues.size(); i++) {
if (inputValues.get(i) < min)
min = inputValues.get(i);
}
I am a little unclear about your goal here. If you are required to make frequent calls to check the minimum value, a min heap would be a very good choice.
A min heap has the property that it offers constant time access to the minimum value. This [implementation] uses an ArrayList. So, you can add to the ArrayList using the add() method, and minValue() gives constant time access to the minimum value of the list since it ensures that the minimum value is always at index 0. The list is modified accordingly when the least value is removed, or a new value is added (called heapify).
I am not adding any code here since the link should make that part clear. If you would like some clarification, I would be more than glad to be of help.
Edit.
public class HelloWorld {
private static ArrayList<Double> values;
private static Double sum = 0.0D;
/**
* Identifies the minimum value stored in the heap
* #return the minimum value
*/
public static Double minValue() {
if (values.size() == 0) {
throw new NoSuchElementException();
}
return values.get(0);
}
/**
* Adds a new value to the heap.
* #param newValue the value to be added
*/
public static void add(Double newValue) {
values.add(newValue);
int pos = values.size()-1;
while (pos > 0) {
if (newValue.compareTo(values.get((pos-1)/2)) < 0) {
values.set(pos, values.get((pos-1)/2));
pos = (pos-1)/2;
}
else {
break;
}
}
values.set(pos, newValue);
// update global sum
sum += newValue;
}
/**
* Removes the minimum value from the heap.
*/
public static void remove() {
Double newValue = values.remove(values.size()-1);
int pos = 0;
if (values.size() > 0) {
while (2*pos+1 < values.size()) {
int minChild = 2*pos+1;
if (2*pos+2 < values.size() &&
values.get(2*pos+2).compareTo(values.get(2*pos+1)) < 0) {
minChild = 2*pos+2;
}
if (newValue.compareTo(values.get(minChild)) > 0) {
values.set(pos, values.get(minChild));
pos = minChild;
}
else {
break;
}
}
values.set(pos, newValue);
}
// update global sum
sum -= newValue;
}
/**
* NEEDS EDIT Computes the average of the list, leaving out the minimum value.
* #param newValue the value to be added
*/
public static double calcWeightedAvg() {
double minValue = minValue();
// the running total of the sum took this into account
// so, we have to remove this from the sum to get the effective sum
double effectiveSum = (sum - minValue);
return effectiveSum * minValue;
}
public static void main(String []args) {
values = new ArrayList<Double>();
// add values to the arraylist -> order is intentionally ruined
double[] arr = new double[]{10,70,90,80,0.5};
for(double val: arr)
add(val);
System.out.println("Present minimum in the list: " + minValue()); // 0.5
System.out.println("CalcWeightedAvg: " + calcWeightedAvg()); // 125.0
}
}
Related
I'm working on a project that prompts the user to create and fill an array with integers, then displays the mean, mode, median, and standard deviation of that array. It starts by asking the user what the size of the array will be, to which the number entered will declare and initialize the array. The program will then iterate several times asking the user to declare an integer value, and each value will be stored into the array until the array is filled. The program will then print the contents of the array, as well as the mean, mode, median, and standard deviation.
I have a code that seems to meet all these requirements. However, one thing I am struggling on is the mode. While it does print out the most repeated number in the array, it doesn't take into account multiple modes with the same number of repetitions, nor does it take into account what will happen if there is no mode.
Right now, if two numbers are entered twice each, the mode displayed is the first number to be repeated more than once. For example, if I have an array size of 10 integers, and the integers I enter are 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, it will print out "2.0" for the mode instead of printing both "2.0" and "3.0." If there is no mode, it simply enters the number first entered, rather than saying "None."
What would be the best course of action to go about accomplishing this?
Here is my code:
import java.util.*;
public class ArrayStatistics {
public static void main(String[] args) {
double total = 0;
Scanner input = new Scanner(System.in);
System.out.print("Enter the size of your array >> ");
int size = input.nextInt();
double[] myArray = new double[size];
System.out.print("Enter the integer values >> ");
for (int i=0; i<size; i++) {
myArray[i] = input.nextInt();
}
System.out.println("\nIntegers:");
for (int i=0; i<size; i++) {
System.out.println(myArray[i]);
}
double mean = calculateMean(myArray);
System.out.println("\nMean: " + mean);
double mode = calculateMode(myArray);
System.out.println("Mode: " + mode);
double median = calculateMedian(myArray);
System.out.println("Median: " + median);
double SD = calculateSD(myArray);
System.out.format("Standard Deviation: %.6f", SD);
}
public static double calculateMean(double myArray[]) {
int sum = 0;
for(int i = 0; i<myArray.length; i++) {
sum = (int) (sum + myArray[i]);
}
double mean = ((double) sum) / (double)myArray.length;
return mean;
}
public static double calculateMode(double myArray[]) {
int modeCount = 0;
int mode = 0;
int currCount = 0;
for(double candidateMode : myArray) {
currCount = 0;
for(double element : myArray) {
if(candidateMode == element) {
currCount++;
}
}
if(currCount > modeCount) {
modeCount = currCount;
mode = (int) candidateMode;
}
}
return mode;
}
public static double calculateMedian(double myArray[]) {
Arrays.sort(myArray);
int val = myArray.length/2;
double median = ((myArray[val]+myArray[val-1])/2.0);
return median;
}
public static double calculateSD(double myArray[]) {
double sum = 0.0;
double standardDeviation = 0.0;
int length = myArray.length;
for(double num : myArray) {
sum += num;
}
double mean = sum/length;
for(double num : myArray) {
standardDeviation += Math.pow(num - mean, 2);
}
return Math.sqrt(standardDeviation/length);
}
First the code, then the explanations.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.stream.Collectors;
public class ArrayStatistics {
public static void main(String[] args) {
int total = 0;
Scanner input = new Scanner(System.in);
System.out.print("Enter the size of your array >> ");
int size = input.nextInt();
int[] myArray = new int[size];
Map<Integer, Integer> frequencies = new HashMap<>();
System.out.print("Enter the integer values >> ");
for (int i = 0; i < size; i++) {
myArray[i] = input.nextInt();
if (frequencies.containsKey(myArray[i])) {
int frequency = frequencies.get(myArray[i]);
frequencies.put(myArray[i], frequency + 1);
}
else {
frequencies.put(myArray[i], 1);
}
total += myArray[i];
}
System.out.println("\nIntegers:");
for (int i = 0; i < size; i++) {
System.out.println(myArray[i]);
}
double mean = calculateMean(size, total);
System.out.println("\nMean: " + mean);
List<Integer> mode = calculateMode(frequencies);
System.out.println("Mode: " + mode);
double median = calculateMedian(myArray);
System.out.println("Median: " + median);
double stdDev = calculateSD(mean, total, size, myArray);
System.out.format("Standard Deviation: %.6f", stdDev);
}
public static double calculateMean(int count, int total) {
double mean = ((double) total) / count;
return mean;
}
public static List<Integer> calculateMode(Map<Integer, Integer> frequencies) {
Map<Integer, Integer> sorted = frequencies.entrySet()
.stream()
.sorted((e1, e2) -> e2.getValue() - e1.getValue())
.collect(Collectors.toMap(e -> e.getKey(),
e -> e.getValue(),
(i1, i2) -> i1,
LinkedHashMap::new));
Iterator<Integer> iterator = sorted.keySet().iterator();
Integer first = iterator.next();
Integer val = sorted.get(first);
List<Integer> modes = new ArrayList<>();
if (val > 1) {
modes.add(first);
while (iterator.hasNext()) {
Integer next = iterator.next();
Integer nextVal = sorted.get(next);
if (nextVal.equals(val)) {
modes.add(next);
}
else {
break;
}
}
}
return modes;
}
public static double calculateMedian(int myArray[]) {
Arrays.sort(myArray);
int val = myArray.length / 2;
double median = ((myArray[val] + myArray[val - 1]) / 2.0);
return median;
}
public static double calculateSD(double mean, int sum, int length, int[] myArray) {
double standardDeviation = 0.0;
for (double num : myArray) {
standardDeviation += Math.pow(num - mean, 2);
}
return Math.sqrt(standardDeviation / length);
}
}
In order to determine the mode(s), you need to keep track of the occurrences of integers entered into your array. I use a Map to do this. I also calculate the total while entering the integers. I use this total in methods that require it, for example calculateMean. Seems like extra work to recalculate the total each time you need it.
You are dealing with integers, so why declare myArray as array of double? Hence I changed it to array of int.
Your question was how to determine the mode(s). Consequently I refactored method calculatMode. In order to determine the mode(s), you need to interrogate the frequencies, hence the method parameter. Since you claim that there can be zero, one or more than one modes, the method returns a List. First I sort the Map entries according to the value, i.e. the number of occurrences of a particular integer in myArray. I sort the entries in descending order. Then I collect all the sorted entries to a LinkedHashMap since that is a map that stores its entries in the order in which they were added. Hence the first entry in the LinkedHashMap will be the integer with the most occurrences. If the number of occurrences of the first map entry is 1 (one), that means there are no modes (according to this definition that I found):
If no number in the list is repeated, then there is no mode for the list.
In the case of no modes, method calculateMode returns an empty List.
If the number of occurrences of the first entry is more than one, I add the integer to the List. Then I iterate through the remaining map entries and add the integer to the List if its occurrences equals that of the first map entry. As soon as the number of occurrences in an entry does not equal that of the first entry, I exit the while loop. Now List contains all the integers in myArray with the highest number of occurrences.
Here is a sample run (using example data from your question):
Enter the size of your array >> 10
Enter the integer values >> 1 2 2 3 3 4 5 6 7 8
Integers:
1
2
2
3
3
4
5
6
7
8
Mean: 4.1
Mode: [2, 3]
Median: 3.5
Standard Deviation: 2.211334
I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values."
Example:
100
105
102
13
104
22
101
How would I be able to write the code to detect that (in this case) 13 and 22 don't fall within the "good values" of around 100?
There are several criteria for detecting outliers. The simplest ones, like Chauvenet's criterion, use the mean and standard deviation calculated from the sample to determine a "normal" range for values. Any value outside of this range is deemed an outlier.
Other criterions are Grubb's test and Dixon's Q test and may give better results than Chauvenet's for example if the sample comes from a skew distribution.
package test;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Main {
public static void main(String[] args) {
List<Double> data = new ArrayList<Double>();
data.add((double) 20);
data.add((double) 65);
data.add((double) 72);
data.add((double) 75);
data.add((double) 77);
data.add((double) 78);
data.add((double) 80);
data.add((double) 81);
data.add((double) 82);
data.add((double) 83);
Collections.sort(data);
System.out.println(getOutliers(data));
}
public static List<Double> getOutliers(List<Double> input) {
List<Double> output = new ArrayList<Double>();
List<Double> data1 = new ArrayList<Double>();
List<Double> data2 = new ArrayList<Double>();
if (input.size() % 2 == 0) {
data1 = input.subList(0, input.size() / 2);
data2 = input.subList(input.size() / 2, input.size());
} else {
data1 = input.subList(0, input.size() / 2);
data2 = input.subList(input.size() / 2 + 1, input.size());
}
double q1 = getMedian(data1);
double q3 = getMedian(data2);
double iqr = q3 - q1;
double lowerFence = q1 - 1.5 * iqr;
double upperFence = q3 + 1.5 * iqr;
for (int i = 0; i < input.size(); i++) {
if (input.get(i) < lowerFence || input.get(i) > upperFence)
output.add(input.get(i));
}
return output;
}
private static double getMedian(List<Double> data) {
if (data.size() % 2 == 0)
return (data.get(data.size() / 2) + data.get(data.size() / 2 - 1)) / 2;
else
return data.get(data.size() / 2);
}
}
Output:
[20.0]
Explanation:
Sort a list of integers, from low to high
Split a list of integers into 2 parts (by a middle) and put them into 2 new separate ArrayLists (call them "left" and "right")
Find a middle number (median) in both of those new ArrayLists
Q1 is a median from left side, and Q3 is the median from the right side
Applying mathematical formula:
IQR = Q3 - Q1
LowerFence = Q1 - 1.5*IQR
UpperFence = Q3 + 1.5*IQR
More info about this formula: http://www.mathwords.com/o/outlier.htm
Loop through all of my original elements, and if any of them are lower than a lower fence, or higher than an upper fence, add them to
"output" ArrayList
This new "output" ArrayList contains the outliers
An implementation of the Grubb's test can be found at MathUtil.java. It will find a single outlier, of which you can remove from your list and repeat until you've removed all outliers.
Depends on commons-math, so if you're using Gradle:
dependencies {
compile 'org.apache.commons:commons-math:2.2'
}
find the mean value for your list
create a Map that maps the number to the distance from mean
sort values by the distance from mean
and differentiate last n number, making sure there is no injustice with distance
Use this algorithm. This algorithm uses the average and standard deviation. These 2 number optional values (2 * standardDeviation).
public static List<int> StatisticalOutLierAnalysis(List<int> allNumbers)
{
if (allNumbers.Count == 0)
return null;
List<int> normalNumbers = new List<int>();
List<int> outLierNumbers = new List<int>();
double avg = allNumbers.Average();
double standardDeviation = Math.Sqrt(allNumbers.Average(v => Math.Pow(v - avg, 2)));
foreach (int number in allNumbers)
{
if ((Math.Abs(number - avg)) > (2 * standardDeviation))
outLierNumbers.Add(number);
else
normalNumbers.Add(number);
}
return normalNumbers;
}
As Joni already pointed out , you can eliminate outliers with the help of Standard Deviation and Mean. Here is my code, that you can use for your purposes.
public static void main(String[] args) {
List<Integer> values = new ArrayList<>();
values.add(100);
values.add(105);
values.add(102);
values.add(13);
values.add(104);
values.add(22);
values.add(101);
System.out.println("Before: " + values);
System.out.println("After: " + eliminateOutliers(values,1.5f));
}
protected static double getMean(List<Integer> values) {
int sum = 0;
for (int value : values) {
sum += value;
}
return (sum / values.size());
}
public static double getVariance(List<Integer> values) {
double mean = getMean(values);
int temp = 0;
for (int a : values) {
temp += (a - mean) * (a - mean);
}
return temp / (values.size() - 1);
}
public static double getStdDev(List<Integer> values) {
return Math.sqrt(getVariance(values));
}
public static List<Integer> eliminateOutliers(List<Integer> values, float scaleOfElimination) {
double mean = getMean(values);
double stdDev = getStdDev(values);
final List<Integer> newList = new ArrayList<>();
for (int value : values) {
boolean isLessThanLowerBound = value < mean - stdDev * scaleOfElimination;
boolean isGreaterThanUpperBound = value > mean + stdDev * scaleOfElimination;
boolean isOutOfBounds = isLessThanLowerBound || isGreaterThanUpperBound;
if (!isOutOfBounds) {
newList.add(value);
}
}
int countOfOutliers = values.size() - newList.size();
if (countOfOutliers == 0) {
return values;
}
return eliminateOutliers(newList,scaleOfElimination);
}
eliminateOutliers() method is doing all the work
It is a recursive method, which modifies the list with every recursive call
scaleOfElimination variable, which you pass to the method, defines at what scale
you want to remove outliers: Normally i go with 1.5f-2f, the greater the variable is,
the less outliers will be removed
The output of the code:
Before: [100, 105, 102, 13, 104, 22, 101]
After: [100, 105, 102, 104, 101]
I'm very glad and thanks to Valiyev. His solution helped me a lot. And I want to shere my little SRP on his works.
Please note that I use List.of() to store Dixon's critical values, for this reason it is required to use Java higher than 8.
public class DixonTest {
protected List<Double> criticalValues =
List.of(0.941, 0.765, 0.642, 0.56, 0.507, 0.468, 0.437);
private double scaleOfElimination;
private double mean;
private double stdDev;
private double getMean(final List<Double> input) {
double sum = input.stream()
.mapToDouble(value -> value)
.sum();
return (sum / input.size());
}
private double getVariance(List<Double> input) {
double mean = getMean(input);
double temp = input.stream()
.mapToDouble(a -> a)
.map(a -> (a - mean) * (a - mean))
.sum();
return temp / (input.size() - 1);
}
private double getStdDev(List<Double> input) {
return Math.sqrt(getVariance(input));
}
protected List<Double> eliminateOutliers(List<Double> input) {
int N = input.size() - 3;
scaleOfElimination = criticalValues.get(N).floatValue();
mean = getMean(input);
stdDev = getStdDev(input);
return input.stream()
.filter(this::isOutOfBounds)
.collect(Collectors.toList());
}
private boolean isOutOfBounds(Double value) {
return !(isLessThanLowerBound(value)
|| isGreaterThanUpperBound(value));
}
private boolean isGreaterThanUpperBound(Double value) {
return value > mean + stdDev * scaleOfElimination;
}
private boolean isLessThanLowerBound(Double value) {
return value < mean - stdDev * scaleOfElimination;
}
}
I hope it will help someone else.
Best regard
Thanks to #Emil_Wozniak for posting the complete code. I struggled with it for a while not realizing that eliminateOutliers() actually returns the outliers, not the list with them eliminated. The isOutOfBounds() method also was confusing because it actually returns TRUE when the value is IN bounds. Below is my update with some (IMHO) improvements:
The eliminateOutliers() method returns the input list with outliers removed
Added getOutliers() method to get just the list of outliers
Removed confusing isOutOfBounds() method in favor of a simple filtering expression
Expanded N list to support up to 30 input values
Protect against out of bounds errors when input list is too big or too small
Made stats methods (mean, stddev, variance) static utility methods
Calculate upper/lower bounds only once instead of on every comparison
Supply input list on ctor and store as an instance variable
Refactor to avoid using the same variable name as instance and local variables
Code:
/**
* Implements an outlier removal algorithm based on https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/dixon.htm#:~:text=It%20can%20be%20used%20to,but%20one%20or%20two%20observations).
* Original Java code by Emil Wozniak at https://stackoverflow.com/questions/18805178/how-to-detect-outliers-in-an-arraylist
*
* Reorganized, made more robust, and clarified many of the methods.
*/
import java.util.List;
import java.util.stream.Collectors;
public class DixonTest {
protected List<Double> criticalValues =
List.of( // Taken from https://sebastianraschka.com/Articles/2014_dixon_test.html#2-calculate-q
// Alfa level of 0.1 (90% confidence)
0.941, // N=3
0.765, // N=4
0.642, // ...
0.56,
0.507,
0.468,
0.437,
0.412,
0.392,
0.376,
0.361,
0.349,
0.338,
0.329,
0.32,
0.313,
0.306,
0.3,
0.295,
0.29,
0.285,
0.281,
0.277,
0.273,
0.269,
0.266,
0.263,
0.26 // N=30
);
// Stats calculated on original input data (including outliers)
private double scaleOfElimination;
private double mean;
private double stdDev;
private double UB;
private double LB;
private List<Double> input;
/**
* Ctor taking a list of values to be analyzed.
* #param input
*/
public DixonTest(List<Double> input) {
this.input = input;
// Create statistics on the original input data
calcStats();
}
/**
* Utility method returns the mean of a list of values.
* #param valueList
* #return
*/
public static double getMean(final List<Double> valueList) {
double sum = valueList.stream()
.mapToDouble(value -> value)
.sum();
return (sum / valueList.size());
}
/**
* Utility method returns the variance of a list of values.
* #param valueList
* #return
*/
public static double getVariance(List<Double> valueList) {
double listMean = getMean(valueList);
double temp = valueList.stream()
.mapToDouble(a -> a)
.map(a -> (a - listMean) * (a - listMean))
.sum();
return temp / (valueList.size() - 1);
}
/**
* Utility method returns the std deviation of a list of values.
* #param input
* #return
*/
public static double getStdDev(List<Double> valueList) {
return Math.sqrt(getVariance(valueList));
}
/**
* Calculate statistics and bounds from the input values and store
* them in class variables.
* #param input
*/
private void calcStats() {
int N = Math.min(Math.max(0, input.size() - 3), criticalValues.size()-1); // Changed to protect against too-small or too-large lists
scaleOfElimination = criticalValues.get(N).floatValue();
mean = getMean(input);
stdDev = getStdDev(input);
UB = mean + stdDev * scaleOfElimination;
LB = mean - stdDev * scaleOfElimination;
}
/**
* Returns the input values with outliers removed.
* #param input
* #return
*/
public List<Double> eliminateOutliers() {
return input.stream()
.filter(value -> value>=LB && value <=UB)
.collect(Collectors.toList());
}
/**
* Returns the outliers found in the input list.
* #param input
* #return
*/
public List<Double> getOutliers() {
return input.stream()
.filter(value -> value<LB || value>UB)
.collect(Collectors.toList());
}
/**
* Test and sample usage
* #param args
*/
public static void main(String[] args) {
List<Double> testValues = List.of(1200.0,1205.0,1220.0,1194.0,1212.0);
DixonTest outlierDetector = new DixonTest(testValues);
List<Double> goodValues = outlierDetector.eliminateOutliers();
List<Double> badValues = outlierDetector.getOutliers();
System.out.println(goodValues.size()+ " good values:");
for (double v: goodValues) {
System.out.println(v);
}
System.out.println(badValues.size()+" outliers detected:");
for (double v: badValues) {
System.out.println(v);
}
// Get stats on remaining (good) values
System.out.println("\nMean of good values is "+DixonTest.getMean(goodValues));
}
}
It is just a very simple implementation which fetches the information which numbers are not in the range:
List<Integer> notInRangeNumbers = new ArrayList<Integer>();
for (Integer number : numbers) {
if (!isInRange(number)) {
// call with a predefined factor value, here example value = 5
notInRangeNumbers.add(number, 5);
}
}
Additionally inside the isInRange method you have to define what do you mean by 'good values'. Below you will find an examplary implementation.
private boolean isInRange(Integer number, int aroundFactor) {
//TODO the implementation of the 'in range condition'
// here the example implementation
return number <= 100 + aroundFactor && number >= 100 - aroundFactor;
}
I thought I had this figure out, and I was pretty confident it was going to work. Unfortunately, it did not.
The code below is supposed to return the average, the number of values entered and then calculate the std Var. I get it to return the average and count, but the std var result is off. Using the values 5, 6, 8 and 9 I am supposed to get a result of std var 1.83, I am getting something way off 7.17.
I know my error is in the way I am calculating the std Dev, but I was pretty sure I was doing it right.
Here is my code:
/**
This class is used to calculate the average and standard deviation
of a data set.
*/
public class DataSet{
private double sum;
private double sumSquare;
private int counter;
/**Constructs a DataSet object to hold the
* total number of inputs, sum and square
*/
public DataSet(){
sum = 0;
sumSquare = 0;
counter = 0;
}
/**Adds a value to this data set
* #param x the input value
*/
public void add(double x){
sum = sum + x;
sumSquare = sumSquare + x * x;
counter++;
}
/**Calculate average of dataset
* #return average, the average of the set
*/
public double getAverage(){
double avg = sum / counter;
return avg;
}
/**Get the total inputs values
* #return n, the total number of inputs
*/
public int getCount(){
return counter;
}
public double getStandardDeviation(){
double sqr = sumSquare / counter;
double stdDev = Math.sqrt(sqr);
return stdDev;
}
}
Here is my runner program:
import java.util.Scanner;
class DataSetRunner
{
public static void main(String[] args)
{
Scanner input = new Scanner(System.in);
DataSet data = new DataSet();
boolean done = false;
while (!done)
{
System.out.println("Enter value, Q to quit: ");
String userInput = input.next();
if (userInput.equalsIgnoreCase("Q"))
done = true;
else
{
double x = Double.parseDouble(userInput);
data.add(x);
}
}
System.out.println("Average = " + data.getAverage());
System.out.println("Count = " + data.getCount());
System.out.println("The Standard Deviation is = " + data.getStandardDeviation());
}
Your calculation is incorrect.
Standard deviation is based on the sum of the squares of the difference to the mean.
You are simply summing the squares of the data values.
You must first calculate the mean (ie the average), then once you know that you can calculate the standard deviation using this value.
The correct procedure is (quoting from wikipedia):
To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result of each:
Next, compute the average of these values, and take the square root:
Basically, you can't calculate the standard deviation "as you go".
I'm creating a program which takes a user's info and outputs the min, max, average, sum, and counts how many values were in it. I'm really struggling to figure out how to create default constructor of 100 items and the array size which the user is supposed to define.
Create a new DataSet object. The client creating the object specifies the maximum number
of items that can be added to the set. (Write a constructor with one int parameter.)
Also write a default constructor which creates a DataSet capable of handling 100 items.
Add an integer data item to a DataSet. If the maximum number of items have already been added to the set, the item is simply ignored.
Here is my code
import javax.swing.*;
import java.util.*;
public class DataSet {
private int count; // Number of numbers that have been entered.
private double sum; // The sum of all the items that have been entered.
private double min;
private double max;
//Adds numbers to dataset.
public void addDatum(double num) {
count++;
sum += num;
if (count == 1){
min = num;
max = num;
} else if (num < min){
min = num;
} else if (num > max){
max = num;
}
}
public boolean isEmpty()
{
if(count == 0)
{
return true;
}
else
{
return false;
}
}
//Return number of items entered into the dataset.
public int getCount() {
return count;
}
//Return the sum of all the numbers that have been entered.
public double getSum() {
return sum;
}
//Return the average of all the numbers that have been entered.
public double getAvg() {
return sum / count;
}
//return Maximum value of data entered.
public double getMax(){
return max;
}
//return Minimum value of data entered.
public double getMin(){
return min;
}
public static void main (String[] args){
Scanner scanner = new Scanner(System.in);
DataSet calc = new DataSet();
double nextnumber = 0;
while (true){
System.out.print("Enter the next number(0 to exit): ");
nextnumber = scanner.nextDouble();
if (nextnumber == 0)
break;
calc.addDatum(nextnumber);
}
System.out.println("Min = "+calc.getMin());
System.out.println("Max = "+calc.getMax());
System.out.println("Mean = "+calc.getAvg());
System.out.println("Count = "+calc.getCount());
System.out.println("Sum = "+calc.getSum());
}
} //end class DataSet
The syntax for declaring an array is type[] name; (there are variants, but this is the most common)
So an int array is declared as thus:
int[] someIntegers;
Creating a new array can be done several ways. The normal way is to create an empty array with all elements initialised to their default value (zero or false for primitive datatypes, and null for object arrays). The syntax is:
someIntegers = new int[4]; // ie. [0, 0, 0, 0]
// or
int n = ...; // intitalise n some how
someIntegers = new int[n];
// this way we can get different length arrays at runtime
You have to add a variable to hold the max amount of numbers.
int max = 0;
Then you would need the two constructors:
Dataset() {
max = 100;
}
Dataset(int max) {
this.max = max;
}
Then when you get the input, you have to check if you have reached the number limit before you do anything.
System.out.print("Enter the next number(0 to exit): ");
nextnumber = scanner.nextDouble();
if (count < max) {
if (nextnumber == 0) {
break;
}
calc.addDatum(nextnumber);
}
Your code above does not contain any constructors, so only the default DataSet() constructor is available. In your DataSet class, you need to define both constructors to meet your requirements. In addition you will need to create a collection type (ie an array of ints) for storing the numbers added to the dataset (this seems to be part of your requirements). With the code below, when you create an instance of the DataSet class in your main method, you can create it with the default 100 elements by saying
DataSet myDataSet = new DataSet();
or you can create it with a user specified number of elements like
DataSet myDataSet = new DataSet(30); //for thirty elements in the array
public class DataSet {
int[] myArray;
public DataSet() //Zero parameters constructor
{
//initialize your array to 100 elements here
myArray = new int[100]; //the array can hold 100 elements
}
public DataSet(int max) //One parameter constructor
{
//initialize your array to 'max' elements here
myArray = new int[max]; //the array can hold max number of elements
}
public void AddNum(int num)
{
//logic to add number to the array here :P
}
}
How do I create a loop to generate min, max, avg for 2 array lists, i have only generated the min, max and avg with sum for single array lists so far.
These are the 2 arrays User[] & Withdrawals[]:
User, Withdrawals
1 , 90.00
2 , 85.00
4 , 75.00
5 , 65.00
2 , 40.00
1 , 80.00
3 , 50.00
5 , 85.00
4 , 80.00
1 , 70.00
size = 10
This is what i have tried, as i have no clue about 2 arrays interdependent:
double min = 0.0;
double max = 0.0;
double sum = 0.0;
double avg = 0.0;
for(int i = 0; i <size; i++){
.
.
for(int j = 0; j < Withdrawals.length; j++){
if(Withdrawals[User[i]] > max){
max = Withdrawals[j];
}
if(Withdrawals[User[i]] < min){
min = Withdrawals[j];
}
}
sum += Withdrawals[j];
avg = sum/size;
}
how do i print the min, max, avg from the no of withdrawals per user ? :S
I have already counted the number of withdrawals per user.
Conditions are: create everything from scratch instead of using available library features of Java.
Divide and conquer :)
Yes, I know that is a term used for an algorithm technique, in this case what I mean is... work with small parts.
First having the min, max, avg for a simple array:
double[] values = {2,3,4,5,6,7};
double min = values[0];
double max = values[0];
double sum = 0;
for (double value : values) {
min = Math.min(value, min);
max = Math.max(value, max);
sum += value;
}
double avg = sum / values.length;
System.out.println("Min: " + min);
System.out.println("Max: " + max);
System.out.println("Avg: " + avg);
Note: Since you can't use Java libraries for your assignment, is easy to do your own versions of the min/max functions (read the Math JavaDoc)
Now you can encapsulate this code in a function, you can start by returning another array:
static double[] minMaxAvg(double[] values) {
double min = values[0];
double max = values[0];
double sum = 0;
for (double value : values) {
min = Math.min(value, min);
max = Math.max(value, max);
sum += value;
}
double avg = sum / values.length;
return new double[] {min, max, avg};
}
public static void main(String[] args) {
double[] values = {2,3,4,5,6,7};
double[] info = minMaxAvg(values);
System.out.println("Min: " + info[0]);
System.out.println("Max: " + info[1]);
System.out.println("Avg: " + info[2]);
}
Using an array is a little bit ugly to read, so is better if you create a class to hold the min, max, avg. So lets refactor the code a little bit:
class ValueSummary {
final double min;
final double max;
final double avg;
static ValueSummary createFor(double[] values) {
double min = values[0];
double max = values[0];
double sum = 0;
for (double value : values) {
min = Math.min(value, min);
max = Math.max(value, max);
sum += value;
}
double avg = sum / values.length;
return new ValueSummary(min, max, avg);
}
ValueSummary(double min, double max, double avg) {
this.min = min;
this.max = max;
this.avg = avg;
}
public String toString() {
return "Min: " + min + "\nMax: " + max +"\nAvg: " + avg;
}
}
public static void main(String[] args) {
double[] values = {2,3,4,5,6,7};
ValueSummary info = ValueSummary.createFor(values);
System.out.println(info);
}
You don't specify it in your question, but I assume that you have an array for each user (maybe each withdrawals is another array).
Now that you have the bottom parts, we can switch to a top-down thinking.
So your code could be something like this:
for (User aUser : users) {
System.out.println("User: " + aUser);
System.out.println(ValueSummary.createFor(withdrawalsOf(aUser)));
}
Ok, but this is just the idea, you still have the problem to relate aUser with its withdrawals. You have several options here:
Make a "table" User-> Withdrawals, that is what you are trying to do with the two arrays. The User index in the array acts like a "user id". When you learn about Map you will see that you can use a better representation for the index.
Having a Map or array is just an optimization, of the relationship User->Withdrawls, but you can represent that relationship with an object (ie UserWithdrawls)
Option 1:
static class User {
final String name;
public User(String s) { name = s; }
}
public static void main(String[] args) {
User[] users = { new User("John"), new User("Doe")};
double[][] withdrawals = {
new double[] { 1, 2, 3}, new double[] { 10,22, 30}
};
for (int i = 0; i < users.length; i++) {
System.out.println("User: " + users[i].name);
System.out.println(ValueSummary.createFor(withdrawals[i]));
}
}
Option 2:
static class User {
final String name;
public User(String s) { name = s; }
}
static class UserWithdrawls {
final User user;
final double[] withdrawals;
final ValueSummary summary;
UserWithdrawls(User user, double[] withdrawals) {
this.user = user;
this.withdrawals = withdrawals;
this.summary = ValueSummary.createFor(withdrawals);
}
}
public static void main(String[] args) {
UserWithdrawls[] userWithdrawls = {
new UserWithdrawls(new User("John"), new double[] { 1, 2, 3}),
new UserWithdrawls(new User("Doe"), new double[] { 10, 22, 30})
};
for (UserWithdrawls uw : userWithdrawls) {
System.out.println("User: " + uw.user.name);
System.out.println(uw.summary);
}
}
Additional notes: If you are studying Computer Science, you'll learn in the future that the loop to calculate max, min, avg has a complexity of O(n). If the values array is fully loaded in memory, doing the max/min/avg in three different functions (thus reading the array 3 times) is still an algorithm of O(n) order with a bigger constant. With the power of today's computers the constant is so small, that most of the time you'll not get any gain from calculating min/max/avg in the same loop. In contrast you can gain code readability, for example in Groovy the minMaxAvg code could be written like this:
def values = [2,3,4,5,6,7];
println values.min()
println values.max()
println values.sum() / values.size()
Quick n Dirty: Use a second for loop for the second array, but do not reinitialize the min, max etc again.
Cleaner would be to make a class to hold the min, max etc, and a method that is passed this result object and an array. The method then scans the array and updates the result objects min, max etc. Call the method for each array.
Why don't you try to look at the code of Descriptive Statistics in the Commons Math library? Or better, use it instead of reinvent the wheel?
DescriptiveStatistics de = new DescriptiveStatistics();
de.addValue(..) // Your values
// Add more values
Double max = de.getMax();
Double min = de.getMin();
Double avg = de.getSum() / de.getN(); // or de.getMean();
And use an instance of DescriptiveStatistics for every array.
I think it would be better if you stored the details for each user in a seperate data structure like the following class named UserWithdrawals.
public class Program1{
public static class UserWithdrawals{
private LinkedList<Double> withdrawals=new LinkedList<>();
public void add(Double amt){
this.withdrawals.add(amt);
}
public Double getMinimum(){
Double min=this.withdrawals.get(0);
for(Double amt:this.withdrawals)
if(amt.compareTo(min)<0) min=amt;
return min;
}
public Double getMaximum(){
Double max=this.withdrawals.get(0);
for(Double amt:this.withdrawals)
if(amt.compareTo(max)>0) max=amt;
return max;
}
public Double getAverage(){
Double sum=new Double(0);
for(Double amt:this.withdrawals)
sum+=amt;
return sum/this.withdrawals.size();
//this method will fail if the withdrawals list is updated during the iteration
}
/*You can also combine the three into a single method and return an array of Double object coz the iteration is same.*/
}
/*now you iterate over your two array lists (This wont work if the two array lists - 'Users' and 'Withdrawals' are of different size) and store the withdrawal data associated with a user in the corresponding map value - Maps or Associative arrays are a very basic data structure so your professor should not have any problems with this*/
private HashMap<Integer,UserWithdrawals> withdrawals_map=new HashMap<>();
public Program1(ArrayList<Integer> Users, ArrayList<Double> Withdrawals){
for(int i=0;i<Users.size();i++){
Integer user_no=Users.get(i);
Double withdrawal_amt=Withdrawals.get(i);
if(this.withdrawals_map.get(user_no)==null){
this.withdrawals_map.put(user_no,new UserWithdrawals());
}
this.withdrawals_map.get(user_no).add(withdrawal_amt);
}
}
public UserWithdrawals getUserWithdrawalsData(Integer user_no){
return this.withdrawals_map.get(user_no);
}
}
Sort the 2D array in O(log(n)) based on 1st column, by using c++ STL Sort function.
Traverse in O(n) to calculate the average and update MaxAverage.
// Driver function to sort the 2D vector
// on basis of a particular column
bool sortcol( const vector<int>& v1, const vector<int>& v2 ) {
return v1[0] < v2[0];
}
void sortMatrix()
{
// Initializing 2D vector "vect" with
// values S_ID,MARKS
vector< vector<int> > vect{{1,85}, {2,90}, {1,87}, {1,99}, {3,70}};
// Number of rows
int m = vect.size();
// Number of columns
int n = vect[0].size();
// Use of "sort()" for sorting on basis
// of 1st column
sort(vect.begin(), vect.end(),sortcol);
float maxAverage=-1;
int id=1; // assuming it starts from 1.
float sum=0;
int s=0; // size of marks per student to calculate average
for( int i=0; i<m; i++ )
{
sum+=vect[i][1];
s=s+1;
if( i+1!= m && vect[i+1][0] != vect[i][0] ){// gotten all the marks of this student
maxAverage = maxAverage>sum/s? maxAverage:sum/s;
id = vect[i][0];
s=0;
sum=0;
}
}
cout<<"ID: "<<id<<"\tValue: "<<maxAverage<<endl;
}
Output:
ID: 2 Value: 90.3333