Finding min and max point in a series - java

I got a series like this:
20 22 25 27 30 31 30 25 22 19 21 25 28 30 28 27...
As soon as the numbers reach near 30, they start moving negatively, and as soon as they reach near 20, they start moving positively.
I need to find these 2 points using some sort of algo. I'm totally lost.
I can't just do a sort because then I get 31 as max and 19 as min.
In real implementation, the numbers can change, and can be Float as well, instead of just int. It can be something like this:
55.20 57.35 54.30 59.25 61.00 58.20 55.40 53.50 58.75 60.10 55.15 53.40 50.00 51.10 52.00
In this case 53 and 60 are the points, and additionally, a third lower point 50.00.
How would I go ahead on this?

import java.util.List;
import java.util.ArrayList;
public class GetExtrema {
public static <T extends Comparable<T>> List<T> getExtrema(T[] series) {
List<T> extrema = new ArrayList<T>();
extrema.add(series[0]);
boolean upElseDown = series[1].compareTo(series[0]) > 0;
for (int i = 2; i < series.length; ++i) {
if (series[i].compareTo(series[i-1]) > 0 != upElseDown) {
extrema.add(series[i-1]);
upElseDown = !upElseDown;
} // end if
} // end for
extrema.add(series[series.length-1]);
return extrema;
} // end getExtrema()
public static void main(String[] args) {
Integer[] s1 = {20,22,25,27,30,31,30,25,22,19,21,25,28,30,28,27};
List<Integer> extrema = getExtrema(s1);
System.out.println(extrema);
Double[] s2 = {55.2,57.3,54.3,59.2,61.,58.2,55.4,53.5,58.7,60.1,55.1,53.4,50.,51.1,52.};
List<Double> extrema2 = getExtrema(s2);
System.out.println(extrema2);
System.exit(0);
} // end main()
} // end class GetExtrema
Compilation and execution:
javac GetExtrema.java;
CLASSPATH=. java GetExtrema;
## [20, 31, 19, 30, 27]
## [55.2, 57.35, 54.3, 61.0, 53.5, 60.1, 50.0, 52.0]

If this is not a homework assignment and you can use outside libraries, consider something like Apache Commons StatUtils min function. There is a corresponding max function also. This particular library expects doubles, there should be something similar for floats.
If this is a homework assignment and the task at hand is to learn how to develop an algorithm, avoid sorting. It is not necessary. A simple loop and two tracking variables for min and max will do. For each iteration over the series, check the value and check if:
the current value is less than min, if it is set min to the current value
the current value is greater than max, if it set max to the current value

Related

How to cheaply deal with multiple ranges (finding a maximum)

I have an amount of ranges, each with a weight. Every point on the total range is scored by the sum of the weights of all the ranges the point falls into. I'd like to be able to cheaply find the total value of points, and would like to be able to find a maximum. Ideally, it would also be able to find the maximum for a set of (equidistantly) spaced points.
Unfortunately, I'm heavily limited by performance, and am struggling to find a good algorithm for this.
The only two decent solutions I could find are:
- Bruteforce it by sampling a bunch of points. For each: check every range whether it fits, find the total value, then check if it's better than the best so far. Decent point samples can be found by taking the boundaries of the ranges.
- Create a set of buckets. Iterate through all the ranges, adding a value to all the buckets that fit within the range. Then iterate through all the buckets to find the best one
Neither are fast enough for my liking (they have been tested), and the latter isn't continuous so has accuracy problems.
I'd be okay with getting a slightly inaccurate response as long as the performance is way better.
What adds a bit of extra complexity to my particular case is that I'm actually dealing with angles, so the environment is modular. The ranges can't be ordered, and I need to ensure that a range going from 340 degrees to 20 degrees contains both a point at 350 and at 10 degrees.
The angle-ranges I'm dealing with can't exceed 180 beyond degrees and only very rarely are above 90.
The amount of ranges generally isn't very high (1-30), but I need to do this calculation a lot.
The language is Java if it matters.
Make a list (array) of angle intervals. If interval finish value less than start value (20<340), add 360 to the finish (340, 380)
Make a list of pair (angle, +weight for start point or -weight for finish point).
Concatenate list with its copy to provide circular intersection. (It is possible to copy only part of list)
Sort them by angle (use +/- as secondary key in case of tie: - before +)
Make CurrWeight=0
Walk through the list, adding +/weight field to CurrWeight. Check for max value.
(Such approach works for linear lists, I tried to modify it for circular ones, perhaps I might miss some caveats)
here, instead of the term 'edges', i should have better used the term 'boundaries', because it referes to interval boundaries
import java.util.ArrayList;
import java.util.Iterator;
import java.util.SortedSet;
import java.util.TreeSet;
public class Main {
ArrayList<Interval> intervals;
public static void main(String args[]) {
Main main = new Main();
main.intervals = new ArrayList<Interval>();
Interval i1 = new Interval(10, 30, 1);
Interval i2= new Interval(20, 40, 1);
Interval i3= new Interval(50, 60, 1);
Interval i4= new Interval(0, 70, 1);
main.intervals.add(i1);
main.intervals.add(i2);
main.intervals.add(i3);
main.intervals.add(i4);
Interval winningInterval = main.processIntervals(main.intervals);
System.out.println("winning interval="+winningInterval);
}
public Interval processIntervals(ArrayList<Interval> intervals)
{
SortedSet<Integer> intervalEdges = new TreeSet<Integer>();
for(int i = 0;i<intervals.size();i++)
{
Interval currentInterval = intervals.get(i);
intervalEdges.add(currentInterval.a);
intervalEdges.add(currentInterval.b);
}
System.out.println(intervalEdges);
//edges stores the same data as intervalEdges, but for convenience, it is a list
ArrayList<Integer> edges = new ArrayList<Integer>(intervalEdges);
ArrayList<Interval> intersectionIntervals = new ArrayList<Interval>();
for(int i=0; i<edges.size()-1;i++)
{
Interval newInterval = new Interval(edges.get(i), edges.get(i+1), 0);
int score = 0; //the sum of the values of the overlapping intervals
for(int j=0; j<intervals.size();j++)
{
if(newInterval.isIncludedInInterval(intervals.get(j)))
score = score+ intervals.get(j).val;
}
newInterval.val = score;
intersectionIntervals.add(newInterval);
}
System.out.println(intersectionIntervals);
int maxValue=0; //the maximum value of an interval
Interval x = new Interval(-1,-1,0);//that interval with the maximum value
for(int i=0; i<intersectionIntervals.size();i++)
{
if(intersectionIntervals.get(i).val > maxValue)
{
maxValue=intersectionIntervals.get(i).val;
x=intersectionIntervals.get(i);
}
}
return x;
}
}
class Interval
{
public int a, b, val;
public Interval(int a, int b, int val) {
super();
this.a = a;
this.b = b;
this.val = val;
}
#Override
public String toString() {
return "Interval [a=" + a + ", b=" + b + ", val=" + val + "]";
}
boolean isIncludedInInterval(Interval y)
{
//returns true if current interval is included in interval y
return this.a>=y.a && this.b<= y.b;
}
}
gives the output
[0, 10, 20, 30, 40, 50, 60, 70]
[Interval [a=0, b=10, val=1], Interval [a=10, b=20, val=2], Interval [a=20, b=30, val=3], Interval [a=30, b=40, val=2], Interval [a=40, b=50, val=1], Interval [a=50, b=60, val=2], Interval [a=60, b=70, val=1]]
winning interval=Interval [a=20, b=30, val=3]
This solves the case when the intervals are straight line intervals, and not angular intervals. I will come back with modifications to take into account the fact that x=x+360.

Getting the inverse of a function that uses summation in Java

I have a program with one class, which looks like this.
public class Functions {
public static void main(String[] args) {
System.out.println(summationFunction(1)); //Prints 13
System.out.println(summationFunction(2)); //Prints 29
System.out.println(summationFunction(3)); //Prints 48
System.out.println(summationFunction(4)); //Prints 70
}
public static int summationFunction(int input) {
int summedNumber = 0;
int i = input;
while (i > 0) {
summedNumber += i * 3;
i--;
}
return 10 * input + (summedNumber);
}
}
So, this program will take in a given number and apply this function to it:
And this all works well (I have run the class Functions and everything prints just as it's supposed to.) BUT, I need to find the inverse of this function, and I need to be able to translate it to code; I do not know how to do this.
I basically need a function that will return values like this:
public static void main(String[] args) {
System.out.println(summationFunction(13)); //Prints 1
System.out.println(summationFunction(29)); //Prints 2
System.out.println(summationFunction(48)); //Prints 3
System.out.println(summationFunction(70)); //Prints 4
}
which, (as you can tell) is the opposite of the original function.
So to sum everything up, I need a function that will return the inverse of my original function (summationFunction), and I would like to know how I would model this or if there is a quick solution, in code.
One more thing: I know that I can have the method take an input and search for the most similar output of the original method, but I would like to see if there is a simpler way to do this which does not involve searching, thus giving a quicker output speed. And if you wish you can safely assume that the input of the inversed function will always be a number which will give an integer output, like 13, 29, 48, 70, etc...
By the way, if you are going to downvote the question, will you at least give a reason somewhere? The comments perhaps? I can not see any reason that this question is eligible for being downvoted, and a reason would help.
Wolfram Alpha to the rescue !
It tells you that this function can be written as :
1/24*(6*x+23)^2-529/24
So if you want to solve f(x)=a, you have :
x = 1/6*(sqrt(24*a+529)-23)
a = 70
# => x = 4
Note : Using Wolfram shouldn't prevent you from finding the answer on your own.
sum(something*i) is equal to something*sum(i) because something (3 in this case ) doesn't depend on i.
sum(i,i=1..n) is equal to n*(n+1)/2, and it's easy to prove (see Wikipedia)
So your function becomes 10*x+3*x*(x+1)/2
Expanded, it is :
(3 x^2)/2+(23 x)/2
You need to solve (3 x^2)/2+(23 x)/2 = 70, in other words :
(3 x^2)/2+(23 x)/2 - 70 = 0
It is a quadratic equation, with a=3/2, b=23/2 and c=-70 or c=-29 or c=....
You sum can be written like this 3*x*(x+1)/2 so you have equation 10*x + 3*x*(x+1)/2 = y you need to solve it.
Wolfram alpha tells that result will be 1/6.0 * (-23.0+sqrt(529.0+24.0 * y))

How can I improve my 2 sum algorithm for a range of numbers using a hash table?

I have developed an algorithm to solve the 2 sum problem using a hash table although its performance is dreadful for huge inputs.
My goal is to find all distinct numbers x,y where -10000<= x+y <=10000. By the way, is the performance of my code O(n*m) where n the size of input and m the number of keys on the map?
Here is my code:
import com.google.common.base.Stopwatch;
import java.util.Scanner;
import java.util.HashMap;
import java.util.ArrayList;
import static com.google.common.collect.Lists.newArrayList;
public class TwoSum {
private HashMap<Long, Long> map;
private ArrayList<Long> Ts;
private long result = 0L;
public TwoSum() {
Ts = newArrayList();
for(long i = -10000; i < 10001; i++){
Ts.add(i);
}
Scanner scan = new Scanner(System.in);
map = new HashMap<>();
while (scan.hasNextLong()) {
long a = scan.nextLong();
if (!map.containsKey(a)) {
map.put(a, a);
}
}
}
private long count(){
//long c = 0L;
for (Long T : Ts) {
long t = T;
for (Long x : map.values()) {
long y = t - x;
if (map.containsValue(y) && y != x) {
result++;
}
//System.out.println(c++);
}
}
return result / 2;
}
public static void main(String [] args) {
TwoSum s = new TwoSum();
Stopwatch stopwatch = Stopwatch.createStarted();
System.out.println(s.count());
stopwatch.stop();
System.out.println("time:" + stopwatch);
}
}
sample input:
-7590801
-3823598
-5316263
-2616332
-7575597
-621530
-7469475
1084712
-7780489
-5425286
3971489
-57444
1371995
-5401074
2383653
1752912
7455615
3060706
613097
-1073084
7759843
7267574
-7483155
-2935176
-5128057
-7881398
-637647
-2607636
-3214997
-8253218
2980789
168608
3759759
-5639246
555129
-4489068
44019
2275782
-3506307
-8031288
-213609
-4524262
-1502015
-1040324
3258235
32686
1047621
-3376656
7601567
-7051390
6633993
-6245148
4994051
-4259178
856589
6047000
1785511
4449514
-1177519
4972172
8274315
7725694
-4923179
5076288
-876369
-7663790
1613721
4472116
-4587501
3194726
6195357
-3364248
-113737
6260410
1974241
3174620
3510171
7289166
4532581
-6650736
-3782721
7007010
6007081
-7661180
-1372125
-5967818
516909
-7625800
-2700089
-7676790
-2991247
2283308
1614251
-4619234
2741749
567264
4190927
5307122
-5810503
-6665772
output: 6
The gist of your algorithm can be rewritten in pseudocode as:
for all integers t from -10k to 10k,
for all map keys x,
if t - x in map, and t is not 2*x,
count ++
return count / 2
You can easily improve this a bit:
for all integers t from -10k to 10k,
for the lower half of keys x in ascending order such that t is not 2*x
if t - x in map,
count ++
This makes it go twice as fast (you no longer double-count). However, you need to sort your inputs to ensure map keys in ascending order. You can add them into a TreeSet and then move it into a LinkedHashSet. Using Sets is better than Maps if you do not care about the values, and all the information is in the keys.
Running time is still O(inputs * range), since you have two nested loops, one with range iterations and the other with half your input. This is a fundamental shortcoming of the algorithm, and no amount of optimization will fix it.
The question is an assignment from Algorithms: Design and Analysis
- an online course offered by Stanford University and taught by Prof. Tim Roughgarden. I happen to be taking the same course.
The usual solution for looking up t - i in a hash table is O(n) for a single t, but doing that 20001 * 1000000 times results in roughly 20 billion lookups!
A better solution is to create a sorted set xs from the input file, and ∀i ∈ xs, find all numbers from xs in the range [-10000 - i, 10000 - i]. Since a sorted set, by definition, doesn't have duplicates, so we don't need to worry about any number in the range being equal to i. There's one gotcha though, which is really unclear in the problem statement. It is not only sufficient to find unique (x, y) ∀ x, y ∈ xs, but also that their sum is unique. Obviously, 2 unique numbers may produce equal sums (e.g. 2 + 4 = 1 + 5 = 6). Thus, we need to keep track of the sums too.
Lastly, we can stop once we go past 5000, since there can't be any more numbers to the right that add up to less than 10000.
Here's a Scala solution:
def twoSumCount(xs: SortedSet[Long]): Int = {
xs
.foldLeft(collection.mutable.Set.empty[Long]) { (sums, i) =>
if (i < TenThou / 2) {
xs
// using from makes it slower
.range(-TenThou - i, TenThou - i + 1)
.map(_ + i)
// using diff makes it slower
.withFilter(y => !sums.contains(y))
// adding individual elements is faster than using
// diff/filter/filterNot and adding all using ++=
.foreach(sums.add)
}
sums
}
.size
}
Benchmark:
cores: 8
hostname: ***
name: OpenJDK 64-Bit Server VM
osArch: x86_64
osName: Mac OS X
vendor: Azul Systems, Inc.
version: 11.0.1+13-LTS
Parameters(file -> 2sum): 116.069441 ms

binary search guessing number recursively

I am coding a binary search algorithm and I want to get the count minimum guesses does it take to search the number that I provide.suppose that the number which I provide is 33, then it should count 7 steps.
Step no number guessed result range of possible values
0 1-100
1 50 too high 1-49
2 25 too low 26-49
3 37 too high 26-36
4 31 too low 32-36
5 34 too high 32-33
6 32 too low 33-33
7 33 correct
so this is my code for this
package binarySearch;
public class Binary {
int gussedNo;
public static int count =0;
void search(int lowerBound,int upperBound,int num){
gussedNo=upperBound+lowerBound/2;
count();
if(gussedNo==num){
System.out.println(count);}
else if(gussedNo>num){
upperBound=gussedNo-1;
search(lowerBound,upperBound,num);
}
if(gussedNo<num){
lowerBound=gussedNo+1;
search(lowerBound,upperBound,num);
}
}
int count(){
count=count+1;
return count;
}
}
I created a a separate method. here is my my main class..
package binarySearch;
public class MainClass {
public static void main (String[] args){
Binary search= new Binary();
search.search(1, 100,33 );
}
}
Here I have given lowerbound as 1 and uperbound as 100, and the number I want to count guesses for it is 33.
But when I execute the code I get the count as 68..but it should be 7 according to binary search
Take a look at the line where you create the next guess:
gussedNo=upperBound+lowerBound/2;
Due to mathematical operators precedence in Java, this line is the same as having:
gussedNo=upperBound+(lowerBound/2);
Which is clearly not performing a binary search, and thus, not what you wanted. You can solve this by explicitly adding the brackets:
gussedNo = (upperBound + lowerBound) / 2;
here is your problem
gussedNo=upperBound+lowerBound/2;
you forgot abour operatots order execution
it should be
gussedNo=(upperBound+lowerBound)/2;

Efficient algorithm for detecting different elements in a collection

Imagine you have a set of five elements (A-E) with some numeric values of a measured property (several observations for each element, for example "heart rate"):
A = {100, 110, 120, 130}
B = {110, 100, 110, 120, 90}
C = { 90, 110, 120, 100}
D = {120, 100, 120, 110, 110, 120}
E = {110, 120, 120, 110, 120}
First, I have to detect if there are significant differences on the average levels. So I run a one way ANOVA using the Statistical package provided by Apache Commons Math. No problems so far, I obtain a boolean that tells me whether differences are found or not.
Second, if differences are found, I need to know the element (or elements) that is different from the rest. I plan to use unpaired t-tests, comparing each pair of elements (A with B, A with C .... D with E), to know if an element is different than the other. So, at this point I have the information of the list of elements that present significant differences with others, for example:
C is different than B
C is different than D
But I need a generic algorithm to efficiently determine, with that information, what element is different than the others (C in the example, but could be more than one).
Leaving statistical issues aside, the question could be (in general terms): "Given the information about equality/inequality of each one of the pairs of elements in a collection, how could you determine the element/s that is/are different from the others?"
Seems to be a problem where graph theory could be applied. I am using Java language for the implementation, if that is useful.
Edit: Elements are people and measured values are times needed to complete a task. I need to detect who is taking too much or too few time to complete the task in some kind of fraud detection system.
Just in case anyone is interested in the final code, using Apache Commons Math to make statistical operations, and Trove to work with collections of primitive types.
It looks for the element(s) with the highest degree (the idea is based on comments made by #Pace and #Aniko, thanks).
I think the final algorithm is O(n^2), suggestions are welcome. It should work for any problem involving one cualitative vs one cuantitative variable, assuming normality of the observations.
import gnu.trove.iterator.TIntIntIterator;
import gnu.trove.map.TIntIntMap;
import gnu.trove.map.hash.TIntIntHashMap;
import gnu.trove.procedure.TIntIntProcedure;
import gnu.trove.set.TIntSet;
import gnu.trove.set.hash.TIntHashSet;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.math.MathException;
import org.apache.commons.math.stat.inference.OneWayAnova;
import org.apache.commons.math.stat.inference.OneWayAnovaImpl;
import org.apache.commons.math.stat.inference.TestUtils;
public class TestMath {
private static final double SIGNIFICANCE_LEVEL = 0.001; // 99.9%
public static void main(String[] args) throws MathException {
double[][] observations = {
{150.0, 200.0, 180.0, 230.0, 220.0, 250.0, 230.0, 300.0, 190.0 },
{200.0, 240.0, 220.0, 250.0, 210.0, 190.0, 240.0, 250.0, 190.0 },
{100.0, 130.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 },
{200.0, 230.0, 150.0, 230.0, 240.0, 200.0, 210.0, 220.0, 210.0 },
{200.0, 230.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 }
};
final List<double[]> classes = new ArrayList<double[]>();
for (int i=0; i<observations.length; i++) {
classes.add(observations[i]);
}
OneWayAnova anova = new OneWayAnovaImpl();
// double fStatistic = anova.anovaFValue(classes); // F-value
// double pValue = anova.anovaPValue(classes); // P-value
boolean rejectNullHypothesis = anova.anovaTest(classes, SIGNIFICANCE_LEVEL);
System.out.println("reject null hipothesis " + (100 - SIGNIFICANCE_LEVEL * 100) + "% = " + rejectNullHypothesis);
// differences are found, so make t-tests
if (rejectNullHypothesis) {
TIntSet aux = new TIntHashSet();
TIntIntMap fraud = new TIntIntHashMap();
// i vs j unpaired t-tests - O(n^2)
for (int i=0; i<observations.length; i++) {
for (int j=i+1; j<observations.length; j++) {
boolean different = TestUtils.tTest(observations[i], observations[j], SIGNIFICANCE_LEVEL);
if (different) {
if (!aux.add(i)) {
if (fraud.increment(i) == false) {
fraud.put(i, 1);
}
}
if (!aux.add(j)) {
if (fraud.increment(j) == false) {
fraud.put(j, 1);
}
}
}
}
}
// TIntIntMap is sorted by value
final int max = fraud.get(0);
// Keep only those with a highest degree
fraud.retainEntries(new TIntIntProcedure() {
#Override
public boolean execute(int a, int b) {
return b != max;
}
});
// If more than half of the elements are different
// then they are not really different (?)
if (fraud.size() > observations.length / 2) {
fraud.clear();
}
// output
TIntIntIterator it = fraud.iterator();
while (it.hasNext()) {
it.advance();
System.out.println("Element " + it.key() + " has significant differences");
}
}
}
}
Your edit gives good details; thanks,
Based on that I would presume a fairly well-behaved distribution of times (normal, or possibly gamma; depends on how close to zero your times get) for typical responses. Rejecting a sample from this distribution could be as simple as computing a standard deviation and seeing which samples lie more than n stdevs from the mean, or as complex as taking subsets which exclude outliers until your data settles down into a nice heap (e.g. the mean stops moving around 'much').
Now, you have an added wrinkle if you assume that a person who monkeys with one trial will monkey with another. So you're erally trying to discriminate between a person who just happens to be fast (or slow) vs. one who is 'cheating'. You could do something like compute the stdev rank of each score (I forget the proper name for this: if a value is two stdevs above the mean, the score is '2'), and use that as your statistic.
Then, given this new statistic, there are some hypotheses you'll need to test. E.g., my suspicion is that the stdev of this statistic will be higher for cheaters than for someone who is just uniformly faster than other people--but you'd need data to verify that.
Good luck with it!
You would have to run the paired t-test (or whatever pairwise test you want to implement) and the increment the counts in a hash where the key is the Person and the count is the number times it was different.
I guess you could also have an arrayList that contains people objects. The people object could store their ID and the counts of time they were different. Implement comparable and then you could sort the arraylist by count.
If the items in the list were sorted in numerical order, you can walk two lists simultaneously, and any differences can easily be recognized as insertions or deletions. For example
List A List B
1 1 // Match, increment both pointers
3 3 // Match, increment both pointers
5 4 // '4' missing in list A. Increment B pointer only.
List A List B
1 1 // Match, increment both pointers
3 3 // Match, increment both pointers
4 5 // '4' missing in list B (or added to A). Incr. A pointer only.

Categories

Resources