Efficient algorithm for detecting different elements in a collection - java

Imagine you have a set of five elements (A-E) with some numeric values of a measured property (several observations for each element, for example "heart rate"):
A = {100, 110, 120, 130}
B = {110, 100, 110, 120, 90}
C = { 90, 110, 120, 100}
D = {120, 100, 120, 110, 110, 120}
E = {110, 120, 120, 110, 120}
First, I have to detect if there are significant differences on the average levels. So I run a one way ANOVA using the Statistical package provided by Apache Commons Math. No problems so far, I obtain a boolean that tells me whether differences are found or not.
Second, if differences are found, I need to know the element (or elements) that is different from the rest. I plan to use unpaired t-tests, comparing each pair of elements (A with B, A with C .... D with E), to know if an element is different than the other. So, at this point I have the information of the list of elements that present significant differences with others, for example:
C is different than B
C is different than D
But I need a generic algorithm to efficiently determine, with that information, what element is different than the others (C in the example, but could be more than one).
Leaving statistical issues aside, the question could be (in general terms): "Given the information about equality/inequality of each one of the pairs of elements in a collection, how could you determine the element/s that is/are different from the others?"
Seems to be a problem where graph theory could be applied. I am using Java language for the implementation, if that is useful.
Edit: Elements are people and measured values are times needed to complete a task. I need to detect who is taking too much or too few time to complete the task in some kind of fraud detection system.

Just in case anyone is interested in the final code, using Apache Commons Math to make statistical operations, and Trove to work with collections of primitive types.
It looks for the element(s) with the highest degree (the idea is based on comments made by #Pace and #Aniko, thanks).
I think the final algorithm is O(n^2), suggestions are welcome. It should work for any problem involving one cualitative vs one cuantitative variable, assuming normality of the observations.
import gnu.trove.iterator.TIntIntIterator;
import gnu.trove.map.TIntIntMap;
import gnu.trove.map.hash.TIntIntHashMap;
import gnu.trove.procedure.TIntIntProcedure;
import gnu.trove.set.TIntSet;
import gnu.trove.set.hash.TIntHashSet;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.math.MathException;
import org.apache.commons.math.stat.inference.OneWayAnova;
import org.apache.commons.math.stat.inference.OneWayAnovaImpl;
import org.apache.commons.math.stat.inference.TestUtils;
public class TestMath {
private static final double SIGNIFICANCE_LEVEL = 0.001; // 99.9%
public static void main(String[] args) throws MathException {
double[][] observations = {
{150.0, 200.0, 180.0, 230.0, 220.0, 250.0, 230.0, 300.0, 190.0 },
{200.0, 240.0, 220.0, 250.0, 210.0, 190.0, 240.0, 250.0, 190.0 },
{100.0, 130.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 },
{200.0, 230.0, 150.0, 230.0, 240.0, 200.0, 210.0, 220.0, 210.0 },
{200.0, 230.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0, 150.0 }
};
final List<double[]> classes = new ArrayList<double[]>();
for (int i=0; i<observations.length; i++) {
classes.add(observations[i]);
}
OneWayAnova anova = new OneWayAnovaImpl();
// double fStatistic = anova.anovaFValue(classes); // F-value
// double pValue = anova.anovaPValue(classes); // P-value
boolean rejectNullHypothesis = anova.anovaTest(classes, SIGNIFICANCE_LEVEL);
System.out.println("reject null hipothesis " + (100 - SIGNIFICANCE_LEVEL * 100) + "% = " + rejectNullHypothesis);
// differences are found, so make t-tests
if (rejectNullHypothesis) {
TIntSet aux = new TIntHashSet();
TIntIntMap fraud = new TIntIntHashMap();
// i vs j unpaired t-tests - O(n^2)
for (int i=0; i<observations.length; i++) {
for (int j=i+1; j<observations.length; j++) {
boolean different = TestUtils.tTest(observations[i], observations[j], SIGNIFICANCE_LEVEL);
if (different) {
if (!aux.add(i)) {
if (fraud.increment(i) == false) {
fraud.put(i, 1);
}
}
if (!aux.add(j)) {
if (fraud.increment(j) == false) {
fraud.put(j, 1);
}
}
}
}
}
// TIntIntMap is sorted by value
final int max = fraud.get(0);
// Keep only those with a highest degree
fraud.retainEntries(new TIntIntProcedure() {
#Override
public boolean execute(int a, int b) {
return b != max;
}
});
// If more than half of the elements are different
// then they are not really different (?)
if (fraud.size() > observations.length / 2) {
fraud.clear();
}
// output
TIntIntIterator it = fraud.iterator();
while (it.hasNext()) {
it.advance();
System.out.println("Element " + it.key() + " has significant differences");
}
}
}
}

Your edit gives good details; thanks,
Based on that I would presume a fairly well-behaved distribution of times (normal, or possibly gamma; depends on how close to zero your times get) for typical responses. Rejecting a sample from this distribution could be as simple as computing a standard deviation and seeing which samples lie more than n stdevs from the mean, or as complex as taking subsets which exclude outliers until your data settles down into a nice heap (e.g. the mean stops moving around 'much').
Now, you have an added wrinkle if you assume that a person who monkeys with one trial will monkey with another. So you're erally trying to discriminate between a person who just happens to be fast (or slow) vs. one who is 'cheating'. You could do something like compute the stdev rank of each score (I forget the proper name for this: if a value is two stdevs above the mean, the score is '2'), and use that as your statistic.
Then, given this new statistic, there are some hypotheses you'll need to test. E.g., my suspicion is that the stdev of this statistic will be higher for cheaters than for someone who is just uniformly faster than other people--but you'd need data to verify that.
Good luck with it!

You would have to run the paired t-test (or whatever pairwise test you want to implement) and the increment the counts in a hash where the key is the Person and the count is the number times it was different.
I guess you could also have an arrayList that contains people objects. The people object could store their ID and the counts of time they were different. Implement comparable and then you could sort the arraylist by count.

If the items in the list were sorted in numerical order, you can walk two lists simultaneously, and any differences can easily be recognized as insertions or deletions. For example
List A List B
1 1 // Match, increment both pointers
3 3 // Match, increment both pointers
5 4 // '4' missing in list A. Increment B pointer only.
List A List B
1 1 // Match, increment both pointers
3 3 // Match, increment both pointers
4 5 // '4' missing in list B (or added to A). Incr. A pointer only.

Related

Using Java Stream API, finding highest value of variable, with the stream of the changes made to the variable

Context/Scenario
Let's say we have an immutable object called Transaction, where transaction.getAction() would return a TransactionAction enum which can be DEPOSIT or WITHDRAW, and transaction.getAmount() would return an Integer which specify the amount of money being deposit or withdrawn.
enum TransactionAction {
WITHDRAW,
DEPOSIT
}
public class Transaction {
private final TransactionAction action;
private final int amount;
public Transaction(TransactionAction action, int amount) {
this.action = action;
this.amount = amount;
}
public TransactionAction getAction() {
return action;
}
public int getAmount() {
return amount;
}
}
Question
We now have a Stream<Transaction> which is a stream filled with Transaction that can either be DEPOSIT or WITHDRAW. We can imagine this Stream<Transaction> as a history of transactions of one particular bank account.
What I am trying to achieve is to get the highest balance the account has ever achieved in most efficient manner (thus using Stream API).
Example
Bob transaction history is:
// balance start at 0
[DEPOSIT] 1200 // balance: 1200
[DEPOSIT] 500 // balance: 1700
[WITHDRAW] 700 // balance: 1000
[DEPOSIT] 300 // balance: 1300
[WITHDRAW] 800 // balance: 500
[WITHDRAW] 500 // balance: 0
Bob's highest balance is 1700.
What you need is to find the maximum value of a cumulative sum. In pseudo-code, this would be something like:
transactions = [1200, 500, -700, 300, -800, -500]
csum = cumulativeSum(transactions) // should be [1200,1700,1000,1300,500,0]
max(csum) // should be 1700
The imperative way:
The traditional for-loop is well suited for such cases. It should be fairly easy to write and is probably the most efficient alternative both in time and space. It does not require multiple iterations and it does not require extra lists.
int max = 0;
int csum = 0;
for (Transaction t: transactions) {
int amount = (t.getAction() == TransactionAction.WITHDRAW ? -1 : 1) * t.getAmount();
csum += amount;
if (csum > max) max = csum;
}
Diving into functional:
Streams are a functional programming concept and, as such, they are free of side-effects and well suited for stateless operations. Keeping the cumulative state is considered a side-effect, and then we would have to talk about Monads to bring those side-effects under control and... we don't want to go that way.
Java, not being a functional language (although allowing for functional style), cares less about purity. You could simply have a control variable outside the stream to keep track of that external state within the current map or reduce operations. But that would also be giving up everything Streams are meant for.
So let's see how Java's experienced fellows do in this matter. In pure Haskell, the cumulative sum can be achieved with a Scan Left operation:
λ> scanl1 (+) [1200, 500, -700, 300, -800, -500]
[1200,1700,1000,1300,500,0]
Finding the maximum of this would be as simple as:
λ> maximum ( scanl1 (+) [1200, 500, -700, 300, -800, -500] )
1700
A Java Streams solution:
Java does not have such an idiomatic way of expressing a scan left, but you may achieve a similar result with collect.
transactions.stream()
.map(t -> (t.getAction() == TransactionAction.WITHDRAW ? -1 : 1) * t.getAmount())
.collect(ArrayList<Integer>::new, (csum, amount) ->
csum.add(csum.size() > 0 ? csum.get(csum.size() - 1) + amount : amount),
ArrayList::addAll)
.stream()
.max(Integer::compareTo);
// returns Optional[1700]
EDIT: As correctly pointed out in the comments, this accumulator function is not associative and problems would appear if trying to use parallelStream instead of stream.
This can be further simplified. For example, if you enrich your TransactionAction enum with a multiplier (-1 for WITHDRAW and 1 for DEPOSIT), then map could be replaced with:
.map(t -> t.getAction().getMultiplier() * t.getAmount())
EDIT: Yet another approach: Parallel Prefix Sum
Since Java 8, arrays offer a parallelPrefix operation that could be used like:
Integer[] amounts = transactions.stream()
.map(t -> (t.getAction() == TransactionAction.WITHDRAW ? -1 : 1) * t.getAmount())
.toArray(Integer[]::new);
Arrays.parallelPrefix(amounts, Integer::sum);
Arrays.stream(amounts).max(Integer::compareTo);
// returns Optional[1700]
As Streams collect, it also requires an associative function, Integer::sum satisfies that property. The downside is that it requires an array and can't be used with lists. Although the parallelPrefix is very efficient, setting up the array to work with it could not pay off.
Wrapping up:
Again, it's possible to achieve this with Java Streams although it won't be as efficient as a traditional loop both in time and space. But you benefit from the compositionality of streams. As always, it's a trade-off.
A stream would not help here. Use a list and a for-loop:
List<Transaction> transactions = ...;
int balance = 0;
int max = 0;
for (Transaction transaction : transactions) {
balance += (transaction.getAction() == TransactionAction.DEPOSIT ? 1 : -1)
* transaction.getAmount();
max = Math.max(max, balance);
}
The problem is that you need to keep track of some state while processing transactions, and you wouldn't be able to do this with streams without introducing complicated or mutable data structures that would make this code bug-prone.
Here is another Stream solution:
AtomicInteger balance = new AtomicInteger(0);
int highestBalance = transactions
.stream()
.mapToInt(transaction -> {
int amount = transaction.getAmount();
if (transaction.getAction() == TransactionAction.WITHDRAW) {
amount = -amount;
}
return balance.accumulateAndGet(amount, Integer::sum);
})
.max()
.orElse(0);
Cumulative Sum of each position can be computed like this:
List<Integer> integers = Arrays.asList(1200, 500, -700, 300, -800, -500);
Stream<Integer[]> cumulativeSum = Stream.iterate(
new Integer[]{0, integers.get(0)},
p -> new Integer[]{p[0] + 1, p[1] + integers.get(p[0] + 1)}
)
.limit(integers.size());
With this you can get the max balance in this way:
Integer[] max = cumulativeSum
.max(Comparator.comparing(p -> p[1]))
.get();
System.out.println("Position: " + max[0]);
System.out.println("Value: " + max[1]);
Or with iterator but here is problem that last sum wouldn't be computed:
Stream<Integer> integerStream = Arrays.stream(new Integer[]{
1200, 500, -700, 300, -800, -500});
Iterator<Integer> iterator = integerStream.iterator();
Integer maxCumulativeSum = Stream.iterate(iterator.next(), p -> p + iterator.next())
.takeWhile(p -> iterator.hasNext())
.max(Integer::compareTo).get();
System.out.println(maxCumulativeSum);
Problem is with takeWhile and it may be solved with takeWhileInclusive (from external library).
A wrong solution
// Deposit is positive, withdrawal is negative.
final Stream<Integer> theOriginalDepositWithdrawals = Stream.of(1200, 500, -700, 300, -800, -500);
final Stream<Integer> sequentialDepositWithdrawals = theOriginalDepositWithdrawals.sequential();
final CurrentBalanceMaximumBalance currentMaximumBalance = sequentialDepositWithdrawals.<CurrentBalanceMaximumBalance>reduce(
// Identity.
new CurrentBalanceMaximumBalance(0, Integer.MIN_VALUE),
// Accumulator.
(currentAccumulation, elementDepositWithdrawal) -> {
final int newCurrentBalance =
currentAccumulation.currentBalance +
elementDepositWithdrawal;
final int newMaximumBalance = Math.max(
currentAccumulation.maximumBalance,
newCurrentBalance
);
return new CurrentBalanceMaximumBalance(
newCurrentBalance,
newMaximumBalance
);
},
// Combiner.
(res1, res2) -> {
final int newCurrentBalance =
res1.currentBalance +
res2.currentBalance;
final int newMaximumBalance = Math.max(
res1.maximumBalance,
res2.maximumBalance
);
return new CurrentBalanceMaximumBalance(
newCurrentBalance, newMaximumBalance
);
}
);
System.out.println("Maximum is: " + currentMaximumBalance.maximumBalance);
Helper class:
class CurrentBalanceMaximumBalance {
public final int currentBalance;
public final int maximumBalance;
public CurrentBalanceMaximumBalance(
int currentBalance,
int maximumBalance
) {
this.currentBalance = currentBalance;
this.maximumBalance = maximumBalance;
}
}
This is a wrong solution. It might arbitrarily work, but there is no guarantee that it will.
It breaks the interface of reduce. The properties that are broken are associativity for both the accumulator function and the combiner function. It also doesn't require that the stream respects the order of the original transactions.
This makes it possibly dangerous to use, and might well give wrong results depending on what the implementation of reduce happens to be as well as whether the stream respects the original order of the deposits and withdrawals or not.
Using sequential() here is not sufficient, since sequential() is about sequential/parallel execution. An example of a stream that executes sequentially but does not have ordering is a stream created from a HashSet and then have sequential() called on it.
A correct solution
The problem uses the concept of a "current balance", and that is only meaningful when computed from the first transaction and then in order to the end. For instance, if you have the list [-1000, 10, 10, -1000], you cannot start in the middle and then say that the "current balance" was 20 at some point. You must apply the operations reg. "current balance" in the order of the original transactions.
So, one straight-forward solution is to:
Require that the stream respects the original order of transactions, with a defined "encounter order".
Apply forEachOrdered​().

How to code these conditional statements in more elegant & scalable manner

In my software, I need to decide the version of a feature based on 2 parameters. Eg.
Render version 1 -> if (param1 && param2) == true;
Render version 2 -> if (!param1 && !param2) == true;
Render version 3 -> if only param1 == true;
Render version 4 -> if only param2 == true;
So, to meet this requirement, I wrote a code which looks like this -
if(param1 && param2) //both are true {
version = 1;
}
else if(!param1 && !param2) //both are false {
version = 2;
}
else if(!param2) //Means param1 is true {
version = 3;
}
else { //Means param2 is true
version = 4;
}
There are definitely multiple ways to code this but I finalised this approach after trying out different approaches because this is the most readable code I could come up with.
But this piece of code is definitely not scalable because -
Let say tomorrow we want to introduce new param called param3. Then
the no. of checks will increase because of multiple possible
combinations.
For this software, I am pretty much sure that we
will have to accommodate new parameters in future.
Can there be any scalable & readable way to code these requirements?
EDIT:
For a scalable solution define the versions for each parameter combination through a Map:
Map<List<Boolean>, Integer> paramsToVersion = Map.of(
List.of(true, true), 1,
List.of(false, false), 2,
List.of(true, false), 3,
List.of(false, true), 4);
Now finding the right version is a simple map lookup:
version = paramsToVersion.get(List.of(param1, param2));
The way I initialized the map works since Java 9. In older Java versions it’s a little more wordy, but probably still worth doing. Even in Java 9 you need to use Map.ofEntries if you have 4 or more parameters (for 16 combinations), which is a little more wordy too.
Original answer:
My taste would be for nested if/else statements and only testing each parameter once:
if (param1) {
if (param2) {
version = 1;
} else {
version = 3;
}
} else {
if (param2) {
version = 4;
} else {
version = 2;
}
}
But it scales poorly to many parameters.
If you have to enumerate all the possible combinations of Booleans, it's often simplest to convert them into a number:
// param1: F T F T
// param2; F F T T
static final int[] VERSIONS = new int[]{2, 3, 4, 1};
...
version = VERSIONS[(param1 ? 1:0) + (param2 ? 2:0)];
I doubt that there is a way that would be more compact, readable and scalable at the same time.
You express the conditions as minimized expressions, which are compact and may have meaning (in particular, the irrelevant variables don't clutter them). But there is no systematism that you could exploit.
A quite systematic alternative could be truth tables, i.e. the explicit expansion of all combinations and the associated truth value (or version number), which can be very efficient in terms of running-time. But these have a size exponential in the number of variables and are not especially readable.
I am afraid there is no free lunch. Your current solution is excellent.
If you are after efficiency (i.e. avoiding the need to evaluate all expressions sequentially), then you can think of the truth table approach, but in the following way:
declare an array of version numbers, with 2^n entries;
use the code just like you wrote to initialize all table entries; to achieve that, enumerate all integers in [0, 2^n) and use their binary representation;
now for a query, form an integer index from the n input booleans and lookup the array.
Using the answer by Olevv, the table would be [2, 4, 3, 1]. A lookup would be like (false, true) => T[01b] = 4.
What matters is that the original set of expressions is still there in the code, for human reading. You can use it in an initialization function that will fill the array at run-time, and you can also use it to hard-code the table (and leave the code in comments; even better, leave the code that generates the hard-coded table).
Your combinations of parameters is nothing more than a binary number (like 01100) where the 0 indicates a false and the 1 a true.
So your version can be easily calculated by using all the combinations of ones and zeroes. Possible combinations with 2 input parameters are:
11 -> both are true
10 -> first is true, second is false
01 -> first is false, second is true
00 -> both are false
So with this knowledge I've come up with a quite scalable solution using a "bit mask" (nothing more than a number) and "bit operations":
public static int getVersion(boolean... params) {
int length = params.length;
int mask = (1 << length) - 1;
for(int i = 0; i < length; i++) {
if(!params[i]) {
mask &= ~(1 << length - i - 1);
}
}
return mask + 1;
}
The most interesting line is probably this:
mask &= ~(1 << length - i - 1);
It does many things at once, I split it up. The part length - i - 1 calculates the position of the "bit" inside the bit mask from the right (0 based, like in arrays).
The next part: 1 << (length - i - 1) shifts the number 1 the amount of positions to the left. So lets say we have a position of 3, then the result of the operation 1 << 2 (2 is the third position) would be a binary number of the value 100.
The ~ sign is a binary inverse, so all the bits are inverted, all 0 are turned to 1 and all 1 are turned to 0. With the previous example the inverse of 100 would be 011.
The last part: mask &= n is the same as mask = mask & n where n is the previously computed value 011. This is nothing more than a binary AND, so all the same bits which are in mask and in n are kept, where as all others are discarded.
All in all, does this single line nothing more than remove the "bit" at a given position of the mask if the input parameter is false.
If the version numbers are not sequential from 1 to 4 then a version lookup table, like this one may help you.
The whole code would need just a single adjustment in the last line:
return VERSIONS[mask];
Where your VERSIONS array consists of all the versions in order, but reversed. (index 0 of VERSIONS is where both parameters are false)
I would have just gone with:
if (param1) {
if (param2) {
} else {
}
} else {
if (param2) {
} else {
}
}
Kind of repetitive, but each condition is evaluated only once, and you can easily find the code that executes for any particular combination. Adding a 3rd parameter will, of course, double the code. But if there are any invalid combinations, you can leave those out which shortens the code. Or, if you want to throw an exception for them, it becomes fairly easy to see which combination you have missed. When the IF's become too long, you can bring the actual code out in methods:
if (param1) {
if (param2) {
method_12();
} else {
method_1();
}
} else {
if (param2) {
method_2();
} else {
method_none();
}
}
Thus your whole switching logic takes up a function of itself and the actual code for any combination is in another method. When you need to work with the code for a particular combination, you just look up the appropriate method. The big IF maze is then rarely looked at, and when it is, it contains only the IFs themselves and nothing else potentially distracting.

How to cheaply deal with multiple ranges (finding a maximum)

I have an amount of ranges, each with a weight. Every point on the total range is scored by the sum of the weights of all the ranges the point falls into. I'd like to be able to cheaply find the total value of points, and would like to be able to find a maximum. Ideally, it would also be able to find the maximum for a set of (equidistantly) spaced points.
Unfortunately, I'm heavily limited by performance, and am struggling to find a good algorithm for this.
The only two decent solutions I could find are:
- Bruteforce it by sampling a bunch of points. For each: check every range whether it fits, find the total value, then check if it's better than the best so far. Decent point samples can be found by taking the boundaries of the ranges.
- Create a set of buckets. Iterate through all the ranges, adding a value to all the buckets that fit within the range. Then iterate through all the buckets to find the best one
Neither are fast enough for my liking (they have been tested), and the latter isn't continuous so has accuracy problems.
I'd be okay with getting a slightly inaccurate response as long as the performance is way better.
What adds a bit of extra complexity to my particular case is that I'm actually dealing with angles, so the environment is modular. The ranges can't be ordered, and I need to ensure that a range going from 340 degrees to 20 degrees contains both a point at 350 and at 10 degrees.
The angle-ranges I'm dealing with can't exceed 180 beyond degrees and only very rarely are above 90.
The amount of ranges generally isn't very high (1-30), but I need to do this calculation a lot.
The language is Java if it matters.
Make a list (array) of angle intervals. If interval finish value less than start value (20<340), add 360 to the finish (340, 380)
Make a list of pair (angle, +weight for start point or -weight for finish point).
Concatenate list with its copy to provide circular intersection. (It is possible to copy only part of list)
Sort them by angle (use +/- as secondary key in case of tie: - before +)
Make CurrWeight=0
Walk through the list, adding +/weight field to CurrWeight. Check for max value.
(Such approach works for linear lists, I tried to modify it for circular ones, perhaps I might miss some caveats)
here, instead of the term 'edges', i should have better used the term 'boundaries', because it referes to interval boundaries
import java.util.ArrayList;
import java.util.Iterator;
import java.util.SortedSet;
import java.util.TreeSet;
public class Main {
ArrayList<Interval> intervals;
public static void main(String args[]) {
Main main = new Main();
main.intervals = new ArrayList<Interval>();
Interval i1 = new Interval(10, 30, 1);
Interval i2= new Interval(20, 40, 1);
Interval i3= new Interval(50, 60, 1);
Interval i4= new Interval(0, 70, 1);
main.intervals.add(i1);
main.intervals.add(i2);
main.intervals.add(i3);
main.intervals.add(i4);
Interval winningInterval = main.processIntervals(main.intervals);
System.out.println("winning interval="+winningInterval);
}
public Interval processIntervals(ArrayList<Interval> intervals)
{
SortedSet<Integer> intervalEdges = new TreeSet<Integer>();
for(int i = 0;i<intervals.size();i++)
{
Interval currentInterval = intervals.get(i);
intervalEdges.add(currentInterval.a);
intervalEdges.add(currentInterval.b);
}
System.out.println(intervalEdges);
//edges stores the same data as intervalEdges, but for convenience, it is a list
ArrayList<Integer> edges = new ArrayList<Integer>(intervalEdges);
ArrayList<Interval> intersectionIntervals = new ArrayList<Interval>();
for(int i=0; i<edges.size()-1;i++)
{
Interval newInterval = new Interval(edges.get(i), edges.get(i+1), 0);
int score = 0; //the sum of the values of the overlapping intervals
for(int j=0; j<intervals.size();j++)
{
if(newInterval.isIncludedInInterval(intervals.get(j)))
score = score+ intervals.get(j).val;
}
newInterval.val = score;
intersectionIntervals.add(newInterval);
}
System.out.println(intersectionIntervals);
int maxValue=0; //the maximum value of an interval
Interval x = new Interval(-1,-1,0);//that interval with the maximum value
for(int i=0; i<intersectionIntervals.size();i++)
{
if(intersectionIntervals.get(i).val > maxValue)
{
maxValue=intersectionIntervals.get(i).val;
x=intersectionIntervals.get(i);
}
}
return x;
}
}
class Interval
{
public int a, b, val;
public Interval(int a, int b, int val) {
super();
this.a = a;
this.b = b;
this.val = val;
}
#Override
public String toString() {
return "Interval [a=" + a + ", b=" + b + ", val=" + val + "]";
}
boolean isIncludedInInterval(Interval y)
{
//returns true if current interval is included in interval y
return this.a>=y.a && this.b<= y.b;
}
}
gives the output
[0, 10, 20, 30, 40, 50, 60, 70]
[Interval [a=0, b=10, val=1], Interval [a=10, b=20, val=2], Interval [a=20, b=30, val=3], Interval [a=30, b=40, val=2], Interval [a=40, b=50, val=1], Interval [a=50, b=60, val=2], Interval [a=60, b=70, val=1]]
winning interval=Interval [a=20, b=30, val=3]
This solves the case when the intervals are straight line intervals, and not angular intervals. I will come back with modifications to take into account the fact that x=x+360.

How can I improve my 2 sum algorithm for a range of numbers using a hash table?

I have developed an algorithm to solve the 2 sum problem using a hash table although its performance is dreadful for huge inputs.
My goal is to find all distinct numbers x,y where -10000<= x+y <=10000. By the way, is the performance of my code O(n*m) where n the size of input and m the number of keys on the map?
Here is my code:
import com.google.common.base.Stopwatch;
import java.util.Scanner;
import java.util.HashMap;
import java.util.ArrayList;
import static com.google.common.collect.Lists.newArrayList;
public class TwoSum {
private HashMap<Long, Long> map;
private ArrayList<Long> Ts;
private long result = 0L;
public TwoSum() {
Ts = newArrayList();
for(long i = -10000; i < 10001; i++){
Ts.add(i);
}
Scanner scan = new Scanner(System.in);
map = new HashMap<>();
while (scan.hasNextLong()) {
long a = scan.nextLong();
if (!map.containsKey(a)) {
map.put(a, a);
}
}
}
private long count(){
//long c = 0L;
for (Long T : Ts) {
long t = T;
for (Long x : map.values()) {
long y = t - x;
if (map.containsValue(y) && y != x) {
result++;
}
//System.out.println(c++);
}
}
return result / 2;
}
public static void main(String [] args) {
TwoSum s = new TwoSum();
Stopwatch stopwatch = Stopwatch.createStarted();
System.out.println(s.count());
stopwatch.stop();
System.out.println("time:" + stopwatch);
}
}
sample input:
-7590801
-3823598
-5316263
-2616332
-7575597
-621530
-7469475
1084712
-7780489
-5425286
3971489
-57444
1371995
-5401074
2383653
1752912
7455615
3060706
613097
-1073084
7759843
7267574
-7483155
-2935176
-5128057
-7881398
-637647
-2607636
-3214997
-8253218
2980789
168608
3759759
-5639246
555129
-4489068
44019
2275782
-3506307
-8031288
-213609
-4524262
-1502015
-1040324
3258235
32686
1047621
-3376656
7601567
-7051390
6633993
-6245148
4994051
-4259178
856589
6047000
1785511
4449514
-1177519
4972172
8274315
7725694
-4923179
5076288
-876369
-7663790
1613721
4472116
-4587501
3194726
6195357
-3364248
-113737
6260410
1974241
3174620
3510171
7289166
4532581
-6650736
-3782721
7007010
6007081
-7661180
-1372125
-5967818
516909
-7625800
-2700089
-7676790
-2991247
2283308
1614251
-4619234
2741749
567264
4190927
5307122
-5810503
-6665772
output: 6
The gist of your algorithm can be rewritten in pseudocode as:
for all integers t from -10k to 10k,
for all map keys x,
if t - x in map, and t is not 2*x,
count ++
return count / 2
You can easily improve this a bit:
for all integers t from -10k to 10k,
for the lower half of keys x in ascending order such that t is not 2*x
if t - x in map,
count ++
This makes it go twice as fast (you no longer double-count). However, you need to sort your inputs to ensure map keys in ascending order. You can add them into a TreeSet and then move it into a LinkedHashSet. Using Sets is better than Maps if you do not care about the values, and all the information is in the keys.
Running time is still O(inputs * range), since you have two nested loops, one with range iterations and the other with half your input. This is a fundamental shortcoming of the algorithm, and no amount of optimization will fix it.
The question is an assignment from Algorithms: Design and Analysis
- an online course offered by Stanford University and taught by Prof. Tim Roughgarden. I happen to be taking the same course.
The usual solution for looking up t - i in a hash table is O(n) for a single t, but doing that 20001 * 1000000 times results in roughly 20 billion lookups!
A better solution is to create a sorted set xs from the input file, and ∀i ∈ xs, find all numbers from xs in the range [-10000 - i, 10000 - i]. Since a sorted set, by definition, doesn't have duplicates, so we don't need to worry about any number in the range being equal to i. There's one gotcha though, which is really unclear in the problem statement. It is not only sufficient to find unique (x, y) ∀ x, y ∈ xs, but also that their sum is unique. Obviously, 2 unique numbers may produce equal sums (e.g. 2 + 4 = 1 + 5 = 6). Thus, we need to keep track of the sums too.
Lastly, we can stop once we go past 5000, since there can't be any more numbers to the right that add up to less than 10000.
Here's a Scala solution:
def twoSumCount(xs: SortedSet[Long]): Int = {
xs
.foldLeft(collection.mutable.Set.empty[Long]) { (sums, i) =>
if (i < TenThou / 2) {
xs
// using from makes it slower
.range(-TenThou - i, TenThou - i + 1)
.map(_ + i)
// using diff makes it slower
.withFilter(y => !sums.contains(y))
// adding individual elements is faster than using
// diff/filter/filterNot and adding all using ++=
.foreach(sums.add)
}
sums
}
.size
}
Benchmark:
cores: 8
hostname: ***
name: OpenJDK 64-Bit Server VM
osArch: x86_64
osName: Mac OS X
vendor: Azul Systems, Inc.
version: 11.0.1+13-LTS
Parameters(file -> 2sum): 116.069441 ms

Compute the different ways to make (money) change from $167.37?

This was an interview question:
Given an amount, say $167.37 find all the possible ways of generating the change for this amount using the denominations available in the currency?
Anyone who could think of a space and time efficient algorithm and supporting code, please share.
Here is the code that i wrote (working) . I am trying to find the running time of this, any help is appreciated
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Map;
public class change_generation {
/**
* #param args
*/
public static void generatechange(float amount,LinkedList<Float> denominations,HashMap<Float,Integer> useddenominations)
{
if(amount<0)
return;
if(amount==0)
{
Iterator<Float> it = useddenominations.keySet().iterator();
while(it.hasNext())
{
Float val = it.next();
System.out.println(val +" :: "+useddenominations.get(val));
}
System.out.println("**************************************");
return;
}
for(Float denom : denominations)
{
if(amount-denom < 0)
continue;
if(useddenominations.get(denom)== null)
useddenominations.put(denom, 0);
useddenominations.put(denom, useddenominations.get(denom)+1);
generatechange(amount-denom, denominations, useddenominations);
useddenominations.put(denom, useddenominations.get(denom)-1);
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
float amount = 2.0f;
float nikle=0.5f;
float dollar=1.0f;
float ddollar=2.0f;
LinkedList<Float> denominations = new LinkedList<Float>();
denominations.add(ddollar);
denominations.add(dollar);
denominations.add(nikle);
HashMap<Float,Integer> useddenominations = new HashMap<Float,Integer>();
generatechange(amount, denominations, useddenominations);
}
}
EDIT
This is a specific example of the combination / subset problem, answered here.
Finding all possible combinations of numbers to reach a given sum
--- I am retaining my answer below (as it was usefull to someone), however, admittedly, it is not a direct answer to this question ---
ORIGINAL ANSWER
The most common solution is dynamic programming :
First, you find the simplest way to make change of 1, then you use that solution to make change for 2, 3, 4, 5, 6, etc.... At each iteration, you "check" if you can go "backwards" and decrease the amount of coins in your answer. For example, up to "4" you must add pennies. But, once you get to "5", you can remove all pennies, and your solution has only one coin required : the nickel. But then, until 9, you again must add pennies, etc etc etc.
However, the dynamic programming methodology is not gauranteed to be fast.
Alternatively, you can use a greedy method, where you continually pick the largest coin possible. This is extremely fast , but doesnt always give you an optimal solution. However, if your coins are 1 5 10 and 25 , Greedy works perfectly, and is much faster then the linear programming method.
Memoization (kind of) is your friend here. A simple implementation in C:
unsigned int findRes(int n)
{
//Setup array, etc.
//Only one way to make zero... no coins.
results[0] = 1;
for(i=0; i<number_of_coins; i++)
{
for(j=coins[i]; j<=n; j++)
{
results[j] += results[j - coins[i]];
}
}
return results[n];
}
So, what we're really doing here is saying:
1) Our only possible way to make 0 coins is 0 (this is our base case)
2) If we are trying to calculate value m, then let's check each coin k. As long as k <= m, we can use that coin k in a solution
3) Well, if we can use k in a solution, then couldn't we just take the solution for (m-k) and add it to our current total?
I'd try to model this in real life.
If you were at the till and you knew you had to find $167.37 you would probably initially consider $200 as the "simplest" tender, being just two notes. Then, if I had it, I may consider $170, i.e. $100, $50 and $20 (three notes). See where I am going?
More formally, try to over-tender with the minimum number of notes/coins. This would be much easier to enumerate than the full set of possibilities.
Don't use floats, even tiniest inaccuracies will destroy your algorithm.
Go from biggest to lowest coin/banknote. For every possible amount call the function recursively. When there are no more coins left pay the rest in ones and print the solution. This is how it looks in pseudo-C:
#define N 14
int coinValue[N]={20000,10000,5000,2000,1000,500,200,100,50,20,10,5,2,1};
int coinCount[N];
void f(int toSpend, int i)
{
if(coinValue[i]>1)
{
for(coinCount[i]=0;coinCount[i]*coinValue[i]<=toSpend;coinCount[i]++)
{
f(toSpend-coinCount[i]*coinValue[i],i+1);
}
}
else
{
coinCount[i]=toSpend;
print(coinCount);
}
}
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Map;
public class change_generation {
static int jj=1;
public static void generatechange(float amount,LinkedList<Float> denominations,
HashMap<Float,Integer> useddenominations) {
if(amount<0)
return;
if(amount==0) {
Iterator<Float> it = useddenominations.keySet().iterator();
while(it.hasNext()) {
Float val = it.next();
System.out.println(val +" :: "+useddenominations.get(val));
}
System.out.println("**************************************");
return;
}
for(Float denom : denominations) {
if(amount-denom < 0)
continue;
if(useddenominations.get(denom)== null)
useddenominations.put(denom, 0);
useddenominations.put(denom, useddenominations.get(denom)+1);
generatechange(amount-denom, denominations, useddenominations);
useddenominations.put(denom, useddenominations.get(denom)-1);
}
}
public static void main(String[] args) {
float amount = 2.0f;
float nikle=0.25f;
float dollar=1.0f;
float ddollar=2.0f;
LinkedList<Float> denominations = new LinkedList<Float>();
denominations.add(ddollar);
denominations.add(dollar);
denominations.add(nikle);
HashMap<Float,Integer> useddenominations = new HashMap<Float,Integer>();
generatechange(amount, denominations, useddenominations);
}
}

Categories

Resources