Multiobject Comparable/Comparator interface - java

Is there any standard interface or approach usable in collections/streams (max, sort) for the situation where one might need to compare on multiple sides/objects at once?
The signature could be something like
compare(T... toCompare)
instead of
compare(T object1, T object2)
what I would like is do an implementation that works for comparing operations in Java APIs. But from what I saw, I think I have to adhere mandatory to unitary comparations.
UPDATE: Practical example: I'd like to have a Comparator implementation interpreted by Collections/Stream.max() that allowed me to make multiside comparisons not unitary comparisons (i.e, that accepts multiple T in the compare method). The max function returns the element so that element is the winner of a comparison mechanism, custom implemented, of it against ALL the others, not the winner of n battles 1 vs 1.
UPDATE2: More specific example:
I have (Pineapple,Pizza,Yogurt), and max returns the item such that my custom 1 -> n comparison returns biggest quotient. This quotient could be something like degreeOfYumie. So Pineapple is more yummie than Pizza+Yogurt, Pizza is equally yummie than Pineapple+yogurt, and Yogurt is equally yummie than Pizza+Pineapple. So the winner is Pineaple. If I did that unitary, all the ingredients would be equally yummie. Is there any mechanism for implementing a comparator/comparable as that? Perhaps a "sortable" interface that works on collections, streams and queues?

There is no need for a specialized interface. If you have a Comparator that conforms to the specification, it will be transitive and allow comparing multiple objects. To get the maximum out of three or more elements, simply use, e.g.
Stream.of(42, 8, 17).max(Comparator.naturalOrder())
.ifPresent(System.out::println);
// or
Stream.of("foo", "BAR", "Baz").max(String::compareToIgnoreCase)
.ifPresent(System.out::println);
If you are interested in the index of the max element, you can do it like this:
List<String> list=Arrays.asList("foo", "BAR", "z", "Baz");
int index=IntStream.range(0, list.size()).boxed()
.max(Comparator.comparing(list::get, String.CASE_INSENSITIVE_ORDER))
.orElseThrow(()->new IllegalStateException("empty list"));
Regarding your updated question…
You said you want to establish an ordering based on the quotient of an element’s property and the remaining elements. Let’s think this through
Suppose we have the positive numerical values a, b and c and want to establish an ordering based on a/(b+c), b/(a+c) and c/(a+b).
Then we can transform the term by extending the quotients to have a common denominator:
a(a+c)(a+b) b(b+c)(b+a) c(c+b)(c+a)
--------------- --------------- ---------------
(a+b)(b+c)(a+c) (a+b)(b+c)(a+c) (a+b)(b+c)(a+c)
Since common denominators have no effect on the ordering we can elide them and after expanding the products we get the terms:
a³+a²b+a²c+abc b³+b²a+b²c+abc c³+c²a+c²b+abc
Here we can elide the common summand abc as it has no effect on the ordering.
a³+a²b+a²c b³+b²a+b²c c³+c²a+c²b
then factor out again
a²(a+b+c) b²(a+b+c) c²(a+b+c)
to see that we have a common factor which we can elide as it doesn’t affect the ordering so we finally get
a² b² c²
what does this result tell us? Simply that the quotients are proportional to the values a, b and c, thus have the same ordering. So there is no need to implement a quotient based comparator when we can prove it to have the same outcome as a simple comparator based on the original values a, b and c.
(The picture would be different if negative values were allowed, but since allowing negative values would create the possibility of getting zero as denominator, they are off this use case anyway)
It should be emphasized that any other result for a particular comparator would prove that that comparator is unusable for standard Comparator use cases. If the combined values of all other elements had an effect on the resulting order, in other words, adding another element to the relation would change the ordering, how should an operation like adding an element to a TreeSet or inserting it at the right position of a sorted list work?

The problem with comparing multiple objects at once is what to return.
A Java comparator returns -1 if the first object is "smaller than the second one, 0 if they are equals and 1 if the first one is the "bigger" one.
If you compare more than two objects, an integer wouldn't suffice to describe the difference between said objects.

If you have a normal Comparable<T> you can combine it any way you want. From being able to compare two things you can build anything (see different sorting algorithms, which usually only need a < implementation).
For example here's a naive one for "you could say if it's bigger, equal or smaller than ANY of the objects"
<T extends Comparable<T>> int compare(T... toCompare) {
if (toCompare.length < 2) throw Nothing to compare; // or return something
T first = toCompare[0];
int smallerCount;
int equalCount;
int biggerCount;
for(int i = 1, n = toCompare.length; i < n; ++i) {
int compare = first.compareTo(toCompare[i]);
if(compare == 0) {
equalCount++;
} else if(compare < 0) {
smallerCount++;
} else {
biggerCount++;
}
}
return someCombinationOf(smallerCount, equalCount, biggerCount);
}
However I couldn't figure out a proper way of combining them, what about the sequence (3, 5, 3, 1) where 3 is smaller than 5, equal to 3 and bigger than 1, so all counts are 1; here all your "it's bigger, equal or smaller than ANY" conditions are true at the same time, however you could return the counts as an object if it helps to defer the combination of counts to a later point in time.

Related

Java sorted data structure that allows for logarithmic time removal of values within a range

I was wondering if there's an interface in the Java built in libraries that implements a data structure which is ordered and supports removal over a range. For example, if we call the data structure S (let's say of Integers), I'd like to be able to search for and remove the subset Q of S such that Q consists of all elements in S in the range [start, end] in O(|Q| log |S|) time.
I know in C++, there is an erase method to the Set interface, but it doesn't seem like Java's TreeSet has something similar. Can anyone help? Thanks!
SortedSet.subSet returns a view, which you can then clear().
For example:
TreeSet<String> set = new TreeSet<>(Arrays.asList("A", "B", "C", "D", "E"));
System.out.println(set); // [A, B, C, D, E]
set.subSet("B", "D").clear(); // Inclusive of B, exclusive of D.
System.out.println(set); // [A, D, E]
(The documentation of SortedSet describes how to modify the bounds of subSet to be exclusive and inclusive, respectively, at least for String).
I don't know of any interfaces/libraries, but you could try using a histogram like structure...
For example, let's say we know our structure will only hold integers between min and max inclusive. Then we can simply make an array in the following manor...
int[] hist = new int[max - min + 1];
If we want to add a number i to the histogram, we can simply do...
hist[i - min]++;
Each index in the array represents a number in the predefined range. Each value in the array represents the number of occurrences of a number in the range.
This structure is extremely powerful since it allows for constant time addition, removal, and search.
Let's say we want to remove all elements in the inclusive range Q = [s, e]. We can run the following linear loop...
for (int i = s; i <= e; i++) {
hist[i - min] = 0;
}
This runs in O(|Q|).
Lastly, If we wanted to make the above structure an ordered set instead of an ordered sequence, we could change our array to hold booleans instead of integers.
For more info check out https://en.wikipedia.org/wiki/Counting_sort.

Data structure used to perform the union operation on two disjoint sets

What basic data structure would be best to use for the union operation on two disjoint sets?
Are there any algorithms that would run in O(1) time?
I'm thinking some variety of Hash Table, but I'm kind of stuck.
This is for a study guide in Algorithms and Data Structures.
The full question:
The set operation UNION takes two disjoint sets S1 and S2 as input, and returns a
set S = S1 ∪ S2 consisting of all the elements of S1 and S2 (the sets S1 and S2 are
usually destroyed by this operation). Explain how you can support UNION operation
in O(1) time using a suitable data structure. Discuss what data structure you would
use and describe the algorithm for the UNION operation.
If the sets are disjoint, a linked list (with a head and tail) will be enough. The union in this case is only a concatenation of the lists. In C++:
struct LL {
Value *val;
LL *next;
};
struct LList{
LL *head;
LL *tail;
};
and the union operation will be:
void unify(LList* list1, LList* list2) {
// assuming you take care of edge cases
list1->tail->next = list2->head;
list1->tail = list2->tail;
return;
}
An interesting technique that sometimes applies to that problem (not always though, as you will see), is to use an array of "cycles", each cycle storing a set. The cycles are stored as a bunch of "next element" links, so next[i] will give an integer that represents the next item. In the end the links loop back, so the sets are necessarily disjoint.
The nice thing there is that you can union two sets together by swapping two items. If you have indexes s1 and s2, then the sets they are in (s1 and s2 are not special representatives, you can refer to a set by any of its elements) can be unioned by swapping those positions:
int temp = next[s1];
next[s1] = next[s2];
next[s2] = temp;
Or however you can swap in your language. Java doesn't have a nice equivalent of std::swap(&next[s1], &next[s2]) as far as I know.
This is obviously related to cyclic linked lists, but more compact. The downside is that you have to prepare your "universe" in advance. With linked lists you can arbitrarily add items. Also if your items are not the integers 0 to n then you will have an array on the side to do the mapping, but that's not really a pure downside or upside, it depends on what you need to do with it.
A bonus upside is that because you can refer to an item by index, it goes together more easily with other data structures, for example it likes to cooperate with the Union Find structure (which is also an array of integers, well two of them), inheriting the O(1) Union that both structure offer, keeping the amortized O(α(n)) Find of Union Find, and also (from the cycles structure) keeping the O(m) set enumeration for a set of size m. So you mostly get the best of both worlds.
In case it wasn't obvious, you can initialize the "universe" with "all singletons" like this:
for (int i = 0; i < next.length; i++)
next[i] = i;
The same as in Union Find.

Stable sort - do we really need it?

I do not understand the underlying problem that tries to solve the stable sorting algorithm.
Arrays.sort(Object[]) Javadoc states:
This sort is guaranteed to be stable: equal elements will
not be reordered as a result of the sort.
But if elements are equal, they are not distringuishable from each other! If you swap two equal elements, this should not affect anything. This is the definition of equality.
So, why do we need stability at all?
UPDATE: My question is about Collections.sort(List<T>) / Objects.sort(Object[]) methods, not Collections.sort(List<T>, Comparator<T>), Objects.sort(Comparator<T>). The latter ones are bit different. But there is still no need for stability for them: if you want predictable compound sorts, then you create appropriate compound comparators.
Let's say you have two columns. Column name and column date. Then you start ordering your list by date first, afterwards you sort them by name. If your sort is stable what it will produce is that you get the name ordered correctly and if the names are equal you get them sorted by date since your order is stable. But if your order is not stable you won't have any relative ordering between the equal keys.
public static void main (String[] args)
{
// your code goes here
List<FirstNameLastName> list = new ArrayList<FirstNameLastName> ();
list.add(new("A","B"));
list.add(new("D","B"));
list.add(new("F","B"));
list.add(new("C","C"));
list.add(new("E","C"));
list.add(new("B","C"));
Arrays.sort(list,new Comparator(firstName)); //FIXME
// A-B , B-C , C-C , D-B , E-C , F-B
Arrays.sort(list,new Comparator(lastName)); //FIXME
// A-B , D-B F-B,B-C,C-C,E-C
//So as you see here inside the last name B and C each first name
//is sorted also
//However if you just sorted instead directly on last name only
//A-B , D-B -F,B-C-C,E-C,B-C
}
private class FirstNameLastName {
String firstName;
Stirng lastName;
public FirstNameLastName(String firstName,String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
}
Consider the example
[{1, 'c'}, {2, 'a'}, {3, 'a'}]
It is sorted by number field, but not by character. After a stable sort by character:
[{2, 'a'}, {3, 'a'}, {1, 'c'}]
After an unstable sort the following order is possible:
[{3, 'a'}, {2, 'a'}, {1, 'c'}]
You can notice that {3, 'a'} and {2, 'a'} were reordered.
Java 8 example (Java API has only stable sort for objects):
List<Point> list = Arrays.asList(new Point(1,1), new Point(1,0), new Point(2,1));
list.sort((a,b) -> Integer.compare(a.x,b.x));
System.out.println(list);
list.sort((a,b) -> Integer.compare(a.y,b.y));
System.out.println(list);
Unstable sort can suck for UIs. Imagine you are using file explorer in Windows, with a details view of songs. Can you imagine if you kept clicking filetype column to sort by filetype, and it randomly reordered everything within each type-group every time you clicked it?
Stable sort in UI's allows me (the user) to create my own compound sorts. I can chain multiple sorts together, like "sort by name", then "sort by artist", then "sort by type". The final resulting sort prioritizes type, then artist, and then finally, by name. This is because a stable sort actually preserves the previous sort, allowing me to "build" my own sorting from a series of elementary sorts! Whereas unstable sorts nuke the previous sort type's results.
Of course, in code, you'd just make one big, fully defined ordering, and then sort and sort once, rather than a chained "compound" sort like I described above. This is why you tend not to need stable sorts for most application-internal sorting. But when the user drives the click-by-click, stable sorting is the best/simplest.
EDIT:
"My question is about Collections.sort(List<T>) / Objects.sort(Object[]) methods"
These are generic sorts that work on anything that defines Comparable. Your class's Comparable implementation might return 0 even when the objects are not technically "equal", because you might be trying for a particular ordering. In other words, these methods are every bit as open to custom orderings as Collections.sort(List<T>, Comparator<T>). And you might want stable sort, or you might not.
You may (and almost always do) have elements for which it is true that:
a.compareTo(b) == 0 and
a.equals(b) == false
Take, for example, a List<Product>, where product has a number of properties:
- id
- price
You could see several use cases where you would want to sort Product by id or price but not by other values.
The big benefit that stable sorting brings to the table is that if you sort by price then by id you will have a List that is correctly sorted by first price then by id.
If your sorting algorithm is unstable, then the second sort by id might be break the order of the initial sort by price.

No concept of "size" in Guava's RangeSet / Range?

I'm eager to use Guava's RangeSets in my program. Despite the features of adding and merging of ranges, i'm also interested in the "size" of my ranges.
Some remarks:
no ranges i'm interested in are infinite!
all ranges i'm using are of the bound-type "closedOpen"
the underlying use-case is a discrete time-space (size = summed up time-ticks)
This seems to be something which is not built-in (or i didn't see it) and i'm wondering if there is a clear reason against this conceptionally (which means i should not implement some getSize() function myself) or not.
Let's have a look at my use-case:
RangeSet<Integer> usageTicks = TreeRangeSet.create();
usageTicks.add(Range.closedOpen(3, 7));
usageTicks.add(Range.closedOpen(12,18));
usageTicks.add(Range.closedOpen(18, 23));
int size = usageTicks.hypotheticalGetSizeFunction(); // size = 15
Is there any reason against the following:
Set<Range<Integer>> setOfRanges = usageTicks.asRanges();
int sum = 0;
for(Range<Integer> range : setOfRanges)
sum += (range.upperEndpoint() - range.lowerEndpoint());
Guava's Range only require one thing of its enclosed types: that they implement Comparable.
But not all which implement Comparable have a notion of distance. How would you measure the distance between two Strings, for instance?
This is why Guava also has DiscreteDomain and ContiguousSet; with the former you have methods such as next(), prev() and distance(), which is what you are interested in here. Guava's site has an article on it.

Best way to write this program

I have a general programming question, that I have happened to use Java to answer. This is the question:
Given an array of ints write a program to find out how many numbers that are not unique are in the array. (e.g. in {2,3,2,5,6,1,3} 2 numbers (2 and 3) are not unique). How many operations does your program perform (in O notation)?
This is my solution.
int counter = 0;
for(int i=0;i<theArray.length-1;i++){
for(int j=i+1;j<theArray.length;j++){
if(theArray[i]==theArray[j]){
counter++;
break; //go to next i since we know it isn't unique we dont need to keep comparing it.
}
}
}
return counter:
Now, In my code every element is being compared with every other element so there are about n(n-1)/2 operations. Giving O(n^2). Please tell me if you think my code is incorrect/inefficient or my O expression is wrong.
Why not use a Map as in the following example:
// NOTE! I assume that elements of theArray are Integers, not primitives like ints
// You'll nee to cast things to Integers if they are ints to put them in a Map as
// Maps can't take primitives as keys or values
Map<Integer, Integer> elementCount = new HashMap<Integer, Integer>();
for (int i = 0; i < theArray.length; i++) {
if (elementCount.containsKey(theArray[i]) {
elementCount.put(theArray[i], new Integer(elementCount.get(theArray[i]) + 1));
} else {
elementCount.put(theArray[i], new Integer(1));
}
}
List<Integer> moreThanOne = new ArrayList<Integer>();
for (Integer key : elementCount.keySet()) { // method may be getKeySet(), can't remember
if (elementCount.get(key) > 1) {
moreThanOne.add(key);
}
}
// do whatever you want with the moreThanOne list
Notice that this method requires iterating through the list twice (I'm sure there's a way to do it iterating once). It iterates once through theArray, and then implicitly again as it iterates through the key set of elementCount, which if no two elements are the same, will be exactly as large. However, iterating through the same list twice serially is still O(n) instead of O(n^2), and thus has much better asymptotic running time.
Your code doesn't do what you want. If you run it using the array {2, 2, 2, 2}, you'll find that it returns 2 instead of 1. You'll have to find a way to make sure that the counting is never repeated.
However, your Big O expression is correct as a worst-case analysis, since every element might be compared with every other element.
Your analysis is correct but you could easily get it down to O(n) time. Try using a HashMap<Integer,Integer> to store previously-seen values as you iterate through the array (key is the number that you've seen, value is the number of times you've seen it). Each time you try to add an integer into the hashmap, check to see if it's already there. If it is, just increment that integers counter. Then, at the end, loop through the map and count the number of times you see a key with a corresponding value higher than 1.
First, your approach is what I would call "brute force", and it is indeed O(n^2) in the worst case. It's also incorrectly implemented, since numbers that repeat n times are counted n-1 times.
Setting that aside, there are a number of ways to approach the problem. The first (that a number of answers have suggested) is to iterate the array, and using a map to keep track of how many times the given element has been seen. Assuming the map uses a hash table for the underlying storage, the average-case complexity should be O(n), since gets and inserts from the map should be O(1) on average, and you only need to iterate the list and map once each. Note that this is still O(n^2) in the worst case, since there's no guarantee that the hashing will produce contant-time results.
Another approach would be to simply sort the array first, and then iterate the sorted array looking for duplicates. This approach is entirely dependent on the sort method chosen, and can be anywhere from O(n^2) (for a naive bubble sort) to O(n log n) worst case (for a merge sort) to O(n log n) average-though-likely case (for a quicksort.)
That's the best you can do with the sorting approach assuming arbitrary objects in the array. Since your example involves integers, though, you can do much better by using radix sort, which will have worst-case complexity of O(dn), where d is essentially constant (since it maxes out at 9 for 32-bit integers.)
Finally, if you know that the elements are integers, and that their magnitude isn't too large, you can improve the map-based solution by using an array of size ElementMax, which would guarantee O(n) worst-case complexity, with the trade-off of requiring 4*ElementMax additional bytes of memory.
I think your time complexity of O(n^2) is correct.
If space complexity is not the issue then you can have an array of 256 characters(ASCII) standard and start filling it with values. For example
// Maybe you might need to initialize all the values to 0. I don't know. But the following can be done with O(n+m) where n is the length of theArray and m is the length of array.
int[] array = new int[256];
for(int i = 0 ; i < theArray.length(); i++)
array[theArray[i]] = array[theArray[i]] + 1;
for(int i = 0 ; i < array.length(); i++)
if(array[i] > 1)
System.out.print(i);
As others have said, an O(n) solution is quite possible using a hash. In Perl:
my #data = (2,3,2,5,6,1,3);
my %count;
$count{$_}++ for #data;
my $n = grep $_ > 1, values %count;
print "$n numbers are not unique\n";
OUTPUT
2 numbers are not unique

Categories

Resources