I have a general method that needs to sort and search for generic objects,
Old Version:
public <T> int isIn(T[] list, T t) {
Arrays.sort(list, Comparator.comparingInt(Object::hashCode));
return Arrays.binarySearch(values,updatedObject.hashCode())
}
New Version:
public <T> int isIn(T[] list, T t) {
Arrays.sort(list, Comparator.comparingInt(Object::hashCode));
return Arrays.binarySearch(values,updatedObject.hashCode(),Comparator.comparingInt(Object::hashCode))
}
Assuming that hashcode() is implemented properly, I can't think of any case where this could fail or give any error.
What are the cases, if any, that this could give us an error !
NOTE: Code is edited, I added Comparator to the binary search
There are quite a few issues with your code.
hashCode is used to infer object inequality. Meaning that two objects may virtually have the same hashCode and yet an equals invocation may return false (see also here). In this case, the outcome of binarySearch may yield unexpected results (see also here).
binarySearch invocations without passing a Comparator assume that your objects are sorted based on their natural ordering, which implies they are Comparable (which we cannot know with your code). If your objects were not Comparable, you would get a ClassCastException when invoking binarySearch as you do. You would at the very least need to use the Comparator employed when sorting the array, and pass it to your binarySerach invocation (see here for overload API).
Note: OP added Comparator and edited question - only valid with previous version of the question.
As a general OO guideline, you may want to rethink what objects you are going to use your isIn method with, and maybe bind the generic type to a broad type that yields enough information in terms of properties to have a natural order, or at least to be sortable using a Comparator based on properties that it makes sense to sort with. TL;DR the hash code's purpose is not to sort objects.
Following up on point 3, you'd expect an isIn method to return a boolean type, not an integer type. If your goal is simply to infer whether your array contains a given value, override equals in the objects passed, wrap your array in a suitable Collection and invoke contains.
This should fail; because, when two objects contain the same hashcode the order will be unspecified. The contract on hash codes is that all objects returning equals(...) as true have the same hashcode, not all equal hashcodes have equals(...) return true.
Hashcodes are not unique, that's why we have the hashcode act as the first means of determining cheap equality, but always have to follow up with the equals(...) method afterwards.
Now, the equals(...) method doesn't provide comparision (ordering, so you'll have to back it up with a Comparator for the object anyway. Since you initially ordered on hashcode, your Comparator will have to provide orderings that don't violate the hascode-first approach.
// pseudocode
unless hashcodes are equal, return the value of Comparator.comparingInt()
when hashcodes are equal, pick some stable ordering.
Note that this would work well within it's bubble of your code; but, if you wanted to assure that someone didn't break this in the future, you might need to also add a Comparable interface to your object, and have it return the "hashcode, then sub-ordering" with the same logic as your Comparator.
Assuming that hashcode() is implemented properly
This is a big assumption as Java's library classes (e.g. String) themselves have collision in hashCode() values, e.g.
System.out.println("Aa".hashCode() + "," + "BB".hashCode());
This will print the same hash code and hence, comparator will consider both the objects equal.
Your code will produce an index of an object with the same hash code, but since objects are allowed to have the same hash code without being equal, you need to perform some additional work before returning the result.
Walk the list of items with the same hash code until you find equality, find a different hash code, or go off the end of the array:
public <T> int isIn(T[] list, T t) {
Comparator<T> cmp = Comparator.comparingInt(Object::hashCode);
Arrays.sort(list, cmp);
int pos = Arrays.binarySearch(list, t, cmp);
if (pos < 0) {
return -1;
}
// At this point pos is a valid index, but it may not be of the same object
// Continue with a linear search of the equal range of hash codes from here:
int hash = t.hashCode();
while (pos != list.length()) {
if (t.equals(list[pos]) {
return pos;
}
if (list[pos].hashCode() != hash) {
return -1;
}
pos++;
}
return -1;
}
Note: Although this is consistent with the way the hashCode/equals are supposed to be used in Java, the approach is less efficient than using a HashSet<T>, because it requires O(n*log n) sorting step.
Related
Suppose I have a class not implementing the Comparable interface like
class Dummy {
}
and a collection of instances of this class plus some function external to the class that allows comparing these instances partially (a map will be used for this purpose below):
Collection<Dummy> col = new ArrayList<>();
Map<Dummy, Integer> map = new HashMap<>();
for (int i = 0; i < 12; i++) {
Dummy d = new Dummy();
col.add(d);
map.put(d, i % 4);
}
Now I want to sort this collection using the TreeSet class with a custom comparator:
TreeSet<Dummy> sorted = new TreeSet<>(new Comparator<Dummy>() {
#Override
public int compare(Dummy o1, Dummy o2) {
return map.get(o1) - map.get(o2);
}
});
sorted.addAll(col);
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements. My next attempt was to change the compare method of the comparator to
#Override
public int compare(Dummy o1, Dummy o2) {
int d = map.get(o1) - map.get(o2);
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
return 1; // is this acceptable?
}
It seemingly gives the desired result for this simple demonstrational example but I'm still in doubt: is it correct to always return 1 for unequal (but undistinguishable by the map) objects? Such a relation still violates the general contact for the Comparator.compare() method because sgn(compare(x, y)) == -sgn(compare(y, x)) is, generally, wrong. Do I really need to implement a correct total ordering for TreeSet to work correctly or the above is enough? How to do this when an instance has no fields to compare?
For more real-life example imagine that, instead of Dummy, you have a type parameter T of some generic class. T may have some fields and implement the equals() method through them, but you don't know these fields and yet need to sort instances of this class according to some external function. Is this possible with the help of TreeSet?
Edit
Using System.identityHashCode() is a great idea but there is (not so small) chance of collision.
Besides possibility of such a collision, there is one more pitfall. Suppose you have 3 objects: a, b, c such that map.get(a) = map.get(b) = map.get(c) (here = isn't assignment but the mathematical equality), identityHashCode(a) < identityHashCode(b) < identityHashCode(c), a.equals(c) is true, but a.equals(b) (and hence c.equals(b)) is false. After adding these 3 elements to a TreeSet in this order: a, b, c you can get into a situation when all of them have been added to the set, that contradicts the prescribed behaviour of the Set interface - it should not contain equal elements. How to deal with that?
In addition, it would be great if someone familiar with TreeSet mechanics explained to me what does the term "well-defined" in the phrase "The behavior of a set is well-defined even if its ordering is inconsistent with equals" from TreeSet javadoc mean.
Unless you have an absolutely huge amount of Dummy objects and really bad luck, you can use System.identityHashCode()to break ties:
Comparator.<Dummy>comparingInt(d -> map.get(d))
.thenComparingInt(System::identityHashCode)
Your comparator is not acceptable, since it violates the contract: you have d1 > d2 and d2 > d1 at the same time if they're not equal and don't share the same value in the map.
This answer covers just the first example in the question. The remainder of the question, and the various edits, are I think better answered as part of separate, focused questions.
The first example sets up 12 instances of Dummy, creates a map that maps each instance to an Integer in the range [0, 3], and then adds the 12 Dummy instances to a TreeSet. That TreeSet is provided with a comparator that uses the Dummy-to-Integer map. The result is that the TreeSet contains only four of the Dummy instances. The example concludes with the following statement:
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements.
This last sentence is incorrect. The result contains fewer elements than were inserted because the comparator considers many of the instances to be duplicates and therefore they aren't inserted into the set. The equals method doesn't enter the discussion at all. Therefore, the concept of "consistent with equals" isn't relevant to this discussion. TreeSet never calls equals. The comparator is the only thing that determines membership in the TreeSet.
This seems like an unsatisfactory result, but only because we happen "know" that there are 12 distinct Dummy instances. However, the TreeSet doesn't "know" that they are distinct. It only knows how to compare the Dummy instances using the comparator. When it does so, it finds that several are duplicates. That is, the comparator returns 0 sometimes even though it's being called with Dummy instances that we believe to be distinct. That's why only four Dummy instances end up in the TreeSet.
I'm not entirely sure what the desired outcome is, but it seems like the result TreeSet should contain all 12 instances ordered by values in the Dummy-to-Integer map. My suggestion was to use Guava's Ordering.arbitrary() which provides a comparator that distinguishes between distinct-but-otherwise-equal elements, but does so in a way that satisfies the general contract of Comparator. If you create the TreeSet like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(Ordering.arbitrary()));
the result will be that the TreeSet contains all 12 Dummy instances, sorted by Integer value in the map, and with Dummy instances that map to the same value ordered arbitrarily.
In the comments, you stated that the Ordering.arbitrary doc "unequivocally cautions against using it in SortedSet". That's not quite right; that doc says,
Because the ordering is identity-based, it is not "consistent with Object.equals(Object)" as defined by Comparator. Use caution when building a SortedSet or SortedMap from it, as the resulting collection will not behave exactly according to spec.
The phrase "not behave exactly according to spec" really means that it will behave "strangely" as described in the class doc of Comparator:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
For example, suppose one adds two elements a and b such that (a.equals(b) && c.compare(a, b) != 0) to an empty TreeSet with comparator c. The second add operation will return true (and the size of the tree set will increase) because a and b are not equivalent from the tree set's perspective, even though this is contrary to the specification of the Set.add method.
You seemed to indicate that this "strange" behavior was unacceptable, in that Dummy elements that are equals shouldn't appear in the TreeSet. But the Dummy class doesn't override equals, so it seems like there's an additional requirement lurking behind here.
There are some additional questions added in later edits to the question, but as I mentioned above, I think these are better handled as separate question(s).
UPDATE 2018-12-22
After rereading the question edits and comments, I think I've finally figured out what you're looking for. You want a comparator over any object that provides a primary ordering based on some int-valued function that may result in duplicate values for unequal objects (as determined by the objects' equals method). Therefore, a secondary ordering is required that provides a total ordering over all unequal objects, but which returns zero for objects that are equals. This implies that the comparator should be consistent with equals.
Guava's Ordering.arbitrary comes close in that it provides an arbitrary total ordering over any objects, but it only returns zero for objects that are identical (that is, ==) but not for objects that are equals. It's thus inconsistent with equals.
It sounds, then, that you want a comparator that provides an arbitrary ordering over unequal objects. Here's a function that creates one:
static Comparator<Object> arbitraryUnequal() {
Map<Object, Integer> map = new HashMap<>();
return (o1, o2) -> Integer.compare(map.computeIfAbsent(o1, x -> map.size()),
map.computeIfAbsent(o2, x -> map.size()));
}
Essentially, this assigns a sequence number to every newly seen unequal object and keeps these numbers in a map held by the comparator. It uses the map's size as the counter. Since objects are never removed from this map, the size and thus the sequence number always increases.
(If you intend for this comparator to be used concurrently, e.g., in a parallel sort, the HashMap should be replaced with a ConcurrentHashMap and the size trick should be modified to use an AtomicInteger that's incremented when new entries are added.)
Note that the map in this comparator builds up entries for every unequal object that it's ever seen. If this is attached to a TreeSet, objects will persist in the comparator's map even after they've been removed from the TreeSet. This is necessary so that if objects are added or removed, they'll retain consistent ordering over time. Guava's Ordering.arbitrary uses weak references to allow objects to be collected if they're no longer used. We can't do that, because we need to preserve the ordering of non-identical but equal objects.
You'd use it like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(arbitraryUnequal()));
You had also asked what "well-defined" means in the following:
The behavior of a set is well-defined even if its ordering is inconsistent with equals
Suppose you were to use a TreeSet using a comparator that's inconsistent with equals, such as the one using Guava's Ordering.arbitrary shown above. The TreeSet will still work as expected, consistent with itself. That is, it will maintain objects in a total ordering, it will not contain any two objects for which the comparator returns zero, and all its methods will work as specified. However, it is possible for there to be an object for which contains returns true (since that's computed using the comparator) but for which equals is false if called with the object actually in the set.
For example, BigDecimal is Comparable but its comparison method is inconsistent with equals:
> BigDecimal z = new BigDecimal("0.0")
> BigDecimal zz = new BigDecimal("0.00")
> z.compareTo(zz)
0
> z.equals(zz)
false
> TreeSet<BigDecimal> ts = new TreeSet<>()
> ts.add(z)
> HashSet<BigDecimal> hs = new HashSet<>(ts)
> hs.equals(ts)
true
> ts.contains(zz)
true
> hs.contains(zz)
false
This is what the spec means when it says things can behave "strangely". We have two sets that are equal. Yet they report different results for contains of the same object, and the TreeSet reports that it contains an object even though that object is unequal to an object in the set.
Here's the comparator I ended up with. It is both reliable and memory efficient.
public static <T> Comparator<T> uniqualizer() {
return new Comparator<T>() {
private final Map<T, Integer> extraId = new HashMap<>();
private int id;
#Override
public int compare(T o1, T o2) {
int d = Integer.compare(o1.hashCode(), o2.hashCode());
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
d = extraId.computeIfAbsent(o1, key -> id++)
- extraId.computeIfAbsent(o2, key -> id++);
assert id > 0 : "ID overflow";
assert d != 0 : "Concurrent modification";
return d;
}
};
}
It creates total ordering on all objects of the given class T and thus allows to distinguish objects not distinguishable by a given comparator via attaching to it like this:
Comparator<T> partial = ...
Comparator<T> total = partial.thenComparing(uniqualizer());
In the example given at the question, T is Dummy and
partial = Comparator.<Dummy>comparingInt(map::get);
Note that you don't need to specify the type T when calling uniqualizer(), complier automatically determines it via type inference. You only have to make sure that hashCode() in T is consistent with equals(), as described in the general contract of hashCode(). Then uniqualizer() will give you the comparator (total) consistent with equals() and you can use it in any code that requires comparing objects of type T, e.g. when creating a TreeSet:
TreeSet<T> sorted = new TreeSet<>(total);
or sorting a list:
List<T> list = ...
Collections.sort(list, total);
There is a java bean Car that might contains two values: model and price.
Now suppose I override equals() and hashcode() checking only for model in that way:
public boolean equals(Object o) {
return this.model.equals(o.model);
}
public int hashCode() {
return model.hashCode();
}
This permit me to check if an arraylist already contain an item Car of the same model (and doesn't matter the price), in that way:
List<Car> car = new ArrayList<Car>();
car.add(new Car("carA",100f));
car.add(new Car("carB",101f));
car.add(new Car("carC",110f));
System.out.println(a.contains(new Car("carB",111f)));
It returns TRUE. That's fine, because the car already exist!
But now I decide that the ArrayList is not good, because I want to maintain the items ordered, so I substitute it with a TreeSet in this way:
Set<Car> car = new TreeSet<Car>(new Comparator<Car>() {
#Override
public int compare(Car car1, Car car2) {
int compPrice = - Float.compare(car1.getPrice(), car2.getPrice());
if (compPrice > 0 || compPrice < 0)
return compPrice;
else
return car1.getModel().compareTo(car2.getModel());
}});
car.add(new Car("carA",100f));
car.add(new Car("carB",101f));
car.add(new Car("carC",110f));
System.out.println(a.contains(new Car("carB",111f)));
But now there is a problem, it return FALSE... why?
It seems that when I invoke contains() using an arrayList the method equals() is invoked.
But it seems that when I invoke contains() using a TreeSet with a comparator, the comparator is used instead.
Why does that happen?
TreeSet forms a binary tree keeping elements according to natural (or not) orders, so in order to search quickly one specific element is the collection, TreeSet uses Comparable or Comparator instead of equals().
As TreeSet JavaDoc precises:
Note that the ordering maintained by a set (whether or not an explicit
comparator is provided) must be consistent with equals if it is to
correctly implement the Set interface. (See Comparable or Comparator
for a precise definition of consistent with equals.) This is so
because the Set interface is defined in terms of the equals operation,
but a TreeSet instance performs all element comparisons using its
compareTo (or compare) method, so two elements that are deemed equal
by this method are, from the standpoint of the set, equal. The
behavior of a set is well-defined even if its ordering is inconsistent
with equals; it just fails to obey the general contract of the Set
interface.
We can find a similarity with the HashCode/Equals contract:
If equals() returns true, hashcode() has to return true too in order to be found during search.
Likewise with TreeSet:
If contains() (using Comparator or Comparable) returns true, equals() has to return true too in order to be consistent with equals().
THEREFORE: Fields used within TreeSet.equals() method have to be exactly the same (no more, no less) than within your Comparator implementation.
A TreeSet is implicitly sorted, and it uses a Comparator for this sorting. The equals() method can only tell you if two objects are the same or different, not how they should be ordered for sorting. Only a Comparator can do that.
More to the point, a TreeSet also uses comparisons for searching. This is sort of the whole point of tree-based map/set. When the contains() method is called, a binary search is performed and the target is either found or not found, based on how the comparator is defined. The comparator defines not only logical order but also logical identity. If you are relying on logical identity defined by an inconsistent equals() implementation, then confusion will probably ensue.
The reason for the different behaviour is, that you consider the price member in the compare method, but ignore it in equals.
new Car("carB",101f) // what you add to the list
new Car("carB",111f) // what you are looking for
Both instances are "equals" (sorry...) since their model members are equal (and the implementation stops after that test). They "compare" as different, though, because that implementation also checks the price member.
I may be wrong but for me, we can override equals for an object so that you consider them has being meaningfully equals.
All the entry in a map have distinct keys, and all the entries in set have distinct values (not meaningfully equals)
But when using a TreeMap or a TreeSet, you can provide a comparator.
I noticed that when a comparator is provided, the object's equals method is bypassed, and two objets are considered equals when the comparator returns 0.
Thus, we have 2 objects but inside of a map keyset, or a set, only one is kept.
I'd like to know if it is possible, using a sorted collection, to make a distinction for two different instances.
Here's an easy sample:
public static void main(String[] args) {
TreeSet<String> set = new TreeSet<String>();
String s1 = new String("toto");
String s2 = new String("toto");
System.out.println(s1 == s2);
set.add(s1);
set.add(s2);
System.out.println(set.size());
}
Notice that using new String("xxx") bypass the use of the String pool, thus s1 != s2.
I'd like to know how to implement a comparator so that the set size is 2 and not 1.
The main question is: for two distinct instances of the same String value, how can i return something != 0 in my comparator?
Note that i'd like to have that comparator respect the rules:
Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second. The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)
The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
Finally, the implementer must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
I can use a trick like:
public int compare(String s1,String s2) {
if s1.equals(s2) { return -1 }
...
}
It seems to work fine but the rules are not respected since compare(s1,s2) != -compare(s2,s1)
So is there any elegant solution to this problem?
Edit: for those wondering why i ask such a thing. It's more by curiosity than any real life problem.
But i've already been in a situation like that and though about a solution to this problem:
Imagine you have:
class Label {
String label;
}
For each label you have an associated String value.
Now what if you want to have a map, label->value.
But now what if you want to be able to have twice the same label as a map key?
Ex
"label" (ref1) -> value1
"label" (ref2) -> value2
You can implement equals so that two distinct Label instances are not equals -> i think it works for HashMap.
But what if you want to be able to sort these Label objects by alphabetical order?
You need to provide a comparator or implement comparable.
But how can we make an order distinction between 2 Labels having the same label?
We must!
compare(ref1,ref2) must not return 0. But should it return -1 or 1 ?
We could compare the memory address or something like that to take such a decision but i think it's not possible in Java...
If you're using Guava, you can make use of Ordering.arbitrary(), which will impose an additional order on elements which remains consistent for the life of the VM. You can use this to break ties in your Comparator in a consistent way.
However you could be using the wrong data structure. Have you considered using a Multiset (e.g. TreeMultiset), which allows multiple instances to be added?
Try using the following Comparator (for your example):
Comparator<String> comp = Ordering.natural().compound(Ordering.arbitrary());
This will sort things according to their natural Comparable ordering, but when the natural ordering is equal it will fall back to an arbitrary ordering so that distinct objects remain distinct.
I'm not sure it's a great idea to do that. From the javadoc for Comparator:
Caution should be exercised when using a comparator capable of
imposing an ordering inconsistent with equals to order a sorted set
(or sorted map). Suppose a sorted set (or sorted map) with an explicit
comparator c is used with elements (or keys) drawn from a set S. If
the ordering imposed by c on S is inconsistent with equals, the sorted
set (or sorted map) will behave "strangely." In particular the sorted
set (or sorted map) will violate the general contract for set (or
map), which is defined in terms of equals.
if you want a sorted collection with equal objects, you can put all your objects in a List and use Collections.sort().
You might want to use a SortedSet<Collection<String>> or similar, since - as you mentioned - a sorted doesn't allow you to add multiple equal entries.
Altenatively you can use Guava's MultiSet.
From the JavaDoc on SortedSet:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface.
However, one question still remains: why do you want to have two distinct instances that are logically equal (that's what equals() actually means).
The comparator should actualy return 0 only when the two references refer to the same object, like this:
public int compare(String s1,String s2) {
if (s1!=s2) {
int result = s1.compareTo(s2);
if (result == 0) {
return -1;
} else {
return result;
}
} else {
return 0;
}
}
TreeSet has a constructor that takes a comparator, meaning even if the objects you store aren't Comparable objects by themselves, you can provide a custom comparator.
Is there an analogous implementation of a nonordered set? (e.g. an alternative to HashSet<T> that takes a "hasher" object that calculates equals() and hashCode() for objects T that may be different from the objects' own implementations?)
C++ std::hash_set gives you this, just wondering if there's something for Java.
Edit: #Max brings up a good technical point about equals() -- fair enough; and it's true for TreeMap and HashMap keys via Map.containsKey(). But are there other well-known data structures out there that allow organization by custom hashers?
No, having a "hasher" object is not supported by the Collections specifications. You can certainly implement your own collection that supports this but another way to do this is to consider the Hasher to be a wrapping object that you store in your HashSet instead.
Set<HasherWrapper<Foo>> set = new HashSet<HasherWrapper<Foo>>();
set.add(new HasherWrapper(foo));
...
The wrapper class would then look something like:
private class HasherWrapper<T> {
T wrappedObject;
public HasherWrapper(T wrappedObject) {
this.wrappedObject = wrappedObject;
}
#Override
public int hashCode() {
// special hash code calculations go here
}
#Override
public boolean equals(Object obj) {
// special equals code calculations go here
}
}
There is no such implementation in the standard library, but it doesn't prevent you from rolling your own. This is something i've often wanted to have myself.
See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4771660 for the reason:
We wanted to avoid the complexity. We seriously entertained this notion
at the time the collections framework was designed, but rejected it. The
power-to-weight ration seemed to low. We felt that equals was what you
wanted 95% of the time; ==, 4%; and something else 1%. Writing sensible
contracts for bulk operations when is very tricky when equality predicates
differ.
No, there is not and there can not be by specification. Moreover, you misunderstood the way TreeSet uses it's Comparator.
From TreeSet Javadoc:
Note that the ordering maintained by a set (whether or not an explicit
comparator is provided) must be consistent with equals if it is to
correctly implement the Set interface. (See Comparable or Comparator
for a precise definition of consistent with equals.) This is so
because the Set interface is defined in terms of the equals operation,
but a TreeSet instance performs all element comparisons using its
compareTo (or compare) method, so two elements that are deemed equal
by this method are, from the standpoint of the set, equal. The
behavior of a set is well-defined even if its ordering is inconsistent
with equals; it just fails to obey the general contract of the Set
interface.
From Comparable javadoc:
The natural ordering for a class C is said to be consistent with
equals if and only if e1.compareTo(e2) == 0 has the same boolean value
as e1.equals(e2) for every e1 and e2 of class C. Note that null is not
an instance of any class, and e.compareTo(null) should throw a
NullPointerException even though e.equals(null) returns false.
From Collection javadoc:
boolean contains(Object o)
Returns true if this collection contains
the specified element. More formally, returns true if and only if this
collection contains at least one element e such that (o==null ?
e==null : o.equals(e)).
Therefore, by specification there can not be any kind of class that implements Collection<E> interface and fully depend on some external Comparator-style object to insert objects. All Collections should use equals method of an Object class to verify if the object is already inserted.
There is definitely nothing like that, hashcode() and equals() are defining attributes of an object and should not be changed. They define what makes an object equal to each other and this shouldn't be different from one set to another. The only way to do what you are talking about is to subclass the object and write a new hashcode() and equals() and this would only really make sense to do if the subclass had a defining variable that should be added in addition to the super class' hashcode() and equals(). I know this may not be what you are aiming for but I hope this helps. If you explain your reasoning more for wanting this then it might help to find a better solution if there exists one.
I'd like to have a SortedSet of Collections (Sets themselves, in this case, but not necessarily in general), that sorts by Collection size. This seems to violate the proscription to have the Comparator be consistent with equals() - i.e., two collections may be unequal (by having different elements), but compare to the same value (because they have the same number of elements).
Notionally, I could also put into the Comparator ways to sort sets of equal sizes, but the use of the sort wouldn't take advantage of that, and there's not really a useful + intuitive way to compare the Collections of equal size (at least, in my particular case), so that seems like a waste.
Does this case of inconsistency seem like a problem?
SortedSet interface extends core Set and thus should conform to the contract outlined in Set specification.
The only possible way to achieve that is to have your element's equal() method behavior be consistent with your Comparator - the reason for that is that core Set operates based on equality whereas SortedSet operates based on comparison.
For example, add() method defined in core Set interface specifies that you can't add an element to the set if there already is an element whose equal() method would return true with this new element as argument. Well, SortedSet doesn't use equal(), it uses compareTo(). So if your compareTo() returns false your element WILL be added even if equals() were to return true, thus breaking the Set contract.
None of this is a practical problem per se, however. SortedSet behavior is always consistent, even if compare() vs equals() are not.
As ChssPly76 wrote in a comment, you can use hashCode to decide the compareTo call in the case where two Collections have the same size but are not equal. This works fine, except in the rare case where you have two Collections with the same size, are not equal, but have the same hashCode. Admittedly, the chances of that happening are pretty small, but it is conceivable. If you want to be really careful, instead of hashCode, use System.identityHashCode instead. This should give you a unique number for each Collection, and you shouldn't get collisions.
At the end of the day, this gives you the functionality of having the Collections in the Set sorted by size, with arbitrary ordering in the case of two Collections with matching size. If this is all you need, it's not much slower than the usual comparison would be. If you need the ordering to be consistent between different JVM instances, this won't work and you'll have to do it some other way.
pseudocode:
if (a.equals(b)) {
return 0;
} else if (a.size() > b.size()) {
return 1;
} else if (b.size() > a.size()) {
return -1;
} else {
return System.identityHashCode(a) > System.identityHashCode(b) ? 1 : -1;
}
This seems to violate the proscription
to have the Comparator be consistent
with equals() - i.e., two collections
may be unequal (by having different
elements), but compare to the same
value (because they have the same
number of elements).
There is no requirement, either stated (in the Javadoc) or implied, that a Comparator be consistent with an object's implementation of boolean equals(Object).
Note that Comparable and Comparator are distinct interfaces with different purposes. Comparable is used to define a 'natural' order for a class. In that context, it would be a bad idea for equals and compateTo to be inconsistent. By contrast, a Comparator is used when you want to use a different order to the natural order of a class.
EDIT: Here's the complete paragraph from the Javadoc for SortedSet.
Note that the ordering maintained by a
sorted set (whether or not an explicit
comparator is provided) must be
consistent with equals if the sorted
set is to correctly implement the Set
interface. (See the Comparable
interface or Comparator interface for
a precise definition of consistent
with equals.) This is so because the
Set interface is defined in terms of
the equals operation, but a sorted
set performs all element comparisons
using its compareTo (or compare)
method, so two elements that are
deemed equal by this method are, from
the standpoint of the sorted set,
equal. The behavior of a sorted set is
well-defined even if its ordering is
inconsistent with equals; it just
fails to obey the general contract of
the Set interface.
I have highlighted the final sentence. The point is that such a SortedSet will work as you would most likely expect, but the behavior of some operations won't exactly match the Set specification ... because the specification defines their behavior in terms of the equals method.
So in fact, there is a stated requirement for consistency (my mistake), but the consequences of ignoring it are not as bad as you might think. Of course, it is up to decide if you should do that. In my estimation, it should be OK, provided that you comment the code thoroughly and make sure that the SortedSet does not 'leak'.
However, it is not clear to me that a Comparator for collections that only looks at an collections "size" is going to work ... from a semantic perspective. I mean, do you really want to say that all collections with (say) 2 elements are equal? This will mean that your set can only ever contain one collection of any given size ...
There is no reason why a Comparator should return the same results as equals(). In fact, the Comparator API was introduced because equals() just isn't enough: If you want to sort a collection, you must know whether two elements are lesser or greater.
It's a little bit odd that SortedSet as a part of the standard API breaks the contract defined in the Set interface and uses the Comparator to define equality instead of the equals method, but that's how it is.
If your actual problem is to sort a collection of collections according to the containted collections' size, you are better of with a List, which you can sort using Collections.sort(List, Comparator>);