Comparator suitable for TreeSet when there is no distinguishing field - java

Suppose I have a class not implementing the Comparable interface like
class Dummy {
}
and a collection of instances of this class plus some function external to the class that allows comparing these instances partially (a map will be used for this purpose below):
Collection<Dummy> col = new ArrayList<>();
Map<Dummy, Integer> map = new HashMap<>();
for (int i = 0; i < 12; i++) {
Dummy d = new Dummy();
col.add(d);
map.put(d, i % 4);
}
Now I want to sort this collection using the TreeSet class with a custom comparator:
TreeSet<Dummy> sorted = new TreeSet<>(new Comparator<Dummy>() {
#Override
public int compare(Dummy o1, Dummy o2) {
return map.get(o1) - map.get(o2);
}
});
sorted.addAll(col);
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements. My next attempt was to change the compare method of the comparator to
#Override
public int compare(Dummy o1, Dummy o2) {
int d = map.get(o1) - map.get(o2);
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
return 1; // is this acceptable?
}
It seemingly gives the desired result for this simple demonstrational example but I'm still in doubt: is it correct to always return 1 for unequal (but undistinguishable by the map) objects? Such a relation still violates the general contact for the Comparator.compare() method because sgn(compare(x, y)) == -sgn(compare(y, x)) is, generally, wrong. Do I really need to implement a correct total ordering for TreeSet to work correctly or the above is enough? How to do this when an instance has no fields to compare?
For more real-life example imagine that, instead of Dummy, you have a type parameter T of some generic class. T may have some fields and implement the equals() method through them, but you don't know these fields and yet need to sort instances of this class according to some external function. Is this possible with the help of TreeSet?
Edit
Using System.identityHashCode() is a great idea but there is (not so small) chance of collision.
Besides possibility of such a collision, there is one more pitfall. Suppose you have 3 objects: a, b, c such that map.get(a) = map.get(b) = map.get(c) (here = isn't assignment but the mathematical equality), identityHashCode(a) < identityHashCode(b) < identityHashCode(c), a.equals(c) is true, but a.equals(b) (and hence c.equals(b)) is false. After adding these 3 elements to a TreeSet in this order: a, b, c you can get into a situation when all of them have been added to the set, that contradicts the prescribed behaviour of the Set interface - it should not contain equal elements. How to deal with that?
In addition, it would be great if someone familiar with TreeSet mechanics explained to me what does the term "well-defined" in the phrase "The behavior of a set is well-defined even if its ordering is inconsistent with equals" from TreeSet javadoc mean.

Unless you have an absolutely huge amount of Dummy objects and really bad luck, you can use System.identityHashCode()to break ties:
Comparator.<Dummy>comparingInt(d -> map.get(d))
.thenComparingInt(System::identityHashCode)
Your comparator is not acceptable, since it violates the contract: you have d1 > d2 and d2 > d1 at the same time if they're not equal and don't share the same value in the map.

This answer covers just the first example in the question. The remainder of the question, and the various edits, are I think better answered as part of separate, focused questions.
The first example sets up 12 instances of Dummy, creates a map that maps each instance to an Integer in the range [0, 3], and then adds the 12 Dummy instances to a TreeSet. That TreeSet is provided with a comparator that uses the Dummy-to-Integer map. The result is that the TreeSet contains only four of the Dummy instances. The example concludes with the following statement:
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements.
This last sentence is incorrect. The result contains fewer elements than were inserted because the comparator considers many of the instances to be duplicates and therefore they aren't inserted into the set. The equals method doesn't enter the discussion at all. Therefore, the concept of "consistent with equals" isn't relevant to this discussion. TreeSet never calls equals. The comparator is the only thing that determines membership in the TreeSet.
This seems like an unsatisfactory result, but only because we happen "know" that there are 12 distinct Dummy instances. However, the TreeSet doesn't "know" that they are distinct. It only knows how to compare the Dummy instances using the comparator. When it does so, it finds that several are duplicates. That is, the comparator returns 0 sometimes even though it's being called with Dummy instances that we believe to be distinct. That's why only four Dummy instances end up in the TreeSet.
I'm not entirely sure what the desired outcome is, but it seems like the result TreeSet should contain all 12 instances ordered by values in the Dummy-to-Integer map. My suggestion was to use Guava's Ordering.arbitrary() which provides a comparator that distinguishes between distinct-but-otherwise-equal elements, but does so in a way that satisfies the general contract of Comparator. If you create the TreeSet like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(Ordering.arbitrary()));
the result will be that the TreeSet contains all 12 Dummy instances, sorted by Integer value in the map, and with Dummy instances that map to the same value ordered arbitrarily.
In the comments, you stated that the Ordering.arbitrary doc "unequivocally cautions against using it in SortedSet". That's not quite right; that doc says,
Because the ordering is identity-based, it is not "consistent with Object.equals(Object)" as defined by Comparator. Use caution when building a SortedSet or SortedMap from it, as the resulting collection will not behave exactly according to spec.
The phrase "not behave exactly according to spec" really means that it will behave "strangely" as described in the class doc of Comparator:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
For example, suppose one adds two elements a and b such that (a.equals(b) && c.compare(a, b) != 0) to an empty TreeSet with comparator c. The second add operation will return true (and the size of the tree set will increase) because a and b are not equivalent from the tree set's perspective, even though this is contrary to the specification of the Set.add method.
You seemed to indicate that this "strange" behavior was unacceptable, in that Dummy elements that are equals shouldn't appear in the TreeSet. But the Dummy class doesn't override equals, so it seems like there's an additional requirement lurking behind here.
There are some additional questions added in later edits to the question, but as I mentioned above, I think these are better handled as separate question(s).
UPDATE 2018-12-22
After rereading the question edits and comments, I think I've finally figured out what you're looking for. You want a comparator over any object that provides a primary ordering based on some int-valued function that may result in duplicate values for unequal objects (as determined by the objects' equals method). Therefore, a secondary ordering is required that provides a total ordering over all unequal objects, but which returns zero for objects that are equals. This implies that the comparator should be consistent with equals.
Guava's Ordering.arbitrary comes close in that it provides an arbitrary total ordering over any objects, but it only returns zero for objects that are identical (that is, ==) but not for objects that are equals. It's thus inconsistent with equals.
It sounds, then, that you want a comparator that provides an arbitrary ordering over unequal objects. Here's a function that creates one:
static Comparator<Object> arbitraryUnequal() {
Map<Object, Integer> map = new HashMap<>();
return (o1, o2) -> Integer.compare(map.computeIfAbsent(o1, x -> map.size()),
map.computeIfAbsent(o2, x -> map.size()));
}
Essentially, this assigns a sequence number to every newly seen unequal object and keeps these numbers in a map held by the comparator. It uses the map's size as the counter. Since objects are never removed from this map, the size and thus the sequence number always increases.
(If you intend for this comparator to be used concurrently, e.g., in a parallel sort, the HashMap should be replaced with a ConcurrentHashMap and the size trick should be modified to use an AtomicInteger that's incremented when new entries are added.)
Note that the map in this comparator builds up entries for every unequal object that it's ever seen. If this is attached to a TreeSet, objects will persist in the comparator's map even after they've been removed from the TreeSet. This is necessary so that if objects are added or removed, they'll retain consistent ordering over time. Guava's Ordering.arbitrary uses weak references to allow objects to be collected if they're no longer used. We can't do that, because we need to preserve the ordering of non-identical but equal objects.
You'd use it like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(arbitraryUnequal()));
You had also asked what "well-defined" means in the following:
The behavior of a set is well-defined even if its ordering is inconsistent with equals
Suppose you were to use a TreeSet using a comparator that's inconsistent with equals, such as the one using Guava's Ordering.arbitrary shown above. The TreeSet will still work as expected, consistent with itself. That is, it will maintain objects in a total ordering, it will not contain any two objects for which the comparator returns zero, and all its methods will work as specified. However, it is possible for there to be an object for which contains returns true (since that's computed using the comparator) but for which equals is false if called with the object actually in the set.
For example, BigDecimal is Comparable but its comparison method is inconsistent with equals:
> BigDecimal z = new BigDecimal("0.0")
> BigDecimal zz = new BigDecimal("0.00")
> z.compareTo(zz)
0
> z.equals(zz)
false
> TreeSet<BigDecimal> ts = new TreeSet<>()
> ts.add(z)
> HashSet<BigDecimal> hs = new HashSet<>(ts)
> hs.equals(ts)
true
> ts.contains(zz)
true
> hs.contains(zz)
false
This is what the spec means when it says things can behave "strangely". We have two sets that are equal. Yet they report different results for contains of the same object, and the TreeSet reports that it contains an object even though that object is unequal to an object in the set.

Here's the comparator I ended up with. It is both reliable and memory efficient.
public static <T> Comparator<T> uniqualizer() {
return new Comparator<T>() {
private final Map<T, Integer> extraId = new HashMap<>();
private int id;
#Override
public int compare(T o1, T o2) {
int d = Integer.compare(o1.hashCode(), o2.hashCode());
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
d = extraId.computeIfAbsent(o1, key -> id++)
- extraId.computeIfAbsent(o2, key -> id++);
assert id > 0 : "ID overflow";
assert d != 0 : "Concurrent modification";
return d;
}
};
}
It creates total ordering on all objects of the given class T and thus allows to distinguish objects not distinguishable by a given comparator via attaching to it like this:
Comparator<T> partial = ...
Comparator<T> total = partial.thenComparing(uniqualizer());
In the example given at the question, T is Dummy and
partial = Comparator.<Dummy>comparingInt(map::get);
Note that you don't need to specify the type T when calling uniqualizer(), complier automatically determines it via type inference. You only have to make sure that hashCode() in T is consistent with equals(), as described in the general contract of hashCode(). Then uniqualizer() will give you the comparator (total) consistent with equals() and you can use it in any code that requires comparing objects of type T, e.g. when creating a TreeSet:
TreeSet<T> sorted = new TreeSet<>(total);
or sorting a list:
List<T> list = ...
Collections.sort(list, total);

Related

Implement a comparable using objects hash codes

I have a general method that needs to sort and search for generic objects,
Old Version:
public <T> int isIn(T[] list, T t) {
Arrays.sort(list, Comparator.comparingInt(Object::hashCode));
return Arrays.binarySearch(values,updatedObject.hashCode())
}
New Version:
public <T> int isIn(T[] list, T t) {
Arrays.sort(list, Comparator.comparingInt(Object::hashCode));
return Arrays.binarySearch(values,updatedObject.hashCode(),Comparator.comparingInt(Object::hashCode))
}
Assuming that hashcode() is implemented properly, I can't think of any case where this could fail or give any error.
What are the cases, if any, that this could give us an error !
NOTE: Code is edited, I added Comparator to the binary search
There are quite a few issues with your code.
hashCode is used to infer object inequality. Meaning that two objects may virtually have the same hashCode and yet an equals invocation may return false (see also here). In this case, the outcome of binarySearch may yield unexpected results (see also here).
binarySearch invocations without passing a Comparator assume that your objects are sorted based on their natural ordering, which implies they are Comparable (which we cannot know with your code). If your objects were not Comparable, you would get a ClassCastException when invoking binarySearch as you do. You would at the very least need to use the Comparator employed when sorting the array, and pass it to your binarySerach invocation (see here for overload API).
Note: OP added Comparator and edited question - only valid with previous version of the question.
As a general OO guideline, you may want to rethink what objects you are going to use your isIn method with, and maybe bind the generic type to a broad type that yields enough information in terms of properties to have a natural order, or at least to be sortable using a Comparator based on properties that it makes sense to sort with. TL;DR the hash code's purpose is not to sort objects.
Following up on point 3, you'd expect an isIn method to return a boolean type, not an integer type. If your goal is simply to infer whether your array contains a given value, override equals in the objects passed, wrap your array in a suitable Collection and invoke contains.
This should fail; because, when two objects contain the same hashcode the order will be unspecified. The contract on hash codes is that all objects returning equals(...) as true have the same hashcode, not all equal hashcodes have equals(...) return true.
Hashcodes are not unique, that's why we have the hashcode act as the first means of determining cheap equality, but always have to follow up with the equals(...) method afterwards.
Now, the equals(...) method doesn't provide comparision (ordering, so you'll have to back it up with a Comparator for the object anyway. Since you initially ordered on hashcode, your Comparator will have to provide orderings that don't violate the hascode-first approach.
// pseudocode
unless hashcodes are equal, return the value of Comparator.comparingInt()
when hashcodes are equal, pick some stable ordering.
Note that this would work well within it's bubble of your code; but, if you wanted to assure that someone didn't break this in the future, you might need to also add a Comparable interface to your object, and have it return the "hashcode, then sub-ordering" with the same logic as your Comparator.
Assuming that hashcode() is implemented properly
This is a big assumption as Java's library classes (e.g. String) themselves have collision in hashCode() values, e.g.
System.out.println("Aa".hashCode() + "," + "BB".hashCode());
This will print the same hash code and hence, comparator will consider both the objects equal.
Your code will produce an index of an object with the same hash code, but since objects are allowed to have the same hash code without being equal, you need to perform some additional work before returning the result.
Walk the list of items with the same hash code until you find equality, find a different hash code, or go off the end of the array:
public <T> int isIn(T[] list, T t) {
Comparator<T> cmp = Comparator.comparingInt(Object::hashCode);
Arrays.sort(list, cmp);
int pos = Arrays.binarySearch(list, t, cmp);
if (pos < 0) {
return -1;
}
// At this point pos is a valid index, but it may not be of the same object
// Continue with a linear search of the equal range of hash codes from here:
int hash = t.hashCode();
while (pos != list.length()) {
if (t.equals(list[pos]) {
return pos;
}
if (list[pos].hashCode() != hash) {
return -1;
}
pos++;
}
return -1;
}
Note: Although this is consistent with the way the hashCode/equals are supposed to be used in Java, the approach is less efficient than using a HashSet<T>, because it requires O(n*log n) sorting step.

Case-insensitive Comparator breaks my TreeMap

A Comparator I used in my TreeMap broke the behavior I intended for that TreeMap. Look at the following code:
TreeMap<String, String> treeMap = new TreeMap<>(new Comparator<String>() {
public int compare(String o1, String o2) {
return o1.toLowerCase().compareTo(o2.toLowerCase());
}
});
treeMap.put("abc", "Element1");
treeMap.put("ABC", "Element2");
What I think I have done is that I have created a map that is sorted by its keys, case-insensitive. The two distinct elements have non-equal keys (abc and ABC) whose comparison will return 0. I expected just a random ordering of the two elements. Yet, the command:
System.out.println("treeMap: " + treeMap);
resulted in:
treeMap: {abc=Element2}
The key abc has been re-assigned the value Element2!
Can anyone explain how could this happen and if it's a valid, documented behavior of TreeMap?
It happens because TreeMap considers elements equal if a.compareTo(b) == 0. It's documented in the JavaDoc for TreeMap (emphasis mine):
Note that the ordering maintained by a tree map, like any sorted map, and whether or not an explicit comparator is provided, must be consistent with equals if this sorted map is to correctly implement the Map interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Map interface is defined in terms of the equals operation, but a sorted map performs all key comparisons using its compareTo (or compare) method, so two keys that are deemed equal by this method are, from the standpoint of the sorted map, equal. The behavior of a sorted map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface.
Your comparator isn't consistent with equals.
If you want to keep not-equal-but-equal-ignoring-case elements, put a second level of checking into your comparator, to use case-sensitive ordering:
public int compare(String o1, String o2) {
int cmp = o1.compareToIgnoreCase(o2);
if (cmp != 0) return cmp;
return o1.compareTo(o2);
}
The Comparator you pass to a TreeMap determines not just the ordering of the keys inside the Map, it also determines whether two keys are considered identical (they are considered identical when compare() returns 0).
Therefore, in your TreeMap, "abc" and "ABC" are considered identical keys. Maps don't allow identical keys, so the second value Element2 overwrites the first value Element1.
You need to ensure that the equality of that map's elements is consistent with the comparator. Quoting from the class comment:
Note that the ordering maintained by a tree map, like any sorted map,
and whether or not an explicit comparator is provided, must be
consistent with equals if this sorted map is to correctly
implement the interface.
The accepted answer is technically correct, but misses the idiomatic solution to the problem.
You should be using the static String.CASE_INSENSITIVE_ORDER comparator provided or at least using String.compareToIgnoreCase() inside your own to consider what is .equal().
For locale sensitive comparisons, you should use something from java.text.Collator

TreeSet not adding all elements and HashSet

I have two sets on my code, and on them I add the same collection of elements. The problem is, the TreeSet doesn't add all elements. I am getting a bit confused.
I having a problem for a while now and I am struggling to find out why my TreeSet won’t add all of the elements in the Collection I pass to addAll.
It is a TreeSet build with a comparator, for items that have an equals method like:
public final boolean equals(Object o) {
return this==o;
}
#Override
public int hashCode() {
int hash = 3;
hash = 67 * hash + Objects.hashCode(this.grauDeAdaptacao);
hash = 67 * hash + Objects.hashCode(this.idade);
return hash;
}
Just to test I did the following:
HashSet<Item> test1 = new HashSet<>(items);
TreeSet<Item> test2 = new TreeSet<>(getComparator());
test2.addAll(items);
if (test1.size() < 50 || test2.size()<50 ) {
throw new IllegalStateException();
}
And the comparator uses:
private int compare(S ser1, S ser2) {
return ser1.getGrau().compareTo(ser2.getGrau());
}
But what is awkward is that, the hash seams fine, while the TreeSet don’t have all 50 elements.
I need two elements to be equal just when they are the same instance, in all subclasses, that is why I made the final method like that.
A HashSet uses equals to test two objects for equality.
A HashSet guarantees that it is never the case that for any two distinct objects, a and b, in the Set a.equals(b) == true.
A TreeSet uses compareTo to test two objects for equality.
A TreeSet guarantees that it is never the case that for any two distinct objects, a and b, in the Set a.compareTo(b) == 0.
Under the assumption that a.compareTo(b) == 0 iff a.equals(b) then this behaviour is the same. Under this condition it can be said that the compareTo method is "consistent with equals", as defined in the documentation for Comparable
The same documentation also states that:
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
This is an example of 'behav[ing] "strangely"'.
You have some objects for which a.equals(b) == false but a.compareTo(b) == 0.
It should further be noted that the for the implementation of hashCode it is required that if a.equals(b) == true then a.hashCode() == b.hashCode(). This is not the case in your implementation. Your implementation of hashCode() is invalid given your implementation of equals.
The reflexive property is not required. i.e. it can (and will) be the case that a.hashCode() == b.hashCode() and a.equals(b) == false.
So, in summary.
Your hashCode and equals are wrong. They need to be consistent, as described in the documentation for equals and hashCode.
Your comapreTo is wrong, it should be "consistent with equals" as described in the documentation for Comparable.
Solved, with your help of course.
As I needed a data structure that as efficient to insert and search, while it would be always sorted - beening from the Java SE API. TreeSet was just too simple to not use it. So I adjusted the comparator and the items to use IDs. and just like that the Tree got happy, it ordered my elements with the property I intended, and in case they were the same, the ID would decide the order...it got compatible with equals, and everybody got happy.

Why is the comparator used instead of the equals() in Collections?

There is a java bean Car that might contains two values: model and price.
Now suppose I override equals() and hashcode() checking only for model in that way:
public boolean equals(Object o) {
return this.model.equals(o.model);
}
public int hashCode() {
return model.hashCode();
}
This permit me to check if an arraylist already contain an item Car of the same model (and doesn't matter the price), in that way:
List<Car> car = new ArrayList<Car>();
car.add(new Car("carA",100f));
car.add(new Car("carB",101f));
car.add(new Car("carC",110f));
System.out.println(a.contains(new Car("carB",111f)));
It returns TRUE. That's fine, because the car already exist!
But now I decide that the ArrayList is not good, because I want to maintain the items ordered, so I substitute it with a TreeSet in this way:
Set<Car> car = new TreeSet<Car>(new Comparator<Car>() {
#Override
public int compare(Car car1, Car car2) {
int compPrice = - Float.compare(car1.getPrice(), car2.getPrice());
if (compPrice > 0 || compPrice < 0)
return compPrice;
else
return car1.getModel().compareTo(car2.getModel());
}});
car.add(new Car("carA",100f));
car.add(new Car("carB",101f));
car.add(new Car("carC",110f));
System.out.println(a.contains(new Car("carB",111f)));
But now there is a problem, it return FALSE... why?
It seems that when I invoke contains() using an arrayList the method equals() is invoked.
But it seems that when I invoke contains() using a TreeSet with a comparator, the comparator is used instead.
Why does that happen?
TreeSet forms a binary tree keeping elements according to natural (or not) orders, so in order to search quickly one specific element is the collection, TreeSet uses Comparable or Comparator instead of equals().
As TreeSet JavaDoc precises:
Note that the ordering maintained by a set (whether or not an explicit
comparator is provided) must be consistent with equals if it is to
correctly implement the Set interface. (See Comparable or Comparator
for a precise definition of consistent with equals.) This is so
because the Set interface is defined in terms of the equals operation,
but a TreeSet instance performs all element comparisons using its
compareTo (or compare) method, so two elements that are deemed equal
by this method are, from the standpoint of the set, equal. The
behavior of a set is well-defined even if its ordering is inconsistent
with equals; it just fails to obey the general contract of the Set
interface.
We can find a similarity with the HashCode/Equals contract:
If equals() returns true, hashcode() has to return true too in order to be found during search.
Likewise with TreeSet:
If contains() (using Comparator or Comparable) returns true, equals() has to return true too in order to be consistent with equals().
THEREFORE: Fields used within TreeSet.equals() method have to be exactly the same (no more, no less) than within your Comparator implementation.
A TreeSet is implicitly sorted, and it uses a Comparator for this sorting. The equals() method can only tell you if two objects are the same or different, not how they should be ordered for sorting. Only a Comparator can do that.
More to the point, a TreeSet also uses comparisons for searching. This is sort of the whole point of tree-based map/set. When the contains() method is called, a binary search is performed and the target is either found or not found, based on how the comparator is defined. The comparator defines not only logical order but also logical identity. If you are relying on logical identity defined by an inconsistent equals() implementation, then confusion will probably ensue.
The reason for the different behaviour is, that you consider the price member in the compare method, but ignore it in equals.
new Car("carB",101f) // what you add to the list
new Car("carB",111f) // what you are looking for
Both instances are "equals" (sorry...) since their model members are equal (and the implementation stops after that test). They "compare" as different, though, because that implementation also checks the price member.

How to make a distinction between two equals objects in a Sorted collection?

I may be wrong but for me, we can override equals for an object so that you consider them has being meaningfully equals.
All the entry in a map have distinct keys, and all the entries in set have distinct values (not meaningfully equals)
But when using a TreeMap or a TreeSet, you can provide a comparator.
I noticed that when a comparator is provided, the object's equals method is bypassed, and two objets are considered equals when the comparator returns 0.
Thus, we have 2 objects but inside of a map keyset, or a set, only one is kept.
I'd like to know if it is possible, using a sorted collection, to make a distinction for two different instances.
Here's an easy sample:
public static void main(String[] args) {
TreeSet<String> set = new TreeSet<String>();
String s1 = new String("toto");
String s2 = new String("toto");
System.out.println(s1 == s2);
set.add(s1);
set.add(s2);
System.out.println(set.size());
}
Notice that using new String("xxx") bypass the use of the String pool, thus s1 != s2.
I'd like to know how to implement a comparator so that the set size is 2 and not 1.
The main question is: for two distinct instances of the same String value, how can i return something != 0 in my comparator?
Note that i'd like to have that comparator respect the rules:
Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second. The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)
The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
Finally, the implementer must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
I can use a trick like:
public int compare(String s1,String s2) {
if s1.equals(s2) { return -1 }
...
}
It seems to work fine but the rules are not respected since compare(s1,s2) != -compare(s2,s1)
So is there any elegant solution to this problem?
Edit: for those wondering why i ask such a thing. It's more by curiosity than any real life problem.
But i've already been in a situation like that and though about a solution to this problem:
Imagine you have:
class Label {
String label;
}
For each label you have an associated String value.
Now what if you want to have a map, label->value.
But now what if you want to be able to have twice the same label as a map key?
Ex
"label" (ref1) -> value1
"label" (ref2) -> value2
You can implement equals so that two distinct Label instances are not equals -> i think it works for HashMap.
But what if you want to be able to sort these Label objects by alphabetical order?
You need to provide a comparator or implement comparable.
But how can we make an order distinction between 2 Labels having the same label?
We must!
compare(ref1,ref2) must not return 0. But should it return -1 or 1 ?
We could compare the memory address or something like that to take such a decision but i think it's not possible in Java...
If you're using Guava, you can make use of Ordering.arbitrary(), which will impose an additional order on elements which remains consistent for the life of the VM. You can use this to break ties in your Comparator in a consistent way.
However you could be using the wrong data structure. Have you considered using a Multiset (e.g. TreeMultiset), which allows multiple instances to be added?
Try using the following Comparator (for your example):
Comparator<String> comp = Ordering.natural().compound(Ordering.arbitrary());
This will sort things according to their natural Comparable ordering, but when the natural ordering is equal it will fall back to an arbitrary ordering so that distinct objects remain distinct.
I'm not sure it's a great idea to do that. From the javadoc for Comparator:
Caution should be exercised when using a comparator capable of
imposing an ordering inconsistent with equals to order a sorted set
(or sorted map). Suppose a sorted set (or sorted map) with an explicit
comparator c is used with elements (or keys) drawn from a set S. If
the ordering imposed by c on S is inconsistent with equals, the sorted
set (or sorted map) will behave "strangely." In particular the sorted
set (or sorted map) will violate the general contract for set (or
map), which is defined in terms of equals.
if you want a sorted collection with equal objects, you can put all your objects in a List and use Collections.sort().
You might want to use a SortedSet<Collection<String>> or similar, since - as you mentioned - a sorted doesn't allow you to add multiple equal entries.
Altenatively you can use Guava's MultiSet.
From the JavaDoc on SortedSet:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface.
However, one question still remains: why do you want to have two distinct instances that are logically equal (that's what equals() actually means).
The comparator should actualy return 0 only when the two references refer to the same object, like this:
public int compare(String s1,String s2) {
if (s1!=s2) {
int result = s1.compareTo(s2);
if (result == 0) {
return -1;
} else {
return result;
}
} else {
return 0;
}
}

Categories

Resources