A Comparator I used in my TreeMap broke the behavior I intended for that TreeMap. Look at the following code:
TreeMap<String, String> treeMap = new TreeMap<>(new Comparator<String>() {
public int compare(String o1, String o2) {
return o1.toLowerCase().compareTo(o2.toLowerCase());
}
});
treeMap.put("abc", "Element1");
treeMap.put("ABC", "Element2");
What I think I have done is that I have created a map that is sorted by its keys, case-insensitive. The two distinct elements have non-equal keys (abc and ABC) whose comparison will return 0. I expected just a random ordering of the two elements. Yet, the command:
System.out.println("treeMap: " + treeMap);
resulted in:
treeMap: {abc=Element2}
The key abc has been re-assigned the value Element2!
Can anyone explain how could this happen and if it's a valid, documented behavior of TreeMap?
It happens because TreeMap considers elements equal if a.compareTo(b) == 0. It's documented in the JavaDoc for TreeMap (emphasis mine):
Note that the ordering maintained by a tree map, like any sorted map, and whether or not an explicit comparator is provided, must be consistent with equals if this sorted map is to correctly implement the Map interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Map interface is defined in terms of the equals operation, but a sorted map performs all key comparisons using its compareTo (or compare) method, so two keys that are deemed equal by this method are, from the standpoint of the sorted map, equal. The behavior of a sorted map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface.
Your comparator isn't consistent with equals.
If you want to keep not-equal-but-equal-ignoring-case elements, put a second level of checking into your comparator, to use case-sensitive ordering:
public int compare(String o1, String o2) {
int cmp = o1.compareToIgnoreCase(o2);
if (cmp != 0) return cmp;
return o1.compareTo(o2);
}
The Comparator you pass to a TreeMap determines not just the ordering of the keys inside the Map, it also determines whether two keys are considered identical (they are considered identical when compare() returns 0).
Therefore, in your TreeMap, "abc" and "ABC" are considered identical keys. Maps don't allow identical keys, so the second value Element2 overwrites the first value Element1.
You need to ensure that the equality of that map's elements is consistent with the comparator. Quoting from the class comment:
Note that the ordering maintained by a tree map, like any sorted map,
and whether or not an explicit comparator is provided, must be
consistent with equals if this sorted map is to correctly
implement the interface.
The accepted answer is technically correct, but misses the idiomatic solution to the problem.
You should be using the static String.CASE_INSENSITIVE_ORDER comparator provided or at least using String.compareToIgnoreCase() inside your own to consider what is .equal().
For locale sensitive comparisons, you should use something from java.text.Collator
Related
Suppose I have a class not implementing the Comparable interface like
class Dummy {
}
and a collection of instances of this class plus some function external to the class that allows comparing these instances partially (a map will be used for this purpose below):
Collection<Dummy> col = new ArrayList<>();
Map<Dummy, Integer> map = new HashMap<>();
for (int i = 0; i < 12; i++) {
Dummy d = new Dummy();
col.add(d);
map.put(d, i % 4);
}
Now I want to sort this collection using the TreeSet class with a custom comparator:
TreeSet<Dummy> sorted = new TreeSet<>(new Comparator<Dummy>() {
#Override
public int compare(Dummy o1, Dummy o2) {
return map.get(o1) - map.get(o2);
}
});
sorted.addAll(col);
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements. My next attempt was to change the compare method of the comparator to
#Override
public int compare(Dummy o1, Dummy o2) {
int d = map.get(o1) - map.get(o2);
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
return 1; // is this acceptable?
}
It seemingly gives the desired result for this simple demonstrational example but I'm still in doubt: is it correct to always return 1 for unequal (but undistinguishable by the map) objects? Such a relation still violates the general contact for the Comparator.compare() method because sgn(compare(x, y)) == -sgn(compare(y, x)) is, generally, wrong. Do I really need to implement a correct total ordering for TreeSet to work correctly or the above is enough? How to do this when an instance has no fields to compare?
For more real-life example imagine that, instead of Dummy, you have a type parameter T of some generic class. T may have some fields and implement the equals() method through them, but you don't know these fields and yet need to sort instances of this class according to some external function. Is this possible with the help of TreeSet?
Edit
Using System.identityHashCode() is a great idea but there is (not so small) chance of collision.
Besides possibility of such a collision, there is one more pitfall. Suppose you have 3 objects: a, b, c such that map.get(a) = map.get(b) = map.get(c) (here = isn't assignment but the mathematical equality), identityHashCode(a) < identityHashCode(b) < identityHashCode(c), a.equals(c) is true, but a.equals(b) (and hence c.equals(b)) is false. After adding these 3 elements to a TreeSet in this order: a, b, c you can get into a situation when all of them have been added to the set, that contradicts the prescribed behaviour of the Set interface - it should not contain equal elements. How to deal with that?
In addition, it would be great if someone familiar with TreeSet mechanics explained to me what does the term "well-defined" in the phrase "The behavior of a set is well-defined even if its ordering is inconsistent with equals" from TreeSet javadoc mean.
Unless you have an absolutely huge amount of Dummy objects and really bad luck, you can use System.identityHashCode()to break ties:
Comparator.<Dummy>comparingInt(d -> map.get(d))
.thenComparingInt(System::identityHashCode)
Your comparator is not acceptable, since it violates the contract: you have d1 > d2 and d2 > d1 at the same time if they're not equal and don't share the same value in the map.
This answer covers just the first example in the question. The remainder of the question, and the various edits, are I think better answered as part of separate, focused questions.
The first example sets up 12 instances of Dummy, creates a map that maps each instance to an Integer in the range [0, 3], and then adds the 12 Dummy instances to a TreeSet. That TreeSet is provided with a comparator that uses the Dummy-to-Integer map. The result is that the TreeSet contains only four of the Dummy instances. The example concludes with the following statement:
The result is obviously unsatisfactory (contains less elements than the initial collection). This is because such a comparator is not consistent with equals, i.e. sometimes returns 0 for non-equal elements.
This last sentence is incorrect. The result contains fewer elements than were inserted because the comparator considers many of the instances to be duplicates and therefore they aren't inserted into the set. The equals method doesn't enter the discussion at all. Therefore, the concept of "consistent with equals" isn't relevant to this discussion. TreeSet never calls equals. The comparator is the only thing that determines membership in the TreeSet.
This seems like an unsatisfactory result, but only because we happen "know" that there are 12 distinct Dummy instances. However, the TreeSet doesn't "know" that they are distinct. It only knows how to compare the Dummy instances using the comparator. When it does so, it finds that several are duplicates. That is, the comparator returns 0 sometimes even though it's being called with Dummy instances that we believe to be distinct. That's why only four Dummy instances end up in the TreeSet.
I'm not entirely sure what the desired outcome is, but it seems like the result TreeSet should contain all 12 instances ordered by values in the Dummy-to-Integer map. My suggestion was to use Guava's Ordering.arbitrary() which provides a comparator that distinguishes between distinct-but-otherwise-equal elements, but does so in a way that satisfies the general contract of Comparator. If you create the TreeSet like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(Ordering.arbitrary()));
the result will be that the TreeSet contains all 12 Dummy instances, sorted by Integer value in the map, and with Dummy instances that map to the same value ordered arbitrarily.
In the comments, you stated that the Ordering.arbitrary doc "unequivocally cautions against using it in SortedSet". That's not quite right; that doc says,
Because the ordering is identity-based, it is not "consistent with Object.equals(Object)" as defined by Comparator. Use caution when building a SortedSet or SortedMap from it, as the resulting collection will not behave exactly according to spec.
The phrase "not behave exactly according to spec" really means that it will behave "strangely" as described in the class doc of Comparator:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
For example, suppose one adds two elements a and b such that (a.equals(b) && c.compare(a, b) != 0) to an empty TreeSet with comparator c. The second add operation will return true (and the size of the tree set will increase) because a and b are not equivalent from the tree set's perspective, even though this is contrary to the specification of the Set.add method.
You seemed to indicate that this "strange" behavior was unacceptable, in that Dummy elements that are equals shouldn't appear in the TreeSet. But the Dummy class doesn't override equals, so it seems like there's an additional requirement lurking behind here.
There are some additional questions added in later edits to the question, but as I mentioned above, I think these are better handled as separate question(s).
UPDATE 2018-12-22
After rereading the question edits and comments, I think I've finally figured out what you're looking for. You want a comparator over any object that provides a primary ordering based on some int-valued function that may result in duplicate values for unequal objects (as determined by the objects' equals method). Therefore, a secondary ordering is required that provides a total ordering over all unequal objects, but which returns zero for objects that are equals. This implies that the comparator should be consistent with equals.
Guava's Ordering.arbitrary comes close in that it provides an arbitrary total ordering over any objects, but it only returns zero for objects that are identical (that is, ==) but not for objects that are equals. It's thus inconsistent with equals.
It sounds, then, that you want a comparator that provides an arbitrary ordering over unequal objects. Here's a function that creates one:
static Comparator<Object> arbitraryUnequal() {
Map<Object, Integer> map = new HashMap<>();
return (o1, o2) -> Integer.compare(map.computeIfAbsent(o1, x -> map.size()),
map.computeIfAbsent(o2, x -> map.size()));
}
Essentially, this assigns a sequence number to every newly seen unequal object and keeps these numbers in a map held by the comparator. It uses the map's size as the counter. Since objects are never removed from this map, the size and thus the sequence number always increases.
(If you intend for this comparator to be used concurrently, e.g., in a parallel sort, the HashMap should be replaced with a ConcurrentHashMap and the size trick should be modified to use an AtomicInteger that's incremented when new entries are added.)
Note that the map in this comparator builds up entries for every unequal object that it's ever seen. If this is attached to a TreeSet, objects will persist in the comparator's map even after they've been removed from the TreeSet. This is necessary so that if objects are added or removed, they'll retain consistent ordering over time. Guava's Ordering.arbitrary uses weak references to allow objects to be collected if they're no longer used. We can't do that, because we need to preserve the ordering of non-identical but equal objects.
You'd use it like this:
SortedSet<Dummy> sorted = new TreeSet<>(Comparator.<Dummy>comparingInt(map::get)
.thenComparing(arbitraryUnequal()));
You had also asked what "well-defined" means in the following:
The behavior of a set is well-defined even if its ordering is inconsistent with equals
Suppose you were to use a TreeSet using a comparator that's inconsistent with equals, such as the one using Guava's Ordering.arbitrary shown above. The TreeSet will still work as expected, consistent with itself. That is, it will maintain objects in a total ordering, it will not contain any two objects for which the comparator returns zero, and all its methods will work as specified. However, it is possible for there to be an object for which contains returns true (since that's computed using the comparator) but for which equals is false if called with the object actually in the set.
For example, BigDecimal is Comparable but its comparison method is inconsistent with equals:
> BigDecimal z = new BigDecimal("0.0")
> BigDecimal zz = new BigDecimal("0.00")
> z.compareTo(zz)
0
> z.equals(zz)
false
> TreeSet<BigDecimal> ts = new TreeSet<>()
> ts.add(z)
> HashSet<BigDecimal> hs = new HashSet<>(ts)
> hs.equals(ts)
true
> ts.contains(zz)
true
> hs.contains(zz)
false
This is what the spec means when it says things can behave "strangely". We have two sets that are equal. Yet they report different results for contains of the same object, and the TreeSet reports that it contains an object even though that object is unequal to an object in the set.
Here's the comparator I ended up with. It is both reliable and memory efficient.
public static <T> Comparator<T> uniqualizer() {
return new Comparator<T>() {
private final Map<T, Integer> extraId = new HashMap<>();
private int id;
#Override
public int compare(T o1, T o2) {
int d = Integer.compare(o1.hashCode(), o2.hashCode());
if (d != 0)
return d;
if (o1.equals(o2))
return 0;
d = extraId.computeIfAbsent(o1, key -> id++)
- extraId.computeIfAbsent(o2, key -> id++);
assert id > 0 : "ID overflow";
assert d != 0 : "Concurrent modification";
return d;
}
};
}
It creates total ordering on all objects of the given class T and thus allows to distinguish objects not distinguishable by a given comparator via attaching to it like this:
Comparator<T> partial = ...
Comparator<T> total = partial.thenComparing(uniqualizer());
In the example given at the question, T is Dummy and
partial = Comparator.<Dummy>comparingInt(map::get);
Note that you don't need to specify the type T when calling uniqualizer(), complier automatically determines it via type inference. You only have to make sure that hashCode() in T is consistent with equals(), as described in the general contract of hashCode(). Then uniqualizer() will give you the comparator (total) consistent with equals() and you can use it in any code that requires comparing objects of type T, e.g. when creating a TreeSet:
TreeSet<T> sorted = new TreeSet<>(total);
or sorting a list:
List<T> list = ...
Collections.sort(list, total);
I have a nested treemap based on the following structure, and then it ofc continues from "2":{ with the same structure..
http://pastebin.com/uKwAVz5L
And as you can see, it is already sorted by the "c13" sub item (episode number).. but when i use the treemap in my applications, it shows up like this:
http://i50.tinypic.com/15o9vno.png
They are not even remotely sorted.. but i cant see why? :O
Its the same problem when using it in my android app..
Cheers
Here is some valuable infomation on TreeMap:
Red-Black tree based implementation of the SortedMap interface. This
class guarantees that the map will be in ascending key order, sorted
according to the natural order for the key's class (see Comparable),
or by the comparator provided at creation time, depending on which
constructor is used.
Note that the ordering maintained by a sorted map (whether or not an
explicit comparator is provided) must be consistent with equals if
this sorted map is to correctly implement the Map interface. (See
Comparable or Comparator for a precise definition of consistent with
equals.) This is so because the Map interface is defined in terms of
the equals operation, but a map performs all key comparisons using its
compareTo (or compare) method, so two keys that are deemed equal by
this method are, from the standpoint of the sorted map, equal. The
behavior of a sorted map is well-defined even if its ordering is
inconsistent with equals; it just fails to obey the general contract
of the Map interface.
Have you correctly implemented the methods mentioned above?
There are also different implementations of the Collections framework (An overview is here). If TreeMap doesn't provide the functionality you want you can implement another one and modify it to your needs.
Try using a Comparator:
TreeMap map = new TreeMap<Obj1, Obj2>(new ObjComparator());
private class ObjComparator() implements Comparator<Obj1> {
public int compareTo(Obj1 o1, Obj1 o2) {
return o1.compareTo(o2); // do your logic here
}
}
There is a java bean Car that might contains two values: model and price.
Now suppose I override equals() and hashcode() checking only for model in that way:
public boolean equals(Object o) {
return this.model.equals(o.model);
}
public int hashCode() {
return model.hashCode();
}
This permit me to check if an arraylist already contain an item Car of the same model (and doesn't matter the price), in that way:
List<Car> car = new ArrayList<Car>();
car.add(new Car("carA",100f));
car.add(new Car("carB",101f));
car.add(new Car("carC",110f));
System.out.println(a.contains(new Car("carB",111f)));
It returns TRUE. That's fine, because the car already exist!
But now I decide that the ArrayList is not good, because I want to maintain the items ordered, so I substitute it with a TreeSet in this way:
Set<Car> car = new TreeSet<Car>(new Comparator<Car>() {
#Override
public int compare(Car car1, Car car2) {
int compPrice = - Float.compare(car1.getPrice(), car2.getPrice());
if (compPrice > 0 || compPrice < 0)
return compPrice;
else
return car1.getModel().compareTo(car2.getModel());
}});
car.add(new Car("carA",100f));
car.add(new Car("carB",101f));
car.add(new Car("carC",110f));
System.out.println(a.contains(new Car("carB",111f)));
But now there is a problem, it return FALSE... why?
It seems that when I invoke contains() using an arrayList the method equals() is invoked.
But it seems that when I invoke contains() using a TreeSet with a comparator, the comparator is used instead.
Why does that happen?
TreeSet forms a binary tree keeping elements according to natural (or not) orders, so in order to search quickly one specific element is the collection, TreeSet uses Comparable or Comparator instead of equals().
As TreeSet JavaDoc precises:
Note that the ordering maintained by a set (whether or not an explicit
comparator is provided) must be consistent with equals if it is to
correctly implement the Set interface. (See Comparable or Comparator
for a precise definition of consistent with equals.) This is so
because the Set interface is defined in terms of the equals operation,
but a TreeSet instance performs all element comparisons using its
compareTo (or compare) method, so two elements that are deemed equal
by this method are, from the standpoint of the set, equal. The
behavior of a set is well-defined even if its ordering is inconsistent
with equals; it just fails to obey the general contract of the Set
interface.
We can find a similarity with the HashCode/Equals contract:
If equals() returns true, hashcode() has to return true too in order to be found during search.
Likewise with TreeSet:
If contains() (using Comparator or Comparable) returns true, equals() has to return true too in order to be consistent with equals().
THEREFORE: Fields used within TreeSet.equals() method have to be exactly the same (no more, no less) than within your Comparator implementation.
A TreeSet is implicitly sorted, and it uses a Comparator for this sorting. The equals() method can only tell you if two objects are the same or different, not how they should be ordered for sorting. Only a Comparator can do that.
More to the point, a TreeSet also uses comparisons for searching. This is sort of the whole point of tree-based map/set. When the contains() method is called, a binary search is performed and the target is either found or not found, based on how the comparator is defined. The comparator defines not only logical order but also logical identity. If you are relying on logical identity defined by an inconsistent equals() implementation, then confusion will probably ensue.
The reason for the different behaviour is, that you consider the price member in the compare method, but ignore it in equals.
new Car("carB",101f) // what you add to the list
new Car("carB",111f) // what you are looking for
Both instances are "equals" (sorry...) since their model members are equal (and the implementation stops after that test). They "compare" as different, though, because that implementation also checks the price member.
I may be wrong but for me, we can override equals for an object so that you consider them has being meaningfully equals.
All the entry in a map have distinct keys, and all the entries in set have distinct values (not meaningfully equals)
But when using a TreeMap or a TreeSet, you can provide a comparator.
I noticed that when a comparator is provided, the object's equals method is bypassed, and two objets are considered equals when the comparator returns 0.
Thus, we have 2 objects but inside of a map keyset, or a set, only one is kept.
I'd like to know if it is possible, using a sorted collection, to make a distinction for two different instances.
Here's an easy sample:
public static void main(String[] args) {
TreeSet<String> set = new TreeSet<String>();
String s1 = new String("toto");
String s2 = new String("toto");
System.out.println(s1 == s2);
set.add(s1);
set.add(s2);
System.out.println(set.size());
}
Notice that using new String("xxx") bypass the use of the String pool, thus s1 != s2.
I'd like to know how to implement a comparator so that the set size is 2 and not 1.
The main question is: for two distinct instances of the same String value, how can i return something != 0 in my comparator?
Note that i'd like to have that comparator respect the rules:
Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second. The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)
The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
Finally, the implementer must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
I can use a trick like:
public int compare(String s1,String s2) {
if s1.equals(s2) { return -1 }
...
}
It seems to work fine but the rules are not respected since compare(s1,s2) != -compare(s2,s1)
So is there any elegant solution to this problem?
Edit: for those wondering why i ask such a thing. It's more by curiosity than any real life problem.
But i've already been in a situation like that and though about a solution to this problem:
Imagine you have:
class Label {
String label;
}
For each label you have an associated String value.
Now what if you want to have a map, label->value.
But now what if you want to be able to have twice the same label as a map key?
Ex
"label" (ref1) -> value1
"label" (ref2) -> value2
You can implement equals so that two distinct Label instances are not equals -> i think it works for HashMap.
But what if you want to be able to sort these Label objects by alphabetical order?
You need to provide a comparator or implement comparable.
But how can we make an order distinction between 2 Labels having the same label?
We must!
compare(ref1,ref2) must not return 0. But should it return -1 or 1 ?
We could compare the memory address or something like that to take such a decision but i think it's not possible in Java...
If you're using Guava, you can make use of Ordering.arbitrary(), which will impose an additional order on elements which remains consistent for the life of the VM. You can use this to break ties in your Comparator in a consistent way.
However you could be using the wrong data structure. Have you considered using a Multiset (e.g. TreeMultiset), which allows multiple instances to be added?
Try using the following Comparator (for your example):
Comparator<String> comp = Ordering.natural().compound(Ordering.arbitrary());
This will sort things according to their natural Comparable ordering, but when the natural ordering is equal it will fall back to an arbitrary ordering so that distinct objects remain distinct.
I'm not sure it's a great idea to do that. From the javadoc for Comparator:
Caution should be exercised when using a comparator capable of
imposing an ordering inconsistent with equals to order a sorted set
(or sorted map). Suppose a sorted set (or sorted map) with an explicit
comparator c is used with elements (or keys) drawn from a set S. If
the ordering imposed by c on S is inconsistent with equals, the sorted
set (or sorted map) will behave "strangely." In particular the sorted
set (or sorted map) will violate the general contract for set (or
map), which is defined in terms of equals.
if you want a sorted collection with equal objects, you can put all your objects in a List and use Collections.sort().
You might want to use a SortedSet<Collection<String>> or similar, since - as you mentioned - a sorted doesn't allow you to add multiple equal entries.
Altenatively you can use Guava's MultiSet.
From the JavaDoc on SortedSet:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface.
However, one question still remains: why do you want to have two distinct instances that are logically equal (that's what equals() actually means).
The comparator should actualy return 0 only when the two references refer to the same object, like this:
public int compare(String s1,String s2) {
if (s1!=s2) {
int result = s1.compareTo(s2);
if (result == 0) {
return -1;
} else {
return result;
}
} else {
return 0;
}
}
Like for a object to be inserted into a HashMap the object should implement the equals() and the hashcode() method(not necessarily).
Are there any special conditions for an object to be inserted in a TreeMap ?
Unless a Comparator which mutually compares the keys is provided in the TreeMap's constructor, the keys must implement Comparable.
See the javadocs on TreeMap constructors for more information: http://download.oracle.com/javase/6/docs/api/java/util/TreeMap.html
EDIT: As #MeBigFatGuy points out it is highly recommended for keys to override equals() as well, in such a way that the implementation is consistent with the comparison. From the TreeMap javadoc:
Note that the ordering maintained by a sorted map (whether or not an explicit comparator is provided) must be consistent with equals if this sorted map is to correctly implement the Map interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Map interface is defined in terms of the equals operation, but a map performs all key comparisons using its compareTo (or compare) method, so two keys that are deemed equal by this method are, from the standpoint of the sorted map, equal. The behavior of a sorted map is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Map interface.
The type/class should (again, not necessarily) implement the Comparable interface (and override the compareTo method), so as to decide the ordering within the TreeMap.