TreeSet Comperator

TreeSet Comperator - java

I used a TreeSet with a self written Comparator. Now when I'm adding elements to the TreeSet and the Comparator's compare methods returns 0, it seems like the TreeSet contains only one of the Object with equal ranking.
I didn't see that this behaviour is documented in the javadocs. Maybe I miss something. Can you confirm this behaviour?
I edited the Comparator. Now it never returns 0 and the TreeSet contains all the Objects with equal ranking.
Is that the way it has to be, if I want to have multiple Objects with equal ranking?

That's the way it has to be, as a set is defined as including equal objects only once.
When your Comparator returns 0, two objects are considered equal, therefore only one (probably the first) of all equal objects is included in the set.

Yes, this is documented in the JavaDoc for TreeSet:
Note that the ordering maintained by a
set (whether or not an explicit
comparator is provided) must be
consistent with equals if it is to
correctly implement the Set interface.
(See Comparable or Comparator for a
precise definition of consistent with
equals.) This is so because the Set
interface is defined in terms of the
equals operation, but a TreeSet
instance performs all element
comparisons using its compareTo (or
compare) method, so two elements that
are deemed equal by this method are,
from the standpoint of the set, equal.
The behavior of a set is well-defined
even if its ordering is inconsistent
with equals; it just fails to obey the
general contract of the Set interface. (Emphasis mine)

If you want a sorted collection that can hold multiple objects which are equal to each other, then the TreeMultiset from Google Collections would probably do the trick.

Related

Comparator Data Structure ordering - how to maintain insertion order whilst ordered on another field?

I have a data structure which takes an optional Comparator to customise ordering (in this case it is a TreeSet but really it doesn't matter, I can swap out for a PriorityQueue without breaking my code). At present, it is ordered by a field price on the object which the structure is storing.
When 2 objects have the same price, I want timestamp to be the tie-breaker, timestamp being System.CurrentTime. To specify this in the comparator I have to use:
if (Object1.getPrice == Object2.getPrice && Object1.Timestamp > Object2.Timestamp) return 1
The problem is that this breaks the equals case when I do TreeSet.floor() or TreeSet.ceiling() - the method no longer recognises that 2 objects are of equal price but will still recognise if the price is higher/lower. How do I mitigate this?

With the Comparator you decide what is the order of the set (actually as well their "uniqueness"!).
So you have to decide if the objects are equal:
if the Price is the same
if the Price and Timestamp is the same
maybe using different data-strucure for this different usages would be an option or a different that could cover both. Depending on the other requirements and boundaries of your code.
Take a look here: https://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
So you provide the Comparator.
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.
So if there is already a equals object in the Set (consulting the comparator) it want be added again as this would be placed at the very same position in the Set according to the order (if you don't like that don't use a Set). They should be implemented consistent (even tough it is not forced by the compiler). So instead of using direct equals it used the comparator.
Internally the comparator could use the equals operator of the objects (inside the if statement like if(obj1.equals(obj2)){return 0;}).
But how to implement the equals - internally it would rely on some property comparison (like you do now in the comparator).
If you don't like that for your data structure maybe don't use a (Tree)Set.
Depending on the needs a Map or a List might be fine.
Choosing the right Collection:
http://www.javapractices.com/topic/TopicAction.do?Id=65
For the difference between == and equals:
== is comparing the references, lets say the pointer to the container, if true it is the same instance.
equals is comparing the object, lets say the content of the object container, if true it holds same value(s) (in its properties)
But out there are probably millions of exhausting articles on that.
Note: equals of an Object should be implemented consistent with the hashCode() method. And for a (Tree)Set the comparator should be implemented consistent with equals (maybe use something like if(obj1.equals(obj2)){return 0;}). Furthermore, other Set implementations use the hasCode() for that.

when to implement comparable and when to implement equals in Java

In Java, when should I implement Comparable<Something> versus implementing the equals method? I understand every time I implement equals I also have to implement hash code.
EDIT
Based on answers I am getting below:
Is it safe to say that if I implement Comparable then I don't need to implement equals and hashCode? As in: whatever I can accomplish with equal is already included in compareTo? For an example, I want to be able to compare two BSTs for equality. Implementing a hashCode for that seems daunting; so would comparable be sufficient?

If you only ever need to compare them for equality (or put them in a HashMap or HashSet which is effectively the same) you only need to implement equals and hashcode.
If your objects have an implicit order and you indend to sort them (or put them in a TreeMap or TreeSet which is effectively sorting) then you must implement Comparable or provide a Comparator.

Comparable is typically used for ordering items (like sorting) and equals is used for checking if two items are equal. You could use comparable to check for equality but it doesn't have to be. From the docs, "It is strongly recommended (though not required) that natural orderings be consistent with equals"

equals (and hashCode) are used for equality tests. The Comparable interface can be used for equality checks too, but it is in practice used for sorting elements or comparing their order.
That is, equals deals only with equality (similar to == and !=), while compareTo allows you to check many different types of inequality (similar to <, <=, ==, >=, > and !=). Therefore, if your class has a partial or total ordering of some kind, you may want to implement Comparable.
The relationship between compareTo and equals is briefly mentioned in the javadocs for Comparable:
It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly indicate this fact. The recommended language is "Note: this class has a natural ordering that is inconsistent with equals."
Comparable is used by TreeSet to compare and sort anything you insert into it, and it is used by List#sort(null) to sort a list. It will be used in other places too, but those are the first that come to mind.

You should consider when you will use the object.
If for example in TreeMap and TreeSet, then you'll need Comparable. Also any case, when you will have to sort elements(especially sorted collections), implementing Comparable is mandatory. Overriding equals and hashcode will be needed in a very large amount of cases, most of them.

As per the javadocs:
This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.
Lists (and arrays) of objects that implement this interface can be sorted automatically by Collections.sort (and Arrays.sort). Objects that implement this interface can be used as keys in a sorted map or as elements in a sorted set, without the need to specify a comparator.
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false
In short, equals() and hashcode() are for comparing equality whereas Comparable is for sorting

If you update the equals method, you should always update hashCode or you will be setting a boobytrap for yourself or others when they try to use it in a HashMap or HasSet or similar collections (very commonly used).
Comparable is required only when the interfaces you are working with require it. You can implement it for use with sorting and sorted collections but it's not actually required in most cases. An alternate approach is to pass in a Comparator to the sort or collection constructor. For example the String class has a case insensitive Comparator that is very useful and eliminates the need to create your own String class (i.e. String cannot be extended.)

Equals and Comparable with Sets

I posted some code here which correctly solved a problem the poster had. OP wanted to remove duplicates and bring certain special items to the top of a list. I used a TreeSet with a special Comparable class which wrapped the Locale they were working with to achieve what they wanted.
I then got to thinking ... as you do ... that I was eliminating duplicates by returning 0 from the compareTo method, not by returning true from an equals implementation as one would need to do to correctly indicate a duplicate in a Set (from the definition of a Set).
I have no objection to using this technique but am I using what might be considered an undocumented feature? Am I safe to assume that doing this kind of thing going forward will continue to work?

It seems like this is pretty well documented in JavaDoc of TreeSet (bold mine):
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
Here is an example of the only (?) JDK class that implements Comparable but is not consistent with equals():
Set<BigDecimal> decimals = new HashSet<BigDecimal>();
decimals.add(new BigDecimal("42"));
decimals.add(new BigDecimal("42.0"));
decimals.add(new BigDecimal("42.00"));
System.out.println(decimals);
decimals at the end have three values because 42, 42.0 and 42.00 are not equal as far as equals() is concerned. But if you replace HashSet with TreeSet, the resulting set contains only 1 item (42 - that happened to be the first one added) as all of them are considered equal when compared using BigDecimal.compareTo().
This shows that TreeSet is in a way "broken" when using types not consistent with equals(). It still works properly and all operations are well-defined - it just doesn't obey the contract of Set class - if two classes are not equal(), they are not considered duplicates.
See also
What does comparison being consistent with equals mean ? What can possibly happen if my class doesn't follow this principle?

java.lang.Comparable and equals

If I implement java.lang.Comparable for a class, do I still have to override the equals() method? Or will the Comparable work for equals as well?
If the answer is no, then what if some discrepancy arises? Let's say the way I term two objects as equal within the equals() method is different from the way I term two objects of the same class as equal within the compareTo() of the Comparable.
Moreover, if I implement Comparable, do I also have to override equals()?

While it is recommended (and pretty sensible) that having a.compareTo(b) == 0 imply that a.equals(b) (and visa versa), it is not required. Comparable is intended to be used when performing an ordering on a series of objects, whereas equals() just tests for straight equality.
This link has some good information on implementing compareTo properly.

From Javadoc of java.lang.Comparable:
It is strongly recommended (though not required) that natural orderings be consistent with equals.

While it is recommended, it is not required that .equals() and .compareTo() have the same behaviour.
Just look at the BigDecimal Docs for equals() method:
Unlike compareTo, this method considers two BigDecimal objects equal
only if they are equal in value and scale (thus 2.0 is not equal to
2.00 when compared by this method).
BigDecimal is a core class of java that has different behaviour for equals() and compareTo() and serves as a good example of the difference between 2 objects that are comparable vs truly equal.

Let's say the way I term two objects as equal within the equals()
method is different from the way I term two objects of the same class
as equal within the toCompare() of the Comparable?
If you do this, and you put those objects into a sorted set, the set will misbehave. From the docs on SortedSet:
Note that the ordering maintained by a sorted set (whether or not an
explicit comparator is provided) must be consistent with equals if the
sorted set is to correctly implement the Set interface.
For example, a TreeSet may (erroneously) contain two objects where
a.compareTo(b) != 0
even though
a.equals(b) == true

Java SortedSet + Comparator, consistency with equals() question

I'd like to have a SortedSet of Collections (Sets themselves, in this case, but not necessarily in general), that sorts by Collection size. This seems to violate the proscription to have the Comparator be consistent with equals() - i.e., two collections may be unequal (by having different elements), but compare to the same value (because they have the same number of elements).
Notionally, I could also put into the Comparator ways to sort sets of equal sizes, but the use of the sort wouldn't take advantage of that, and there's not really a useful + intuitive way to compare the Collections of equal size (at least, in my particular case), so that seems like a waste.
Does this case of inconsistency seem like a problem?

SortedSet interface extends core Set and thus should conform to the contract outlined in Set specification.
The only possible way to achieve that is to have your element's equal() method behavior be consistent with your Comparator - the reason for that is that core Set operates based on equality whereas SortedSet operates based on comparison.
For example, add() method defined in core Set interface specifies that you can't add an element to the set if there already is an element whose equal() method would return true with this new element as argument. Well, SortedSet doesn't use equal(), it uses compareTo(). So if your compareTo() returns false your element WILL be added even if equals() were to return true, thus breaking the Set contract.
None of this is a practical problem per se, however. SortedSet behavior is always consistent, even if compare() vs equals() are not.

As ChssPly76 wrote in a comment, you can use hashCode to decide the compareTo call in the case where two Collections have the same size but are not equal. This works fine, except in the rare case where you have two Collections with the same size, are not equal, but have the same hashCode. Admittedly, the chances of that happening are pretty small, but it is conceivable. If you want to be really careful, instead of hashCode, use System.identityHashCode instead. This should give you a unique number for each Collection, and you shouldn't get collisions.
At the end of the day, this gives you the functionality of having the Collections in the Set sorted by size, with arbitrary ordering in the case of two Collections with matching size. If this is all you need, it's not much slower than the usual comparison would be. If you need the ordering to be consistent between different JVM instances, this won't work and you'll have to do it some other way.
pseudocode:
if (a.equals(b)) {
return 0;
} else if (a.size() > b.size()) {
return 1;
} else if (b.size() > a.size()) {
return -1;
} else {
return System.identityHashCode(a) > System.identityHashCode(b) ? 1 : -1;
}

This seems to violate the proscription
to have the Comparator be consistent
with equals() - i.e., two collections
may be unequal (by having different
elements), but compare to the same
value (because they have the same
number of elements).
There is no requirement, either stated (in the Javadoc) or implied, that a Comparator be consistent with an object's implementation of boolean equals(Object).
Note that Comparable and Comparator are distinct interfaces with different purposes. Comparable is used to define a 'natural' order for a class. In that context, it would be a bad idea for equals and compateTo to be inconsistent. By contrast, a Comparator is used when you want to use a different order to the natural order of a class.
EDIT: Here's the complete paragraph from the Javadoc for SortedSet.
Note that the ordering maintained by a
sorted set (whether or not an explicit
comparator is provided) must be
consistent with equals if the sorted
set is to correctly implement the Set
interface. (See the Comparable
interface or Comparator interface for
a precise definition of consistent
with equals.) This is so because the
Set interface is defined in terms of
the equals operation, but a sorted
set performs all element comparisons
using its compareTo (or compare)
method, so two elements that are
deemed equal by this method are, from
the standpoint of the sorted set,
equal. The behavior of a sorted set is
well-defined even if its ordering is
inconsistent with equals; it just
fails to obey the general contract of
the Set interface.
I have highlighted the final sentence. The point is that such a SortedSet will work as you would most likely expect, but the behavior of some operations won't exactly match the Set specification ... because the specification defines their behavior in terms of the equals method.
So in fact, there is a stated requirement for consistency (my mistake), but the consequences of ignoring it are not as bad as you might think. Of course, it is up to decide if you should do that. In my estimation, it should be OK, provided that you comment the code thoroughly and make sure that the SortedSet does not 'leak'.
However, it is not clear to me that a Comparator for collections that only looks at an collections "size" is going to work ... from a semantic perspective. I mean, do you really want to say that all collections with (say) 2 elements are equal? This will mean that your set can only ever contain one collection of any given size ...

There is no reason why a Comparator should return the same results as equals(). In fact, the Comparator API was introduced because equals() just isn't enough: If you want to sort a collection, you must know whether two elements are lesser or greater.

It's a little bit odd that SortedSet as a part of the standard API breaks the contract defined in the Set interface and uses the Comparator to define equality instead of the equals method, but that's how it is.
If your actual problem is to sort a collection of collections according to the containted collections' size, you are better of with a List, which you can sort using Collections.sort(List, Comparator>);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.