Related
This question already has answers here:
How to implement the Java comparable interface?
(9 answers)
Closed 6 years ago.
I have a token class that uses object identity (as in equals just returns tokenA == tokenB). I'd like to use it in a TreeSet. This means that I need to implement a comparison between two tokens that is compatible with reference equality.I don't care about the specific implementation, so long as it is consistent with equals and fulfills the contract (as per TreeSet: "Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.")
Note: these tokens are created on multiple threads, and may be compared on different threads than they were created on.
What would be the best method to go about doing so?
Ideas I've tried:
Using System.identityHashCode - the problem with this is that it is not guaranteed that two different objects will always have a different hashcode. And due to the birthday paradox you only need about 77k tokens before two will collide (assuming that System.identityHashCode is uniformly distributed over the entire 32-bit range, which may not be true...)
Using a comparator over the default Object.toString method for each token. Unfortunately, under the hood this just uses the hash code (same thing as above).
Using an int or long unique value (read: static counter -> instance variable). This bloats the size, and makes multithreading a pain (not to mention making object creation effectively singlethreaded) (AtomicInteger / AtomicLong for the static counter helps somewhat, but its the size bloat that's more annoying here).
Using System.identityHashCode and a static disambiguation map for any collisions. This works, but is rather complex. Also, Java by default doesn't have a ConcurrentWeakValueHashMultiMap (isn't that a mouthful), which means that I've got to pull in an external dependency (or write my own - probably using something similar to this) to do so, or suffer a (slow) memory leak, or use finalizers (ugh). (And I don't know if anyone implements such a thing...)
By the way, I can't simply punt the problem and assume unique objects have unique hash codes. That's what I was doing, but the assertion fired in the comparator, and so I dug into it, and, lo and behold, on my machine the following:
import java.util.*;
import java.util.Collections.*;
import java.lang.*;
public class size {
public static void main(String[] args) {
Map<Integer, Integer> soFar = new HashMap<>();
for (int i = 1; i <= 1_000_000; i++) {
TokenA t = new TokenA();
int ihc = System.identityHashCode(t);
if (soFar.containsKey(ihc)) {
System.out.println("Collision: " + ihc +" # object #" + soFar.get(ihc) + " & " + i);
break;
}
soFar.put(ihc, i);
}
}
}
class TokenA {
}
prints
Collision: 2134400190 # object #62355 & 105842
So collisions definitely do exist.
So, any suggestions?
There is no magic:
Here is the problem tokenA == tokenB compares identity, tokenA.equals(tokenB) compares whatever is defined in .equals() for that class regardless of identity.
So two objects can have .equals() return true and not be the same object instance, they don't even have to the the same type or share a super type.
There is no short cuts:
Implementing compareTo() is whatever you want to compare that are attributes of the objects. You just have to write the code and make it do what you want, but compareTo() is probably not what you want. compareTo() is for comparisons, if you two things are not < or > each other in some meaningful way then Comparable and Comparator<T> are not what you want.
Equals that is identity is simple:
public boolean equals(Object o)
{
return this == o;
}
I took a look at the IntelliJ default hashCode() implementation and was wondering, why they implemented it the way they did. I'm quite new to the hash concept and found some contradictory statements, that need clarification:
public int hashCode(){
// creationDate is of type Date
int result = this.creationDate != null ? this.creationDate.hashCode() : 0;
// id is of type Long (wrapper class)
result = 31 * result + (this.id != null ? this.id.hashCode() : 0);
// code is of type String
result = 31 * result + (this.code != null ? this.code.hashCode() : 0);
// revision is of type int
result = 31 * result + this.revision;
return result;
}
Imo, the best source about this topic seemed to be this Java world article because I found their arguments most convincing. So I was wondering:
Among other arguments, above source states that multiplication is one of the slower operations. So, wouldn't it be better to skip the multiplication with a prime number whenever I call the hashCode() method of a reference type? Because most of the time this already includes such a multiplication.
Java world states that bitwise XOR ^ also improves the computation due to not mentioned reasons : ( What exactly might be an advantage in comparison to regular addition?
Wouldn't it be better to return different values when the respective class field is null? It would make the result more distinguishable, wouldn't it? Are there any huge disadvantages to use non-zero values?
Their example code looks more appealing to my eye, tbh:
public boolean hashCode() {
return
(name == null ? 17 : name.hashCode()) ^
(birth == null ? 31 : name.hashCode());
}
But I'm not sure if that's objectively true. I'm also a little bit suspicious of IntelliJ because their default code for equals(Object) compares by instanceof instead of comparing the instance classes directly. And I agree with that Java world article that this doesn't seem to fulfill the contract correctly.
As for hashCode(), I would consider it more important to minimize collisions (two different objects having same hashCode()) than the speed of the hashCode() computation. Yes, the hashCode() should be fast (constant-time if possible), but for huge data structures using hashCode() (maps, sets etc.) the collisions are more important factor.
If your hashCode() function performs in constant time (independent on data and input size) and produces a good hashing function (few collisions), asymptotically the operations (get, contains, put) on map will perform in constant time.
If your hashCode() function produces a lot of collisions, the performance will suffer. In extreme case, you can always return 0 from hashCode() - the function itself will be super-fast, but the map operations will perform in linear time (i.e. growing with map size).
Multiplying the hashCode() before adding another field's sub-hashCode should usually provide for less collisions - this is a heuristic based on that often the fields contain similar data / small numbers.
Consider an example of class Person:
class Person {
int age;
int heightCm;
int weightKg;
}
If you just added the numbers together to compute the hashCode, the result would be somewhere between 60 and 500 for all persons. If you multiply it the way Idea does, you will get hashCodes between 2000 and more than 100000 - much bigger space and therefore lower chance of collisions.
Using XOR is not a very good idea, for example if you have class Rectangle with fields height and width, all squares would have the same hashCode - 0.
As for equals() using instanceof vs. getClass().equals(), I've never seen a conclusive debate on this. Both have their advantages and disadvantages, and both ways can cause troubles if you're not careful:
If you use instanceof, any subclass that overrides your equals() will likely break the symmetry requirement
If you use getClass().equals(), this will not work well with some frameworks like Hibernate that produce their own subclasses of your classes to store their own technical information
I was reading Effective Java Item 9 and decided to run the example code by myself. But it works slightly different depending on how I insert a new object that I don't understand what exactly is going on inside. The PhoneNumber class looks:
public class PhoneNumber {
private final short areaCode;
private final short prefix;
private final short lineNumber;
public PhoneNumber(int areaCode, int prefix, int lineNumber) {
this.areaCode = (short)areaCode;
this.prefix = (short) prefix;
this.lineNumber = (short)lineNumber;
}
#Override public boolean equals(Object o) {
if(o == this) return true;
if(!(o instanceof PhoneNumber)) return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.lineNumber == lineNumber && pn.prefix == prefix && pn.areaCode == areaCode;
}
}
Then according to the book and as is when I tried,
public static void main(String[] args) {
HashMap<PhoneNumber, String> phoneBook = new HashMap<PhoneNumber, String>();
phoneBook.put(new PhoneNumber(707,867,5309), "Jenny");
System.out.println(phoneBook.get(new PhoneNumber(707,867,5309)));
}
This prints "null" and it's explained in the book because HashMap has an optimization that caches the hash code associated with each entry and doesn't check for object equality if the hash codes don't match. It makes sense to me. But when I do this:
public static void main(String[] args) {
PhoneNumber p1 = new PhoneNumber(707,867,5309);
phoneBook.put(p1, "Jenny");
System.out.println(phoneBook.get(new PhoneNumber(707,867,5309)));
}
Now it returns "Jenny". Can you explain why it didn't fail in the second case?
The experienced behaviour might depend on the Java version and vendor that was used to run the application, because since the general contract of Object.hashcode() is violated, the result is implementation dependent.
A possible explanation (taking one possible implementation of HashMap):
The HashMap class in its internal implementation puts objects (keys) in different buckets based on their hashcode. When you query an element or you check if a key is contained in the map, first the proper bucket is looked for based on the hashcode of the queried key. Inside the bucket objects are checked in a sequencial way, and inside a bucket only the equals() method is used to compare elements.
So if you do not override Object.hashcode() it will be indeterministic if 2 different objects produce default hashcodes which may or may not determine the same bucket. If by any chance they "point" to the same bucket, you will still be able to find the key if the equals() method says they are equal. If by any chance they "point" to 2 different buckets, you will not find the key even if equals() method says they are equal.
hashcode() must be overriden to be consistent with your overridden equals() method. Only in this case it is guaranteed the proper, expected and consistent working of HashMap.
Read the javadoc of Object.hashcode() for the contract that you must not violate. The main point is that if equals() returns true for another object, the hashcode() method must return the same value for both of these objects.
Can you explain why it didn't fail in the second case?
In a nutshell, it is not guaranteed to fail. The two objects in the second example could end up having the same hash code (purely by coincidence or, more likely, due to compiler optimizations or due to how the default hashCode() works in your JVM). This would lead to the behaviour you describe.
For what it's worth, I cannot reproduce this behaviour with my compiler/JVM.
In your case by coincidence JVM was able to find the same hashCode for both object. When I ran your code, in my JVM it gave null for both the case. So your problem is because of JVM not the code.
It is better to override hashCode() each and every time when you override equils() method.
I haven't read Effective Java, I read SCJP by Kathy Sierra. So if you need more details then you can read this book. It's nice.
Your last code snipped does not compile because you haven't declared phoneBook.
Both main methods should work exactly the same. There is a 1 in 16 chance that it will print Jenny because a newly crated HashMap has a default size of 16. In detail that means that only the lower 4 bits of the hashCode will be checked. If they equal the equal method is used.
Eclipse source menu has a "generate hashCode / equals method" which generates functions like the one below.
String name;
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
CompanyRole other = (CompanyRole) obj;
if (name == null)
{
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
If I select multiple fields when generating hashCode() and equals() Eclipse uses the same pattern shown above.
I am not an expert on hash functions and I would like to know how "good" the generated hash function is? What are situations where it will break down and cause too many collisions?
You can see the implementation of hashCode function in java.util.ArrayList as
public int hashCode() {
int hashCode = 1;
Iterator<E> i = iterator();
while (i.hasNext()) {
E obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
It is one such example and your Eclipse generated code follows a similar way of implementing it. But if you feel that you have to implement your hashCode by your own, there are some good guidelines given by Joshua Bloch in his famous book Effective Java. I will post those important points from Item 9 of that book. Those are,
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte, char, short, or int, compute (int) f.
iii. If the field is a long, compute (int) (f ^ (f >>> 32)).
iv. If the field is a float, compute Float.floatToIntBits(f).
v. If the field is a double, compute Double.doubleToLongBits(f), and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional)
vii. If the field is an array, treat it as if each element were a separate field.
That is, compute a hash code for each significant element by applying
these rules recursively, and combine these values per step 2.b. If every
element in an array field is significant, you can use one of the
Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result.
When you are finished writing the hashCode method, ask yourself whether
equal instances have equal hash codes. Write unit tests to verify your intuition!
If equal instances have unequal hash codes, figure out why and fix the problem.
Java language designers and Eclipse seem to follow similar guidelines I suppose. Happy coding. Cheers.
Since Java 7 you can use java.util.Objects to write short and elegant methods:
class Foo {
private String name;
private String id;
#Override
public int hashCode() {
return Objects.hash(name,id);
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Foo) {
Foo right = (Foo) obj;
return Objects.equals(name,right.name) && Objects.equals(id,right.id);
}
return false;
}
}
Generally it is good, but:
Guava does it somehow better, I prefer it. [EDIT: It seems that as of JDK7 Java provides a similar hash function].
Some frameworks can cause problems when accessing fields directly instead of using setters/getters, like Hibernate for example. For some fields that Hibernate creates lazy, it creates a proxy not the real object. Only calling the getter will make Hibernate go for the real value in the database.
Yes, it is perfect :) You will see this approach almost everywhere in the Java source code.
It's a standard way of writing hash functions. However, you can improve/simplify it if you have some knowledge about the fields. E.g. you can ommit the null check, if your class guarantees that the field never be null (applies to equals() as well). Or you can of delegate the field's hash code if only one field is used.
I would also like to add a reference to Item 9, in Effective Java 2nd Edition by Joshua Bloch.
Here is a recipe from Item 9 : ALWAYS OVERRIDE HASHCODE WHEN YOU OVERRIDE EQUALS
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte, char, short, or int, compute (int) f.
iii. If the field is a long,compute(int)(f^(f>>>32)).
iv. If the field is a float, compute Float.floatToIntBits(f).
v. If the field is a double, compute Double.doubleToLongBits(f), and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows: result = 31 * result + c;
3. Return result.
4. When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition! If equal instances have unequal hash codes, figure out why and fix the problem.
If you are using Apache Software Foundation (commons-lang library) then
below classes will help you to generate hashcode/equals/toString methods using reflection.
You don't need to worry about regenerating hashcode/equals/toString methods when you add/remove instance variables.
EqualsBuilder - This class provides methods to build a good equals method for any class. It follows rules laid out in Effective Java , by Joshua Bloch. In particular the rule for comparing doubles, floats, and arrays can be tricky. Also, making sure that equals() and hashCode() are consistent can be difficult.
HashCodeBuilder - This class enables a good hashCode method to be built for any class. It follows the rules laid out in the book Effective Java by Joshua Bloch. Writing a good hashCode method is actually quite difficult. This class aims to simplify the process.
ReflectionToStringBuilder - This class uses reflection to determine the fields to append. Because these fields are usually private, the class uses AccessibleObject.setAccessible(java.lang.reflect.AccessibleObject[], boolean) to change the visibility of the fields. This will fail under a security manager, unless the appropriate permissions are set up correctly.
Maven Dependency:
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>${commons.lang.version}</version>
</dependency>
Sample Code:
import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;
import org.apache.commons.lang.builder.ReflectionToStringBuilder;
public class Test{
instance variables...
....
getter/setter methods...
....
#Override
public String toString() {
return ReflectionToStringBuilder.toString(this);
}
#Override
public int hashCode() {
return HashCodeBuilder.reflectionHashCode(this);
}
#Override
public boolean equals(Object obj) {
return EqualsBuilder.reflectionEquals(this, obj);
}
}
One potential drawback is that all objects with null fields will have a hash code of 31, thus there could be many potential collisions between objects that only contain null fields. This would make for slower lookups in Maps.
This can occur when you have a Map whose key type has multiple subclasses. For example, if you had a HashMap<Object, Object>, you could have many key values whose hash code was 31. Admittedly, this won't occur that often. If you like, you could randomly change the values of the prime to something besides 31, and lessen the probability of collisions.
i want to do nested sorting . I have a course object which has a set of applications .Applications have attributes like time and priority. Now i want to sort them according to the priority first and within priority i want to sort them by time.
For example, given this class (public fields only for brevity):
public class Job {
public int prio;
public int timeElapsed;
}
you might implement sorting by time using the static sort(List, Comparator) method in the java.util.Collections class. Here, an anonymous inner class is created to implemented the Comparator for "Job". This is sometimes referred to as an alternative to function pointers (since Java does not have those).
public void sortByTime() {
AbstractList<Job> list = new ArrayList<Job>();
//add some items
Collections.sort(list, new Comparator<Job>() {
public int compare(Job j1, Job j2) {
return j1.timeElapsed - j2.timeElapsed;
}
});
}
Mind the contract model of the compare() method: http://java.sun.com/javase/6/docs/api/java/util/Comparator.html#compare(T,%20T)
Take a look at the Google Collections Ordering class at http://google-collections.googlecode.com/svn/trunk/javadoc/index.html?com/google/common/collect/Ordering.html. It should have everything you need plus more. In particular you should take a look at the compound method to get your second ordering.
To sort on multiple criteria, a couple of common approches using the Comparable interface:
write your compareTo() method so that it compares one field, and then goes on to compare the other if it can't return an ordering based on the first;
if you're careful then again, in your compareTo() method, you can translate a combination of both criteria into a single integer that you can then compare.
The first of these approaches is usually preferable and more likely to be correct (even though the code ends up looking a bit more cumbersome).
See the example on my web site of making Java objects sortable, which shows an example of sorting playing cards by suit and number within the suits.
You've already asked this question elsewhere. Write an implementation of java.util.Comparator.
Subtracting the two numbers as in the example above is not always a good idea.
Consider what would happen if you compared -2,147,483,644 with 2,147,483,645. Subtracting them would cause an integer overflow and thus a positive number. A positive number means would cause the comparator to claim that -2,147,483,644 is larger than 2,147,483,645.
-5 - 6 = -7
-2,147,483,644 - 2,147,483,645 = 1
Subtracting to find the compare value is even more dangerous when you consider comparing longs or doubles, since that have to be cast back to ints providing another opportunity for an overflow. For example, never do this:
class ZardozComparorator implements Comparator<Zardoz>{
public int compare(Zardoz z1, Zardoz z2) {
Long z1long = Long.getLong(z1.getName());
Long z2long = Long.getLong(z2.getName());
return (int)(z1long-z2long);
}
}
Instead use the compare method of the object you are comparing. That way you can avoid overflows and if needed you can override the compare method.
class ZardozComparorator implements Comparator<Zardoz>{
public int compare(Zardoz z1, Zardoz z2) {
Long z1long = Long.getLong(z1.getName());
Long z2long = Long.getLong(z2.getName());
return z1long.compareTo(z2long);
}
}
Here is my opinion for this 7 year old question that still gets reported sometimes:
Make a static method in your object like (only if you use other libraries to autogenerate getters and setters):
public static String getNameFrom(Order order){
return order.name;
}
Then try to use something like this:
Collections.sort(orders, Comparator.comparing(Order::getNameFrom));
For a more elegand approach I always prefer not change the entity, but use more advanced coding with Lambdas. For example:
Collections.sort(orders, (order1, order2) ->
order1.name.compareTo(order2.name);