How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?
The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a int result and assign a non-zero value.
For every field f tested in the equals() method, calculate a hash code c by:
If the field f is a boolean:
calculate (f ? 0 : 1);
If the field f is a byte, char, short or int: calculate (int)f;
If the field f is a long: calculate (int)(f ^ (f >>> 32));
If the field f is a float: calculate Float.floatToIntBits(f);
If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
Combine the hash value c with result:
result = 37 * result + c
Return result
This should result in a proper distribution of hash values for most use situations.
If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:
#Override
public int hashCode() {
return Objects.hash(this.firstName, this.lastName);
}
This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.
Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.
#Override
public int hashCode() {
// Start with a non-zero constant. Prime is preferred
int result = 17;
// Include a hash for each field.
// Primatives
result = 31 * result + (booleanField ? 1 : 0); // 1 bit » 32-bit
result = 31 * result + byteField; // 8 bits » 32-bit
result = 31 * result + charField; // 16 bits » 32-bit
result = 31 * result + shortField; // 16 bits » 32-bit
result = 31 * result + intField; // 32 bits » 32-bit
result = 31 * result + (int)(longField ^ (longField >>> 32)); // 64 bits » 32-bit
result = 31 * result + Float.floatToIntBits(floatField); // 32 bits » 32-bit
long doubleFieldBits = Double.doubleToLongBits(doubleField); // 64 bits (double) » 64-bit (long) » 32-bit (int)
result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));
// Objects
result = 31 * result + Arrays.hashCode(arrayField); // var bits » 32-bit
result = 31 * result + referenceField.hashCode(); // var bits » 32-bit (non-nullable)
result = 31 * result + // var bits » 32-bit (nullable)
(nullableReferenceField == null
? 0
: nullableReferenceField.hashCode());
return result;
}
EDIT
Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...
#Override
public boolean equals(Object o) {
// Optimization (not required).
if (this == o) {
return true;
}
// Return false if the other object has the wrong type, interface, or is null.
if (!(o instanceof MyType)) {
return false;
}
MyType lhs = (MyType) o; // lhs means "left hand side"
// Primitive fields
return booleanField == lhs.booleanField
&& byteField == lhs.byteField
&& charField == lhs.charField
&& shortField == lhs.shortField
&& intField == lhs.intField
&& longField == lhs.longField
&& floatField == lhs.floatField
&& doubleField == lhs.doubleField
// Arrays
&& Arrays.equals(arrayField, lhs.arrayField)
// Objects
&& referenceField.equals(lhs.referenceField)
&& (nullableReferenceField == null
? lhs.nullableReferenceField == null
: nullableReferenceField.equals(lhs.nullableReferenceField));
}
It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.
First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:
Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
Reflexivity: For all non-null references, a.equals(a)
Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
Then make sure that their relation with hashCode respects the contact (from the same article):
Consistency with hashCode(): Two equal objects must have the same hashCode() value
Finally a good hash function should strive to approach the ideal hash function.
about8.blogspot.com, you said
if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values
I cannot agree with you. If two objects have the same hashcode it doesn't have to mean that they are equal.
If A equals B then A.hashcode must be equal to B.hascode
but
if A.hashcode equals B.hascode it does not mean that A must equals B
If you use eclipse, you can generate equals() and hashCode() using:
Source -> Generate hashCode() and equals().
Using this function you can decide which fields you want to use for equality and hash code calculation, and Eclipse generates the corresponding methods.
There's a good implementation of the Effective Java's hashcode() and equals() logic in Apache Commons Lang. Checkout HashCodeBuilder and EqualsBuilder.
Just a quick note for completing other more detailed answer (in term of code):
If I consider the question how-do-i-create-a-hash-table-in-java and especially the jGuru FAQ entry, I believe some other criteria upon which a hash code could be judged are:
synchronization (does the algo support concurrent access or not) ?
fail safe iteration (does the algo detect a collection which changes during iteration)
null value (does the hash code support null value in the collection)
If I understand your question correctly, you have a custom collection class (i.e. a new class that extends from the Collection interface) and you want to implement the hashCode() method.
If your collection class extends AbstractList, then you don't have to worry about it, there is already an implementation of equals() and hashCode() that works by iterating through all the objects and adding their hashCodes() together.
public int hashCode() {
int hashCode = 1;
Iterator i = iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
Now if what you want is the best way to calculate the hash code for a specific class, I normally use the ^ (bitwise exclusive or) operator to process all fields that I use in the equals method:
public int hashCode(){
return intMember ^ (stringField != null ? stringField.hashCode() : 0);
}
#about8 : there is a pretty serious bug there.
Zam obj1 = new Zam("foo", "bar", "baz");
Zam obj2 = new Zam("fo", "obar", "baz");
same hashcode
you probably want something like
public int hashCode() {
return (getFoo().hashCode() + getBar().hashCode()).toString().hashCode();
(can you get hashCode directly from int in Java these days? I think it does some autocasting.. if that's the case, skip the toString, it's ugly.)
As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...
Use the reflection methods on Apache Commons EqualsBuilder and HashCodeBuilder.
I use a tiny wrapper around Arrays.deepHashCode(...) because it handles arrays supplied as parameters correctly
public static int hash(final Object... objects) {
return Arrays.deepHashCode(objects);
}
any hashing method that evenly distributes the hash value over the possible range is a good implementation. See effective java ( http://books.google.com.au/books?id=ZZOiqZQIbRMC&dq=effective+java&pg=PP1&ots=UZMZ2siN25&sig=kR0n73DHJOn-D77qGj0wOxAxiZw&hl=en&sa=X&oi=book_result&resnum=1&ct=result ) , there is a good tip in there for hashcode implementation (item 9 i think...).
I prefer using utility methods fromm Google Collections lib from class Objects that helps me to keep my code clean. Very often equals and hashcode methods are made from IDE's template, so their are not clean to read.
Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.
I have not include any equals() implementation but in reality you will of course need it.
import java.util.Objects;
public class Demo {
public static class A {
private final String param1;
public A(final String param1) {
this.param1 = param1;
}
#Override
public int hashCode() {
return Objects.hash(
super.hashCode(),
this.param1);
}
}
public static class B extends A {
private final String param2;
private final String param3;
public B(
final String param1,
final String param2,
final String param3) {
super(param1);
this.param2 = param2;
this.param3 = param3;
}
#Override
public final int hashCode() {
return Objects.hash(
super.hashCode(),
this.param2,
this.param3);
}
}
public static void main(String [] args) {
A a = new A("A");
B b = new B("A", "B", "C");
System.out.println("A: " + a.hashCode());
System.out.println("B: " + b.hashCode());
}
}
The standard implementation is weak and using it leads to unnecessary collisions. Imagine a
class ListPair {
List<Integer> first;
List<Integer> second;
ListPair(List<Integer> first, List<Integer> second) {
this.first = first;
this.second = second;
}
public int hashCode() {
return Objects.hashCode(first, second);
}
...
}
Now,
new ListPair(List.of(a), List.of(b, c))
and
new ListPair(List.of(b), List.of(a, c))
have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.
There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.
Using a prime is pointless as primes have no meaning in the ring Z/(2**32).
So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.
Using different multipliers in different places is helpful, but probably not enough to justify the additional work.
When combining hash values, I usually use the combining method that's used in the boost c++ library, namely:
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
This does a fairly good job of ensuring an even distribution. For some discussion of how this formula works, see the StackOverflow post: Magic number in boost::hash_combine
There's a good discussion of different hash functions at: http://burtleburtle.net/bob/hash/doobs.html
For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.
public class Zam {
private String foo;
private String bar;
private String somethingElse;
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
Zam otherObj = (Zam)obj;
if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
return true;
}
}
return false;
}
public int hashCode() {
return (getFoo() + getBar()).hashCode();
}
public String getFoo() {
return foo;
}
public String getBar() {
return bar;
}
}
The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.
Related
I have made a class called Coordinates which simply holds some x and y integers. I want to use this as a key for a HashMap.
However, I noticed that when you create two different instances of Coordinates with the same x and y values, they are used as different keys by the hash map. That is, you can put two entries even though both of them have the same coordinates.
I have overriden equals():
public boolean equals(Object obj) {
if (!(obj instanceof Coord)) {
return false;
}else if (obj == this) {
return true;
}
Coord other = (Coord)obj;
return (x == other.x && y == other.y);
}
But the HashMap still uses the two instances as if they were different keys. What do I do?
And I know I could use an integer array of two elements instead. But I want to use this class.
You need to override hashCode. Java 7 provides a utility method for this.
#Override
public int hashCode() {
return Objects.hash(x, y);
}
You should also override hashCode() so that two equal instances have the same hashCode(). E.g.:
#Override
public int hashCode() {
int result = x;
result = 31 * result + y;
return result;
}
Note that it is not strictly required for two instances that are not equal to have different hash codes, but the less collisions you have, the better performance you'll get from you HashMap.
A hash map uses the hashCode method of objects to determine which bucket to put the object into.
If your object doesn't implement hashCode, it inherits the default implementation from Object. From the docs:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
As such, each object will appear to be distinct.
Note that different objects may return the same hashCode.
That's called a collision.
When that happens,
then in addition to the hashCode,
the hash map implementation will use the equals method to determine if two objects are equal.
Note that most IDE offer to generate the equals and hashCode methods from the fields defined in your class. In fact, IntelliJ encourages to define these two methods at the same time. For good reason. These two methods are intimately related,
and whenever you change one of them, or implement one of them, or override one of them,
you must review (and most probably change) the other one too.
The methods in this class are 100% generated code (by IntelliJ):
class Coord {
private int x;
private int y;
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Coord coord = (Coord) o;
if (x != coord.x) return false;
if (y != coord.y) return false;
return true;
}
#Override
public int hashCode() {
int result = x;
result = 31 * result + y;
return result;
}
}
You probably did not override the hashCode method. Why is that required ? To answer this, you must understand how an hashtable works.
An hashtable is basically an array of linkedlists. Each bucket in the array corresponds to a particular value of hashCode % numberOfBuckets. All the objects with the same hashCode % numberOfBuckets will be stored within a linkedlist in the associated bucket and will be recognized (during the lookup for instance) basing on their equals method. Therefore, the exact specification is a.hashCode() != b.hashCode() => !a.equals(b) which is equivalent to a.equals(b) => a.hashCode() == b.hashCode().
If you use the default implementation of hashCode, which is based on the reference, then two objects that are equal but have a different reference (and so, most probably, a different hashCode) will be stored in a different bucket, resulting in a duplicate key.
In chapter 3, item 8:
public final class CaseInsensitiveString {
private final String s;
public CaseInsensitiveString(String s) {
if (s == null)
throw new NullPointerException();
this.s = s;
}
#Override public boolean equals(Object o) {
return o instanceof CaseInsensitiveString &&
((CaseInsensitiveString) o).s.equalsIgnoreCase(s);
}
// remainder omitted
}
After describing issues surrounding the equals() method, he goes on to talk about this class in the context of comparing fields.
For some classes, such as CaseInsensitiveString above, field comparisons are more complex than simple equality tests. If this is the case, you may want to store a canonical form of the field, so the equals() method can do cheap exact comparisons on these canonical forms rather than more costly inexact comparisons. This technique is most appropriate for immutable classes; if the object can change, you must keep the canonical form up-to-date.
So my question (and I double-checked what 'canonical' means): what is Bloch talking about? What would the canonical form be? I'm ready to be told that the answer is very simple (presumably otherwise his editor would have told him to add more) but I want to see other people say so.
He also mentions the same thing for hashCode() in the next item 9.
To give it in context, he also discusses a bad version of the equals() method for CaseInsensitiveString:
// Broken - violates symmetry
#Override public boolean equals(Object o) {
if (o instanceof CaseInsensitiveString)
return s.equalsIgnoreCase(
((CaseInsensitiveString) o).s);
if (o instanceof String) // one-way interoperability!
return s.equalsIgnoreCase((String) o);
return false;
}
You should add another final field and store value s.toUpperCase() for it.
This new field will be canonical representation s field. New implementation of method equals() (see code bellow) will be cheaper. This approach will work only for immutable classes.
Another point you should not forget override hashCode() if you override equals().
public final class CaseInsensitiveString {
private final String s;
private final String sForEquals; //field added for simplifier equals method
public CaseInsensitiveString(String s) {
if (s == null) {
throw new IllegalArgumentException(); //NullPointerException() - bad practice
}
this.s = s;
this.sForEquals = s.toUpperCase();
}
#Override
public boolean equals(Object o) {
return o instanceof CaseInsensitiveString &&
((CaseInsensitiveString) o).sForEquals.equals(this.sForEquals);
}
#Override
public int hashCode(){
return sForEquals.hashCode();
}
// remainder omitted
}
The term canonical has some different usages. It refers to values that have several representations (or maybe several varying values that are equal). Then often one specific representation (or value) is chosen as canonical one.
Example: Sets of integers: canonical { 2, 3, 5 } = { 3, 5, 2 } = { 2, 2, 5, 3 } = .... .
For the plain java String there is as issue too. The same text in Unicode can be represented differently: ĉ either as one code point "\u0109"SMALL-LETTER-C-WITH-CIRCUMFLEX, or as two code points c SMALL-LETTER-C and a zero-width ^ COMBINED-DIACRITICAL-MARK-CIRCUMFLEX ("\u0063\u0302").
So even a plain String should be canonicalized in some cases:
String s = "...";
String s1 = Normalizer.normalize(s, Normalizer.Form.NFKD);
This uses Normalizer to decompose a string. This has the advantage, that one could sort and "c" and "ĉ" stay together. One could remove the combining diacritical marks with a regex and would have an ASCII version.
In fact different operating systems handle Unicode names differently, and not always version control systems respect a cross-platform canonicalisation.
Only after a Normalizer.normalize a comparison with String.equals indeed indicates Unicode text equality.
Your question had two parts:
Canonical form means "standardised form - in this case a lowercase version of the field, used for comparison. Every time the value changes, the lowercase copy would have to be updated, so there's an overhead to this design choice. Further, this idea is an optimization for performance only, and frankly is not recommended as it's "premature optimisation"
Non symmetry of equals allows code such that a.equals(b) but not b.equals(a), thus violating the equals contract. In your example, it's possible for a String to be equal to an instance of your class, because its equals() method allows that, but the implementation of equals() in the String class does not allow for an instance of your class to be considered as equal to a String.
Maybe there is a reason I don't know, but I see it's being used in my code to calculate hashcode of a complex object.
Does it provide anything comparing to putting Integer itself there? (I hope not), or it's just for a better clarity?
class SomeClass() {
private Integer myIntegerField1;
private Integer myIntegerField2;
...
public int hashCode() {
final int prime = 31;
int result =1;
result = prime * result + ((myIntegerField1 == null) ? 0 : myIntegerField1.hashCode());
result = prime * result + ....
...
return result;
}
}
The javadoc of Integer.hashCode() says:
Returns: a hash code value for this object, equal to the
primitive int value represented by this
Integer object.
So using Integer.hashCode() or Integer.intValue(), or using auto-unboxing leads to exactly the same value.
Your posted code was auto-generated by an IDE. The code generator has no special cases to handle Integer or other primitive type wrappers, and there isn't a really good reason for it to have one: the way it is implemented now is 100% by the book and on a general level of consideration is the right thing to do.
If you replaced myIntegerField1.hashCode() with just myIntegerField1, the real effect would be a change from a hashCode() call to an intValue() call, and if you check out the source code, you'll find that these two methods are exactly the same.
Composite objects can use combined hashes of their internal state to calculate their own hash code. Example:
public class Person
{
private Integer id;
private String name;
#Override
public int hashCode()
{
int hash = getClass().getName().hashCode();
if (id != null)
{
hash ^= id.hashCode();
}
if (name != null)
{
hash ^= name.hashCode();
}
return hash;
}
}
Don't make hashes overly complicated, and base hashes only on some values which don't change, or are otherwise likely to be stable. Hash codes, by their very nature, are not required to be unique or collision-free.
The hash code is just a quick and dirty finger print that allows for a quick determination whether two instances are NOT equal (if they were equal, they would have to have the same hash code), so the actual equals() check has to be executed only for instances whose hash is equals (again, same hash does NOT imply that they are equal).
There is no reason to explicitly use the hashcode of an Integer. The source code just returns the value of the Integer:
public int hashCode(){
return value;
}
So use the value of the Integer rather than the hash code.
What is the reason why this method is included in the source? What would happen if you had an Object that points to an Integer? Explicitly including the method in the source code ensures proper results.
Here you are trying to find the hashcode of SomeClass type objects.
public int hashCode() {
final int prime = 31;
int result =1;
result = prime * result + ((myIntegerField1 == null) ? 0 : myIntegerField1.hashCode());
result = prime * result + ....
...
return result;
}
In
result = prime * result + ((myIntegerField1 == null) ? 0 : myIntegerField1.hashCode());
you are trying to check if myIntegerField1==null, return hashCode as 0 else hashCode of Integer myIntegerField1.
Remember : myIntegerField1.hashCode() and myIntegerField1.intValue() will return same value as myIntegerField1.
How do you come up with a hash function for a generic object? There is the constraint that two objects need to have the same hash value if they are "equal" as defined by the user. How does Java accomplish this?
I just found the answer to my own question. The way Java does it is that it defines a hashCode for every object and by default the hashCode for two objects are the same iff the two objects are the same in memory. So when the client of the hashtable overrides the equals() method for an object, he should also override the method that computes hashcode such that if a.equals(b) is true, then a.hashCode() must also equal b.hashCode(). This way, it is assured that equal objects have the same hashcode.
First, basically you define the hash function of a class by overriding the hashCode() method. The Javadoc states:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
So the more important question is: What makes two of your objects equal? Or vice versa: What properties make your objects unique? If you have an answer to that, create an equals() method that compares all of the properties and returns true if they're all the same and false otherwise.
The hashCode() method is a bit more involved, I would suggest that you do not create it yourself but let your IDE do it. In Eclipse, you can select Source and then Generate hashCode() and equals() from the menu. This also guarantees that the requirements from above hold.
Here is a small (and simplified) example where the two methods have been generated using Eclipse. Notice that I chose not to include the city property since the zipCode already uniquely identifies the city within a country.
public class Address {
private String streetAndNumber;
private String zipCode;
private String city;
private String country;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((country == null) ? 0 : country.hashCode());
result = prime * result
+ ((streetAndNumber == null) ? 0 : streetAndNumber.hashCode());
result = prime * result + ((zipCode == null) ? 0 : zipCode.hashCode());
return result;
}
#Override
public boolean equals(final Object obj) {
if(this == obj)
return true;
if(obj == null)
return false;
if(!(obj instanceof Address))
return false;
final Address other = (Address) obj;
if(country == null) {
if(other.country != null)
return false;
}
else if(!country.equals(other.country))
return false;
if(streetAndNumber == null) {
if(other.streetAndNumber != null)
return false;
}
else if(!streetAndNumber.equals(other.streetAndNumber))
return false;
if(zipCode == null) {
if(other.zipCode != null)
return false;
}
else if(!zipCode.equals(other.zipCode))
return false;
return true;
}
}
Java doesn't do that. If the hashCode() and equals() are not explicitly implemented, JVM will generate different hashCodes for meaningfully equal instances. You can check Effective Java by Joshua Bloch. It's really helpful.
Several options:
read Effective Java, by Joshua Bloch. It contains a good algorithm for hash codes
let your IDE generate the hashCode method
Java SE 7 and greater: use Objects.hash
The class java.lang.Object cheats. It defines equality (as is determined by equals) as being object identity (as can be determined by ==). So, unless you override equals in your subclass, two instances of your class are "equal", if they happen to be the same object.
The associated hash code for this is implemented by the system function System.identityHashCode (which is no longer really based on object addresses -- was it ever? -- but can be thought of as being implemented this way).
If you override equals, then this implementation of hashCode no longer makes sense.
Consider the following example:
class Identifier {
private final int lower;
private final int upper;
public boolean equals(Object any) {
if (any == this) return true;
else if (!(any instanceof Identifier)) return false;
else {
final Identifier id = (Identifier)any;
return lower == id.lower && upper == id.upper;
}
}
}
Two instances of this class are considered equal, if their "lower" and "upper" members have the same values. Since equality is now determined by object members, we need to define hashCode in a compatible way.
public int hashCode() {
return lower * 31 + upper; // possible implementation, maybe not too sophisticated though
}
As you can see, we use the same fields in hashCode which we also use when we determine equality. It is generally a good idea to base the hash code on all members, which are also considered when comparing for equality.
Consider this example instead:
class EmailAddress {
private final String mailbox;
private final String displayName;
public boolean equals(Object any) {
if (any == this) return true;
else if (!(any instanceof EmailAddress)) return false;
else {
final EmailAddress id = (EmailAddress)any;
return mailbox.equals(id.mailbox);
}
}
}
Since here, equality is only determined by the mailbox member, the hash code should also only be based on that member:
public int hashCode() {
return mailbox.hashCode();
}
Hashing of an object is established by overriding hashCode() method, which the developer can override.
Java uses prime numbers in the default hashcode calculation.
If the equals() and hashCode() method aren't implemented, the JVM will generate hashcode implicitly for the object (for Serializable classes, a serialVersionUID is generated).
For a class whose fields are solely primitive, ex.:
class Foo
{
int a;
String b;
boolean c;
long d;
boolean equals(Object o)
{
if (this == o) return true;
if (!(o instanceof Foo)) return false;
Foo other = (Foo) o;
return a == other.a && b.equals(other.b) && c == other.c && d = other.d;
}
}
Is this a reasonably "good enough" way to write hashCode()?
boolean hashCode()
{
return (b + a + c + d).hashCode();
}
That is, I construct a String out of the same fields that equals() uses, and then just use String#hashCode().
Edit: I've updated my question to include a long field. How should a long be handled in hashCode()? Just let it overflow int?
Your hash code does satisfy the property that if two objects are equal, then their hash codes need to be equal. So, in that way it is 'good enough'. However, it is fairly simple to create collisions in the hash codes which will degrade the performance of hash based data structures.
I would implement it slightly differently though:
public int hashCode() {
return a * 13 + b.hashCode() * 23 + (c? 31: 7);
}
You should check out the documentation for the hashCode() method of Object. It lays out the things that the hash code must satisfy.
It totally depends on what your data will look like. Under most circumstances, this would be a good approach. If you'll often have b end with a number, then you'll get some duplicate codes for unequal objects, as JacobM's answer shows. If you know ahead of time that b will pretty much never have a number value at the end, then this is a reasonable hashing algorithm.