Comparing string in java [duplicate] - java

We know that hashCode() method of an object gives a hash-code based on the memory address of the instance of the object. So when we have two objects of a same class with same data it will still give different hash-code as they are stored in different memory location.
Now, when we create two string objects using new String("Some_Name") we will have two objects that are stored in different address. When we see the hashcode for these two objects we should get different hashcodes as they are stored in different memory location. But we end up getting same hashcode as result.
Employee empObject = new Employee("Some_Name");
Employee empObject1 =new Employee("Some_Name");
String stringObject= new String("Some_Name");
String stringObject1=new String("Some_Name");
//Output
System.out.println(empObject.hashCode()); //1252169911
System.out.println(empObject1.hashCode()); //2101973421
System.out.println(stringObject.hashCode()); //1418906358
System.out.println(stringObject1.hashCode()); //1418906358
Does this mean that String object has overridden hashCode() method from Object. If so in the overridden method it has to search for other String objects with same data in the heap and put a constant hashCode for all. Help me if my basics understanding itself is wrong.
Note: It is not about String literal in-fact it is about String Object as literals are stored in String Constant Pool and String Object is created outside the pool inside heap as Object.

This is what i found with a simple google search:
An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all A types into the compartment for A, and he looks for an A only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet.
The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement:
Objects that are equal must have the same hash code within a running process
Unequal objects must have different hash codes – WRONG!
Objects with the same hash code must be equal – WRONG!
The contract allows for unequal objects to share the same hash code, such as the A and µ objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes 2^32.
Hope it helps

Explanation
Your first paragraph describes the default behavior of hashCode. But usually classes override it and create a content-based solution (same as for equals). This especially applies to the String class.
Default hashCode
The default implementation is not done in Java but directly implemented in the JVM, it has a native keyword. You can always get hands on the original hashCode by using System#identityHashCode, see its documentation:
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero.
Note that the default implementation of hashCode is not necessarily based on the memory location. It often is related, but you can by no means rely on that (see How is hashCode() calculated in Java). Here is the documentation of Object#hashCode:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The relevant parts are the second and third requirement. It must behave the same as equals and hash-collisions are okay (but not optimal).
And Object#equals is typically used to create custom content-based comparisons (see documentation).
String hashCode
Now let us take a look at the implementation of String#hashCode. As said, the class overrides the method and implements a content-based solution. So the hash for "hello" will always be the same as for "hello". Even if you force new instances using the constructor:
// Will have the same hash
new String("hello").hashCode()
new String("hello").hashCode()
It works exactly as equals, which would output true here as well:
new String("hello").equals(new String("hello")) // true
as required by the contract of the hashCode method (see documentation).
Here is the implementation of the method (JDK 10):
/**
* Returns a hash code for this string. The hash code for a
* {#code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {#code int} arithmetic, where {#code s[i]} is the
* <i>i</i>th character of the string, {#code n} is the length of
* the string, and {#code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* #return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
Which just forwards to either StringLatin1 or StringUTF16, let us see what they have:
// StringLatin1
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
// StringUTF16
public static int hashCode(byte[] value) {
int h = 0;
int length = value.length >> 1;
for (int i = 0; i < length; i++) {
h = 31 * h + getChar(value, i);
}
return h;
}
As you see, both of them just do some simple math based on the individual characters in the string. So it is completely content-based and will thus obviously result in the same result for the same characters always.

Related

How hashCode() is generated for a String object in Java?

We know that hashCode() method of an object gives a hash-code based on the memory address of the instance of the object. So when we have two objects of a same class with same data it will still give different hash-code as they are stored in different memory location.
Now, when we create two string objects using new String("Some_Name") we will have two objects that are stored in different address. When we see the hashcode for these two objects we should get different hashcodes as they are stored in different memory location. But we end up getting same hashcode as result.
Employee empObject = new Employee("Some_Name");
Employee empObject1 =new Employee("Some_Name");
String stringObject= new String("Some_Name");
String stringObject1=new String("Some_Name");
//Output
System.out.println(empObject.hashCode()); //1252169911
System.out.println(empObject1.hashCode()); //2101973421
System.out.println(stringObject.hashCode()); //1418906358
System.out.println(stringObject1.hashCode()); //1418906358
Does this mean that String object has overridden hashCode() method from Object. If so in the overridden method it has to search for other String objects with same data in the heap and put a constant hashCode for all. Help me if my basics understanding itself is wrong.
Note: It is not about String literal in-fact it is about String Object as literals are stored in String Constant Pool and String Object is created outside the pool inside heap as Object.
This is what i found with a simple google search:
An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all A types into the compartment for A, and he looks for an A only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet.
The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement:
Objects that are equal must have the same hash code within a running process
Unequal objects must have different hash codes – WRONG!
Objects with the same hash code must be equal – WRONG!
The contract allows for unequal objects to share the same hash code, such as the A and µ objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes 2^32.
Hope it helps
Explanation
Your first paragraph describes the default behavior of hashCode. But usually classes override it and create a content-based solution (same as for equals). This especially applies to the String class.
Default hashCode
The default implementation is not done in Java but directly implemented in the JVM, it has a native keyword. You can always get hands on the original hashCode by using System#identityHashCode, see its documentation:
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero.
Note that the default implementation of hashCode is not necessarily based on the memory location. It often is related, but you can by no means rely on that (see How is hashCode() calculated in Java). Here is the documentation of Object#hashCode:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The relevant parts are the second and third requirement. It must behave the same as equals and hash-collisions are okay (but not optimal).
And Object#equals is typically used to create custom content-based comparisons (see documentation).
String hashCode
Now let us take a look at the implementation of String#hashCode. As said, the class overrides the method and implements a content-based solution. So the hash for "hello" will always be the same as for "hello". Even if you force new instances using the constructor:
// Will have the same hash
new String("hello").hashCode()
new String("hello").hashCode()
It works exactly as equals, which would output true here as well:
new String("hello").equals(new String("hello")) // true
as required by the contract of the hashCode method (see documentation).
Here is the implementation of the method (JDK 10):
/**
* Returns a hash code for this string. The hash code for a
* {#code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {#code int} arithmetic, where {#code s[i]} is the
* <i>i</i>th character of the string, {#code n} is the length of
* the string, and {#code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* #return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
Which just forwards to either StringLatin1 or StringUTF16, let us see what they have:
// StringLatin1
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
// StringUTF16
public static int hashCode(byte[] value) {
int h = 0;
int length = value.length >> 1;
for (int i = 0; i < length; i++) {
h = 31 * h + getChar(value, i);
}
return h;
}
As you see, both of them just do some simple math based on the individual characters in the string. So it is completely content-based and will thus obviously result in the same result for the same characters always.

HashCode for mutable object

Let's say I have an object.
It contains width and height, x and y coordinates, and x and y representing velocity.
I have overridden equals method and I compare by comparing width, height, x and y and velocity.
What should I do with hash code?
The reason why I am confused is that it is a moving object and I am not sure what I should be using in order to calculate hash code, values are going to be constantly changing and the only thing that will remain static is size really.
According to Object.hashCode() there is a clause that can help with your decision:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
Since your equals() compares width, height, x and y and velocity, your hashcode() would not return the same hash whenever these values change.
Sample hashcode() for you:
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + width;
hash = hash * 31 + height;
hash = hash * 13 + x;
hash = hash * 13 + y;
hash = hash * 13 + velocity.hashcode();
return hash;
}
You can go further by storing the hashcode in a private variable, and use a dirty flag to know to recalculate the hashcode of your object if any parameters change. Since you are not doing anything expensive within the hashcode() method, I don't think you would need to do that.
If you want your hash code code to compare equality then using the changing values will be just fine because the hash is calculated on the fly when you need it (or should be). So you have all these "moving" things and you want to know if they are equal (they have the same location or velocity or whatever) then it will be accurate.
If you want to use it in a hash table then don't override it and just use the default (The address in memory). Or if you want more control set a static counter and use that to create IDs for these objects.
Just make sure equals() and hashCode() are consistent.
That means, whatever fields are used to determine equality should be used to compute the hash code so that equal objects (with a certain state) will have the same hash code (for that same state).
But you may want to consider whether your equals() implementation is correct in the first place, or whether you even need to override the default implementation.
To be honest the only reason why I wanted to implement hash code is because java rules state that if equals is overriden then hash code should be too. However none of the objects will be used in a map.
In that case, you could implement it as follows:
public int hashCode() {
throw new UnsupportedOperationException("hashCode is not implemented");
}
This has the advantage that if you accidentally use the object as a hash key you will get an immediate exception rather than unexpected behavior if the object is mutated at the wrong time.
Typically, I would assign an ID to the object as a member variable and return the ID as the hashcode value.

How come this equals & hashCode override does NOT cause an exception or error?

When I compile and run the code below, I get the following results:
o1==o2 ? true
Hash codes: 0 | 0
o1==o2 ? true
Hash codes: 1 | 8
o1==o2 ? true
Hash codes: 7 | 3
o1==o2 ? true
Hash codes: 68 | 10
o1==o2 ? true
Hash codes: 5 | 4
From what I've read, if two objects are equal, their hashCodes must also be equal. So, how does this code not cause an exception or error?
import java.io.*;
import java.lang.*;
public class EqualsAndHashCode {
private int num1;
private int num2;
public EqualsAndHashCode(int num1, int num2) {
this.num1 = num1;
this.num2 = num2;
}
public static void main(String[] args) {
for (int x=0; x < 5; x++) {
EqualsAndHashCode o1 = new EqualsAndHashCode(x, x);
EqualsAndHashCode o2 = new EqualsAndHashCode(x, x);
System.out.println("o1==o2 ? " + o1.equals(o2));
System.out.println("Hash codes: " + o1.hashCode() + " | " + o2.hashCode());
}
}
public boolean equals(Object o) {
return (this.getNum1() == ((EqualsAndHashCode)o).getNum1()) && (this.getNum2() == ((EqualsAndHashCode)o).getNum2());
}
public int hashCode() {
return (int)(this.getNum1() / Math.random());
}
public int getNum1() { return num1; }
public int getNum2() { return num2; }
}
EDIT I
The premise behind my question was the wording surrounding the hashCode contract (http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode()):
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce
the same integer result.
I assumed that this rule would have been enforced by the JVM at compile or run time and I would have seen errors or exceptions right away when the contract was violated...
Because the JVM does not check or validate that the method contract holds true. They're just methods, and they can return whatever they want.
However, any code which depends upon them supporting the method contract might or will fail. You will not be able to use your EqualsAndHashCode objects in a HashMap, for example. That will throw exceptions or will not return correct values in most cases.
This is the same thing with compareTo() and TreeMaps - compareTo() can return any int that it wants, but if it doesn't return a consistent ordering as defined by the method contract in the Comparable interface, then your TreeMap will throw exceptions as soon as it detects inconsistencies.
So, how does this code not cause an exception or error?
Well, breaking the contract of equals and hashcode, never throws an exception or error. It's just that you see weird behaviour, when you use the objects of those class in hash based collections, like - HashSet, or HashMap.
For example, if in your case, you use your class objects as key in a HashMap, then you might not be able to find that key again, when you try to fetch it. Because, then even if your keys are equal, their hashcodes might be different. And HashMap saerch for keys first using their hashcodes, and then using equals.
if two objects are equal, their hashCodes must also be equal
Above is a recommendation and is not mandated by JVM
The idea behind this recommendation is to have less collisions when storing elements in a hashed collection such as HashMap.
A very good article on the need of hashcode, rules for equals and hashcode, etc:
http://www.ibm.com/developerworks/java/library/j-jtp05273/index.html
How could they possibly be the same given the fact that you are dividing by a Random number?
The typical approach is to use the hashCode values of the individual fields to build the hashCode of the object (if they aren't primitive, in this case they are). You also typically multiply by several prime numbers.
// adapted from Effective Java
public int hashCode() {
int p = 17, q = 37;
p = q * p + num1;
p = q * p + num2;
return p;
}
Use this for hasCose
public int hashCode() {
int result = num1;
result = 31 * result + num2;
return result;
}
Default implementation of hashcode() and equals() is inhertied from Object class by each of the class that you define. In order for your code to behave correctly, especially when it is used in data structures such as HashMap, it is important that you "should" override the default implementation that ensures that "If two instances of your class are equal, then they return same value when hashCode() method is called".
Definition of equality of two objects depends on domain concept their classes represent, and hence, only the author of the class is best suited to implement "equals" and "hashcode" methods. For example, two Employee objects are considered equal if they have same value for "employeeId" attribute. These two may be different instances, but in the realm of domain (say, Human Resources System), they are equal due to equality of their employee IDs. Now, the author of the Employee class should implement "equals" method that compares "employeeId" attributes and returns true if they are same. Similarly, the author should ensure that hashCode() of two Employee instances are same if their employee IDs are same.
And if you are worried about how to write hashCode that meets the above Java recommendation, then, you can generate the hashCode and equals using Eclipse.
Though it is only a recommendation "if two objects are equal, their hashCodes must also be equal", you should be aware of the fact that your code can start misbehaving if objects of your class are used in Set, Map, etc. if you don't create "equals" and "hashCode" methods that comply with this recommendation. Only time you would like to ignore this recommendation is when you are sure that your class will never be tested for equality. An example of such class can be a DAO class or a Service class which typically is instantiated and used as Singleton, and no one (in normal scenarios) compares two DAO or Service classes
The purpose of method contracts is in most cases to allow other code to assume that certain conditions will hold true. In particular, the purpose of the hashCode and equals contracts is to allow a collection to assume that if an object Foo has a particular hashcode (e.g. 24601), and a collection of objects Bar is known not to contain any objects with that hashcode, one may infer from that information that Bar doesn't contain Foo. As a bonus, if a collection of objects contains a variety of hashcodes including that of Foo, and if one has precalculated the hashcodes of all the objects in the collection, one may check the hashcode of each object against that of Bar before looking at the object itself. Comparing two objects' already-computed hash values will be fast, no matter how complicated the objects are.
For all this to work, it is imperative that an object which will report itself equal to another object must always have reported the same hash code as that other object. Because it is always possible to obey this rule, there is seldom any good reason to disobey it. Even if the only immutable characteristic of an object which would be used in determining equality is its type, it's still possible to obey the rule by having all objects of that type return the same hash value. Having objects which will always be different report different hash values may improve performance by several orders of magnitude, but given a choice between behavior which is slow but correct, and behavior which is fast but wrong, the former should generally be preferred.

Java: overriding equals method doesn't do the trick when looking for a key of hashtable?

I have a hashtable looking like this:
Hashtable<Mapping, Integer> mappingCount = new Hashtable<Mapping, Integer>();
I want to use this code:
if (mappingCount.get(currentMapping) != null)
mappingCount.put(currentMapping, mappingCount.get(currentMapping) + 1);
else
mappingCount.put(currentMapping, 1);
In order to be able to get the value from the hashtable, for the class Mapping I did the following:
#Override
public boolean equals(Object obj) {
return ((Mapping)obj).mappingXML.equals(this.mappingXML);
}
However, this doesn't do the trick since mappingCount.get(currentMapping) always results in null. To be sure that something's not wrong, I did the following:
if (aaa.contains(currentMapping.getMappingXML()))
System.out.println("found it!");
else
aaa.add(currentMapping.getMappingXML());
where aaa is List<String> aaa = new ArrayList<String>(). Of course, found it is printed many times. What am I doing wrong?
You also need to override the hashCode() method.
From the JavaDocs:
To successfully store and retrieve
objects from a hashtable, the objects
used as keys must implement the
hashCode method and the equals method.
The reason for this is that Hashtable uses hashCode as a preliminary test to see if two objects are equals. If the hashCode matches, then it uses equals to check for collissions.
The default implementation of hashCode() returns the memory address of the object, and for two objects that are equal, their hashcodes must also be equal.
Also look at the general contract for hashCode().
All of the recommendations to override equals and hash code correctly are spot on; Joshua Bloch tells you how to do it properly.
But an equally important requirement is that keys in maps must be immutable. If your class can change its values, then the equals and hash code can change after you add it to the map; disaster ensues.
Whenever you override equals, you must override hashCode as well.
You need to override hashCode as well.
From the Object#hashCode doc:
Returns a hash code value for the
object. This method is supported for
the benefit of hashtables such as
those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the
hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then
calling the hashCode method on each of
the two objects must produce distinct
integer results. However, the
programmer should be aware that
producing distinct integer results for
unequal objects may improve the
performance of hashtables.
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
You have to implement hashcode() as well!
Example:
public class Employee{
int employeeId;
String name;
Department dept;
// other methods would be in here
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + employeeId;
hash = hash * 31 + name.hashCode();
hash = hash * 13 + (dept == null ? 0 : dept.hashCode());
return hash;
}
}

How to ensure hashCode() is consistent with equals()?

When overriding the equals() function of java.lang.Object, the javadocs suggest that,
it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)
How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?
public class People{
public String name;
public int age;
public int hashCode(){
// How to get a unique integer based on name and age?
}
}
/*******************************/
public class App{
public static void main( String args[] ){
People mike = new People();
People melissa = new People();
mike.name = "mike";
mike.age = 23;
melissa.name = "melissa";
melissa.age = 24;
System.out.println( mike.hasCode() ); // output?
System.out.println( melissa.hashCode(); // output?
}
}
It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}
I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like
public boolean equals(Object obj) {
if (this==obj) return true;
if (obj==null) return false;
if (!(getClass().equals(obj.getClass())) return false;
Person other = (Person)obj;
return (name==null ? other.name==null : name.equals(other.name)) &&
age==other.age;
}
Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age;
}
Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}
Also, it's perfectly valid for two non-equals objects to have the same hash code:
public int hashCode() {
return age;
}
In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.
Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.
A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.
Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.
You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.
The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.
One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.
Edit based on comments:
1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).
2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.
In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.
Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.
In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.
I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).
For straightforward attributes, some IDEs have hashcode function builders.
If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder
The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.
One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.
This is what documentation tells us as for hash code method
# javadoc
Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.

Categories

Resources