HashSet Collisions in Java - java

I have a program for my Java class where I want to use hashSets to compare a directory of text documents. Essentially, my plan is to create a hashSet of strings for each paper, and then add two of the papers hashSets together into one hashSet and find the number of same 6-word sequences.
My question is, do I have to manually check for, and handle, collisions, or does Java do that for me?

Java Hash Maps/Sets Automatically handle Hash collisions, this is why it is important to override both the equals and the hashCode methods. As both of them are utilised by Sets to differentiate duplicate or unique entries.
It is also important to note that these hash collisions hava a performance impace since multiple objects are referenced by the same Hash.
public class MyObject {
private String name;
//getter and setters
public int hashCode() {
int hashCode = //Do some object specifc stuff to gen hashCode
return int;
}
public boolean equals(Object obj) {
if(this==obj) return true;
if(obj instanceOf MyObject) {
if(this.name.equals((MyObject)obj.getName())) {
return true;
}
return false;
}
}
}
Note: Standard Java Objects such as String have already implemented hashCode and equals so you only have to do that for your own kind of Data Objects.

I think you did not ask for hash collisions, right? The question is what happens when HashSet a and HashSet b are added into a single set e.g. by a.addAll(b).
The answer is a will contain all elements and no duplicates. In case of Strings this means you can count the number of equal String from the sets with a.size() before add - a.size() after add + b.size().
It does not even matter if some of the Strings have the same hash code but are not equal.

Related

How do I create a Set with a Tuple value which will read the points in the Tuple [duplicate]

Recently I read through this
Developer Works Document.
The document is all about defining hashCode() and equals() effectively and correctly, however I am not able to figure out why we need to override these two methods.
How can I take the decision to implement these methods efficiently?
Joshua Bloch says on Effective Java
You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
Let's try to understand it with an example of what would happen if we override equals() without overriding hashCode() and attempt to use a Map.
Say we have a class like this and that two objects of MyClass are equal if their importantField is equal (with hashCode() and equals() generated by eclipse)
public class MyClass {
private final String importantField;
private final String anotherField;
public MyClass(final String equalField, final String anotherField) {
this.importantField = equalField;
this.anotherField = anotherField;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((importantField == null) ? 0 : importantField.hashCode());
return result;
}
#Override
public boolean equals(final Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final MyClass other = (MyClass) obj;
if (importantField == null) {
if (other.importantField != null)
return false;
} else if (!importantField.equals(other.importantField))
return false;
return true;
}
}
Imagine you have this
MyClass first = new MyClass("a","first");
MyClass second = new MyClass("a","second");
Override only equals
If only equals is overriden, then when you call myMap.put(first,someValue) first will hash to some bucket and when you call myMap.put(second,someOtherValue) it will hash to some other bucket (as they have a different hashCode). So, although they are equal, as they don't hash to the same bucket, the map can't realize it and both of them stay in the map.
Although it is not necessary to override equals() if we override hashCode(), let's see what would happen in this particular case where we know that two objects of MyClass are equal if their importantField is equal but we do not override equals().
Override only hashCode
If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(second,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to the business requirement).
But the problem is that equals was not redefined, so when the map hashes second and iterates through the bucket looking if there is an object k such that second.equals(k) is true it won't find any as second.equals(first) will be false.
Hope it was clear
Collections such as HashMap and HashSet use a hashcode value of an object to determine how it should be stored inside a collection, and the hashcode is used again in order to locate the object
in its collection.
Hashing retrieval is a two-step process:
Find the right bucket (using hashCode())
Search the bucket for the right element (using equals() )
Here is a small example on why we should overrride equals() and hashcode().
Consider an Employee class which has two fields: age and name.
public class Employee {
String name;
int age;
public Employee(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
#Override
public boolean equals(Object obj) {
if (obj == this)
return true;
if (!(obj instanceof Employee))
return false;
Employee employee = (Employee) obj;
return employee.getAge() == this.getAge()
&& employee.getName() == this.getName();
}
// commented
/* #Override
public int hashCode() {
int result=17;
result=31*result+age;
result=31*result+(name!=null ? name.hashCode():0);
return result;
}
*/
}
Now create a class, insert Employee object into a HashSet and test whether that object is present or not.
public class ClientTest {
public static void main(String[] args) {
Employee employee = new Employee("rajeev", 24);
Employee employee1 = new Employee("rajeev", 25);
Employee employee2 = new Employee("rajeev", 24);
HashSet<Employee> employees = new HashSet<Employee>();
employees.add(employee);
System.out.println(employees.contains(employee2));
System.out.println("employee.hashCode(): " + employee.hashCode()
+ " employee2.hashCode():" + employee2.hashCode());
}
}
It will print the following:
false
employee.hashCode(): 321755204 employee2.hashCode():375890482
Now uncomment hashcode() method , execute the same and the output would be:
true
employee.hashCode(): -938387308 employee2.hashCode():-938387308
Now can you see why if two objects are considered equal, their hashcodes must
also be equal? Otherwise, you'd never be able to find the object since the default
hashcode method in class Object virtually always comes up with a unique number
for each object, even if the equals() method is overridden in such a way that two
or more objects are considered equal. It doesn't matter how equal the objects are if
their hashcodes don't reflect that. So one more time: If two objects are equal, their
hashcodes must be equal as well.
You must override hashCode() in every
class that overrides equals(). Failure
to do so will result in a violation of
the general contract for
Object.hashCode(), which will prevent
your class from functioning properly
in conjunction with all hash-based
collections, including HashMap,
HashSet, and Hashtable.
   from Effective Java, by Joshua Bloch
By defining equals() and hashCode() consistently, you can improve the usability of your classes as keys in hash-based collections. As the API doc for hashCode explains: "This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable."
The best answer to your question about how to implement these methods efficiently is suggesting you to read Chapter 3 of Effective Java.
Why we override equals() method
In Java we can not overload how operators like ==, +=, -+ behave. They are behaving a certain way. So let's focus on the operator == for our case here.
How operator == works.
It checks if 2 references that we compare point to the same instance in memory. Operator == will resolve to true only if those 2 references represent the same instance in memory.
So now let's consider the following example
public class Person {
private Integer age;
private String name;
..getters, setters, constructors
}
So let's say that in your program you have built 2 Person objects on different places and you wish to compare them.
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1 == person2 ); --> will print false!
Those 2 objects from business perspective look the same right? For JVM they are not the same. Since they are both created with new keyword those instances are located in different segments in memory. Therefore the operator == will return false
But if we can not override the == operator how can we say to JVM that we want those 2 objects to be treated as same. There comes the .equals() method in play.
You can override equals() to check if some objects have same values for specific fields to be considered equal.
You can select which fields you want to be compared. If we say that 2 Person objects will be the same if and only if they have the same age and same name, then the IDE will create something like the following for automatic generation of equals()
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age &&
name.equals(person.name);
}
Let's go back to our previous example
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1 == person2 ); --> will print false!
System.out.println ( person1.equals(person2) ); --> will print true!
So we can not overload == operator to compare objects the way we want but Java gave us another way, the equals() method, which we can override as we want.
Keep in mind however, if we don't provide our custom version of .equals() (aka override) in our class then the predefined .equals() from Object class and == operator will behave exactly the same.
Default equals() method which is inherited from Object will check whether both compared instances are the same in memory!
Why we override hashCode() method
Some Data Structures in java like HashSet, HashMap store their elements based on a hash function which is applied on those elements. The hashing function is the hashCode()
If we have a choice of overriding .equals() method then we must also have a choice of overriding hashCode() method. There is a reason for that.
Default implementation of hashCode() which is inherited from Object considers all objects in memory unique!
Let's get back to those hash data structures. There is a rule for those data structures.
HashSet can not contain duplicate values and HashMap can not contain duplicate keys
HashSet is implemented with a HashMap behind the scenes where each value of a HashSet is stored as a key in a HashMap.
So we have to understand how a HashMap works.
In a simple way a HashMap is a native array that has some buckets. Each bucket has a linkedList. In that linkedList our keys are stored. HashMap locates the correct linkedList for each key by applying hashCode() method and after that it iterates through all elements of that linkedList and applies equals() method on each of these elements to check if that element is already contained there. No duplicate keys are allowed.
When we put something inside a HashMap, the key is stored in one of those linkedLists. In which linkedList that key will be stored is shown by the result of hashCode() method on that key. So if key1.hashCode() has as a result 4, then that key1 will be stored on the 4th bucket of the array, in the linkedList that exists there.
By default hashCode() method returns a different result for each different instance. If we have the default equals() which behaves like == which considers all instances in memory as different objects we don't have any problem.
But in our previous example we said we want Person instances to be considered equal if their ages and names match.
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1.equals(person2) ); --> will print true!
Now let's create a map to store those instances as keys with some string as pair value
Map<Person, String> map = new HashMap();
map.put(person1, "1");
map.put(person2, "2");
In Person class we have not overridden the hashCode method but we have overridden equals method. Since the default hashCode provides different results for different java instances person1.hashCode() and person2.hashCode() have big chances of having different results.
Our map might end with those persons in different linkedLists.
This is against the logic of a HashMap
A HashMap is not allowed to have multiple equal keys!
But ours now has and the reason is that the default hashCode() which was inherited from Object Class was not enough. Not after we have overridden the equals() method on Person Class.
That is the reason why we must override hashCode() method after we have overridden equals method.
Now let's fix that. Let's override our hashCode() method to consider the same fields that equals() considers, namely age, name
public class Person {
private Integer age;
private String name;
..getters, setters, constructors
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age &&
name.equals(person.name);
}
#Override
public int hashCode() {
return Objects.hash(name, age);
}
}
Now let's try again to save those keys in our HashMap
Map<Person, String> map = new HashMap();
map.put(person1, "1");
map.put(person2, "2");
person1.hashCode() and person2.hashCode() will definitely be the same. Let's say it is 0.
HashMap will go to bucket 0 and in that LinkedList will save the person1 as key with the value "1". For the second put HashMap is intelligent enough and when it goes again to bucket 0 to save person2 key with value "2" it will see that another equal key already exists there. So it will overwrite the previous key. So in the end only person2 key will exist in our HashMap.
Now we are aligned with the rule of Hash Map that says no multiple equal keys are allowed!
Identity is not equality.
equals operator == test identity.
equals(Object obj) method compares equality test(i.e. we need to tell equality by overriding the method)
Why do I need to override the equals and hashCode methods in Java?
First we have to understand the use of equals method.
In order to identity differences between two objects we need to override equals method.
For example:
Customer customer1=new Customer("peter");
Customer customer2=customer1;
customer1.equals(customer2); // returns true by JVM. i.e. both are refering same Object
------------------------------
Customer customer1=new Customer("peter");
Customer customer2=new Customer("peter");
customer1.equals(customer2); //return false by JVM i.e. we have two different peter customers.
------------------------------
Now I have overriden Customer class equals method as follows:
#Override
public boolean equals(Object obj) {
if (this == obj) // it checks references
return true;
if (obj == null) // checks null
return false;
if (getClass() != obj.getClass()) // both object are instances of same class or not
return false;
Customer other = (Customer) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name)) // it again using bulit in String object equals to identify the difference
return false;
return true;
}
Customer customer1=new Customer("peter");
Customer customer2=new Customer("peter");
Insteady identify the Object equality by JVM, we can do it by overring equals method.
customer1.equals(customer2); // returns true by our own logic
Now hashCode method can understand easily.
hashCode produces integer in order to store object in data structures like HashMap, HashSet.
Assume we have override equals method of Customer as above,
customer1.equals(customer2); // returns true by our own logic
While working with data structure when we store object in buckets(bucket is a fancy name for folder). If we use built-in hash technique, for above two customers it generates two different hashcode. So we are storing the same identical object in two different places. To avoid this kind of issues we should override the hashCode method also based on the following principles.
un-equal instances may have same hashcode.
equal instances should return same hashcode.
Simply put, the equals-method in Object check for reference equality, where as two instances of your class could still be semantically equal when the properties are equal. This is for instance important when putting your objects into a container that utilizes equals and hashcode, like HashMap and Set. Let's say we have a class like:
public class Foo {
String id;
String whatevs;
Foo(String id, String whatevs) {
this.id = id;
this.whatevs = whatevs;
}
}
We create two instances with the same id:
Foo a = new Foo("id", "something");
Foo b = new Foo("id", "something else");
Without overriding equals we are getting:
a.equals(b) is false because they are two different instances
a.equals(a) is true since it's the same instance
b.equals(b) is true since it's the same instance
Correct? Well maybe, if this is what you want. But let's say we want objects with the same id to be the same object, regardless if it's two different instances. We override the equals (and hashcode):
public class Foo {
String id;
String whatevs;
Foo(String id, String whatevs) {
this.id = id;
this.whatevs = whatevs;
}
#Override
public boolean equals(Object other) {
if (other instanceof Foo) {
return ((Foo)other).id.equals(this.id);
}
}
#Override
public int hashCode() {
return this.id.hashCode();
}
}
As for implementing equals and hashcode I can recommend using Guava's helper methods
Let me explain the concept in very simple words.
Firstly from a broader perspective we have collections, and hashmap is one of the datastructure in the collections.
To understand why we have to override the both equals and hashcode method, if need to first understand what is hashmap and what is does.
A hashmap is a datastructure which stores key value pairs of data in array fashion. Lets say a[], where each element in 'a' is a key value pair.
Also each index in the above array can be linked list thereby having more than one values at one index.
Now why is a hashmap used?
If we have to search among a large array then searching through each if them will not be efficient, so what hash technique tells us that lets pre process the array with some logic and group the elements based on that logic i.e. Hashing
EG: we have array 1,2,3,4,5,6,7,8,9,10,11 and we apply a hash function mod 10 so 1,11 will be grouped in together. So if we had to search for 11 in previous array then we would have to iterate the complete array but when we group it we limit our scope of iteration thereby improving speed. That datastructure used to store all the above information can be thought of as a 2d array for simplicity
Now apart from the above hashmap also tells that it wont add any Duplicates in it. And this is the main reason why we have to override the equals and hashcode
So when its said that explain the internal working of hashmap , we need to find what methods the hashmap has and how does it follow the above rules which i explained above
so the hashmap has method called as put(K,V) , and according to hashmap it should follow the above rules of efficiently distributing the array and not adding any duplicates
so what put does is that it will first generate the hashcode for the given key to decide which index the value should go in.if nothing is present at that index then the new value will be added over there, if something is already present over there then the new value should be added after the end of the linked list at that index. but remember no duplicates should be added as per the desired behavior of the hashmap. so lets say you have two Integer objects aa=11,bb=11.
As every object derived from the object class, the default implementation for comparing two objects is that it compares the reference and not values inside the object. So in the above case both though semantically equal will fail the equality test, and possibility that two objects which same hashcode and same values will exists thereby creating duplicates. If we override then we could avoid adding duplicates.
You could also refer to Detail working
import java.util.HashMap;
public class Employee {
String name;
String mobile;
public Employee(String name,String mobile) {
this.name = name;
this.mobile = mobile;
}
#Override
public int hashCode() {
System.out.println("calling hascode method of Employee");
String str = this.name;
int sum = 0;
for (int i = 0; i < str.length(); i++) {
sum = sum + str.charAt(i);
}
return sum;
}
#Override
public boolean equals(Object obj) {
// TODO Auto-generated method stub
System.out.println("calling equals method of Employee");
Employee emp = (Employee) obj;
if (this.mobile.equalsIgnoreCase(emp.mobile)) {
System.out.println("returning true");
return true;
} else {
System.out.println("returning false");
return false;
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
Employee emp = new Employee("abc", "hhh");
Employee emp2 = new Employee("abc", "hhh");
HashMap<Employee, Employee> h = new HashMap<>();
//for (int i = 0; i < 5; i++) {
h.put(emp, emp);
h.put(emp2, emp2);
//}
System.out.println("----------------");
System.out.println("size of hashmap: "+h.size());
}
}
hashCode() :
If you only override the hash-code method nothing happens, because it always returns a new hashCode for each object as an Object class.
equals() :
If you only override the equals method, if a.equals(b) is true it means the hashCode of a and b must be the same but that does not happen since you did not override the hashCode method.
Note : hashCode() method of Object class always returns a new hashCode for each object.
So when you need to use your object in the hashing based collection, you must override both equals() and hashCode().
Java puts a rule that
"If two objects are equal using Object class equals method, then the hashcode method should give the same value for these two objects."
So, if in our class we override equals() we should override hashcode() method also to follow this rule.
Both methods, equals() and hashcode(), are used in Hashtable, for example, to store values as key-value pairs. If we override one and not the other, there is a possibility that the Hashtable may not work as we want, if we use such object as a key.
Adding to #Lombo 's answer
When will you need to override equals() ?
The default implementation of Object's equals() is
public boolean equals(Object obj) {
return (this == obj);
}
which means two objects will be considered equal only if they have the same memory address which will be true only if you are
comparing an object with itself.
But you might want to consider two objects the same if they have the same value for one
or more of their properties (Refer the example given in #Lombo 's answer).
So you will override equals() in these situations and you would give your own conditions for equality.
I have successfully implemented equals() and it is working great.So why are they asking to override hashCode() as well?
Well.As long as you don't use "Hash" based Collections on your user-defined class,it is fine.
But some time in the future you might want to use HashMap or HashSet and if you don't override and "correctly implement" hashCode(), these Hash based collection won't work as intended.
Override only equals (Addition to #Lombo 's answer)
myMap.put(first,someValue)
myMap.contains(second); --> But it should be the same since the key are the same.But returns false!!! How?
First of all,HashMap checks if the hashCode of second is the same as first.
Only if the values are the same,it will proceed to check the equality in the same bucket.
But here the hashCode is different for these 2 objects (because they have different memory address-from default implementation).
Hence it will not even care to check for equality.
If you have a breakpoint inside your overridden equals() method,it wouldn't step in if they have different hashCodes.
contains() checks hashCode() and only if they are the same it would call your equals() method.
Why can't we make the HashMap check for equality in all the buckets? So there is no necessity for me to override hashCode() !!
Then you are missing the point of Hash based Collections.
Consider the following :
Your hashCode() implementation : intObject%9.
The following are the keys stored in the form of buckets.
Bucket 1 : 1,10,19,... (in thousands)
Bucket 2 : 2,20,29...
Bucket 3 : 3,21,30,...
...
Say,you want to know if the map contains the key 10.
Would you want to search all the buckets? or Would you want to search only one bucket?
Based on the hashCode,you would identify that if 10 is present,it must be present in Bucket 1.
So only Bucket 1 will be searched !!
Because if you do not override them you will be use the default implentation in Object.
Given that instance equality and hascode values generally require knowledge of what makes up an object they generally will need to be redefined in your class to have any tangible meaning.
In order to use our own class objects as keys in collections like HashMap, Hashtable etc.. , we should override both methods ( hashCode() and equals() ) by having an awareness on internal working of collection. Otherwise, it leads to wrong results which we are not expected.
class A {
int i;
// Hashing Algorithm
if even number return 0 else return 1
// Equals Algorithm,
if i = this.i return true else false
}
put('key','value') will calculate the hash value using hashCode() to determine the
bucket and uses equals() method to find whether the value is already
present in the Bucket. If not it will added else it will be replaced with current value
get('key') will use hashCode() to find the Entry (bucket) first and
equals() to find the value in Entry
if Both are overridden,
Map<A>
Map.Entry 1 --> 1,3,5,...
Map.Entry 2 --> 2,4,6,...
if equals is not overridden
Map<A>
Map.Entry 1 --> 1,3,5,...,1,3,5,... // Duplicate values as equals not overridden
Map.Entry 2 --> 2,4,6,...,2,4,..
If hashCode is not overridden
Map<A>
Map.Entry 1 --> 1
Map.Entry 2 --> 2
Map.Entry 3 --> 3
Map.Entry 4 --> 1
Map.Entry 5 --> 2
Map.Entry 6 --> 3 // Same values are Stored in different hasCodes violates Contract 1
So on...
HashCode Equal Contract
Two keys equal according to equal method should generate same hashCode
Two Keys generating same hashCode need not be equal (In above example all even numbers generate same hash Code)
1) The common mistake is shown in the example below.
public class Car {
private String color;
public Car(String color) {
this.color = color;
}
public boolean equals(Object obj) {
if(obj==null) return false;
if (!(obj instanceof Car))
return false;
if (obj == this)
return true;
return this.color.equals(((Car) obj).color);
}
public static void main(String[] args) {
Car a1 = new Car("green");
Car a2 = new Car("red");
//hashMap stores Car type and its quantity
HashMap<Car, Integer> m = new HashMap<Car, Integer>();
m.put(a1, 10);
m.put(a2, 20);
System.out.println(m.get(new Car("green")));
}
}
the green Car is not found
2. Problem caused by hashCode()
The problem is caused by the un-overridden method hashCode(). The contract between equals() and hashCode() is:
If two objects are equal, then they must have the same hash code.
If two objects have the same hash code, they may or may not be equal.
public int hashCode(){
return this.color.hashCode();
}
It is useful when using Value Objects. The following is an excerpt from the Portland Pattern Repository:
Examples of value objects are things
like numbers, dates, monies and
strings. Usually, they are small
objects which are used quite widely.
Their identity is based on their state
rather than on their object identity.
This way, you can have multiple copies
of the same conceptual value object.
So I can have multiple copies of an
object that represents the date 16 Jan
1998. Any of these copies will be equal to each other. For a small
object such as this, it is often
easier to create new ones and move
them around rather than rely on a
single object to represent the date.
A value object should always override
.equals() in Java (or = in Smalltalk).
(Remember to override .hashCode() as
well.)
Assume you have class (A) that aggregates two other (B) (C), and you need to store instances of (A) inside hashtable. Default implementation only allows distinguishing of instances, but not by (B) and (C). So two instances of A could be equal, but default wouldn't allow you to compare them in correct way.
Consider collection of balls in a bucket all in black color. Your Job is to color those balls as follows and use it for appropriate game,
For Tennis - Yellow, Red.
For Cricket - White
Now bucket has balls in three colors Yellow, Red and White. And that now you did the coloring Only you know which color is for which game.
Coloring the balls - Hashing.
Choosing the ball for game - Equals.
If you did the coloring and some one chooses the ball for either cricket or tennis they wont mind the color!!!
I was looking into the explanation " If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(first,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to our definition)." :
I think 2nd time when we are adding in myMap then it should be the 'second' object like myMap.put(second,someOtherValue)
The methods equals and hashcode are defined in the object class. By default if the equals method returns true, then the system will go further and check the value of the hash code. If the hash code of the 2 objects is also same only then the objects will be considered as same. So if you override only equals method, then even though the overridden equals method indicates 2 objects to be equal , the system defined hashcode may not indicate that the 2 objects are equal. So we need to override hash code as well.
Equals and Hashcode methods in Java
They are methods of java.lang.Object class which is the super class of all the classes (custom classes as well and others defined in java API).
Implementation:
public boolean equals(Object obj)
public int hashCode()
public boolean equals(Object obj)
This method simply checks if two object references x and y refer to the same object. i.e. It checks if x == y.
It is reflexive: for any reference value x, x.equals(x) should return true.
It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
For any non-null reference value x, x.equals(null) should return
false.
public int hashCode()
This method returns the hash code value for the object on which this method is invoked. This method returns the hash code value as an integer and is supported for the benefit of hashing based collection classes such as Hashtable, HashMap, HashSet etc. This method must be overridden in every class that overrides the equals method.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Equal objects must produce the same hash code as long as they are
equal, however unequal objects need not produce distinct hash codes.
Resources:
JavaRanch
Picture
If you override equals() and not hashcode(), you will not find any problem unless you or someone else uses that class type in a hashed collection like HashSet.
People before me have clearly explained the documented theory multiple times, I am just here to provide a very simple example.
Consider a class whose equals() need to mean something customized :-
public class Rishav {
private String rshv;
public Rishav(String rshv) {
this.rshv = rshv;
}
/**
* #return the rshv
*/
public String getRshv() {
return rshv;
}
/**
* #param rshv the rshv to set
*/
public void setRshv(String rshv) {
this.rshv = rshv;
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Rishav) {
obj = (Rishav) obj;
if (this.rshv.equals(((Rishav) obj).getRshv())) {
return true;
} else {
return false;
}
} else {
return false;
}
}
#Override
public int hashCode() {
return rshv.hashCode();
}
}
Now consider this main class :-
import java.util.HashSet;
import java.util.Set;
public class TestRishav {
public static void main(String[] args) {
Rishav rA = new Rishav("rishav");
Rishav rB = new Rishav("rishav");
System.out.println(rA.equals(rB));
System.out.println("-----------------------------------");
Set<Rishav> hashed = new HashSet<>();
hashed.add(rA);
System.out.println(hashed.contains(rB));
System.out.println("-----------------------------------");
hashed.add(rB);
System.out.println(hashed.size());
}
}
This will yield the following output :-
true
-----------------------------------
true
-----------------------------------
1
I am happy with the results. But if I have not overridden hashCode(), it will cause nightmare as objects of Rishav with same member content will no longer be treated as unique as the hashCode will be different, as generated by default behavior, here's the would be output :-
true
-----------------------------------
false
-----------------------------------
2
In the example below, if you comment out the override for equals or hashcode in the Person class, this code will fail to look up Tom's order. Using the default implementation of hashcode can cause failures in hashtable lookups.
What I have below is a simplified code that pulls up people's order by Person. Person is being used as a key in the hashtable.
public class Person {
String name;
int age;
String socialSecurityNumber;
public Person(String name, int age, String socialSecurityNumber) {
this.name = name;
this.age = age;
this.socialSecurityNumber = socialSecurityNumber;
}
#Override
public boolean equals(Object p) {
//Person is same if social security number is same
if ((p instanceof Person) && this.socialSecurityNumber.equals(((Person) p).socialSecurityNumber)) {
return true;
} else {
return false;
}
}
#Override
public int hashCode() { //I am using a hashing function in String.java instead of writing my own.
return socialSecurityNumber.hashCode();
}
}
public class Order {
String[] items;
public void insertOrder(String[] items)
{
this.items=items;
}
}
import java.util.Hashtable;
public class Main {
public static void main(String[] args) {
Person p1=new Person("Tom",32,"548-56-4412");
Person p2=new Person("Jerry",60,"456-74-4125");
Person p3=new Person("Sherry",38,"418-55-1235");
Order order1=new Order();
order1.insertOrder(new String[]{"mouse","car charger"});
Order order2=new Order();
order2.insertOrder(new String[]{"Multi vitamin"});
Order order3=new Order();
order3.insertOrder(new String[]{"handbag", "iPod"});
Hashtable<Person,Order> hashtable=new Hashtable<Person,Order>();
hashtable.put(p1,order1);
hashtable.put(p2,order2);
hashtable.put(p3,order3);
//The line below will fail if Person class does not override hashCode()
Order tomOrder= hashtable.get(new Person("Tom", 32, "548-56-4412"));
for(String item:tomOrder.items)
{
System.out.println(item);
}
}
}
hashCode() method is used to get a unique integer for given object. This integer is used for determining the bucket location, when this object needs to be stored in some HashTable, HashMap like data structure. By default, Object’s hashCode() method returns and integer representation of memory address where object is stored.
The hashCode() method of objects is used when we insert them into a HashTable, HashMap or HashSet. More about HashTables on Wikipedia.org for reference.
To insert any entry in map data structure, we need both key and value. If both key and values are user define data types, the hashCode() of the key will be determine where to store the object internally. When require to lookup the object from the map also, the hash code of the key will be determine where to search for the object.
The hash code only points to a certain "area" (or list, bucket etc) internally. Since different key objects could potentially have the same hash code, the hash code itself is no guarantee that the right key is found. The HashTable then iterates this area (all keys with the same hash code) and uses the key's equals() method to find the right key. Once the right key is found, the object stored for that key is returned.
So, as we can see, a combination of the hashCode() and equals() methods are used when storing and when looking up objects in a HashTable.
NOTES:
Always use same attributes of an object to generate hashCode() and equals() both. As in our case, we have used employee id.
equals() must be consistent (if the objects are not modified, then it must keep returning the same value).
Whenever a.equals(b), then a.hashCode() must be same as b.hashCode().
If you override one, then you should override the other.
http://parameshk.blogspot.in/2014/10/examples-of-comparable-comporator.html
String class and wrapper classes have different implementation of equals() and hashCode() methods than Object class. equals() method of Object class compares the references of the objects, not the contents. hashCode() method of Object class returns distinct hashcode for every single object whether the contents are same.
It leads problem when you use Map collection and the key is of Persistent type, StringBuffer/builder type. Since they don't override equals() and hashCode() unlike String class, equals() will return false when you compare two different objects even though both have same contents. It will make the hashMap storing same content keys. Storing same content keys means it is violating the rule of Map because Map doesnt allow duplicate keys at all.
Therefore you override equals() as well as hashCode() methods in your class and provide the implementation(IDE can generate these methods) so that they work same as String's equals() and hashCode() and prevent same content keys.
You have to override hashCode() method along with equals() because equals() work according hashcode.
Moreover overriding hashCode() method along with equals() helps to intact the equals()-hashCode() contract: "If two objects are equal, then they must have the same hash code."
When do you need to write custom implementation for hashCode()?
As we know that internal working of HashMap is on principle of Hashing. There are certain buckets where entrysets get stored. You customize the hashCode() implementation according your requirement so that same category objects can be stored into same index.
when you store the values into Map collection using put(k,v)method, the internal implementation of put() is:
put(k, v){
hash(k);
index=hash & (n-1);
}
Means, it generates index and the index is generated based on the hashcode of particular key object. So make this method generate hashcode according your requirement because same hashcode entrysets will be stored into same bucket or index.
That's it!
IMHO, it's as per the rule says - If two objects are equal then they should have same hash, i.e., equal objects should produce equal hash values.
Given above, default equals() in Object is == which does comparison on the address, hashCode() returns the address in integer(hash on actual address) which is again distinct for distinct Object.
If you need to use the custom Objects in the Hash based collections, you need to override both equals() and hashCode(), example If I want to maintain the HashSet of the Employee Objects, if I don't use stronger hashCode and equals I may endup overriding the two different Employee Objects, this happen when I use the age as the hashCode(), however I should be using the unique value which can be the Employee ID.
To help you check for duplicate Objects, we need a custom equals and hashCode.
Since hashcode always returns a number its always fast to retrieve an object using a number rather than an alphabetic key. How will it do? Assume we created a new object by passing some value which is already available in some other object. Now the new object will return the same hash value as of another object because the value passed is same. Once the same hash value is returned, JVM will go to the same memory address every time and if in case there are more than one objects present for the same hash value it will use equals() method to identify the correct object.
When you want to store and retrieve your custom object as a key in Map, then you should always override equals and hashCode in your custom Object .
Eg:
Person p1 = new Person("A",23);
Person p2 = new Person("A",23);
HashMap map = new HashMap();
map.put(p1,"value 1");
map.put(p2,"value 2");
Here p1 & p2 will consider as only one object and map size will be only 1 because they are equal.
public class Employee {
private int empId;
private String empName;
public Employee(int empId, String empName) {
super();
this.empId = empId;
this.empName = empName;
}
public int getEmpId() {
return empId;
}
public void setEmpId(int empId) {
this.empId = empId;
}
public String getEmpName() {
return empName;
}
public void setEmpName(String empName) {
this.empName = empName;
}
#Override
public String toString() {
return "Employee [empId=" + empId + ", empName=" + empName + "]";
}
#Override
public int hashCode() {
return empId + empName.hashCode();
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(this instanceof Employee)) {
return false;
}
Employee emp = (Employee) obj;
return this.getEmpId() == emp.getEmpId() && this.getEmpName().equals(emp.getEmpName());
}
}
Test Class
public class Test {
public static void main(String[] args) {
Employee emp1 = new Employee(101,"Manash");
Employee emp2 = new Employee(101,"Manash");
Employee emp3 = new Employee(103,"Ranjan");
System.out.println(emp1.hashCode());
System.out.println(emp2.hashCode());
System.out.println(emp1.equals(emp2));
System.out.println(emp1.equals(emp3));
}
}
In Object Class equals(Object obj) is used to compare address comparesion thats why when in Test class if you compare two objects then equals method giving false but when we override hashcode() the it can compare content and give proper result.
Both the methods are defined in Object class. And both are in its simplest implementation. So when you need you want add some more implementation to these methods then you have override in your class.
For Ex: equals() method in object only checks its equality on the reference. So if you need compare its state as well then you can override that as it is done in String class.
There's no mention in this answer of testing the equals/hashcode contract.
I've found the EqualsVerifier library to be very useful and comprehensive. It is also very easy to use.
Also, building equals() and hashCode() methods from scratch involves a lot of boilerplate code. The Apache Commons Lang library provides the EqualsBuilder and HashCodeBuilder classes. These classes greatly simplify implementing equals() and hashCode() methods for complex classes.
As an aside, it's worth considering overriding the toString() method to aid debugging. Apache Commons Lang library provides the ToStringBuilder class to help with this.

Is there any chance for the hash codes of two different objects of being same? [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Reason behind JVM's default Object.HashCode() implementation

I am trying to understand why JVM's default implementation does not return same hashcode() value for all the objects...
I have written a program where i have overridden equals() but not hashCode(), and the consequences are scary.
HashSet is adding two objects even the equals are same.
TreeSet is throwing exception with Comparable implementation..
And many more..
Had the default Object'shashCode() implementation returns same int value, all these issues could have been avoided...
I understand their's alot written and discussed about hashcode() and equals() but i am not able to understand why things cant be handled at by default, this is error prone and consequences could be really bad and scary..
Here's my sample program..
import java.util.HashSet;
import java.util.Set;
public class HashcodeTest {
public static void main(String...strings ) {
Car car1 = new Car("honda", "red");
Car car2 = new Car("honda", "red");
Set<Car> set = new HashSet<Car>();
set.add(car1);
set.add(car2);
System.out.println("size of Set : "+set.size());
System.out.println("hashCode for car1 : "+car1.hashCode());
System.out.println("hashCode for car2 : "+car2.hashCode());
}
}
class Car{
private String name;
private String color;
public Car(String name, String color) {
super();
this.name = name;
this.color = color;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getColor() {
return color;
}
public void setColor(String color) {
this.color = color;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Car other = (Car) obj;
if (color == null) {
if (other.color != null)
return false;
} else if (!color.equals(other.color))
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}
Output:
size of Set : 2
hashCode for car1 : 330932989
hashCode for car2 : 8100393
It seems that you want to propose to calculate hashCode by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:
Objects are connected into double-linked list (every object has previous and next fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.
Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?
Suppose you have two HashSet objects. First is created like:
HashSet<Integer> a = new HashSet<>();
a.add(1);
The second is created like this:
HashSet<Integer> b = new HashSet<>();
for(int i=1; i<1000; i++) b.add(i);
for(int i=2; i<1000; i++) b.remove(i);
From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of hashCode and equals.
Suppose you have different List implementations. For example, ArrayList and LinkedList. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different: ArrayList contains an array while LinkedList contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.
Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.
So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode and equals requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals, you just must redefine hashCode as well.
You broke the contract.
hashcode and equals should be written in such a way, that when equals return true these objects has same hashcode.
If you override equals then you must provide hashcode that works properly.
Default implementation can't handle it, because default implementation don't know which fields are important. And automatic implementation would not do it in efficient way, the hashcode function is to speed up operations like data lookup in data structures, if it is implemented improperly, then performance will suffer.
From the Docs
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
From documentation:
If two objects are equal according to the equals(Object)
method, then calling the hashCode} method on each of
the two objects must produce the same integer result.
then if you overrides how equals() behave, you must override hashCode() as well.
Also, from docs of equals() -
Note that it is generally necessary to override the hashCode
method whenever this method is overridden, so as to maintain the
general contract for the hashCode method, which states
that equal objects must have equal hash codes.
From javadoc of Object class:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
Thus if default implementation provides the same hash, it defeats the purpose.
And for a default implementation, it cannot assume all the classes are of value class, thus the last sentence from doc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

Java, Date, Array, hashcode() [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

data members to consider while overriding hashcode and equals

I know (contract) we need to override hashcode when equals is overridden.
Why should I consider same fields used for equals comparison to compute hashcode?
Is it to improve performance, by avoiding too many objects mapping to same bucket, as in below case?
i.e. all objects created on same "date" would map to same bucket and linear comparison will take time in checking object exists using equals() method?
If my above statement is true, what other potential issues will come with below code other than performance issue. Is that the only reason we should use same fields / members used in equals to compute hashcode? Please share. Thanks.
class MyClass {
int date;
int pay;
int id;
public boolean equals(Object o) {
//null and same class instance check
MyClass obj = (MyClass) o;
return (date == obj.date && pay == obj.pay && id == obj.id);
}
public int hashCode() {
int hash = 7;
return (31 * hash + date);
}
}
//please pardon syntax errors, I typed without using ide.
***my intention is to use all fields in equals, and know why same number of elements should be used in hashcode, and what happens if only few elements are used
Clarification:
With only using "date" to compute hashcode,pointer checks right bucket address (do you agree?) furthermore, I get list of items in that bucket, collection will iterate over to check if particular obj exists using equals. And my definition of equals is "all fields must be same". With this, I believe my code works fine, and I only find performance issue. Please point out where I am wrong. Thank you
For your example, I suggest you use just id for equality and that annotate that they're overrides. Also, I like to override toString()
#Override
public boolean equals(Object o) {
if (o instanceof MyClass) {
return (id == ((MyClass) o).id);
}
return false;
}
#Override
public int hashCode() {
return id;
}
#Override
public String toString() {
return String.format("MyClass (id=%d, date=%d, pay=%d)", id, date, pay);
}
That way you can update the date and/or the pay without having to recreate the hash structure. Also, that's what appears to be unique about instances.
I found the answer in Effective Java, by Joshua Bloch, 2nd edtn, page 49 "Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance" . The poor quality may degrade hash tables' performance.
So my guess was right, multiple hashes will map to same bucket.
Additional information:
http://www.javaranch.com/journal/2002/10/equalhash.html
Since the class members/variables num and data do participate in the
equals method comparison, they should also be involved in the
calculation of the hash code. Though, this is not mandatory. You can
use subset of the variables that participate in the equals method
comparison to improve performance of the hashCode method. Performance
of the hashCode method indeed is very important.

Categories

Resources