java LinkedHashSet - java

I've been studying for OCJP (former SCJP) and I came across the following example which uses LinkedHashSet:
public class Test{
int size;
public Test(int s){
this.size = s;
}
#Override
public boolean equals(Object obj) {
return (this.size == ((Test)obj).size);
}
public static void main(String[] args) {
LinkedHashSet<Test> s = new LinkedHashSet<Test>();
s.add(new Test(1));
s.add(new Test(2));
s.add(new Test(1));
System.out.println(s.size());
}
}
Now, the question is what is displayed if :
1) implementation stays as is
2) override of hashCode is inserted in the class Test as follows:
public int hashCode() {return size/5};
Running and compiling the code states that the size of set in the first case is 3, while in the second it is 2.
Why?
In case 1, although equals method is overriden, it is never invoked. Does that mean that add() method does not check for object equality if hashCode method is not overriden?
In case 2, hashCode with the given implementation and the give set of Test objects always returns the same number. How is that different from the default hashCode implementation, and why does it cause equals to be invoked?

If you don't override hashCode(), then each of your instances will have hashcode calculated from some pre-defined Hashing algorithm in Object class. So, all your instances will possibly have different hashcode values (This is not for sure though). Means, each instance will go into its own bucket.
Now, even if you overridden equals() method make two instances equal based on some attribute, their hashcodes are still different.
So, two instances with a different hashcodes, can never be equal. So the size of the set is 3. Since it does not have any duplicate.
But, when you override hashCode() with following implementation: -
public int hashCode() {return size/5};
It will return same value for same size. So the instances with same value of size will have same hashcodes and also, since you have compared them in equals method on the basis of size, so they will be equal and hence they will be considered duplicate in your Set and hence will be removed.So, Set.size() is 2.
Moral: - You should always override hashCode() whenever you override equals() method, to maintain the general contract between the two methods.
General contract between hashcode and equals method: -
When two objects are equal, their hashcode must be equal
When two objects are not equal, their hashcode can be equal
The hashCode algorithm should always generate same value for same object.
If hashCode for two objects are different, they will not be equal
Always use same attributes to calculate hashCode that you used to compare the two instances
Strongly suggested to read at least once: -
Effective Java -
Item#9: Always override hashCode when you override equals

Hashing structures rely on hashing algorithm which is represented by hashCode() in java. When you put something into a HashMap (or LinkedHashSet in your case), jvm invokes hashCode() on the objects that are being inserted into this structure. When it is not overriden, default hashCode() from Object class will be used and it is way inefficient -- all the objects get into their own buckets.
When you override the hashCode() the way that is shown in your example, all of objects in your example will get into the same bucket. And then (when adding them one after another), be compared with equals(). That's why in the first case (when equals() is not called) you get size of 3, and in the second -- 2.

Related

hashCode() method [duplicate]

This question already has answers here:
What is the use of hashCode in Java?
(8 answers)
Closed 2 years ago.
I need some help in better understanding hashCode() method in a theoretical way. I´ve read (emphasis mine):
When hashCode() is called on two separate objects (which are equal according to the equals() method) it returns the same hash code value. However, if it is called on two unequal objects, it will not necessarily return different integer values.
Where can said exceptions occur?
Suppose you have a class with two String fields, and that its hashcode is calculated by summing the hashcodes of those two fields. Further suppose you have an equals that simply checks whether the class fields values are equal.
class Test {
String a;
String b;
public Test
#Override
public int hashCode() {
return a.hashCode() + b.hashCode();
}
#Override
public boolean equals(Object o) { // simplified
Test other = (Test)o;
return a.equals(other.a) && b.equals(other.b);
}
}
Let's see if non-equal instances can have the same hashcode
Test t1 = new Test("hello", "world");
Test t2 = new Test("world", "hello");
System.out.println(t1.equals(t2)); // false
System.out.println(t1.hashCode() == t2.hashCode()); // true
Are we still respecting hashCode's contract?
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
Well, yes, since it only depends on a and b and we're using their hashCode method which we can assume respects the contract.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It does
Test t1 = new Test("hello", "world");
Test t2 = new Test("hello", "world");
System.out.println(t1.equals(t2)); // true
System.out.println(t1.hashCode() == t2.hashCode()); // true
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
That's what we were trying to demonstrate in the first place. It's not a requirement.
public int hashCode() {
return 27;
}
Believe it or not, but this is, although it isn't a very efficient way of working, a valid implementation of hashCode, since it will respect the contract with the equals method. This implementation will cause exactly what you describe.
The hashCode is used to limit the number of cases to compare to.
For instance, if in a high school, you are looking for a student. You only know the name, gender and age of this student. Are you going to look through all the students, or only the ones with that age and gender?
The hashCode does the same, for certain data structures. When looking for an item, it will first make a sub-list/collection of the items with an identical hashCode, and then, it searches for the exact item within that sub-list/collection.
The more specific your hash code, the more efficient the search.

Should hashCode() only use the subset of immutable fields of those used in equals()?

Situation
I needed to overwrite equals() and as it is recommended I also overwrote the hashCode() method using the same fields. Then, when I was looking at a set, that contained only the one object I got the frustrating result of
set.contains(object)
=> false
while
set.stream().findFirst().get().equals(object)
=> true
I understand now, that this is due to changes that were made to object after it was added to set which again changed its hashCode. contains then looks at the wrong key and can't find the object.
My requirements for the implementation are
mutable fields are needed to correctly implement equals()
use these objects safely in hash-based Collections or Maps such ash HashSet even if they are prone to changes.
which conflicts with the convention that
equals() and hashCode() should use the same fields in order to avoid surprises (as argued here: https://stackoverflow.com/a/22827702).
Question
Are there any dangers to using only a subset of fields which are used in equals() to calculate hashCode() instead of using all?
More specifically this would mean: equals() uses a number of fields of the object whereas hashCode() only uses those fields that are used in equals() and that are immutable.
I think this should be okay, because
the contract is fullfilled: equal objects will produce the same hashCode, while the same hashCode does not necesairly mean that the objects are the same.
The hashCode of an object stays the same, even if an object is exposed to changes and therefore will be found in a HashSet before and after those changes.
Related posts that helped me understand my problem but not how to solve it: What issues should be considered when overriding equals and hashCode in Java? and Different fields for equals and hashcode
It's ok for hashCode() to use a subset of the fields that equals() uses, although it may possibly give you a slight performance drop.
Your problem seems to be caused by modifying the object, while still inside the set, in a way that alters the functioning of hashCode() and/or equals(). Whenever you add an object to a HashSet (or as the key in a HashMap), you must not subsequently modify any fields of that object that are used by equals() and/or hashCode(). Ideally, all fields used by equals() should be final. If they can't be, you must treat them as though they are final whilst the object is in the set.
The same goes for TreeSet/TreeMap, too, but applies to fields used by compareTo().
If you really need to modify the fields that are used by equals() (or by compareTo() in the case of a TreeSet/TreeMap), you must:
First, remove that object from the set;
Then modify the object;
And finally add it back to the set.
The contract would indeed be fulfilled. The contract imposes that .equal() objects have ALWAYS the same .hashCode(). The opposite doesn't have to be true and I wonder with the obsession of some people and IDEs to apply exactly that practice. If this was possible for all possible combinations, then you would discover the perfect hash function.
BTW, IntelliJ offers a nice wizard when generating hashCode and equals by treating those two methods separately and allowing to differentiate your selection. Obviously, the opposite, aka offering more fields in the hashCode() and less fields in the equals() would violate the contract.
For HashSet and similar collections/maps, it's a valid solution to have hashCode() use only a subset of the fields from the equals() method. Of course, you have to think about how useful the hash code is to reduce collisions in the map.
But be aware that the problem comes back if you want to use ordered collections like TreeSet. Then you need a comparator that never gives collisions (returns zero) for "different" objects, meaning that the set can only contain one of the colliding elements. Your equals() description implies that multiple objects will exist that differ only in the mutable fields, and then you lose:
Including the mutable fields in the compareTo() method can change the comparison sign, so that the object needs to move to a different branch in the tree.
Excluding the mutable fields in the compareTo() method limits you to have maximum one colliding element in the TreeSet.
So I'd strongly recommend to think about your object class'es concept of equality and mutability again.
That's perfectly valid to me. Suppose you have a Person:
final int name; // used in hashcode
int income; // name + income used in equals
name decides where the entry will go (think HashMap) or which bucket will be chosen.
You put a Person as a Key inside HashMap : according to hashcode it goes to some bucket, second for example. You upgrade the income and search for that Person in the map. According to hashcode it must be in the second bucket, but according to equals it's not there:
static class Person {
private final String name;
private int income;
public Person(String name) {
super();
this.name = name;
}
public int getIncome() {
return income;
}
public void setIncome(int income) {
this.income = income;
}
public String getName() {
return name;
}
#Override
public int hashCode() {
return name.hashCode();
}
#Override
public boolean equals(Object other) {
Person right = (Person) other;
return getIncome() == right.getIncome() && getName().equals(right.getName());
}
}
And a test:
HashSet<Person> set = new HashSet<>();
Person bob = new Person("bob");
bob.setIncome(100);
set.add(bob);
Person sameBob = new Person("bob");
sameBob.setIncome(200);
System.out.println(set.contains(sameBob)); // false
What you are missing I think is the fact that hashcode decides a bucket where an entry goes (there could be many entries in that bucket) and that's the first step, but equals decides if that is well, an equal entry.
The example that you provide is perfectly legal; but the one you linked is the other way around - it uses more fields in hashcode making it thus incorrect.
If you understand these details that first hashcode is used to understand where and Entry might reside and only later all of them (from the subset or bucket) are tried to be found via equal - your example would make sense.

Does it matter if two hashCodes are equal, even if the two objects aren't from the same type?

Let's say I have two types A and B that both have a unique id field, here is how I usually implement the equals() and hashCode() methods:
#Override
public boolean equals(Object obj) {
return obj instanceof ThisType && obj.hashCode() == hashCode();
}
#Override
public int hashCode() {
return Arrays.hashCode(new Object[] { id });
}
In that case, given that A and B both have a 1-arg constructor to set their respective id field,
new A(1).equals(new A(1)) // prints true as expected,
new A(1).equals(new A(2)) // prints false as expected,
new A(1).equals(new B(1)) // prints false as expected.
But also,
new A(1).hashCode() == new B(1).hashCode() // prints true.
I wonder if it matters if two hashCodes are equal, even if the two objects aren't from the same type? Could hashCode() be used somewhere else than in equals()? If yes, to what purpose?
I thought about implementing the two methods as follow:
#Override
public boolean equals(Object obj) {
return obj != null && obj.hashCode() == hashCode();
}
#Override
public int hashCode() {
return Arrays.hashCode(new Object[] { getClass(), id });
}
Adding the class to the hashCode generation would solve this potential problem. What do you think? Is it necessary?
For objects of different classes the same hashCode() does not matter. The hashCode() only says that the objects are possibly the same. If e.g. HashSet encounters the same hashCode() it will test for equality with equals().
The rule is simple:
A.equals(B) implies B.hashcode() == A.hashcode()
B.hashcode() != A.hashcode() implies !A.equals(B)
There should be no other relations between the two. If you use hashcode() inside equals(), you should have a warning.
Hashcode is definitely not used in equals; it is used by collections based on the data structure called a hash table. It is always OK from the correctness standpoint for two hashcodes to equal each other; this is called a hash collision, it is unavoidable in the general case, and the only consequence is weaker performance.
Nothing wrong with two different objects (even of the same type) to have the equal hash code, but your second variant of equals() looks odd to me. It will work only if you can guarantee that your objects will be compared only to the objects of the same type.
Could hashCode() be used somewhere else than in equals()?
This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable. from javadoc
Also
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Not that, when A extends B, or B extends A, then your equals method is faulty, since:
a.equals(b) != b.equals(a)
if a and b happen to have the same hash code.

Java: overriding equals method doesn't do the trick when looking for a key of hashtable?

I have a hashtable looking like this:
Hashtable<Mapping, Integer> mappingCount = new Hashtable<Mapping, Integer>();
I want to use this code:
if (mappingCount.get(currentMapping) != null)
mappingCount.put(currentMapping, mappingCount.get(currentMapping) + 1);
else
mappingCount.put(currentMapping, 1);
In order to be able to get the value from the hashtable, for the class Mapping I did the following:
#Override
public boolean equals(Object obj) {
return ((Mapping)obj).mappingXML.equals(this.mappingXML);
}
However, this doesn't do the trick since mappingCount.get(currentMapping) always results in null. To be sure that something's not wrong, I did the following:
if (aaa.contains(currentMapping.getMappingXML()))
System.out.println("found it!");
else
aaa.add(currentMapping.getMappingXML());
where aaa is List<String> aaa = new ArrayList<String>(). Of course, found it is printed many times. What am I doing wrong?
You also need to override the hashCode() method.
From the JavaDocs:
To successfully store and retrieve
objects from a hashtable, the objects
used as keys must implement the
hashCode method and the equals method.
The reason for this is that Hashtable uses hashCode as a preliminary test to see if two objects are equals. If the hashCode matches, then it uses equals to check for collissions.
The default implementation of hashCode() returns the memory address of the object, and for two objects that are equal, their hashcodes must also be equal.
Also look at the general contract for hashCode().
All of the recommendations to override equals and hash code correctly are spot on; Joshua Bloch tells you how to do it properly.
But an equally important requirement is that keys in maps must be immutable. If your class can change its values, then the equals and hash code can change after you add it to the map; disaster ensues.
Whenever you override equals, you must override hashCode as well.
You need to override hashCode as well.
From the Object#hashCode doc:
Returns a hash code value for the
object. This method is supported for
the benefit of hashtables such as
those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the
hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then
calling the hashCode method on each of
the two objects must produce distinct
integer results. However, the
programmer should be aware that
producing distinct integer results for
unequal objects may improve the
performance of hashtables.
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
You have to implement hashcode() as well!
Example:
public class Employee{
int employeeId;
String name;
Department dept;
// other methods would be in here
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + employeeId;
hash = hash * 31 + name.hashCode();
hash = hash * 13 + (dept == null ? 0 : dept.hashCode());
return hash;
}
}

How do sets differentiate between objects?

How does a set differentiate between objects in both Java and C++? Or do sets not differentiate them at all?
Take these for example:
C++
std::set<A> aset;
A a(1, 2); // Assume A has only two elements, and this constructor sets them both
aset.insert(a);
A a2(1, 2); // This would initialise a `A' object to the same values as `a', but a different object
aset.count(a2); // Would this return 1 or 0?
Java
set<A> aset;
A a = new A(1, 2); // Assume A has only two elements, and this constructor sets them both
aset.add(a);
A a2 = new A(1, 2); // This would initialise a `A' object to the same values as `a', but a different object
aset.contains(a2); // Would this return true or false?
In C++ the set depends on operator<() being defined for the class A, or that you supply a comparison object providing strict weak ordering to the set.
For Java it depends on the equals, hashcode contract.
For the Java part,
The method in charge of determining whether two objects are equal is:
public boolean equals(Object other)
Not to be confused with
public int hashCode()
Whose contract states that two equals objects must return the same number, but two objects that returned the same number may be, but are not necessarily, equal.
The default implementation for the equals method is equality by memory address, therefor if class A did not override the equals method the contains method will return false.
To have the set.contains(a2) method return true you must override the equals and hashCode method to comply as so:
public boolean equals(Object other) {
return other instanceof A && ((A) other).elem1 = this.elem1 && ((A) other).elem2 = this.elem2;
}
public int hashCode() {
return elem1 * 31 + elem2;
}
The hashCode is required (assuming you're using a HashSet) for the set to identify where in the internal representation of the set the object may be (i.e. where to look for it).
Search for HashSet\HashMap to understand the internal representation if you're interested.
As for the C++ part, If I remember correctly it depends on the correct operator overloading, but my C++ is rusty at best.
EDIT: I noticed you specifically asked about sets so I'll elaborate a bit more on how that:
While the equals method is what determines equality between two objects, some preliminary steps in the set implementation used (e.g. HashSet or TreeSet) might relay on something extra:
For example, the HashSet uses the hashCode() function to find the internal location the item might be in, so if A did not override/correctly implement the hashCode() function, the set.contains(a2) may return true or false (for default implementation it's non deterministic - depends on memory location and the current capacity of the set).
For a TreeSet internal implementation to correctly find items within it either the items contained must be implement the Comparable interface properly or the TreeSet itself must be supplied with a Comparator instance implemented properly.
for C++, according to set::insert in C++ Reference
Because set containers do not allow for duplicate values, the
insertion operation checks for each
element inserted whether another
element exists already in the
container with the same value, if so,
the element is not inserted and -if
the function returns a value- an
iterator to it is returned
.
They check for the values, unlike Java, which only checks for the address instead.
In Java at least, comparison is done on a hash code, which by default is created from the location of the object in memory. So in the Java part of the question, aset.contains(a2); would return false, as a2 points to a different part of the memory to a.
I'm afraid I can't comment on how C++ works!
Java calls the object's equals method, which, if you haven't overridden it, is the same as calling Object.hashCode().

Categories

Resources