putiing objects to hash based collection

putiing objects to hash based collection - java

Suppose I have the below class.
class S{
String txt = null;
S(String i){
txt=i;
}
public static void main(String args []){
S s1 = new S("a");
S s2 = new S("b");
S s3 = new S("a");
Map m = new HashMap ();
m.put(s1, "v11");
m.put(s2, "v22");
m.put(s3, "v33");
System.out.println(m.size());
}
//just a plain implementation
public boolean equals(Object o)
{
S cc = (S) o;
if (this.i.equals(cc.i))
{
return true;
}
else
{
return false;
}
}
public int hashCode()
{
return 222;
}
}
This will return size as 2 when running above. Its totally fine. If we comment the hashCode() it return 3 which is also correct. But if we comment the equals and keep the hashCode it should return 2 right? instead it returns 3. When putting objects to hashmap map will check the hash code of an object and if its same it will replace the previous value of the map to the new one right?
Thank You.

But if we comment the equals and keep the hashCode it should return 2
right? instead it returns 3.
3 items is the correct behaviour. 3 objects will be hashed to the same bucket, but because all 3 are different this bucket will contain a chain of values (linked list for HashMap in Java) with the same hash code but not equal to each other.
When putting objects to hashmap map will check the hash code of an object and if its same
it will replace the previous value of the map to the new one right?
If they are hashed to the same bucket it doesn't mean that one value will replace another. Then these values will be compared for equality. If they are equal then old value will be replaced, if they are not - new value will be added to the tail of the linked list (for this bucket).

The hashcode is simply used to determine the bucket in which to place the object. Each bucket can contain more than once object. So hashcode must be implemented to ensure that equal objects go in the same bucket. In other words equal objects must have the same hashcode but objects with the same hashcode aren't necessarily equal.

When you override only hashcode nothing really changes. You are just putting every object in the same bucket with return 222. So the HashMap is more inefficient, but its contract doesn't change.

The hashcode is the first, quick method to find if two objects are equal or not. It is used by hash containers to decide in which "slot" the object may go, and to retrieve it without checking for all of the objects in all of the slots.
If your hashcode is always the same, then all the objects will be directed to the same slot. This is called collision. Insertions will be slower, because after the collision the container will have to check if the objects already in that slot match the new one (equals). Also, retrieval will be slower because it will have to check all of them sequentially until it finds the right one(equals againg). Finally, probably there will be a lot of unused memory wasted in slots that will not be used.
In essence, by no implementing a sensible hashcode you are converting the hashcontainers in lists (and inefficient ones).

If we comment the hashCode() it return 3 which is also correct.
This is not correct! There are only 2 different objects: "a" and "b". The equals method says what is equal and what is not. The expected size is 2. But, because the equals-hashcode contract is broken, the returned size is 3.

Related

HashSet.contains returns false when it shouldn't

I have this code:
public class Tray {
private Set<Block> blocks;
private int numColumns;
private int numRows;
//constructor
public Tray (int numRows, int numColumns){
this.numColumns = numColumns;
this.numRows = numRows;
blocks = new HashSet<>();
}
public boolean satisfiesGoal(Tray other){
Block thisBlock = this.blocks.iterator().next();
Block otherBlock = other.blocks.iterator().next();
boolean hashesEqual = thisBlock.hashCode() == otherBlock.hashCode(); // this is true
boolean areEqual = thisBlock.equals(otherBlock) && otherBlock.equals(thisBlock); // this is true
boolean contains = this.blocks.contains(otherBlock); // this is false, why?
return contains;
}
In the main method I have added the 2 Blocks to their respective Trays. According to the debugger, the variables "hashesEqual" and "areEqual" are true, but the boolean "contains" is false. Any ideas as to why the hashes of 2 objects would be equal as well as equal according to the "equals" method, but would not contain the equal object in a HashSet?

This problem happens if you modify the objects in a way that affects their equality and hash codes after adding them to the HashSet. The set will malfunction because the objects are not found in the correct slot of the hash table corresponding to their new value.
Likewise, a HashMap will malfunction if you modify objects used as keys. Similarly with TreeSet and TreeMap. All these data structures can locate objects quickly because each object's value dictates its storage location. The structure becomes wrong if those objects are then modified.
Immutable objects are nicer as set elements and map keys because they avoid this complication.
If you must modify fields that are part of an object's equality, you'll need to temporarily remove it from the set first, and add it again afterwards. Or, use a list as the main container for your objects, and construct a temporary set only when needed.

Are Block objects mutable?
HashSet stores its contents as keys in a HashMap, with an arbitrary Object as the value.
As per this article,
If an object's hashCode() value can change based on its state, then we must be careful when using such objects as keys in hash-based collections [emphasis mine] to ensure that we don't allow their state to change when they are being used as hash keys. All hash-based collections assume that an object's hash value does not change while it is in use as a key in the collection. If a key's hash code were to change while it was in a collection, some unpredictable and confusing consequences could follow. This is usually not a problem in practice -- it is not common practice to use a mutable object like a List as a key in a HashMap.

The code for contains in openjdk is pretty simple - it eventually just calls HashMap.getEntry which uses the hash code and equals to check if the key exists. My guess is that your error is in thinking that the item is in the set already. But you could easily confirm that that is wrong by directly declaring a Set in the code you have posted and adding the items to that collection.
Try adding the following unit test:
Set<Block> blocks = new HashSet<>();
blocks.add(thisBlock);
assertTrue(thisBlock.hashCode() == otherBlock.hashCode() && thisBlock.equals(otherBlock));
assertTrue(blocks.contains(otherBlock));
If the first assertion passes and the second fails then you've found a bug in Java. I find that pretty unlikely.
Also make sure you have the openjdk source code available so you can step into Java methods while debugging. That way you can step into contains and check exactly where it is failing.
Also note that your code this.blocks.iterator().next() creates a new iterator each time the function is called and then returns the first item in the iteration. In other words it picks the first item in the set (note that this is not the least by natural order). If you are trying to iterate through the two sets in sequence and compare values then that's not what your code does at the moment.

HashCode - What will happen if equal object happened to hash in the same bucket?

I know this has been asked many times, but I can't find an exact answer to my question.
In chapter 3 of Effective Java, there is a scenario there that shows and explains why hashcode should be overriden together with the equals method. I get the most part of it but there is a part there that I can't understand.
There is a given class there that override the equals method but not the hashCode method. The object is put as a key in a map
Map<PhoneNumber, String> m = new HashMap<PhoneNumber, String>();
m.put(new PhoneNumber(707, 867, 5309), "Jenny");
I understand that if we get using another equal object (m.get(new PhoneNumber(707, 867, 5309))), it will return null simply because their hashcodes are not overriden to return equal value for equal objects (because it will search for the object in another bucket because of different hashcode).
But according to my understanding, in that situation, there is no guarantee that the hashcodes of the two objects will always return distinct. What if they happen to return the same hashcode?
I think it is explained in this part
Even if the two instances happen to hash to the same bucket, the get
method will almost certainly return null, as HashMap has an
optimization that caches the hash code associated with each entry and
doesn’t bother checking for object equality if the hash codes don’t
match.
I just don't get the cache thing. Can someone explain this elaborately?
Also, I already did my home work, and found a related question
Influence of HashMap optimization that caches the hash code associated with each entry to its get method
But I'm not that satisfied with the answer accepted, also, the answerer says in the comment that
A hash code can be an arbitrary int, thus each hash code can't have
its own bucket. Consequently, some objects with different hash codes
end up in the same bucket.
Which I completely disagree. To my understanding different hashcodes will never end up in the same bucket.

Take a look at how java.util.HashMap calculates a bucket number for a key by hashCode:
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
If hashtable length = 16 then both 128 and 256 will get in bucket #0. Hashtable is an array of entries:
Entry<K,V>[] table
...
class Entry<K,V> {
K key;
V value;
Entry<K,V> next;
int hash;
...
Entries may form a chain (LinkedList). If bucket #0 (table[0]) is empty (null) then the new entry will be placed directly there, otherwise HashMap will find the last entry in the chain and set the last entry's next = new entry.

When this is said "Even if the two instances happen to hash to the same bucket" it doesn't mean that they have same hashcode. Even different hashcodes can map to same bucket [read about hashing].
So even if the keys hash to the same bucket, .equals may not be invoked (due to the caching optimizations) for the relevant element (since not even the hash-codes matches). Thus, even if the relevant element resides in the same bucket, it may never be compared through .equals, and thus not "found".

What happens to the lookup in a Hashmap or Hashset when the objects Hashcode changes

In a Hashmap the hash code of the key provided is used to place the value in the hashtable. In a Hashset the obects hashcode is used to place the value in the underlying hashtable. i.e the advantage of the hashmap is that you have the flexibility of deciding what you want as the key so you can do nice things like this.
Map<String,Player> players = new HashMap<String,Player>();
This can map a string such as the players name to a player itself.
My question is is what happens to to the lookup when the key's Hashcode changes.
This i expect isn't such a major concern for a Hashmap as I wouldn't expect nor want the key to change. In the previous example if the players name changes he is no longer that player. However I can look a player up using the key change other fields that aren't the name and future lookups will work.
However in a Hashset since the entire object's hashcode is used to place the item if someone slightly changes an object future lookups of that object will no longer resolve to the same position in the Hashtable since it relies on the entire objects Hashcode. Does this mean that once data is in a Hashset it shouldnt be changed. Or does it need to be rehashed? or is it done automatically etc? What is going on?

In your example, a String is immutable so its hashcode cannot change. But hypothetically, if the hashcode of an object did change while was a key in a hash table, then it would probably disappear as far as hashtable lookups were concerned. I went into more detail in this Answer to a related question: https://stackoverflow.com/a/13114376/139985 . (The original question is about a HashSet, but a HashSet is really a HashMap under the covers, so the answer covers this case too.)
It is safe to say that if the keys of either a HashMap or a TreeMap are mutated in a way that affects their respective hashcode() / equals(Object) or compare(...) or compareTo(...) contracts, then the data structure will "break".
Does this mean that once data is in a Hashset it shouldn't be changed.
Yes.
Or does it need to be rehashed? or is it done automatically etc?
It won't be automatically rehashed. The HashMap won't notice that the hashcode of a key has changed. Indeed, you won't even get recomputation of the hashcode when the HashMap resizes. The data structure remembers the original hashcode value to avoid having to recalculate all of the hashcodes when the hash table resizes.
If you know that the hashcode of a key is going to change you need to remove the entry from the table BEFORE you mutate the key, and add it back afterwards. (If you try to remove / put it after mutating the key, the chances are that the remove will fail to find the entry.)
What is going on?
What is going on is that you violated the contract. Don't do that!
The contract consists of two things:
The standard hashcode / equals contract as specified in the javadoc for Object.
An additional constraint that an object's hashcode must not change while it is a key in a hash table.
The latter constraint is not stated specifically in the HashMap javadoc, but the javadoc for Map says this:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
A change that affects equality (typically) also affects the hashcode. At the implementation level, if a HashMap entry's key's hashcode changes, the entry will typically now be in the wrong hash bucket and will be invisible to HashMap methods that perform lookups.

In your example, the keys are String which are immutable. So the hashcode of the keys won't change. What happens when the hashcode of the keys changes is undefined and leads to "weird" behaviour. See the example below, which prints 1, false and 2. The object remains in the set, but the set looks like it is broken (contains returns false).
Extract from Set's javadoc:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
public static void main(String args[]) {
Set<MyObject> set = new HashSet<>();
MyObject o1 = new MyObject(1);
set.add(o1);
o1.i = 2;
System.out.println(set.size()); //1
System.out.println(set.contains(o1)); //false
for (MyObject o : set) {
System.out.println(o.i); //2
}
}
private static class MyObject {
private int i;
public MyObject(int i) {
this.i = i;
}
#Override
public int hashCode() {
return i;
}
#Override
public boolean equals(Object obj) {
if (obj == null) return false;
if (getClass() != obj.getClass()) return false;
final MyObject other = (MyObject) obj;
if (this.i != other.i) return false;
return true;
}
}

With Java's hashes, the original reference is simply not found. It's searched in the bucket corresponding the current hashcode, and not found.
To recover from this after the fact, the Hash keySet must be iterated over, and and any key which is not found by contains method must be removed through the iterator. Preferable is to remove the key from the map, then store the value with new key.

The HashSet is backed up by a HashMap.
From the javadocs.
This class implements the Set interface, backed by a hash table
(actually a HashMap instance).
So if you change the hashcode, I doubt whether you can access the object.
Internal Implementation Details
The add implementation of HashSet is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
The key is the elem and value is just a dummy Object called PRESENT
and the contains implementation is
public boolean contains(Object o) {
return map.containsKey(o);
}

Java HashMap containsKey always false

I have the funny situation, that I store a Coordinate into a HashMap<Coordinate, GUIGameField>.
Now, the strange thing about it is, that I have a fragment of code, which should guard, that no coordinate should be used twice. But if I debug this code:
if (mapForLevel.containsKey(coord)) {
throw new IllegalStateException("This coordinate is already used!");
} else {
...do stuff...
}
... the containsKey always returns false, although I stored a coordinate with a hashcode of 9731 into the map and the current coord also has the hashcode 9731.
After that, the mapForLevel.entrySet() looks like:
(java.util.HashMap$EntrySet) [(270,90)=gui.GUIGameField#29e357, (270,90)=gui.GUIGameField#ca470]
What could I have possibly done wrong? I ran out of ideas. Thanks for any help!
public class Coordinate {
int xCoord;
int yCoord;
public Coordinate(int x, int y) {
...store params in attributes...
}
...getters & setters...
#Override
public int hashCode() {
int hash = 1;
hash = hash * 41 + this.xCoord;
hash = hash * 31 + this.yCoord;
return hash;
}
}

You should override equals in addition to hashCode for it to work correctly.
EDIT : I have wrongly stated that you should use hashCode in your equals - this was not correct. While hashCode must return the same result for two equal objects, it still may return the same result for different objects.

It seems that you forgot to implement equals() method for your coordinate class. This is required by contract. Hah compares 2 entries with the same hash code using equals. In your case the Object.equals() is called that is always different for 2 different object because is based on reference to the object in memory.

The reason why you need to implement equals alongside with hashCode is because of the way hash tables work.
Hash tables associate an integer value (the hash of the Key) with the Value.
Think of it as an array of Value objects. When you insert in this table, value is stored at position key.hashCode().
This allows you to find any object in the table "straight away". You just have to compute the hashCode for that object and you'll know where it is on the table. Think of it as opposed to a Tree, in which you would need to navigate the tree to find the object).
However, there is a problem with this approach: there might be more than one object with the same hash code.
This would cause you to mistakenly associate two (or more) keys with the same value. This is called a collision.
There's a simple way to solve this problem: instead of mapping each hashcode to one Value, you can map it to a list of pairs Key-Value.
Now every time you're looking for an object in the hash map, after computing the hash you need to go through that list (the list of 'values' that are related to that hashcode) and find the correct one.
This is why you always need to implement equals on the key of the hash map.
Note: Hash tables are actually a bit more complex than this, but the idea is the same. You can read more about collision resolution here.

Define hashCode method in your Coordinate class. Make sure it returns unique code for unique objects and same for same objects.

How to ensure hashCode() is consistent with equals()?

When overriding the equals() function of java.lang.Object, the javadocs suggest that,
it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)
How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?
public class People{
public String name;
public int age;
public int hashCode(){
// How to get a unique integer based on name and age?
}
}
/*******************************/
public class App{
public static void main( String args[] ){
People mike = new People();
People melissa = new People();
mike.name = "mike";
mike.age = 23;
melissa.name = "melissa";
melissa.age = 24;
System.out.println( mike.hasCode() ); // output?
System.out.println( melissa.hashCode(); // output?
}
}

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}

I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like
public boolean equals(Object obj) {
if (this==obj) return true;
if (obj==null) return false;
if (!(getClass().equals(obj.getClass())) return false;
Person other = (Person)obj;
return (name==null ? other.name==null : name.equals(other.name)) &&
age==other.age;
}
Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age;
}
Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}
Also, it's perfectly valid for two non-equals objects to have the same hash code:
public int hashCode() {
return age;
}
In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.

Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.
A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.
Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.
You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.
The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.
One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.
Edit based on comments:
1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).
2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.

In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.
Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.
In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.

I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).
For straightforward attributes, some IDEs have hashcode function builders.
If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder

The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.
One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.

This is what documentation tells us as for hash code method
# javadoc
Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.

There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.