I want to maintain a list of objects such that each object in the list is unique.Also I want to retrieve it at one point. Objects are in thousands and I can't modify their source to add a unique id. Also hascodes are unreliable.
My approach was to utilize the key uniqueness of a map.
Say a maintain a map like :
HashMap<Object,int> uniqueObjectMap.
I will add object to map with as a key and set a random int as value. But how does java determine if the object is unique when used as a key ?
Say,
List listOne;
List listTwo;
Object o = new Object;
listOne.add(o);
listTwo.add(o);
uniqueObjectMap.put(listOne.get(0),randomInt()); // -- > line 1
uniqueObjectMap.put(listTw0.get(0),randomInt()); // --> line 2
Will line 2 give an unique key violation error since both are referring to the same object o ?
Edit
So if will unqiueObjectMap.containsKey(listTwo.get(0)) return true ? How are objects determined to be equal here ? Is a field by field comparison done ? Can I rely on this to make sure only one copy of ANY type of object is maintained in the map as key ?
Will line 2 give an unique key violation error since both are referring to the same object o ?
- No. If a key is found to be already present, then its value will be overwritten with the new one.
Next, HashMap has a separate hash() method which Applies a supplemental hash function to a given hashCode (of key objects), which defends against poor quality hash functions.
It does so by calling the Object's hashcode() function.
The default implementation is roughly equivalent to the object's unique identifier (much like a memory address); however, there are objects that are compare-by-value. If dealing with a compare-by-value object, hashcode() will be overridden to compute a number based on the values, such that two identical values yield the same hashcode() number.
As for the collection items that are hash based, the put(...) operation is fine with putting something over the original location. In short, if two objects yeild the same hashcode() and a positive equals(...) result, then operations will assume that they are for all practical purposes the same object. Thus, put may replace the old with the new, or do nothing, as the object is considered the same.
It may not store two copies in the same "place" as it makes no sense to store two copies at the same location. Thus, sets will only contain one copy, as will map keys; however, lists will possibly contain two copies, depending on how you added the second copy.
How are objects determined to be equal here ?
By using equals and Hashcode function of Object class.
Is a field by field comparison done ?
No, if you dont implement equals and hashcode, java will compare the references of your objects.
Can I rely on this to make sure only one copy of ANY type of object is maintained in the map as key ?
No.
Using a Set is a better approch than using Map because it removes duplicates by his own, but in this case it wont work either because Set determinates duplicates the same way like a Map does it with Keys.
If you will refer to same then it ll not throw an error because when HashMap get same key then it's related value will be overwrite.
If the same key is exist in HashMap then it will be overwritten.
if you want to check if the key or value is already exist or not then you can use:
containsKey() and containsValue().
ex :
hashMap.containsKey(0);
this will return true if the key named 0 is already exist otherwise false.
By getting hashcode value using hash(key.hashCode())
HashMap has an inner class Entry with attributes
final K key;
V value;
Entry<K ,V> next;
final int hash;
Hash value is used to calculate the index in the array for storing Entry object, there might be the scenario where 2 unequal object can have same equal hash value.
Entry objects are stored in linked list form, in case of collision, all entry object with same hash value are stored in same Linkedlist but equal method will test for true equality. In this way, HashMap ensure the uniqueness of keys.
Related
This question already has answers here:
How does a hash table work?
(17 answers)
Closed 1 year ago.
Everyone know hashmap contain unique key. But i want know how it maintain uniqueness?
Suppose we inserted 100 data into the hashmap, After that we insert duplicate key and value in the same hashmap. I know it will override the value. But i want to know it will check all the previous keys which is already stored inside the hashmap then will override new key.
Or if it check previous all keys everytime then it will take more time . So please tell me the correct answer.
You need to know the internal representation of the HashMap.
Actually it is an array of list of key value items, graphically we can represent it as you can see below:
bucket
position
--------
0 NULL
1 --> (K1, V1) --> (K47, V47)
2 NULL
3 NULL
...
54 --> (K89, V89)
...
When you perform a put operation put(key, value) first the code retrieve the hashCode of the key. This value with a module operation is needed to search on the list on the specific bucket.
Then it performs a search element by element on that list performing an equals method to check if the key is already present or not.
If the key is present it replace only the value, if not it will add a new couple key, value at the end of the list.
The pseudo code of the HashMap is similar to the following:
public HashMap {
private List<List<KeyValue<K, V>>> keyValuesBuckets;
public void put(K key, V value) {
int hash = key.hashCode();
int bucketPosition = hash % keyValuesBuckets.size();
for (KeyValue kv : keyValuesBuckets.get(bucketPosition)) {
if (kv.getKey().equals(key)) {
// Key is present change the value and exit
kv.setValue(value);
return;
}
}
// Key is not present
keyValuesBuckets.get(bucketPosition).add(new KeyValue(key, value));
}
Note that the code is not the real code. There is no check on the null values for example, but it gives you the idea on how the equality is checked using both hashCode and equals methods.
To have a more in depth details on how it works start from the definition of [hashCode][1]:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
and [equals][2]
In the javadoc you can find the contract that equals and hashCode must supply so that a class can be used as a key in a HashMap:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
[1]: https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
[2]: https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#equals-java.lang.Object-
Take a look at the following image taken from here (https://www.bigocheatsheet.com/)
Hash collections have a O(1) insertion complexity meaning that the operation does not depend on how many data points are already present in the data structure.
As the name suggests keys are hashed into buckets.
If you want to look up a values for key a the key is taken and a hash function employed mapping the value to a specific memory location. There is no need to look up any other value.
The exception are hash collisions which are bound to happen due to dimensionality reduction. After employing the has function, different keys max resolve to the same hash and thus land in the same bucket. In this case other means of checks have to take place (this is the reason why you always have to overwrite hashcode and equals)
Further information about this can be found here: https://stackoverflow.com/a/19691998/3244464
In Java, I understand if two keys maps to one value , linear chaining occurs due to collision.
For Example:
 Map myMap= new HashMap(); //Lets says both of them get mapped to same bucket-A and
myMap.put("John", "Sydney");//linear chaining has occured.
myMap.put("Mary","Mumbai"); //{key1=John}--->[val1=Sydney]--->[val2=Mumbai]
So when I do:
myMap.get("John"); // or myMap.get("Mary")
What does the JVM return since bucket-A contains two values?
Does it return the ref to "chain"? Does it return "Sydney"? Or does it return "Mumbai"?
Linear chaining happens when your keys have the same hashcode and not when two keys map to one value.
So when I do: myMap.get("John"); // or myMap.get("Mary")
map.get("John") gives you Sydney
map.get("Mary") gives you Mumbai
What does the JVM return since bucket-A contains two values?
If the same bucket contains two values, then the equals method of the key is used to determine the correct value to return.
It is worthwhile mentioning the worst-case scenario of storing (K,V) pairs all having the same hashCode for Key. Your hashmap degrades to a linked list in that scenario.
The hashCode of your method determines what 'bucket' (aka list, aka 'linear chain') it will be put in. The equals method determines which object will actually be picked from the 'bucket', in the case of collision. This is why its important to properly implement both methods on all object you intend to store in any kind of hash map.
Your keys are different.
First some terminology
key: the first parameter in the put
value: the second parameter in the put
entry: an Object that holds both the key & the value
When you put into a HashMap the map will call hashCode() on the key and work out which hash bucket the entry needs to go into. If there is something in this bucket already then a LinkedList is formed of entries in the bucket.
When you get from a HashMap the map will call hashCode() on the key and work out which hash bucket to get the entry from. If there is more than one entry in the bucket the the map will walk along the LinkedList until it finds an entry with a key that equals() the key supplied.
A map will always return the Object tied to that key, the value from the entry. Map performance degrades rapidly if hashCode() returns the same (or similar) values for different keys.
You need to use java generics, so your code should really read
Map<String, String> myMap = new HashMap<String, String>();
This will tell the map that you want it to store String keys and values.
From my understanding, the Map first resolves the correct bucket (identified by the hashcode of the key). If there's more than one key in the same bucket, the equals method is used to find the right value in the bucket.
Looking at your example what confuses you is that you think values are chained for a given key. In fact Map.Entry objects are chained for a given hashcode. The hashCode of the key gives you the bucked, then you look at the chained entries to find the one with the equal key.
While going through Kathy Sierra's book I stumbled across this code fragment:
m.put("k1", new Dog("aiko")); // add some key/value pairs
m.put("k2", Pets.DOG);
m.put(Pets.CAT, "CAT key");
Dog d1 = new Dog("clover");
m.put(d1, "Dog key");
m.put(new Cat(), "Cat key");
Maps are used to store stuff in the keys and values format. Would someone tell me what is actually stored in key when we enter "k1" or new Cat() as a key? Are references to these objects are stored or the value of hashcode? I am totally confused with this. Please advice.
And it would be appreciated if you could point me towards further reading material.
The map is an array of N buckets.
The put() method starts by calling hashCode() on your key. From this hash code, it uses a modulo to get the index of the bucket in the map.
Then, it iterates through the entries stored in the linked list associated with the found bucket, and compares each entry key with your key, using the equals() method.
If one entry has a key equal to your key, its value is replaced by the new value. Else, a new entry is created with the new key and the new value, and stored in the linked list associated with the bucket.
Since Cat instances and String instances are never equal, a value associated with a String key will never be modified by putting a value associated with a Cat key.
It will be defined by your object.
You have to create a hashCode() and a equals() method so it can be stored in your hashtable.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
See the javadoc at java.lang.Object http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/Object.html#hashCode()
or you can read this for an explanation
http://www.javaworld.com/javaworld/javaqa/2002-06/01-qa-0621-hashtable.html
I hope it helps
Storing the value to HashMap depends on the hashcode() and equals() method.Please find the more reference from here.
HashMap - hashcode() Example
More information of HashMap get() retrieval of values.Here
When a HashMap is used, the keys in it are unique. This uniqueness of the keys is checked in Java from the definition of the equals() and hashCode() methods that the class of the objects under consideration provides.
This is done by comparing using the equals() method first and if it returns equal then comparing using hashCode().Also, you must be knowing that each reference pointing to an object has a bit pattern which may be different for multiple references referring to the same object.
Hence, once the equals() test passes, the object won't be inserted into the map since the map should have unique keys. So, each hashCode value for objects which are keys in the map will form different buckets for a range of hashCode values and the object will be grouped accordingly.
EDIT to provide an example:
For example, let us consider that two objects have a String attribute with values "hello" and "hlleo" and suppose the hashCode() function is programmed such that the hash code of an object is the sum of the ASCII values of the characters in the String attribute and the equals() method returns true if the values of the String attribute are equal.
So, in the above case, equals() return false as the strings are not equal but the hashCode will be same. So the two objects will be placed in the same hash code bucket.
Hope that helps.
Apart from the fact that HashSet does not allow duplicate values, what is the difference between HashMap and HashSet?
I mean implementation wise? It's a little bit vague because both use hash tables to store values.
HashSet is a set, e.g. {1,2,3,4,5}
HashMap is a key -> value (key to value) map, e.g. {a -> 1, b -> 2, c -> 2, d -> 1}
Notice in my example above that in the HashMap there must not be duplicate keys, but it may have duplicate values.
In the HashSet, there must be no duplicate elements.
They are entirely different constructs. A HashMap is an implementation of Map. A Map maps keys to values. The key look up occurs using the hash.
On the other hand, a HashSet is an implementation of Set. A Set is designed to match the mathematical model of a set. A HashSet does use a HashMap to back its implementation, as you noted. However, it implements an entirely different interface.
When you are looking for what will be the best Collection for your purposes, this Tutorial is a good starting place. If you truly want to know what's going on, there's a book for that, too.
HashSet
HashSet class implements the Set interface
In HashSet, we store objects(elements or values)
e.g. If we have a HashSet of string elements then it could depict a
set of HashSet elements: {“Hello”, “Hi”, “Bye”, “Run”}
HashSet does not allow duplicate elements that mean you
can not store duplicate values in HashSet.
HashSet permits to have a single null value.
HashSet is not synchronized which means they are not suitable for thread-safe operations until unless synchronized explicitly.[similarity]
add contains next notes
HashSet O(1) O(1) O(h/n) h is the table
HashMap
HashMap class implements the Map interface
HashMap is
used for storing key & value pairs. In short, it maintains the
mapping of key & value (The HashMap class is roughly equivalent to
Hashtable, except that it is unsynchronized and permits nulls.) This
is how you could represent HashMap elements if it has integer key
and value of String type: e.g. {1->”Hello”, 2->”Hi”, 3->”Bye”,
4->”Run”}
HashMap does not allow duplicate keys however it allows having duplicate values.
HashMap permits single null key and any number of null values.
HashMap is not synchronized which means they are not suitable for thread-safe operations until unless synchronized explicitly.[similarity]
get containsKey next Notes
HashMap O(1) O(1) O(h/n) h is the table
Please refer this article to find more information.
It's really a shame that both their names start with Hash. That's the least important part of them. The important parts come after the Hash - the Set and Map, as others have pointed out. What they are, respectively, are a Set - an unordered collection - and a Map - a collection with keyed access. They happen to be implemented with hashes - that's where the names come from - but their essence is hidden behind that part of their names.
Don't be confused by their names; they are deeply different things.
The Hashset Internally implements HashMap. If you see the internal implementation the values inserted in HashSet are stored as keys in the HashMap and the value is a Dummy object of Object class.
Difference between HashMap vs HashSet is:-
HashMap contains key value pairs and each value can be accessed by key where as HashSet needs to be iterated everytime as there is no get method.
HashMap implements Map interface and allows one null value as a key and multiple null values as values, whereas HashSet implements Set interface, allows only one null value and no duplicated values.(Remeber one null key is allowed in HashMap key hence one null value in HashSet as HashSet implemements HashMap internally).
HashSet and HashMap do not maintain the order of insertion while iterating.
HashSet allows us to store objects in the set where as HashMap allows us to store objects on the basis of key and value. Every object or stored object will be having key.
As the names imply, a HashMap is an associative Map (mapping from a key to a value), a HashSet is just a Set.
Differences between HashSet and HashMap in Java
1) First and most significant difference between HashMap and HashSet is that HashMap is an implementation of Map interface while HashSet is an implementation of Set interface, which means HashMap is a key value based data-structure and HashSet guarantees uniqueness by not allowing duplicates.In reality HashSet is a wrapper around HashMap in Java, if you look at the code of add(E e) method of HashSet.java you will see following code :
public boolean add(E e)
{
return map.put(e, PRESENT)==null;
}
where its putting Object into map as key and value is an final object PRESENT which is dummy.
2) Second difference between HashMap and HashSet is that , we use add() method to put elements into Set but we use put() method to insert key and value into HashMap in Java.
3) HashSet allows only one null key, but HashMap can allow one null key + multiple null values.
That's all on difference between HashSet and HashMap in Java. In summary HashSet and HashMap are two different type of Collection one being Set and other being Map.
Differences between HashSet and HashMap in Java
HashSet internally uses HashMap to store objects.when add(String) method called it calls HahsMap put(key,value) method where key=String object & value=new Object(Dummy).so it maintain no duplicates because keys are nothing but Value Object.
the Objects which are stored as key in Hashset/HashMap should override hashcode & equals contract.
Keys which are used to access/store value objects in HashMap should declared as Final because when it is modified Value object can't be located & returns null.
A HashMap is to add, get, remove, ... objects indexed by a custom key of any type.
A HashSet is to add elements, remove elements and check if elements are present by comparing their hashes.
So a HashMap contains the elements and a HashSet remembers their hashes.
A HashSet uses a HashMap internally to store its entries. Each entry in the internal HashMap is keyed by a single Object, so all entries hash into the same bucket. I don't recall what the internal HashMap uses to store its values, but it doesn't really matter since that internal container will never contain duplicate values.
EDIT: To address Matthew's comment, he's right; I had it backwards. The internal HashMap is keyed with the Objects that make up the Set elements. The values of the HashMap are an Object that's just simply stored in the HashMap buckets.
Differences:
with respect to heirarchy:
HashSet implements Set.
HashMap implements Map and stores a mapping of keys and values.
A use of HashSet and HashMap with respect to database would help you understand the significance of each.
HashSet: is generally used for storing unique collection objects.
E.g: It might be used as implementation class for storing many-to-one relation ship between
class Item and Class Bid where (Item has many Bids)
HashMap: is used to map a key to value.the value may be null or any Object /list of Object (which is object in itself).
A HashSet is implemented in terms of a HashMap. It's a mapping between the key and a PRESENT object.
HashMap is a Map implementation, allowing duplicate values but not duplicate keys.. For adding an object a Key/Value pair is required. Null Keys and Null values are allowed. eg:
{The->3,world->5,is->2,nice->4}
HashSet is a Set implementation,which does not allow duplicates.If you tried to add a duplicate object, a call to public boolean add(Object o) method, then the set remains unchanged and returns false. eg:
[The,world,is,nice]
Basically in HashMap, user has to provide both Key and Value, whereas in HashSet you provide only Value, the Key is derived automatically from Value by using hash function. So after having both Key and Value, HashSet can be stored as HashMap internally.
HashSet and HashMap both store pairs , the difference lies that in HashMap you can specify a key while in HashSet the key comes from object's hash code
HashMaps allow one null key and null values. They are not synchronized, which increases efficiency. If it is required, you can make them synchronized using Collections.SynchronizedMap()
Hashtables don't allow null keys and are synchronized.
The main difference between them you can find as follows:
HashSet
It does not allow duplicate keys.
Even it is not synchronized, so this will have better performance.
It allows a null key.
HashSet can be used when you want to maintain a unique list.
HashSet implements Set interface and it is backed by the hash table(actually HashMap instance).
HashSet stores objects.
HashSet doesn’t allow duplicate elements but null values are allowed.
This interface doesn’t guarantee that order will remain constant over time.
HashMap
It allows duplicate keys.
It is not synchronized, so this will have better performance.
HashMap does not maintain insertion order.
The order is defined by the Hash function.
It is not Thread Safe
It allows null for both key and value.
It allows one null key and as many null values as you like.
HashMap is a Hash table-based implementation of the Map interface.
HashMap store object as key and value pair.
HashMap does not allow duplicate keys but null keys and values are allowed.
Ordering of the element is not guaranteed overtime.
EDIT - this answer isn't correct. I'm leaving it here in case other people have a similar idea. b.roth and justkt have the correct answers above.
--- original ---
you pretty much answered your own question - hashset doesn't allow duplicate values. it would be trivial to build a hashset using a backing hashmap (and just a check to see if the value already exists). i guess the various java implementations either do that, or implement some custom code to do it more efficiently.
HashMap is a implementation of Map interface
HashSet is an implementation of Set Interface
HashMap Stores data in form of key value pair
HashSet Store only objects
Put method is used to add element in map
Add method is used to add element is Set
In hash map hashcode value is calculated using key object
Here member object is used for calculating hashcode value which can be same for two objects so equal () method is used to check for equality if it returns false that means two objects are different.
HashMap is faster than hashset because unique key is used to access object
HashSet is slower than Hashmap
I have been trying to understand the internal implementation of java.util.HashMap and java.util.HashSet.
Following are the doubts popping in my mind for a while:
Whats is the importance of the #Override public int hashcode() in a HashMap/HashSet? Where is this hash code used internally?
I have generally seen the key of the HashMap be a String like myMap<String,Object>. Can I map the values against someObject (instead of String) like myMap<someObject, Object>? What all contracts do I need to obey for this happen successfully?
Thanks in advance !
EDIT:
Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? And when we do myMap.get(someKey); java is internally calling someKey.hashCode() to get the number in the Hash table to be looked for the resulting value?
Answer: Yes.
EDIT 2:
In a java.util.HashSet, from where is the key generated for the Hash table? Is it from the object that we are adding eg. mySet.add(myObject); then myObject.hashCode() is going to decide where this is placed in the hash table? (as we don't give keys in a HashSet).
Answer: The object added becomes the key. The value is dummy!
The answer to question 2 is easy - yes you can use any Object you like. Maps that have String type keys are widely used because they are typical data structures for naming services. But in general, you can map any two types like Map<Car,Vendor> or Map<Student,Course>.
For the hashcode() method it's like answered before - whenever you override equals(), then you have to override hashcode() to obey the contract. On the other hand, if you're happy with the standard implementation of equals(), then you shouldn't touch hashcode() (because that could break the contract and result in identical hashcodes for unequal objects).
Practical sidenote: eclipse (and probably other IDEs as well) can auto generate a pair of equals() and hashcode() implementation for your class, just based on the class members.
Edit
For your additional question: yes, exactly. Look at the source code for HashMap.get(Object key); it calls key.hashcode to calculate the position (bin) in the internal hashtable and returns the value at that position (if there is one).
But be careful with 'handmade' hashcode/equals methods - if you use an object as a key, make sure that the hashcode doesn't change afterwards, otherwise you won't find the mapped values anymore. In other words, the fields you use to calculate equals and hashcode should be final (or 'unchangeable' after creation of the object).
Assume, we have a contact with String name and String phonenumber and we use both fields to calculate equals() and hashcode(). Now we create "John Doe" with his mobile phone number and map him to his favorite Donut shop. hashcode() is used to calculate the index (bin) in the hash table and that's where the donut shop is stored.
Now we learn that he has a new phone number and we change the phone number field of the John Doe object. This results in a new hashcode. And this hashcode resolves to a new hash table index - which usually isn't the position where John Does' favorite Donut shop was stored.
The problem is clear: In this case we wanted to map "John Doe" to the Donut shop, and not "John Doe with a specific phone number". So, we have to be careful with autogenerated equals/hashcode to make sure they're what we really want, because they might use unwanted fields, introducing trouble with HashMaps and HashSets.
Edit 2
If you add an object to a HashSet, the Object is the key for the internal hash table, the value is set but unused (just a static instance of Object). Here's the implementation from the openjdk 6 (b17):
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
private transient HashMap<E,Object> map;
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Hashing containers like HashMap and HashSet provide fast access to elements stored in them by splitting their contents into "buckets".
For example the list of numbers: 1, 2, 3, 4, 5, 6, 7, 8 stored in a List would look (conceptually) in memory something like: [1, 2, 3, 4, 5, 6, 7, 8].
Storing the same set of numbers in a Set would look more like this: [1, 2] [3, 4] [5, 6] [7, 8]. In this example the list has been split into 4 buckets.
Now imagine you want to find the value 6 out of both the List and the Set. With a list you would have to start at the beginning of the list and check each value until you get to 6, this will take 6 steps. With a set you find the correct bucket, the check each of the items in that bucket (only 2 in our example) making this a 3 step process. The value of this approach increases dramatically the more data you have.
But wait how did we know which bucket to look in? That is where the hashCode method comes in. To determine the bucket in which to look for an item Java hashing containers call hashCode then apply some function to the result. This function tries to balance the numbers of buckets and the number of items for the fastest lookup possible.
During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list. That is why when you override hashCode you must also override equals. So if an object of any type has both an equals and a hashCode method it can be used as a key in a Map or an entry in a Set. There is a contract that must be followed to implement these methods correctly the canonical text on this is from Josh Bloch's great book Effective Java: Item 8: Always override hashCode when you override equals
Whats is the importance of the #Override public int hashcode() in a HashMap/HashSet?
This allows the instance of the map to produce a useful hash code depending on the content of the map. Two maps with the same content will produce the same hash code. If the content is different, the hash code will be different.
Where is this hash code used internally?
Never. This code only exists so you can use a map as a key in another map.
Can I map the values against someObject (instead of String) like myMap<someObject, Object>?
Yes but someObject must be a class, not an object (your name suggests that you want to pass in object; it should be SomeObject to make it clear you're referring to the type).
What all contracts do I need to obey for this happen successfully?
The class must implement hashCode() and equals().
[EDIT]
Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table?
Yes.
Yes. You can use any object as the key in a HashMap. In order to do so following are the steps you have to follow.
Override equals.
Override hashCode.
The contracts for both the methods are very clearly mentioned in documentation of java.lang.Object. http://java.sun.com/javase/6/docs/api/java/lang/Object.html
And yes hashCode() method is used internally by HashMap and hence returning proper value is important for performance.
Here is the hashCode() method from HashMap
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
It is clear from the above code that hashCode of each key is not just used for hashCode() of the map, but also for finding the bucket to place the key,value pair. That is why hashCode() is related to performance of the HashMap
Any Object in Java must have a hashCode() method; HashMap and HashSet are no execeptions. This hash code is used if you insert the hash map/set into another hash map/set.
Any class type can be used as the key in a HashMap/HashSet. This requires that the hashCode() method returns equal values for equal objects, and that the equals() method is implemented according to contract (reflexive, transitive, symmetric). The default implementations from Object already obey these contracts, but you may want to override them if you want value equality instead of reference equality.
There is a intricate relationship between equals(), hashcode() and hash tables in general in Java (and .NET too, for that matter). To quote from the documentation:
public int hashCode()
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
The line
#Overrides public int hashCode()
just tells that the hashCode() method is overridden. This ia usually a sign that it's safe to use the type as key in a HashMap.
And yes, you can aesily use any object which obeys the contract for equals() and hashCode() in a HashMap as key.
In answer to question 2, though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap. Or at the least if your "hashCode", and "equals" implementation are dependent on some of the attributes of your class then you should take care that you don't provide methods to alter these attributes.
Aaron Digulla is absolutely correct. An interesting additional note that people don't seem to realise is that the key object's hashCode() method is not used verbatim. It is, in fact, rehashed by the HashMap i.e. it calls hash(someKey.hashCode)), where hash() is an internal hashing method.
To see this, have a look at the source: http://kickjava.com/src/java/util/HashMap.java.htm
The reason for this is that some people implement hashCode() poorly and the hash() function gives a better hash distribution. It's basically done for performance reasons.
HashCode method for collection classes like HashSet, HashTable, HashMap etc – Hash code returns integer number for the object that is being supported for the purpose of hashing. It is implemented by converting internal address of the object into an integer. Hash code method should be overridden in every class that overrides equals method.
Three general contact for HashCode method
For two equal objects acc. to equal method, then calling HashCode for both object it should produce same integer value.
If it is being called several times for a single object, then it should return constant integer value.
For two unequal objects acc. to equal method, then calling HashCode method for both object, it is not mandatory that it should produce distinct value.