Object's **hashCode** function - how does jdk uses it? - java

I know that whenever you override equals method you should also override hashCode method .
But what Im not sure is, how does JDK uses it?
For example HashSet/HashMap are set/map implementation using hash table, So is correct to say that this table use the object's hash_code as key for their hash_function?

So is correct to say that this table use the object's hash_code as key for their hash_function?
Almost. hashCode() is actually the hash function. So HashMap whenever it tries to find the key or put the key, it calls the key hashCode() method and uses it (with some bit mask)to find proper element in the hash table.
Also note it's not used directly by JVM but justby some classes.

The answer to this is readily found in the documentation:
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
So yes, HashMap uses hashCode.
You can also see the source code, as the JDK is open source. (You'll find it in src.jar in your JDK installation.)

Related

Hashtable implementation in C and Java

In Java, HashMap and Hashtable, both implement map interface and store key/value pairs using hash function and Array/LinkedList implementation.  In C also, Hash table can be implemented using Array/LinkedList functionality but there is no concept of key/value pair like map. 
So my question is, whether Hash table implementation in C, similar to Hashtable in Java? or its more closer to HashSet in java (except unique elements only condition)?
Both semantics (Hashtable and HashSet) can be implemented in C, but neither comes in the Standard C library. You can find many different has table implementation on the Internet, each with its own advantages and drawbacks. Implementing this yourself may prove difficult as there are many traps and pitfalls.
I previously used BSD's Red-Black trees implementation. It's relatively easy to use when you start to understand how it works.
The really great thing about is that you only have to copy one header file and then just include that where it's needed, no need to link to libraries.
It has similar functionallity to HashSets, you can find by keys with the RB_FIND() macro, enumerate elements with RB_FOREACH(), insert new ones with RB_INSERT() and so on.
You can find more info in it's MAN page or the source code itself.
The difference (in Java) between a HashTable and a HashSet is in how the key is selected to calculate its hash value. In the HashSet the key is the instance stored itself, and the hashCode() method is applied to the complete instance, (Object provides both, hashCode() and equals(Object) methods. In the case of an external key, the equals(Object) and hashCode() are selected now from the separate key instance, instead of from the stored data value. For that reason, HashTable is normally a subclass of HashSet (and every Java table is actually derived from its corresponding *Set counterpart), by publishing an internal implementation of the Map.Entry<K,V> interface)
Implementing a hash table in C is not too difficult, but you need to understand first what's the key (if external) and the differences between the Key and the Value, the differences between calculating a hashCode() and comparing for equality, how are you going to distinguish the key from the value, and how do you manage internally keys and hashes, in order to manage collisions.
I recently started an implementation of a hash table in C (not yet finished) and my hash_table constructor need to store in the instance record a pointer to an equals comparison routine (to check for equality, the same as Java requires an Object's compareTo() method, this allows you to detect collisions (when you have two entries with the same hash but they compare as different) and the hash function used on keys to get the hash. In my implementation probably I will store the full hash value returned by the hash function (before fitting it on the table's size), so I can grow the table in order to simplify the placement of the elements in the new hash table once growed, without having to recalculate all hashes again.
I don't know if this hints can be of help to you, but it's my two cents. :)
My implementation uses a prime numbers table to select the capacity (aprox, doubling the size on each entry) to redimension the table when the number of collisions begin to be unacceptable (whatever this means to you, I have no clear idea yet, this is a time consuming operation, but happens scarcely, so it must be carefully specified, and it is something that Java's HashTable does) But if you don't plan to grow your hash table, or to do it manually, the implementation is easier (just add the number of entries in the constructor parameter list, and create some grow_hash_table(new_cap) method.)

Can hash set internally use some other collection instead of hash map

Why is it like Hash set internally only used Hash map ? Is it something related with performance?
A HashSet can be thought of as a special case of a HashMap in which you don't actually care about the type of values, only whether a value is associated with a particular key.
So, it made sense to just implement one on top of the other.
A HashMap is a good choice if your key type has a good hash function.
Similarly, TreeSet is implemented using TreeMap, which can be effective if your keys are ordered/comparable.
You can implement the Set interface in many other ways, but these are the typical ones.
Nah, it's just convenient, and not actually any less efficient on most VMs. So Java -- at least in some implementations -- doesn't bother doing anything fancier.
Since HashMap and HashSet are basicly using the same algorithm, it is simpler not to implement it twice, and therefore it isn't surprising that a few, if not all, JVM implementation do it.
It is also doable with LinkedHashMap/Set, TreeMap/Set and others.
More generally, it is possible to create any Set implementation from any Map implementation by choosing the value as being the same as the key, or be a constant.
The loss in memory storage is negligible.
By the way, the JDK provides a Collections.newSetFromMap method which does exactly this: it converts a Map<E,Boolean> into a Set<E> by mapping all keys to Boolean.TRUE. When there is no corresponding Set implementation to a Map one, this utility method is very useful, for example for ConcurrentHashMap.
The opposite, creating a Map implementation from a Set one, is also doable, though it's slightly more difficult.

Mechanism of Java HashMap

Reading Algorithms book, need to grasp the concept of a hashtable. They write about hashing with separate chaining and hashing with linear probing. I guess Java's HashMap is a hashtable, therefore I'm wondering what mechanism does HashMap use (chaining or probing)?
I need to implement simplest HashMap with get, put, remove. Could you point me at the good material to read that?
When the unique keys used for the Map are custom objects, we need to implement hashCode() function inside the corresponding type. Did I get it right or when is hashCode() needed?
Unfortunately the book does not answer all questions, even though I understand that for many of you these questions are low level.
1: before java 1.8 HashMap uses separate chaining with linked lists to resolve collisions. There is a linked list for every bucket.
2: hmmmmmm maybe this one?
3: yes, you are right, hashCode() is used to calculate the hash of the Key. Then the hash code will be transformed to a number between 0 and number of buckets - 1.
This is a Most Confusing Question for many of us in Interviews.But its not that complex.
We know
HashMap stores key-value pair in Map.Entry (we all know)
HashMap works on hashing algorithm and uses hashCode() and equals() method in put() and get() methods. (even we know this)
When we call put method by passing key-value pair, HashMap uses Key **hashCode()** with hashing to **find out the index** to store the key-value pair. (this is important)
The Entry is **stored in the LinkedList**, so if there are already existing entry, it uses **equals() method to check if the passed key already exists** (even this is important)
if yes it overwrites the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index in the array and then use equals() method to find the correct Entry and return it’s value. (now this is obvious)
THIS IMAGE WILL HELP YOU UNDERSTAND:
HashMap works on the principle of Hashing. Its working is two fold.
First, it maintains a Linked List to store objects of similar values, that means ones which are "equal".
Second it has a collection of these linked list whose headers are present in a array.
For more information refer blog Java Collection Internal Working

Is it possible to implement a MyHashMap backed by a given HashSet implementation?

As we all known, in Sun(Oracle) JDK, HashSet is implemented backed by a HashMap, to reuse the complicated algorithm and data structure.
But, is it possible to implement a MyHashMap using java.util.HashSet as its back?
If possible, how? If not, why?
Please note that this question is only a discussion of coding skill, not applicable for production scenarios.
Trove bases it's Map on it's Set implementation. However, it has one critical method which is missing from Set which is a get() method.
Without a get(Element) method, HashSet you cannot perform a lookup which is a key function of a Map. (pardon the pun) The only option Set has is a contains which could be hacked to perform a get() but it would not be ideal.
You can have;
a Set where the Entry is a key and a value.
you define entries as being equal when the keys are the same.
you hack the equals() method so when there is a match, that on a "put" the value portion of an entry is updated, and on a "get" the value portion is copied.
Set could have been designed to be extended as Map, but it wasn't and it wouldn't be a good idea to use HashSet or the existing Set implementations to create a Map.

What does Java use to determine if a key is a duplicate in a Map?

My first instinct is to say each key is an object, and has a hash code, which is what is used to determine if a duplicate is being inserted. However, I can't find anything to back that up for sure. Can someone provide a link that says that, or provide the real answer here? Thanks!
The Map interface specifies that if two keys are null they are duplicates, otherwise if there's a key k such that key.equals(k), then there is a duplicate. See the contains or get method here:
http://java.sun.com/javase/6/docs/api/java/util/Map.html#containsKey(java.lang.Object)
However, it's up to the Map implementation how to go about performing that check, and a HashMap will use a hash code to narrow the potential keys it will check with the equals method. So in practice, for a typical hash based map, to check for duplicates a map will take the hashcode (probably mod some size), and use equals to compare against any keys whose hashcode mod the same size gives the same remainder.
Read the question wrong, but the person's answer above is correct and my link provides the answer as to how it is determined (the equals method). Look at the contains and get methods in the link.
How a map inserts:
There cannot be a duplicate key in a Map. It will replace the old value with the new value if you find a duplicate key. Here is a link to the Map Interface. In addition, if you look at the put(K key, V value) method, it also explains how a map works. Hope that helps.
It uses the equals() method on the key. The hashCode() method just helps efficiently store the keys for the map.
I'm assuming you're referring to java.util.Map, which is an interface provided in the standard Java libraries. The method of determining if a key is duplicate is left up to the specific implementation. A java.util.HashMap uses equals and hashCode, for example. You can write your own implementation of Map that uses something totally different.
Careful on an edge case here. Null keys are not always duplicates. In fact, null keys turn out to be out to cause much frustration inbetween Map implementations (see my post on Consistency).
For example null keys are OK in HashMaps, but not in a TreeMap that uses natural ordering, or ConccurentHashMap where null keys are forbidden. The problem here is that they throw uncaught exceptions on many of their methods if you use a null key and that introduces scary run-time bugs when you switch implementations during refactoring.

Categories

Resources