Hashtable implementation in C and Java - java

In Java, HashMap and Hashtable, both implement map interface and store key/value pairs using hash function and Array/LinkedList implementation.  In C also, Hash table can be implemented using Array/LinkedList functionality but there is no concept of key/value pair like map. 
So my question is, whether Hash table implementation in C, similar to Hashtable in Java? or its more closer to HashSet in java (except unique elements only condition)?

Both semantics (Hashtable and HashSet) can be implemented in C, but neither comes in the Standard C library. You can find many different has table implementation on the Internet, each with its own advantages and drawbacks. Implementing this yourself may prove difficult as there are many traps and pitfalls.

I previously used BSD's Red-Black trees implementation. It's relatively easy to use when you start to understand how it works.
The really great thing about is that you only have to copy one header file and then just include that where it's needed, no need to link to libraries.
It has similar functionallity to HashSets, you can find by keys with the RB_FIND() macro, enumerate elements with RB_FOREACH(), insert new ones with RB_INSERT() and so on.
You can find more info in it's MAN page or the source code itself.

The difference (in Java) between a HashTable and a HashSet is in how the key is selected to calculate its hash value. In the HashSet the key is the instance stored itself, and the hashCode() method is applied to the complete instance, (Object provides both, hashCode() and equals(Object) methods. In the case of an external key, the equals(Object) and hashCode() are selected now from the separate key instance, instead of from the stored data value. For that reason, HashTable is normally a subclass of HashSet (and every Java table is actually derived from its corresponding *Set counterpart), by publishing an internal implementation of the Map.Entry<K,V> interface)
Implementing a hash table in C is not too difficult, but you need to understand first what's the key (if external) and the differences between the Key and the Value, the differences between calculating a hashCode() and comparing for equality, how are you going to distinguish the key from the value, and how do you manage internally keys and hashes, in order to manage collisions.
I recently started an implementation of a hash table in C (not yet finished) and my hash_table constructor need to store in the instance record a pointer to an equals comparison routine (to check for equality, the same as Java requires an Object's compareTo() method, this allows you to detect collisions (when you have two entries with the same hash but they compare as different) and the hash function used on keys to get the hash. In my implementation probably I will store the full hash value returned by the hash function (before fitting it on the table's size), so I can grow the table in order to simplify the placement of the elements in the new hash table once growed, without having to recalculate all hashes again.
I don't know if this hints can be of help to you, but it's my two cents. :)
My implementation uses a prime numbers table to select the capacity (aprox, doubling the size on each entry) to redimension the table when the number of collisions begin to be unacceptable (whatever this means to you, I have no clear idea yet, this is a time consuming operation, but happens scarcely, so it must be carefully specified, and it is something that Java's HashTable does) But if you don't plan to grow your hash table, or to do it manually, the implementation is easier (just add the number of entries in the constructor parameter list, and create some grow_hash_table(new_cap) method.)

Related

Object's **hashCode** function - how does jdk uses it?

I know that whenever you override equals method you should also override hashCode method .
But what Im not sure is, how does JDK uses it?
For example HashSet/HashMap are set/map implementation using hash table, So is correct to say that this table use the object's hash_code as key for their hash_function?
So is correct to say that this table use the object's hash_code as key for their hash_function?
Almost. hashCode() is actually the hash function. So HashMap whenever it tries to find the key or put the key, it calls the key hashCode() method and uses it (with some bit mask)to find proper element in the hash table.
Also note it's not used directly by JVM but justby some classes.
The answer to this is readily found in the documentation:
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
So yes, HashMap uses hashCode.
You can also see the source code, as the JDK is open source. (You'll find it in src.jar in your JDK installation.)

Mechanism of Java HashMap

Reading Algorithms book, need to grasp the concept of a hashtable. They write about hashing with separate chaining and hashing with linear probing. I guess Java's HashMap is a hashtable, therefore I'm wondering what mechanism does HashMap use (chaining or probing)?
I need to implement simplest HashMap with get, put, remove. Could you point me at the good material to read that?
When the unique keys used for the Map are custom objects, we need to implement hashCode() function inside the corresponding type. Did I get it right or when is hashCode() needed?
Unfortunately the book does not answer all questions, even though I understand that for many of you these questions are low level.
1: before java 1.8 HashMap uses separate chaining with linked lists to resolve collisions. There is a linked list for every bucket.
2: hmmmmmm maybe this one?
3: yes, you are right, hashCode() is used to calculate the hash of the Key. Then the hash code will be transformed to a number between 0 and number of buckets - 1.
This is a Most Confusing Question for many of us in Interviews.But its not that complex.
We know
HashMap stores key-value pair in Map.Entry (we all know)
HashMap works on hashing algorithm and uses hashCode() and equals() method in put() and get() methods. (even we know this)
When we call put method by passing key-value pair, HashMap uses Key **hashCode()** with hashing to **find out the index** to store the key-value pair. (this is important)
The Entry is **stored in the LinkedList**, so if there are already existing entry, it uses **equals() method to check if the passed key already exists** (even this is important)
if yes it overwrites the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index in the array and then use equals() method to find the correct Entry and return it’s value. (now this is obvious)
THIS IMAGE WILL HELP YOU UNDERSTAND:
HashMap works on the principle of Hashing. Its working is two fold.
First, it maintains a Linked List to store objects of similar values, that means ones which are "equal".
Second it has a collection of these linked list whose headers are present in a array.
For more information refer blog Java Collection Internal Working

Map and HashCode

What is the reason to make unique hashCode for hash-based collection to work faster?And also what is with not making hashCode mutable?
I read it here but didn't understand, so I read on some other resources and ended up with this question.
Thanks.
Hashcodes don't have to be unique, but they work better if distinct objects have distinct hashcodes.
A common use for hashcodes is for storing and looking objects in data structures like HashMap. These collections store objects in "buckets" and the hashcode of the object being stored is used to determine which bucket it's stored in. This speeds up retrieval. When looking for an object, instead of having to look through all of the objects, the HashMap uses the hashcode to determine which bucket to look in, and it looks only in that bucket.
You asked about mutability. I think what you're asking about is the requirement that an object stored in a HashMap not be mutated while it's in the map, or preferably that the object be immutable. The reason is that, in general, mutating an object will change its hashcode. If an object were stored in a HashMap, its hashcode would be used to determine which bucket it gets stored in. If that object is mutated, its hashcode would change. If the object were looked up at this point, a different hashcode would result. This might point HashMap to the wrong bucket, and as a result the object might not be found even though it was previously stored in that HashMap.
Hash codes are not required to be unique, they just have a very low likelihood of collisions.
As to hash codes being immutable, that is required only if an object is going to be used as a key in a HashMap. The hash code tells the HashMap where to do its initial probe into the bucket array. If a key's hash code were to change, then the map would no longer look in the correct bucket and would be unable to find the entry.
hashcode() is basically a function that converts an object into a number. In the case of hash based collections, this number is used to help lookup the object. If this number changes, it means the hash based collection may be storing the object incorrectly, and can no longer retrieve it.
Uniqueness of hash values allows a more even distribution of objects within the internals of the collection, which improves the performance. If everything hashed to the same value (worst case), performance may degrade.
The wikipedia article on hash tables provides a good read that may help explain some of this as well.
It has to do with the way items are stored in a hash table. A hash table will use the element's hash code to store and retrieve it. It's somewhat complicated to fully explain here but you can learn about it by reading this section: http://www.brpreiss.com/books/opus5/html/page206.html#SECTION009100000000000000000
Why searching by hashing is faster?
lets say you have some unique objects as values and you have a String as their keys. Each keys should be unique so that when the key is searched, you find the relevant object it holds as its value.
now lets say you have 1000 such key value pairs, you want to search for a particular key and retrieve its value. if you don't have hashing, you would then need to compare your key with all the entries in your table and look for the key.
But with hashing, you hash your key and put the corresponding object in a certain bucket on insertion. now when you want to search for a particular key, the key you want to search will be hashed and its hash value will be determined. And you can go to that hash bucket straight, and pick your object without having to search through the entire key entries.
hashCode is a tricky method. It is supposed to provide a shorthand to equality (which is what maps and sets care about). If many objects in your map share the same hashcode, the map will have to check equals frequently - which is generally much more expensive.
Check the javadoc for equals - that method is very tricky to get right even for immutable objects, and using a mutable object as a map key is just asking for trouble (since the object is stored for its "old" hashcode)
As long, as you are working with collections that you are retrieving elements from by index (0,1,2... collection.size()-1) than you don't need hashcode. However, if we are talking about associative collections like maps, or simply asking collection does it contain some elements than we are talkig about expensive operations.
Hashcode is like digest of provided object. It is robust and unique. Hashcode is generally used for binary comparisions. It is not that expensive to compare on binary level hashcode of every collection's member, as comparing every object by it's properties (more than 1 operation for sure). Hashcode needs to be like a fingerprint - one entity - one, and unmutable hashcode.
The basic idea of hashing is that if one is looking in a collection for an object whose hash code differs from that of 99% of the objects in that collection, one only need examine the 1% of objects whose hash code matches. If the hashcode differs from that of 99.9% of the objects in the collection, one only need examine 0.1% of the objects. In many cases, even if a collection has a million objects, a typical object's hash code will only match a very tiny fraction of them (in many cases, less than a dozen). Thus, a single hash computation may eliminate the need for nearly a million comparisons.
Note that it's not necessary for hash values to be absolutely unique, but performance may be very bad if too many instances share the same hash code. Note that what's important for performance is not the total number of distinct hash values, but rather the extent to which they're "clumped". Searching for an object which is in a collection of a million things in which half the items all have one hash value and each remaining items each have a different value will require examining on average about 250,000 items. By contrast, if there were 100,000 different hash values, each returned by ten items, searching for an object would require examining about five.
You can define a customized class extending from HashMap. Then you override the methods (get, put, remove, containsKey, containsValue) by comparing keys and values only with equals method. Then you add some constructors. Overriding correctly the hashcode method is very hard.
I hope I have helped everybody who wants to use easily a hashmap.

100% Accurate key,value HashMap

According to the webpage http://www.javamex.com/tutorials/collections/hash_codes_advanced.shtml
hash codes do not uniquely identify an object. They simply narrow down the choice of matching items, but it is expected that in normal use, there is a good chance that several objects will share the same hash code. When looking for a key in a map or set, the fields of the actual key object must therefore be compared to confirm a match."
First does this mean that keys used in a has map may point to more then one value as well? I assume that it does.
If this is the case. How can I create a "Always Accurate" hashmap or similar key,value object?
My key needs to be String and my value needs to be String as well.. I need around 4,000 to 10,000 key value pairs..
A standard hashmap will guarantee unique keys. A hashcode is not equivalent to a key. It is just a means of quickly reducing the set of possible values down to objects (strings in your case) that have a specific hashcode.
First, let it be noted: Java's HashMaps work. Assuming the hash function is implemented correctly, you'll always get the same value for the same key.
Now, in a hash map, the key's hash code determines the bucket in which the value will be placed (read about hash tables if you're not familiar with the term). The performance of the map depends on how well the hash codes are distributed, and how balanced is the number of values in every bucket. Since you're using String, rest assure. HashMap will be "Always Accurate".

Why HashSet internally implemented as HashMap [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does HashSet implementation in Sun Java use HashMap as its backing?
I know what a hashset and hashmap is - pretty well versed with them.
There is 1 thing which really puzzled me.
Example:
Set <String> testing= new HashSet <String>();
Now if you debug it using eclipse right after the above statements, under debugger variables tab, you will noticed that the set 'testing' internally is implemented as a hashmap.
Why does it need a hashmap since there is no key,value pair involved in sets collection
It's an implementation detail. The HashMap is actually used as the backing store for the HashSet. From the docs:
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
(emphasis mine)
The answer is right in the API docs
"This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important."
So you don't even need the debugger to know this.
In answer to your question: it is an implementation detail. It doesn't need to use a HashMap, but it is probably just good code re-use. If you think about it, in this case the only difference is that a Set has different semantics from a Map. Namely, maps have a get(key) method, and Sets do not. Sets do not allow duplicates, Maps allow duplicate values, but they must be under different keys.
It is probably really easy to use a HashMap as the backing of a HashSet, because all you would have to do would be to use hashCode (defined on all objects) on the value you are putting in the Set to determine if a dupe, i.e., it is probably just doing something like
backingHashMap.put(toInsert.hashCode(), toInsert);
to insert items into the Set.
In most cases the Set is implemented as wrapper for the keySet() of a Map. This avoids duplicate implementations. If you look at the source you will see how it does this.
You might find the method Collections.newSetFromMap() which can be used to wrap ConcurrentHashMap for example.
The very first sentence of the class's Javadoc states that it is backed by a HashMap:
This class implements the Set interface, backed by a hash table (actually a HashMap instance).
If you'll look at the source code of HashSet you'll see that what it stores in the map is as the key is the entry you are using, and the value is a mere marker Object (named PRESENT).
Why is it backed by a HashMap? Because this is the simplest way to store a set of items in a (conceptual) hashtable and there is no need for HashSet to re-invent an implementation of a hashtable data structure.
It's just a matter of convenience that the standard Java class library implements HashSet using a HashMap -- they only need to implement one data structure and then HashSet stores its data in a HashMap with the actual set objects as the key and a dummy value (typically Boolean.TRUE) as the value.
HashMap has already all the functionality that HashSet requires. There would be no sense to duplicate the same algorithms.
it allows you to easily and quickly determine whether an object is already in the set or not.

Categories

Resources