Ive always been certain that a 'bucket' in a java hash map contains either a linked list or a Tree of some kind, indeed you can read in many places on the web how the bucket holds this list then iterates over the entries using the equals function to find entries that are stored in the same bucket (ie have the same key), bearing this in mind, can someone explain why the following, trivial code doesnt work as expected :-
private class MyString {
String internalString;
MyString(String string) {
internalString = string;
}
#Override
public int hashCode() {
return internalString.length(); // rubbish hashcode but perfectly legal
}
}
...
Map<MyString, String> map = new HashMap<>();
map.put(new MyString("key1"), "val1");
map.put(new MyString("key2"), "val2");
String retVal = map.get(new MyString("key1"));
System.out.println("Val returned = "+retVal);
In this example I would have expected the two map entries to be in the list in the (same) bucket and for retVal to equal "val1", however it equals null?
A quick debug shows why, the bucket does not contain a list at all just a single entry.....
I thought i was going mad until I read this on the baeldung website (https://www.baeldung.com/java-map-duplicate-keys)
...However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
What is going on, does a bucket in a hash map contain a list or not ?
Does a java hashmap bucket really contain a list?
It depends.
For older implementations (Java 7 and earlier), yes it really does contain list. (It is a singly linked list of an internal Node type.)
For newer implementations (Java 8 and later), it can contain either a list or a binary tree, depending on how many entries hash to the particular bucket. If the number is small, a singly linked list is used. If the number is larger than a hard-coded threshold (8 in Java 8), then the HashMap converts the list to a balanced binary tree ... so that bucket searches are O(logN) instead of O(N). This mitigates the effects of a hash code function that generates a lot of collisions (or one where this can be made to occur by choosing keys in a particular way.)
If you want to learn more about how HashMap works, read the source code. (It is well commented, and the comments explain the rationale as well as the nitty-gritty how does it work stuff. It is worse the time ... if you are interested in this kind of thing.)
However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
That is something else entirely. That is about multiple values for a key rather than multiple keys in a bucket.
The article is correct. And this doesn't contradict my "a bucket contains a list or tree" statement.
Put simply, a HashMap bucket can contain multiple key / value pairs, where the keys are all different.
The only point on which I would fault the quoted text is that it seems to imply that it is implementations of Map that have the one value per key restriction. In reality, it is the Map API itself that imposes this restriction ... unless you use (say) a List as the map's value type.
Related
My question is about the bucket array of Hash Table(called A,for example). Since sometimes there will be collision in a hash map, it's common to use the simple chaining solution. That is to say, making each bucket pointing to a linked-list containing entries, and if collision happens, just adding the new entry with same key to the this list.
But according to the data structure book, when a bucket A[i] is empty, it stores null, and if A[i] stores just a single entry (key, value), we can simply have A[i] point directly to the entry (key, value) rather than to a list-based map holding only the one entry. Therefore I think the hash table will be holding two different kinds of objects(Entry Objects and List Objects).
I have had trouble implementing this method.I choose to declare a new abstract class(Called "super" for example) which is inherited by both List class and Entry class.And there is nothing in this new class. As a result, the hash table now hold only one type Object "super" which can point to both types of objects I need. However, I have to use "instanceof" to know what exactly the bucket A[i] is pointing to so as to do operations like adding a new entry. I've heard that using instanceof too much is not appropriate. And there will be many places requiring a "cast". So should I copy the code in the class Entry and List into the "super" class so as to not using so many "cast"s ?
There is nothing wrong in storing a single entry as a linked list having just a single link. After all, the difference between an entry (that contains just the key and the value) and a link of a linked list that contains an entry (the contains the key, the value and a reference to the next link) is a single reference to the next link. That's what the JDK implementation of HashMap does for buckets having a small number of entries.
This way you don't have to worry about storing different types of objects in your table.
On the other hand, the implementation of HashMap in Java 8 uses two entry implementations to store the entries of a bucket - a Node (linked list) and a TreeNode (a node of a tree). If you look at the implementation, you'll see they use e instanceof TreeNode to check the specific type of a given node. So as you can see, even the JDK use "instanceof" to know what exactly the bucket A[i] is pointing to.
I have couple of scenarios related to storing of HashMap, which I am not aware how to accomplish.
Case 1: As there are buckets on which objects are saved, and hashcode will be taken into consideration while saving it. Now say, there are 5 buckets and I want to have my own control on which bucket to save it. Is there a way to achieve it? Say, By internal mechanism, it was going to be saved into bucket 4, but I wanted to save that particular object into bucket 1.
Case 2: Similarly, If I see that out of 5 buckets, 1 bucket was getting much more load than other, and I want to do a load balancing kind of job by moving it to different buckets. How can that be accomplished?
There is fundamentally no way to achieve load balancing in a hashtable. The quintessential property of this structure is direct access to exactly the bucket which must hold the requested key. Any balancing scheme would involve reshuffling the objects among buckets and destroy this property. This is the reason why good-quality hashcodes are vital to the proper operation of a hashtable.
Additionally note that you can't even control bucket selection by manipulating the hashCode() method of your objects, because hashcodes of any two equal objects must match, and because any self-respecting hashtable implementation will additionally shuffle the bits of the value retrieved from hashCode() to ensure better dispersion.
The implementations are designed so that you shouldn't have to worry about these details.
If you wanted to control these more carefully, then you can create your own class implementing Map.
With HashMap and with all Collections whose names start with Hash the more important part is the hasCode generated by the domain object that you are trying to store. That's why every object has a hashCode implementation(implicity with object.hashCode() or explicitely).
First of all HashMap tries to accomplish what you stated in case 2(sort of). If your hashCode implementation is good, meaning can produce evenly dispersed hashCode values for variety of objects than load of the buckets of HashMap is more or less evenly distributed, and you don't have to anything(other than writing a good hashCode function.). Also you can somehow manupulate the balance by implementing your hascode accordingly by producing same hashcode for objects that you want them to be in the same bucket.
If you want to have complete control on the internals of the hashMap than you should implement your own HashMap by implementing Map interface.
The underlying mechanism for bucket creation and placement are abstracted away.
For case 1, you can simply use objects as your keys for the bucket placement. For case 2, you cannot see the actual placement of objects directly.
Although, what you can do is use a Multimap which you can treat the keys as if they were buckets. It's basically a map from keys to collections. Here you can check any given key(bucket) and see how many items you have placed in there. Here you can satisfy requirements from both cases. This is probably as close as you're going to get without actually tampering with the internal bucketing mechanism.
From the link, here is a snippet:
public class MutliMapTest {
public static void main(String... args) {
Multimap<String, String> myMultimap = ArrayListMultimap.create();
// Adding some key/value
myMultimap.put("Fruits", "Bannana");
myMultimap.put("Fruits", "Apple");
myMultimap.put("Fruits", "Pear");
myMultimap.put("Vegetables", "Carrot");
// Getting the size
int size = myMultimap.size();
System.out.println(size); // 4
// Getting values
Collection<string> fruits = myMultimap.get("Fruits");
System.out.println(fruits); // [Bannana, Apple, Pear]
Collection<string> vegetables = myMultimap.get("Vegetables");
System.out.println(vegetables); // [Carrot]
// Iterating over entire Mutlimap
for(String value : myMultimap.values()) {
System.out.println(value);
}
// Removing a single value
myMultimap.remove("Fruits","Pear");
System.out.println(myMultimap.get("Fruits")); // [Bannana, Pear]
// Remove all values for a key
myMultimap.removeAll("Fruits");
System.out.println(myMultimap.get("Fruits")); // [] (Empty Collection!)
}
I have a HashMap which I am using to store objects of type SplitCriteria using a String as the key
Map<String, SplitCriteria> criteriaMap = new HashMap<String, SplitCriteria>();
A sample SplitCriteria object contains the something like the following:
SplitCriteria [
id=4,
criteriaName="Location",
criteriaAbrevName="Loc",
fieldName="LOCATION",
isMandatory=false
]
with id being a long, isMandatory is a boolean and the rest are strings.
I am looping over previously populated Array of the same object type, total count is 7, adding each to the HashMap using the fieldName attribute as the key:
for(SplitCriteria split : selectedCriteria){
String fieldName = split.getFieldName();
criteriaMap.put(fieldName, split);
}
After this loop has finished, the size of the map appears to be 7, but looking at the table contents there are only 6 objects present.
From researching the issue, I have come to understand that if there is a clash with keys, the clashing objects are "chained" together using the next attribute of the entry in the Map.
From the image below, you can see this is what is happening in my scenario, but the two keys are completely different!
Also I read this in the docs for the put method
If the map previously contained a mapping for the key, the old value is replaced by the specified value
and
Returns:
the previous value associated with key, or null if there was no mapping for key.
So if the keys were clashing, I would expect the old entry to be returned, but it is not.
I have no clue how this is happening, as each key I am using is completely different to the next.
Any help in resolving this would be greatly appreciated.
Paddy
EDIT:
When I try and retrieve the object at a later stage I am getting a null reponse
SplitCriteria criteria = (SplitCriteria) criteriaMap.get(key);
but looking at the table contents there are only 6 objects present
Nope, look at size - it's 7. You've just got two values in the same bucket. They don't collide by exact hash value, but they do collide by bucket. That's fine.
You won't be able to observe that when you use the map - if you just use the public API, you'll see all 7 entries with no hint of anything untoward. This is why I would generally recommend avoiding digging into the internal details of an object using the debugger until you're really really sure there's a problem.
HashMap is organized into buckets.
Every bucket has a linked list with entries for that bucket.
In your case, you have sixteen buckets (the size of table), six of them are filled (objects in table), and your seven entries are in those six lists (which means that one of them has length two).
If you open those HashMap$Entry objects, you will find one that has a pointer to the "next" entry.
"LOCATION" and "PAY_FREQUENCY" happen to be in the same bucket.
If you continue to shove more entries into the map, it will eventually resize itself to have more buckets (to avoid running into issues with long lists).
Two different keys may be assigned to the same bin of the HashMap (the same array entry in Java 6 implementation). In that case they will be chained in a linked list. However, neither of these two keys overrides the other, since they are not equal to each other.
The size of your HashMap is 7, which means it contains 7 key-value pairs (even though 2 of them are stored in the same bin).
A clash happens when two different keys produce the same hash value. This hashed value is used in the HashMap to quickly navigate to the elements. So this means, that when two keys clash, they are different but both produce the same hash value. The algorithm that is used to calculate the hash value is internal to the HashMap.
Take a look at this blog post: http://javahungry.blogspot.com/2013/08/hashing-how-hash-map-works-in-java-or.html
The table only has 16 entries. This means that keys are assigned to buckets only based on 4 bits, so two entries in the same bucket isn't that unlikely at all. The table will grow as you add more entries.
You don't need to care about these details. All you should care about is that the map has 7 entries.
Since hashcode is same for both the keys, bucket location would be same and collision will occur in HashMap, Since HashMap use LinkedList to store object, this entry (object of Map.Entry comprise key and value ) will be stored in LinkedList.
HashMap uses Key Object's hashcode to find out bucket location and retrieves Value object ,then there are two Value objects are stored in same bucket . HashMap stores both Key and Value in LinkedList node .
After finding bucket location , we will call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap
I'm having some trouble when using .put(Integer, String) in Java.
To my understanding, when a collision happens the HashMap asks whether the to value are the same with .equals(Object) and if they are not the two values are stored in a LinkedList. Nevertheless, size() is 1 and the hash iterator only shows one result, the last one.
Apart form this, java HashMap API states:put
public V put(K key, V value)
Associates the specified value with the specified key in this map. If
the map previously contained a mapping for the key, the old value is
replaced.
THIS IS NOT WHAT I HAVE READ EVERYWHERE.
Thoughts?
public class HashProblema {
public static void main(String[] args) {
HashMap<Integer, String> hash= new HashMap();
hash.put(1, "sdaaaar");
hash.put(1, "bjbh");
System.out.println(hash.size());
for (Object value : hash.values()) {
System.out.println(value);
}
}
}
The output is -:
1
bjbh
Since the mapping for the key exist, it is replaced and the size remains 1 only.
The value gets over written by the new key..the size remains one and the value gets changed..This is how it works, as key values are always unique..You can't map multiple values on 1 key.
The API is the definitive reference and that is what you must believe.
A collision occurs when the hash of of a key already exists in the HashMap. Then the values of the keys are compared, and if they are the different, the entries are placed in a linked list. If the keys are the same, then the old key-value in the HashMap is overwritten.
API documentation should normally be treated as authoritative unless there is very good reason to doubt its accuracy.
You should almost certainly ignore any claim that doesn't flag itself as 'knowingly' at odds with documentation and provide a testable evidence.
I humbly suggest you might be confused about the role of a linked 'collision' list. As it happens HashMap in Java uses a linked-list to store multiple values for which the hash-code of the key is placed in the same 'bucket' as one or more other keys.
A HashMap in Java will always store a Key-Value-Pair. There are no linked lists involved. What you are describing is the general idea of a hash map (often taught in computer science class), but the implementation in Java is different. Here, you will always have one value per key only (the last one you put in that place).
However, you are free to define a HashMap that contains List objects. Though, you have to keep track of duplicates and collisions on your own then
This question already has answers here:
HashMap is already sorted by key?
(3 answers)
Closed 8 years ago.
Map<Integer,String> m1 = new HashMap<>();
m1.put(5, "gfd");
m1.put(1,"sandy");
m1.put(3, "abc");
m1.put(2, "def");
m1.put(1, "ijk");
m1.put(10, "bcd");
m1.put(0, "ssdfsd");
When I Print the map, the output is {0=ssdfsd, 1=ijk, 2=def, 3=abc, 5=gfd, 10=bcd}.
But, how the output is in sorted order even though I have used HashMap()??
You can easily see this in the implementation. If you look into the source of HashMap.put(), you can see that the hash table index of the object is determined like this:
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
The methods hash() and indexFor() only ensure that there are not too many collisions of hash values and they don't exceed the length of the hash table.
Now if you take a look at Integer.hashCode(), you'll see that the hash is the integer itself:
public int hashCode() {
return value;
}
So the integer which has the value 0 ends up in index 0 of the hash table and so on. At least as long as the hash table is big enough.
The toString() method of HashMap iterates over the hash table and the elements in each index of the hash table. So in your case the order is preserved.
As others mentioned correctly, this behavior is not ensured by implementations of Map. It just works in this special case.
A Map provides you with the interface of storing and retrieving objects into a map with a key attached to them.
What each implementation does internally is fully up to it, including in which order the key/value pairs are stored in the internal structure. See #Seshoumaro's answer for the quote from the javadoc.
HashMap hashes the key (which is an Integer in this case) and uses that hash as an array index. Since the hashCode for Integer is fairly simple to write yourself, it's not surprising that the array indices for each one are in the same order as the key itself.
What all that means is: you shouldnt be surprised that the HashMap is acting this way.
The HashMap provides no guarantee as to the order of the items stored. It may even be in order in some cases. Taken from the javadoc:
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
So it's working as intended. If you're curious as to why this particular example is placed in order, you could check out the source code itself.
Its not just restricted to keys of Integers you may obtain the same with Strings at times.
It just so happens to be at times and you would find numerous instances of the same.
As suggested by others HashMap never guarantees the insertion order while fetching.Since the official doc says to not rely you might find occasions when it would not retain the order so better code likewise.
See this for more