I have a HashMap which I am using to store objects of type SplitCriteria using a String as the key
Map<String, SplitCriteria> criteriaMap = new HashMap<String, SplitCriteria>();
A sample SplitCriteria object contains the something like the following:
SplitCriteria [
id=4,
criteriaName="Location",
criteriaAbrevName="Loc",
fieldName="LOCATION",
isMandatory=false
]
with id being a long, isMandatory is a boolean and the rest are strings.
I am looping over previously populated Array of the same object type, total count is 7, adding each to the HashMap using the fieldName attribute as the key:
for(SplitCriteria split : selectedCriteria){
String fieldName = split.getFieldName();
criteriaMap.put(fieldName, split);
}
After this loop has finished, the size of the map appears to be 7, but looking at the table contents there are only 6 objects present.
From researching the issue, I have come to understand that if there is a clash with keys, the clashing objects are "chained" together using the next attribute of the entry in the Map.
From the image below, you can see this is what is happening in my scenario, but the two keys are completely different!
Also I read this in the docs for the put method
If the map previously contained a mapping for the key, the old value is replaced by the specified value
and
Returns:
the previous value associated with key, or null if there was no mapping for key.
So if the keys were clashing, I would expect the old entry to be returned, but it is not.
I have no clue how this is happening, as each key I am using is completely different to the next.
Any help in resolving this would be greatly appreciated.
Paddy
EDIT:
When I try and retrieve the object at a later stage I am getting a null reponse
SplitCriteria criteria = (SplitCriteria) criteriaMap.get(key);
but looking at the table contents there are only 6 objects present
Nope, look at size - it's 7. You've just got two values in the same bucket. They don't collide by exact hash value, but they do collide by bucket. That's fine.
You won't be able to observe that when you use the map - if you just use the public API, you'll see all 7 entries with no hint of anything untoward. This is why I would generally recommend avoiding digging into the internal details of an object using the debugger until you're really really sure there's a problem.
HashMap is organized into buckets.
Every bucket has a linked list with entries for that bucket.
In your case, you have sixteen buckets (the size of table), six of them are filled (objects in table), and your seven entries are in those six lists (which means that one of them has length two).
If you open those HashMap$Entry objects, you will find one that has a pointer to the "next" entry.
"LOCATION" and "PAY_FREQUENCY" happen to be in the same bucket.
If you continue to shove more entries into the map, it will eventually resize itself to have more buckets (to avoid running into issues with long lists).
Two different keys may be assigned to the same bin of the HashMap (the same array entry in Java 6 implementation). In that case they will be chained in a linked list. However, neither of these two keys overrides the other, since they are not equal to each other.
The size of your HashMap is 7, which means it contains 7 key-value pairs (even though 2 of them are stored in the same bin).
A clash happens when two different keys produce the same hash value. This hashed value is used in the HashMap to quickly navigate to the elements. So this means, that when two keys clash, they are different but both produce the same hash value. The algorithm that is used to calculate the hash value is internal to the HashMap.
Take a look at this blog post: http://javahungry.blogspot.com/2013/08/hashing-how-hash-map-works-in-java-or.html
The table only has 16 entries. This means that keys are assigned to buckets only based on 4 bits, so two entries in the same bucket isn't that unlikely at all. The table will grow as you add more entries.
You don't need to care about these details. All you should care about is that the map has 7 entries.
Since hashcode is same for both the keys, bucket location would be same and collision will occur in HashMap, Since HashMap use LinkedList to store object, this entry (object of Map.Entry comprise key and value ) will be stored in LinkedList.
HashMap uses Key Object's hashcode to find out bucket location and retrieves Value object ,then there are two Value objects are stored in same bucket . HashMap stores both Key and Value in LinkedList node .
After finding bucket location , we will call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap
Related
Ive always been certain that a 'bucket' in a java hash map contains either a linked list or a Tree of some kind, indeed you can read in many places on the web how the bucket holds this list then iterates over the entries using the equals function to find entries that are stored in the same bucket (ie have the same key), bearing this in mind, can someone explain why the following, trivial code doesnt work as expected :-
private class MyString {
String internalString;
MyString(String string) {
internalString = string;
}
#Override
public int hashCode() {
return internalString.length(); // rubbish hashcode but perfectly legal
}
}
...
Map<MyString, String> map = new HashMap<>();
map.put(new MyString("key1"), "val1");
map.put(new MyString("key2"), "val2");
String retVal = map.get(new MyString("key1"));
System.out.println("Val returned = "+retVal);
In this example I would have expected the two map entries to be in the list in the (same) bucket and for retVal to equal "val1", however it equals null?
A quick debug shows why, the bucket does not contain a list at all just a single entry.....
I thought i was going mad until I read this on the baeldung website (https://www.baeldung.com/java-map-duplicate-keys)
...However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
What is going on, does a bucket in a hash map contain a list or not ?
Does a java hashmap bucket really contain a list?
It depends.
For older implementations (Java 7 and earlier), yes it really does contain list. (It is a singly linked list of an internal Node type.)
For newer implementations (Java 8 and later), it can contain either a list or a binary tree, depending on how many entries hash to the particular bucket. If the number is small, a singly linked list is used. If the number is larger than a hard-coded threshold (8 in Java 8), then the HashMap converts the list to a balanced binary tree ... so that bucket searches are O(logN) instead of O(N). This mitigates the effects of a hash code function that generates a lot of collisions (or one where this can be made to occur by choosing keys in a particular way.)
If you want to learn more about how HashMap works, read the source code. (It is well commented, and the comments explain the rationale as well as the nitty-gritty how does it work stuff. It is worse the time ... if you are interested in this kind of thing.)
However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
That is something else entirely. That is about multiple values for a key rather than multiple keys in a bucket.
The article is correct. And this doesn't contradict my "a bucket contains a list or tree" statement.
Put simply, a HashMap bucket can contain multiple key / value pairs, where the keys are all different.
The only point on which I would fault the quoted text is that it seems to imply that it is implementations of Map that have the one value per key restriction. In reality, it is the Map API itself that imposes this restriction ... unless you use (say) a List as the map's value type.
My question is about the bucket array of Hash Table(called A,for example). Since sometimes there will be collision in a hash map, it's common to use the simple chaining solution. That is to say, making each bucket pointing to a linked-list containing entries, and if collision happens, just adding the new entry with same key to the this list.
But according to the data structure book, when a bucket A[i] is empty, it stores null, and if A[i] stores just a single entry (key, value), we can simply have A[i] point directly to the entry (key, value) rather than to a list-based map holding only the one entry. Therefore I think the hash table will be holding two different kinds of objects(Entry Objects and List Objects).
I have had trouble implementing this method.I choose to declare a new abstract class(Called "super" for example) which is inherited by both List class and Entry class.And there is nothing in this new class. As a result, the hash table now hold only one type Object "super" which can point to both types of objects I need. However, I have to use "instanceof" to know what exactly the bucket A[i] is pointing to so as to do operations like adding a new entry. I've heard that using instanceof too much is not appropriate. And there will be many places requiring a "cast". So should I copy the code in the class Entry and List into the "super" class so as to not using so many "cast"s ?
There is nothing wrong in storing a single entry as a linked list having just a single link. After all, the difference between an entry (that contains just the key and the value) and a link of a linked list that contains an entry (the contains the key, the value and a reference to the next link) is a single reference to the next link. That's what the JDK implementation of HashMap does for buckets having a small number of entries.
This way you don't have to worry about storing different types of objects in your table.
On the other hand, the implementation of HashMap in Java 8 uses two entry implementations to store the entries of a bucket - a Node (linked list) and a TreeNode (a node of a tree). If you look at the implementation, you'll see they use e instanceof TreeNode to check the specific type of a given node. So as you can see, even the JDK use "instanceof" to know what exactly the bucket A[i] is pointing to.
Why is a linkedlist required when hash collision occurs and HashMap does not allow duplicate elements? I was trying to understand following points in HashMap:
HashMap does not give order of elements. But following elements I am getting insertion order then LinkedHashMap is different with HashMap.
Map<String, Integer> ht2=new HashMap<String, Integer>();
ht2.put("A", 20);
ht2.put("B", 10);
ht2.put("C", 30);
ht2.put("D", 50);
ht2.put("E", 40);
ht2.put("F", 60);
ht2.put("G", 70);
for(Entry e:ht2.entrySet())
{
System.out.println(e.getKey() +"<<<key HashMap value>>>"+e.getValue());
}
HashMap does not allow duplicate keys , Yes I can get expected output. When we are storing object as a key we have to overwrite the equal method based on attribute, so same object or same object information will not be duplicate. So every bucket will have only one entry if entry same previous one will overwrite. I am not understanding how multiple entry are coming in a same bucket when collision occur it is overwriting the previous value. Why linked list is required here when duplicate are not allowing here. Please look into below example.
HashMap<Employee, Integer> hashMap = new HashMap<Employee, Integer>(4);
for (int i = 0; i < 100; i++) {
Employee myKey = new Employee(i,"XYZ",new Date());
hashMap.put(myKey, i);
}
System.out.println("myKey Size ::"+hashMap.size());
Here I am creating 100 Employee object so 100 buckets are created. I can see when hashcode value printed different value. So how linked list are coming here and how multiple entry are going in to same bucket.
There is a different between the number of buckets and the number of entries in the HashMap.
Two keys of the HashMap may have the same hashCode, even if they are not equal to each other, which means both of them will be stored in the same bucket. Therefore the linked list (or some other structure that can hold multiple entries) is required.
Even two keys having different hashCode may be stored in the same bucket, since the number of buckets is much smaller than the number of possible hashCode values. For example, if the HashMap has 16 buckets, keys with hashCode 0 and 16 will be mapped to the same bucket. Therefore the bucket must be able to hold multiple entries.
The first part of your question is not clear. If you meant to ask why you see different iteration order in HashMap vs. LinkedHashMap, the reason is HashMap doesn't maintain insertion order, and LinkedHashMap does maintain insertion order. If for some input you are seeing an iteration order matching the insertion order in HashMap, that's just coincidence (depending on the buckets that the inserted keys happen to be mapped to).
When a HashMap collision occurs, like you said in your question the .equals is involved. The linked list is used like this:
If a collision occurs and the .equals returns true, then the old reference (if the references are not identical, of course) is replaced by the new one
If the .equals() returns false against the existing value and only one object is in the current bucket, the HashMap inserts it to a linked list at index 0. Note that in java's standard HashMap implementation, the entries into this linked list are entirely internal, that is, you wouldn't even be able to access the list under normal circumstances
If there is more than one entry in the current bucket, it continues down the list until it finds a case where .equals() returns true on the existing object in the list and replaces, or it reaches the end of the list/bucket, in which case step 2 occurs
So you technically don't have to worry about the list, just make sure that your .hashcode minimizes the amount of collisions
I'm having some trouble when using .put(Integer, String) in Java.
To my understanding, when a collision happens the HashMap asks whether the to value are the same with .equals(Object) and if they are not the two values are stored in a LinkedList. Nevertheless, size() is 1 and the hash iterator only shows one result, the last one.
Apart form this, java HashMap API states:put
public V put(K key, V value)
Associates the specified value with the specified key in this map. If
the map previously contained a mapping for the key, the old value is
replaced.
THIS IS NOT WHAT I HAVE READ EVERYWHERE.
Thoughts?
public class HashProblema {
public static void main(String[] args) {
HashMap<Integer, String> hash= new HashMap();
hash.put(1, "sdaaaar");
hash.put(1, "bjbh");
System.out.println(hash.size());
for (Object value : hash.values()) {
System.out.println(value);
}
}
}
The output is -:
1
bjbh
Since the mapping for the key exist, it is replaced and the size remains 1 only.
The value gets over written by the new key..the size remains one and the value gets changed..This is how it works, as key values are always unique..You can't map multiple values on 1 key.
The API is the definitive reference and that is what you must believe.
A collision occurs when the hash of of a key already exists in the HashMap. Then the values of the keys are compared, and if they are the different, the entries are placed in a linked list. If the keys are the same, then the old key-value in the HashMap is overwritten.
API documentation should normally be treated as authoritative unless there is very good reason to doubt its accuracy.
You should almost certainly ignore any claim that doesn't flag itself as 'knowingly' at odds with documentation and provide a testable evidence.
I humbly suggest you might be confused about the role of a linked 'collision' list. As it happens HashMap in Java uses a linked-list to store multiple values for which the hash-code of the key is placed in the same 'bucket' as one or more other keys.
A HashMap in Java will always store a Key-Value-Pair. There are no linked lists involved. What you are describing is the general idea of a hash map (often taught in computer science class), but the implementation in Java is different. Here, you will always have one value per key only (the last one you put in that place).
However, you are free to define a HashMap that contains List objects. Though, you have to keep track of duplicates and collisions on your own then
According to the webpage http://www.javamex.com/tutorials/collections/hash_codes_advanced.shtml
hash codes do not uniquely identify an object. They simply narrow down the choice of matching items, but it is expected that in normal use, there is a good chance that several objects will share the same hash code. When looking for a key in a map or set, the fields of the actual key object must therefore be compared to confirm a match."
First does this mean that keys used in a has map may point to more then one value as well? I assume that it does.
If this is the case. How can I create a "Always Accurate" hashmap or similar key,value object?
My key needs to be String and my value needs to be String as well.. I need around 4,000 to 10,000 key value pairs..
A standard hashmap will guarantee unique keys. A hashcode is not equivalent to a key. It is just a means of quickly reducing the set of possible values down to objects (strings in your case) that have a specific hashcode.
First, let it be noted: Java's HashMaps work. Assuming the hash function is implemented correctly, you'll always get the same value for the same key.
Now, in a hash map, the key's hash code determines the bucket in which the value will be placed (read about hash tables if you're not familiar with the term). The performance of the map depends on how well the hash codes are distributed, and how balanced is the number of values in every bucket. Since you're using String, rest assure. HashMap will be "Always Accurate".