What's the fastest way how to iterate over linked hash map, I need 100 last keys and first 20 keys. The size of this map will be in most cases over 500-1500, thanks
LinkedHashMap<Integer, Float> doc_1 = service5.getMatchingValue(query);
If you iterate over a HashMap you still would have O(n) runtime, as you would iterating over any other datastructure...
Iterating over a HashMap is just as time consuming as iterating over any DS.
If you only need specific entries of a hashmap, you might want to keep information on the required keys and only loop over those keys. Access to an element in a HashMap using its key is O(1) (or atleast amorized), accessing M entries (by their keys) using direct access results thus in O(M) runtime with O(M) << O(N).
You could perhaps keep the last 100 keys in a cache and just loop over (a copy) of the cache to have the best possible access (in terms of performance) in combination with your HashMap.
Related
How Hashmap identifies that this bucket is full and it needs rehashing as it stores the value in linked list if two hashcodes are same, then as per my understanding this linkedlist does not have any fixed size and it can store as many elements it can so this bucket will never be full then how it will identify that it needs rehashing?
In a ConcurrentHashMap actually a red-black tree (for large number of elements) or a linked list (for small number of elements) is used when there is a collision (i.e. two different keys have the same hash code). But you are right when you say, the linked list (or red-black tree) can grow infinitely (assuming you have infinite memory and heap size).
The basic idea of a HashMap or ConcurrentHashMap is, you want to retrieve a value based on its key in O(1) time complexity. But in reality, collisions do happen and when that happens we put the nodes in a tree linked to the bucket (or array cell). So, Java could create a HashMap where the array size would remain fixed and rehashing would never happen, but if they did that all your key values need to be accommodated within the fixed-size array (along with their linked trees).
Let's say you have that kind of HashMap where your array size is fixed to 16 and you push 1000 key-value pairs in it. In that case, you can have at most 16 distinct hash codes. This in turn means that you would have collisions in all (1000-16) puts and those new nodes will end up in the tree and they can no longer be fetched in O(1) time complexity. In tree you'd need O(log n) time to search for keys.
To make sure that this doesn't happen, the HashMap uses a load factor calculation to determine how much of the array is filled with key-value pairs. If it is 75% (default settings) full, any new put would create a new larger array, copy the existing content into it and thus have more buckets or hash code space. This ensures that in most of the cases collisions won't happen and the tree won't be required and you would fetch most keys in O(1) time.
Hashmap maintain complexity of O(1) while inserting data in and
getting data from hashmap, but for 13th key-value pair, put request
will no longer be O(1), because as soon as map will realize that 13th
element came in, that is 75% of map is filled.
It will first double the bucket(array) capacity and then it will go
for Rehash. Rehashing requires re-computing hashcode of already placed
12 key-value pairs again and putting them at new index which requires
time.
Kindly refer this link, this will be helpful for you.
From what I understand, the time complexity to iterate through a Hash table with capacity "m" and number of entries "n" is O(n+m). I was wondering, intuitively, why this is the case? For instance, why isn't it n*m?
Thanks in advance!
You are absolutely correct. Iterating a HashMap is an O(n + m) operation, with n being the number of elements contained in the HashMap and m being its capacity. Actually, this is clearly stated in the docs:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).
Intuitively (and conceptually), this is because a HashMap consists of an array of buckets, with each element of the array pointing to either nothing (i.e. to null), or to a list of entries.
So if the array of buckets has size m, and if there are n entries in the map in total (I mean, n entries scattered throughout all the lists hanging from some bucket), then, iterating the HashMap is done by visiting each bucket, and, for buckets that have a list with entries, visiting each entry in the list. As there are m buckets and n elements in total, iteration is O(m + n).
Note that this is not the case for all hash table implementations. For example, LinkedHashMap is like a HashMap, except that it also has all its entries connected in a doubly-linked list fashion (to preserve either insertion or access order). If you are to iterate a LinkedHashMap, there's no need to visit each bucket. It would be enough to just visit the very first entry and then follow its link to the next entry, and then proceed to the next one, etc, and so on until the last entry. Thus, iterating a LinkedHashMap is just O(n), with n being the total number of entries.
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings)
n = the number of buckets
m = the number of key-value mappings
The complexity of a Hashmap is O(n+m) because the worst-case scenario is one array element contains the whole linked list, which could occur due to flawed implementation of hashcode function being used as a key.
Visialise the worst-case scenario
To iterate this scenario, first java needs to iterate the complete array O(n) and then iterate the linked list O(m), combining these gives us O(n+m).
I have a Java class which contains two Strings, for example the name of a person and the name of the group.
I also have a list of groups (about 10) and a list of persons (about 100). The list of my data objects is larger, it can exceed 10.000 items.
Now I would like to search through my data objects such that I find all objects having a person from the person list and a group from the group list.
My question is: what is the best data structure for the person and group list?
I could use an ArrayList and simply iterate until I find a match, but that is obviously inefficient. A HashSet or HashMap would be much better.
Are there even more efficient ways to solve this? Please advise.
Every data structure has pro and cons.
A Map is used to retrieve data in O(1) if you have an access key.
A List is used to mantain an order between elements, but accessing an element using a key is not possible and you need to loop the whole list that happens in O(n).
A good data-structure for storing and lookup strings is a Trie:
It's essentially a tree structure which uses characters or substrings to denote paths to follow.
Advantages over hash-maps (quote from Wikipedia):
Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
There are no collisions of different keys in a trie.
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
A trie can provide an alphabetical ordering of the entries by key.
I agree with #Davide answer..If we want fast lookup as well as to maintain the order too, then we can go for LinkedHashMap implementation of Map.
By using it, we can have both things:
Data retrieval, If we have access key.
We can maintain the insertion order, so while iterating we will get the data in the same order as of during insertion.
Depending on the scenario (If you have the data before receiving lists of groups/people), preprocessing the data would save you time.
Comparing the data to the groups/people lists will require at least 10,000+ lookups. Comparing the groups/people lists to the data will require a max 10*100 = 1,000 lookups,.. less if you compare against each group one at a time (10+100 = 110 lookups).
I have a gigantic data set which I've to store into a collection and need to find whethere any duplicates in there or not.
The data size could be more than 1 million. I know I can store more element in ArrayList comapre to a Map.
My questions are:
is searching key in a Map faster than searching in sorted ArrayList?
is searching Key in HashMap is faster than TreeMap?
Only in terms of space required to store n elements, which would be more efficient between a TreeMap and a HashMap implementation?
1) Yes. Searching an ArrayList is O(n) on average. The performance of key lookups in a Map depends on the specific implementation. You could write an implementation of Map that is O(n) or worse if you really wanted to, but all the implementations in the standard library are faster than O(n).
2) Yes. HashMap is O(1) on average for simple key lookups. TreeMap is O(log(n)).
Class HashMap<K,V>
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
Class TreeMap<K,V>
This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations. Algorithms are adaptations of those in Cormen, Leiserson, and Rivest's Introduction to Algorithms.
3) The space requirements will be O(n) in both cases. I'd guess the TreeMap requires slightly more space, but only by a constant factor.
It depends on the type of Map you're using.
A HashMap has a constant-time average lookup (O(1)), while a TreeMap's average lookup time is based on the depth of the tree (O(log(n))), so a HashMap is faster.
The difference is probably moot. Both data structures require some amount of constant overhead in space complexity by design (both exhibit O(n) space complexity).
It just did some benchmark testing on lookup performance between hashmap and sorted arraylist. The answer is hashmap is much faster as the size increase. I am talking about 10x, 20x, 30x faster. I did some test with 1 million of entries using sorted array list and hashmap and the array list get and add operation took seconds to complete, where as the hashmap get and put only takes around 50ms.
Here are something I found or observed:
For sorted arraylist, you would have to sort it first to be able to use the search efficiently (binarySearch for example). Practically you don't just have static list (meaning the list will change via add or remove). With that in mind you will need to change the add and the get methods to do "binary" operation to make it efficient (like binarySearch). So even with binary operation the add and get method will be slower and slower as the list grows.
Hashmap on the other hand does not show much of change in term of time in the put and get operation. The problem with Hashmap is memory overhead. If you can live with that then go with hashmap.
I need a Map<Integer,String> with a major need to do fast retrievals of values by key. However I also have the need to retrieve List of all entries (key, value pairs) whose keys are in range (n1 to n2). However, No sorting required in the list.
The map would hold atleast 10,000 such entries.
I initially thought of using TreeMap but that doesn't help with faster retrievals(O(log n) for get() operations). Is it possible to get a list of entries from HashMap whose keys are in range n1 to n2 ?
What would be my best bet to go with ?
The two implementations of NavigableMap (which allow you to retrieve sub-maps or subsets based on key ranges) are TreeMap and ConcurrentSkipListMap, both of which offer O(log n) access time.
Assuming you require O(1) access time as per a regular HashMap, I suggest to implement your own (inefficient) "key range" methods. In other words, sacrifice the performance of the key-range operation for the improved access time you achieve with a regular HashMap. There isn't really another way around this: NavigableMap methods are inherently dependent on the data being stored in a sorted fashion which means you will never be able to achieve O(1) access time.
How close are the keys distributed? For 10000 elements, equally distributed over 20 000 possibilities like 0 to 19999, I could imagine a search for elements from 4 to 14 could be fine. You would miss at a 50% rate.
I wonder why TreeMap doesn't help with faster retrievals (O(log n) for get() operations)?
If you have Tree, with smaller values Left, and bigger ones right, you could return big parts of subtrees. Need it be Map and List?