HashMap vs array performance difference in following approach - java

To solve Dynamic programming problem I used two approaches to store table entries, one using multi dimension array ex:tb[m][n][p][q] and other using hashmap and using indexes of 1st approach to make string to be used as key as in "m,n,p,q". But on one input first approach completes in 2 minutes while other takes more than 3 minutes.
If access time of both hashmap and array is asymptotically equal than why so big difference in performance ?

Like mentioned here:
HashMap uses an array underneath so it can never be faster than using
an array correctly.
You are right, array's and HashMap's access time is in O(1) but this just says it is independent on input size or the current size of your collection. But it doesn't say anything about the actual work which has to be done for each action.
To access an entry of an array you have to calculate the memory address of your entry. This is easy as array's memory address + (index * size of entity).
To access an entry of a HashMap, you first have to hash the given key (which needs many cpu cycles), then access the entry of the HashMap's array using the hash which holds a list (depends on implementation details of the HashMap), and last you have to linear search the list for the correct entry (those lists are very short most of the time, so it is treated as O(1)).
So you see it is more like O(10) for arrays and O(5000) hash maps. Or more precise T(Array access) for arrays and T(hashing) + T(Array access) + T(linear search) for HashMaps with T(X) as actual time of action x.

Related

How hashmap identifies when it needs rehashing

How Hashmap identifies that this bucket is full and it needs rehashing as it stores the value in linked list if two hashcodes are same, then as per my understanding this linkedlist does not have any fixed size and it can store as many elements it can so this bucket will never be full then how it will identify that it needs rehashing?
In a ConcurrentHashMap actually a red-black tree (for large number of elements) or a linked list (for small number of elements) is used when there is a collision (i.e. two different keys have the same hash code). But you are right when you say, the linked list (or red-black tree) can grow infinitely (assuming you have infinite memory and heap size).
The basic idea of a HashMap or ConcurrentHashMap is, you want to retrieve a value based on its key in O(1) time complexity. But in reality, collisions do happen and when that happens we put the nodes in a tree linked to the bucket (or array cell). So, Java could create a HashMap where the array size would remain fixed and rehashing would never happen, but if they did that all your key values need to be accommodated within the fixed-size array (along with their linked trees).
Let's say you have that kind of HashMap where your array size is fixed to 16 and you push 1000 key-value pairs in it. In that case, you can have at most 16 distinct hash codes. This in turn means that you would have collisions in all (1000-16) puts and those new nodes will end up in the tree and they can no longer be fetched in O(1) time complexity. In tree you'd need O(log n) time to search for keys.
To make sure that this doesn't happen, the HashMap uses a load factor calculation to determine how much of the array is filled with key-value pairs. If it is 75% (default settings) full, any new put would create a new larger array, copy the existing content into it and thus have more buckets or hash code space. This ensures that in most of the cases collisions won't happen and the tree won't be required and you would fetch most keys in O(1) time.
Hashmap maintain complexity of O(1) while inserting data in and
getting data from hashmap, but for 13th key-value pair, put request
will no longer be O(1), because as soon as map will realize that 13th
element came in, that is 75% of map is filled.
It will first double the bucket(array) capacity and then it will go
for Rehash. Rehashing requires re-computing hashcode of already placed
12 key-value pairs again and putting them at new index which requires
time.
Kindly refer this link, this will be helpful for you.

How to quickly know the indexes in a massive ArrayList of a very large number of strings from this ArrayList in Java?

Suppose that I have a collection of 50 million different strings in a Java ArrayList. Let foo be a set of 40 million arbitrarily chosen (but fixed) strings from the previous collection. I want to know the index of every string in foo in the ArrayList.
An obvious way to do this would be to iterate through the whole ArrayList until we found a match for the first string in foo, then for the second one and so on. However, this solution would take an extremely long time (considering also that 50 million was an arbitrary large number that I picked for the example, the collection could be in the order of hundreds of millions or even billions but this is given from the beginning and remains constant).
I thought then of using a Hashtable of fixed size 50 million in order to determine the index of a given string in foo using someStringInFoo.hashCode(). However, from my understanding of Java's Hashtable, it seems that this will fail if there are collisions as calling hashCode() will produce the same index for two different strings.
Lastly, I thought about first sorting the ArrayList with the sort(List<T> list) in Java's Collections and then using binarySearch(List<? extends T> list,T key,Comparator<? super T> c) to obtain the index of the term. Is there a more efficient solution than this or is this as good as it gets?
You need additional data structure that is optimized for searching strings. It will map string to it's index. The idea is that you iterate your original list populating your data structure and then iterate your set, performing searches in that data structure.
What structure should you choose?
There are three options worth considering:
Java's HashMap
TRIE
Java's IdentityHashMap
The first option is simple to implement but provides not the best possible performance. But still, it's population time O(N * R) is better than sorting the list, which is O(R * N * log N). Searching time is better then in sorted String list (amortized O(R) compared to O(R log N).
Where R is the average length of your strings.
The second option is always good for maps of strings, providing guaranteed population time for your case of O(R * N) and guaranteed worst-case searching time of O(R). The only disadvantage of it is that there is no out-of-box implementation in Java standard libraries.
The third option is a bit tricky and suitable only for your case. In order to make it work you need to ensure that strings from the first list are literally used in second list (are the same objects). Using IdentityHashMap eliminates String's equals cost (the R above), as IdentityHashMap compares strings by address, taking only O(1). Population cost will be amortized O(N) and search cost amortized O(1). So this solution provides the best performance and out-of-box implementation. However please note that this solution will work only if there are no duplicates in the original list.
If you have any questions please let me know.
You can use a Java Hashtable with no problems. According to the Java Documentation "in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially."
I think you have a misconception on how hash tables work. Hash collisions do NOT ruin the implementation. A hash table is simply an array of linked-lists. Each key goes through a hash function to determine the index in the array which the element will be placed. If a hash collision occurs, the element will be placed at the end of the linked-list at the index in the hash-table array. See link below for diagram.

How key is mapped to value in a hasmap in java with O(1) get() method?

Consider an int array variable x[]. The variable X will have starting address reference. When array is accessed with index 2 that is x[2].then its memory location is calculated as
address of x[2] is starting addr + index * size of int.
ie. x[2]=x + 2*4.
But in case of hashmap how the memory address will be mapped internally.
By reading many previous posts I observed that HashMap uses a linked list to store the key value list. But if that is the case, then to find a key, it generates a hashcode then it will checks for equal hash code in list and retrieves the value..
This takes O(n) complexity.
If i am wrong in above observation Please correct me... I am a beginner.
Thank you
The traditional implementation of a HashMap is to use a function to generate a key, then use that key to access a value directly. Think of it as generating something that will translate into an array index. It does not look through the hashmap comparing element hashes to the generated hash; it generates the hash, and uses the hash to access an element directly.
What I think you're talking about is the case where two values in the HashMap generate the SAME key. THEN it uses a list of those, and has to look through them to determine which one it wants. But that's not O(n) where n is the number of elements in the HashMap, just O(m) where m is the number of elements with the same hash. Clearly the name of the game is to find a hash function where the generated hash is unique for all the elements, as much as is feasible.
--- edit to expand on the explanation ---
In your post, you state:
By reading many previous posts I observed that HashMap uses a linked
list to store the key value list.
This is wrong for the overall HashMap. For a HashMap to work reasonably, there must be a way to use the key to calculate a way to access the corresponding element directly, not by searching through all the values in the HashMap.
A "perfect" hash calculation would translate each possible key into hash value that was not calculated for any other key. This is usually not feasible, and so it is usually possible that two different keys will result in the same result from the hash calculation. In this case, the HashMap implementation could use a linked list of values, and would need to look through all such values to find the one that it was looking for. But this number is FAR less than the number of values in the overall HashMap.
You can make a hash where strings are the keys, and in which the first character of the string is converted to a number which is then used as an array index. As long as your strings all have different first characters, then accessing the value is a simple calc plus an array access -- O(1). Or you could add all the character values of the string indices together and take the last two (or three) digits, and THAT would be your hash calc. As long as the addition produced unique values for each index string, you don't ever have to look through a list; again, O(1).
And, in fact, as long as the hash calculation is approximately perfect, the lookup is still O(1) overall, because the limited number of times you have to look through a short list does not alter the overall efficiency.

What is the Clear Purpose of SparseBooleanArray?? [ I referred official android site for this ]

I referred the android doc site for "SparseBooleanArray" class but still not getting idea of that class about what is the purpose of that class?? For what purpose we need to use that class??
Here is the Doc Link
http://developer.android.com/reference/android/util/SparseBooleanArray.html
From what I get from the documentation it is for mapping Integer values to booleans.
That is, if you want to map, if for a certain userID a widget should be shown and some userIDs have already been deleted, you would have gaps in your mapping.
Meaning, with a normal array, you would create an array of size=maxID and add a boolean value to element at index=userID. Then when iterating over the array, you would have to iterate over maxID elements in the worst case and have to check for null if there is no boolean for that index (eg. the ID does not exist). That is really inefficient.
When using a hashmap to do that you could map the ID to the boolean, but with the added overhead of generating the hashvalue for the key (that is why it is called *hash*map), which would ultimately hurt performance firstly in CPU cycles as well as RAM usage.
So that SparseBooleanArray seems like a good middleway of dealing with such a situation.
NOTE: Even though my example is really contrieved, I hope it illustrates the situation.
Like the javadoc says, SparseBooleanArrays map integers to booleans which basically means that it's like a map with Integer as a key and a boolean as value (Map).
However it's more efficient to use in this particular case It is intended to be more efficient than using a HashMap to map Integers to Booleans
Hope this clears out any issues you had with the description.
I found a very specific and wonderful use for the sparse boolean array.
You can put a true or false value to be associated with a position in a list.
For example: List item #7 was clicked, so putting 7 as the key and true as the value.
There can be three ways to store resource id's
1 Array
Boolean array containing id's as indexes.If we have used that id set it to true else false
Though all the operations are fast but this implementation will require huge amount of space.So it can't be used
High Space Complexity
2 HashMap
Key-ID
Value-Boolean True/False
Using this we need to process each id using the hashing function which will consume memory.Also there may be some empty locations where no id will be stored and we also need to deal with crashes.So due to usage complexity and medium space complexity, it is not used.
Medium Space Complexity
3 SparseBooleanArray
It is middle way.It uses mapping and Array Implementation
Key - ID
Value - Boolean True/False
It is an ArrayList which stores id's in an increasing order.So minimum space is used as it only contains id's which are being used.For searching an id binary search is used.
Though Binary Search O(logn) is slower than hashing O(1) or Array O(1),i.e. all the operations Insertion, Deletion, Searching will take more time but there is least memory wastage.So to save memory we prefer SparseBoolean Array
Least Space Complexity

What is the general purpose of using hashtables as a collection? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What exactly are hashtables?
I understand the purpose of using hash functions to securely store passwords. I have used arrays and arraylists for class projects for sorting and searching data. What I am having trouble understanding is the practical value of hashtables for something like sorting and searching.
I got a lecture on hashtables but we never had to use them in school, so it hasn't clicked. Can someone give me a practical example of a task a hashtable is useful for that couldn't be done with a numerical array or arraylist? Also, a very simple low level example of a hash function would be helpful.
There are all sorts of collections out there. Collections are used for storing and retrieving things, so one of the most important properties of a collection is how fast these operations are. To estimate "fastness" people in computer science use big-O notation which sort of means how many individual operations you have to accomplish to invoke a certain method (be it get or set for example). So for example to get an element of an ArrayList by an index you need exactly 1 operation, this is O(1), if you have a LinkedList of length n and you need to get something from the middle, you'll have to traverse from the start of the list to the middle, taking n/2 operations, in this case get has complexity of O(n). The same comes to key-value stores as hastable. There are implementations that give you complexity of O(log n) to get a value by its key whereas hastable copes in O(1). Basically it means that getting a value from hashtable by its key is really cheap.
Basically, hashtables have similar performance characteristics (cheap lookup, cheap appending (for arrays - hashtables are unordered, adding to them is cheap partly because of this) as arrays with numerical indices, but are much more flexible in terms of what the key may be. Given a continuous chunck of memory and a fixed size per item, you can get the adress of the nth item very easily and cheaply. That's thanks to the indices being integers - you can't do that with, say, strings. At least not directly. Hashes allows reducing any object (that implements it) to a number and you're back to arrays. You still need to add checks for hash collisions and resolve them (which incurs mostly a memory overhead, since you need to store the original value), but with a halfway decent implementation, this is not much of an issue.
So you can now associate any (hashable) object with any (really any) value. This has countless uses (although I have to admit, I can't think of one that's applyable to sorting or searching). You can build caches with small overhead (because checking if the cache can help in a given case is O(1)), implement a relatively performant object system (several dynamic languages do this), you can go through a list of (id, value) pairs and accumulate the values for identical ids in any way you like, and many other things.
Very simple. Hashtables are often called "associated arrays." Arrays allow access your data by index. Hash tables allow access your data by any other identifier, e.g. name. For example
one is associated with 1
two is associated with 2
So, when you got word "one" you can find its value 1 using hastable where key is one and value is 1. Array allows only opposite mapping.
For n data elements:
Hashtables allows O(k) (usually dependent only on the hashing function) searches. This is better than O(log n) for binary searches (which follow an n log n sorting, if data is not sorted you are worse off)
However, on the flip side, the hashtables tend to take roughly 3n amount of space.

Categories

Resources