Suppose we wish to repeatedly search a linked list of length N elements, each of which contains a very long string key. How might we take advantage of the hash value when searching the list for an element with a given key?
Insert the keys into a hash table. Then you can search in (theoretically) constant time.
You need to have the hashes prepared before searching the list and you need to be able to access hash of a string in constant time. Then you can first compare the hashes and only compare strings when hashes match. You can use a hashtable instead of a linked list.
The hash value of a String (hashCode) is a bit like an id for the string. Not completely unique, but usually pretty unique. You can a HashMap to store the String keys and their values. HashMap, as its name suggests, uses the Strings' hash values to efficiently store and retrieve values.
Not sure what constraints you're working under here (i.e. this might take way too much memory depending on how big the strings are), but if you have to maintain the linked list you could create a HashMap that maps the strings to their position in the list which would allow you to retrieve any string from the list with 2 constant time operations.
Put it in HashSet. The search algorithms will use the hash for each String value inserted.
Related
I see that HashSet in Java internally use HashMap to check whether a HashSet contains a element or not. Can’t it just use a bitmap for storing all the hash results from strings. Eg. String abc hashed to say, 12 index and we can just set this index to show that it is present. It would save lot of space as compared to HashMap as we don’t have to store actual keys in the data.
If a HashSet were used for a contains() lookup only, an optimization like that might be possible. It would still be dangerous, because hash collisions can always occur. I think what you are looking for is a Bloom Filter (note that a Bloom Filter doesn't give exact answers, it just rules out false negatives).
A Hash Set is a collection, and a collection needs to have a possibility to retrieve the values stored. Hashes are not reversible, you can't calculate the original string from its hash.
I have a Java class which contains two Strings, for example the name of a person and the name of the group.
I also have a list of groups (about 10) and a list of persons (about 100). The list of my data objects is larger, it can exceed 10.000 items.
Now I would like to search through my data objects such that I find all objects having a person from the person list and a group from the group list.
My question is: what is the best data structure for the person and group list?
I could use an ArrayList and simply iterate until I find a match, but that is obviously inefficient. A HashSet or HashMap would be much better.
Are there even more efficient ways to solve this? Please advise.
Every data structure has pro and cons.
A Map is used to retrieve data in O(1) if you have an access key.
A List is used to mantain an order between elements, but accessing an element using a key is not possible and you need to loop the whole list that happens in O(n).
A good data-structure for storing and lookup strings is a Trie:
It's essentially a tree structure which uses characters or substrings to denote paths to follow.
Advantages over hash-maps (quote from Wikipedia):
Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
There are no collisions of different keys in a trie.
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
A trie can provide an alphabetical ordering of the entries by key.
I agree with #Davide answer..If we want fast lookup as well as to maintain the order too, then we can go for LinkedHashMap implementation of Map.
By using it, we can have both things:
Data retrieval, If we have access key.
We can maintain the insertion order, so while iterating we will get the data in the same order as of during insertion.
Depending on the scenario (If you have the data before receiving lists of groups/people), preprocessing the data would save you time.
Comparing the data to the groups/people lists will require at least 10,000+ lookups. Comparing the groups/people lists to the data will require a max 10*100 = 1,000 lookups,.. less if you compare against each group one at a time (10+100 = 110 lookups).
I have coded a standard Hash Table class in java. It has a large array of buckets, and to insert, retrieve or delete elements, I simply calculate the hash of the element and look at the appropriate index in the array to get the right bucket.
However, I would like to implement some sort of iterator. Is there an other way than looping through all the indices in the array and ignoring those that are empty? Because my hash table might contain hundreds of empty entries, and only a few elements that have been hashed and inserted. Is there a O(n) way to iterate instead of O(size of table) when n<<size of table?
To implement findMin, I could simply save the smallest element each time I insert a new one, but I want to use the iterator approach.
Thanks!
You can maintain a linked list of the map entries, like LinkedHashMap does in the standard library.
Or you can make your hash table ensure that the capacity is always at most kn, for some suitable value of k. This will ensure iteration is linear in n.
You could store a sorted list of the non-empty buckets, and insert a bucket's id into the list (if it's not already there) when you insert something in the hash table.
But maybe it's not too expensive to search through a few hundred empty buckets, if it's not buried too deep inside a loop. A little inefficiency might be better than a more complex design.
If order is important to you you should consider using a Binary Search Tree (a left leaning red black tree for example) or a Skip List to implement your Dictionary. They are better for the job in these cases.
Consider an int array variable x[]. The variable X will have starting address reference. When array is accessed with index 2 that is x[2].then its memory location is calculated as
address of x[2] is starting addr + index * size of int.
ie. x[2]=x + 2*4.
But in case of hashmap how the memory address will be mapped internally.
By reading many previous posts I observed that HashMap uses a linked list to store the key value list. But if that is the case, then to find a key, it generates a hashcode then it will checks for equal hash code in list and retrieves the value..
This takes O(n) complexity.
If i am wrong in above observation Please correct me... I am a beginner.
Thank you
The traditional implementation of a HashMap is to use a function to generate a key, then use that key to access a value directly. Think of it as generating something that will translate into an array index. It does not look through the hashmap comparing element hashes to the generated hash; it generates the hash, and uses the hash to access an element directly.
What I think you're talking about is the case where two values in the HashMap generate the SAME key. THEN it uses a list of those, and has to look through them to determine which one it wants. But that's not O(n) where n is the number of elements in the HashMap, just O(m) where m is the number of elements with the same hash. Clearly the name of the game is to find a hash function where the generated hash is unique for all the elements, as much as is feasible.
--- edit to expand on the explanation ---
In your post, you state:
By reading many previous posts I observed that HashMap uses a linked
list to store the key value list.
This is wrong for the overall HashMap. For a HashMap to work reasonably, there must be a way to use the key to calculate a way to access the corresponding element directly, not by searching through all the values in the HashMap.
A "perfect" hash calculation would translate each possible key into hash value that was not calculated for any other key. This is usually not feasible, and so it is usually possible that two different keys will result in the same result from the hash calculation. In this case, the HashMap implementation could use a linked list of values, and would need to look through all such values to find the one that it was looking for. But this number is FAR less than the number of values in the overall HashMap.
You can make a hash where strings are the keys, and in which the first character of the string is converted to a number which is then used as an array index. As long as your strings all have different first characters, then accessing the value is a simple calc plus an array access -- O(1). Or you could add all the character values of the string indices together and take the last two (or three) digits, and THAT would be your hash calc. As long as the addition produced unique values for each index string, you don't ever have to look through a list; again, O(1).
And, in fact, as long as the hash calculation is approximately perfect, the lookup is still O(1) overall, because the limited number of times you have to look through a short list does not alter the overall efficiency.
I have a HashMap relating Keys to Strings, and I need to compare some of the Strings against each other. However, some of the Strings may or may not be in the HashMap.
Example: Let's say I have 4 Strings that I plan to compare to each other if possible, but only 3 of them end up in the HashMap. How can I compare the Strings that are present without trying to compare them to the String that isn't, and without doing a bunch of nested ifs and elses?
edit: Alohci's solution was easy and fast, and it worked.
Loop through the .values collection of the HashMap
Store the first entry.
Compare each remaining entry with the stored one.
As soon as you find one that doesn't match, throw your error.
If you reach the end of the loop then all the strings match.
It sounds like you need a reverse mapping, that maps all the values to their set of keys.
Map<Key,Value> forwardMap;
Map<Value, Set<Key> reverseMap;
You can then see if all of the entries you are looking at are in the set. Make sure that you put the reverse mapping in when you add/remove the forward mapping.
The benefit of this approach, is the test will be O(n) where n is the size of the keys you are testing, and not O(m) where m is the size of the forward map.