This question already has answers here:
How does a hash table work?
(17 answers)
Closed 8 years ago.
If hash table is an array of linked-list elements and hash code is index of the element in the array then why hash table does not maintain the order of insertion ??
To be brief, a hash table (dictionary) does not maintain a total order of insertions because it doesn't need to. The abstract data type supports ammortized O(1) insertions, deletions, and searches, but does not support enumeration, and does not impose any order on the elements in the key set.
HashTable implements a dictionary, and total order of insertions is not retained because insertions with different hash values map to different chains. In the case of a dictionary implementation using chaining, keys with colliding hashes are stored in a linked list (as stated in the question), and are indeed maintained in order of insertion. There exist many other (faster) implementations of dictionaries that do not have this property. Please see the thees lecture notes for a discussion of open addressing (a dictionary implementation pattern that does not retain insertion order of colliding elements). Open addressing with double hashing, in particular, does not impose any order on the elements stored in the key set.
Because it's designed so, try LinkedHashSet instead
Related
I have a Java class which contains two Strings, for example the name of a person and the name of the group.
I also have a list of groups (about 10) and a list of persons (about 100). The list of my data objects is larger, it can exceed 10.000 items.
Now I would like to search through my data objects such that I find all objects having a person from the person list and a group from the group list.
My question is: what is the best data structure for the person and group list?
I could use an ArrayList and simply iterate until I find a match, but that is obviously inefficient. A HashSet or HashMap would be much better.
Are there even more efficient ways to solve this? Please advise.
Every data structure has pro and cons.
A Map is used to retrieve data in O(1) if you have an access key.
A List is used to mantain an order between elements, but accessing an element using a key is not possible and you need to loop the whole list that happens in O(n).
A good data-structure for storing and lookup strings is a Trie:
It's essentially a tree structure which uses characters or substrings to denote paths to follow.
Advantages over hash-maps (quote from Wikipedia):
Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
There are no collisions of different keys in a trie.
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
A trie can provide an alphabetical ordering of the entries by key.
I agree with #Davide answer..If we want fast lookup as well as to maintain the order too, then we can go for LinkedHashMap implementation of Map.
By using it, we can have both things:
Data retrieval, If we have access key.
We can maintain the insertion order, so while iterating we will get the data in the same order as of during insertion.
Depending on the scenario (If you have the data before receiving lists of groups/people), preprocessing the data would save you time.
Comparing the data to the groups/people lists will require at least 10,000+ lookups. Comparing the groups/people lists to the data will require a max 10*100 = 1,000 lookups,.. less if you compare against each group one at a time (10+100 = 110 lookups).
I have coded a standard Hash Table class in java. It has a large array of buckets, and to insert, retrieve or delete elements, I simply calculate the hash of the element and look at the appropriate index in the array to get the right bucket.
However, I would like to implement some sort of iterator. Is there an other way than looping through all the indices in the array and ignoring those that are empty? Because my hash table might contain hundreds of empty entries, and only a few elements that have been hashed and inserted. Is there a O(n) way to iterate instead of O(size of table) when n<<size of table?
To implement findMin, I could simply save the smallest element each time I insert a new one, but I want to use the iterator approach.
Thanks!
You can maintain a linked list of the map entries, like LinkedHashMap does in the standard library.
Or you can make your hash table ensure that the capacity is always at most kn, for some suitable value of k. This will ensure iteration is linear in n.
You could store a sorted list of the non-empty buckets, and insert a bucket's id into the list (if it's not already there) when you insert something in the hash table.
But maybe it's not too expensive to search through a few hundred empty buckets, if it's not buried too deep inside a loop. A little inefficiency might be better than a more complex design.
If order is important to you you should consider using a Binary Search Tree (a left leaning red black tree for example) or a Skip List to implement your Dictionary. They are better for the job in these cases.
I'm looking at this website that lists Big O complexities for various operations. For Dynamic Arrays, the removal complexity is O(n), while for Hash Tables it's O(1).
For Dynamic Arrays like ArrayLists to be O(n), that must mean the operation of removing some value from the center and then shifting each index over one to keep the block of data contiguous. Because if we're just deleting the value stored at index k and not shifting, it's O(1).
But in Hash Tables with linear probing, deletion is the same thing, you just run your value through the Hash function, go to the Dynamic Array holding your data, and delete the value stored in it.
So why do Hash Tables get O(1) credit while Dynamic Arrays get O(n)?
This is explained here. The key is that the number of values per Dynamic Array is kept under a constant value.
Edit: As Dukeling pointed out, my answer explains why a Hash Table with separate chaining has O(1) removal complexity. I should add that, on the website you were looking at, Hash Tables are credited with O(1) removal complexity because they analyse a Hash Table with separate chaining and not linear probing.
The point of hash tables is that they keep close to the best case, where the best case means a single entry per bucket. Clearly, you have no trouble accepting that to remove the sole entry from a bucket takes O(1) time.
When there are many hash conflicts, you certainly need to do a lot of shifting when using linear probing.
But the complexity for hash tables are under the assumption of Simply Uniform Hashing, meaning that it assumes that there will be a minimal number of hash conflicts.
When this happens, we only need to delete some value and shift either no values or a small (essentially constant) amount of values.
When you talk about the complexity of an algorithm, you actually need to discuss a concrete implementation.
There is no Java class called a "Hash Table" (obviously!) or "HashTable".
There are Java classes called HashMap and Hashtable, and these do indeed have O(1) deletion.
But they don't work the way that you seem to think (all?) hash tables work. Specifically, HashMap and Hashtable are organized as an array of pointers to "chains".
This means that deletion consists of finding the appropriate chain, and then traversing the chain to find the entry to remove. The first step is constant time (including the time to calculate the hash code. The second step is proportional to the length of the hash chains. But assuming that the hash function is good, the average length of the hash chain is a small constant. Hence the total time for deletion is O(1) on average.
The reason that the hash chains are short on average is that the HashMap and Hashtable classes automatically resize the main hash array when the "load factor" (the ratio of the array size to the number of entries) exceeds a predetermined value. Assuming that the hash function distributes the (actual) keys pretty evenly, you will find that the chains are roughly the same length. Assuming that the array size is proportional to the total number of entries, the actual load factor will the average hash chain length.
This reasoning breaks down if the hash function does not distribute the keys evenly. This leads to a situation where you get a lot of hash collisions. Indeed, the worst-case behaviour is when all keys have the same hash value, and they all end up on a single hash chain with all N entries. In that case, deletion involves searching a chain with N entries ... and that makes it O(N).
It turns out that the same reasoning can be applied to other forms of hash table, including those where the entries are stored in the hash array itself, and collisions are handled by rehashing scanning. (Once again, the "trick" is to expand the hash table when the load factor gets too high.)
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
The fundamentals of Hash tables?
I am trying to implement a Simple Hash Table probably with simple java arrays. But 1st I will need to somehow have an associative array or sorts? How might a simple Hash Table implementation look like? It should still be able to do add/delete/get in O(1)
A hash table basically takes an input key, hashes it with a function to find a bucket ID, and then uses that bucket ID to either store or retrieve the data associated with that key.
In other words, for your case, you just have to provide a hashing function on your data that will give you a bucket ID of your array index.
Perhaps the simplest (and most naive) would be exclusive ORing together all the characters of your key then doing a modulus operation to bring it to the desired range. For example, say you have a structure containing:
Name
Address
Phone
All sorts of other details
You could generate a hash as follows:
set hashval to zero
for each character in Name:
hashval = hashval xor character
hashval = hashval mod 256
This would give you a bucket ID of between 0 and 255 inclusive.
Just keep in mind that the bucket may contain more than one item so you can't just use the bucket ID as an array index. Each bucket will need to be a structure containing possibly multiple items (such as a linked list or even another hashtable).
Read any text book on data structures and algorithms, or just Wikipedia "Hash Table" entry
The implementation which comes packaged with your JDK is pretty good for self study (though I admit in no way minimalistic). Have a look at it here.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What exactly are hashtables?
I understand the purpose of using hash functions to securely store passwords. I have used arrays and arraylists for class projects for sorting and searching data. What I am having trouble understanding is the practical value of hashtables for something like sorting and searching.
I got a lecture on hashtables but we never had to use them in school, so it hasn't clicked. Can someone give me a practical example of a task a hashtable is useful for that couldn't be done with a numerical array or arraylist? Also, a very simple low level example of a hash function would be helpful.
There are all sorts of collections out there. Collections are used for storing and retrieving things, so one of the most important properties of a collection is how fast these operations are. To estimate "fastness" people in computer science use big-O notation which sort of means how many individual operations you have to accomplish to invoke a certain method (be it get or set for example). So for example to get an element of an ArrayList by an index you need exactly 1 operation, this is O(1), if you have a LinkedList of length n and you need to get something from the middle, you'll have to traverse from the start of the list to the middle, taking n/2 operations, in this case get has complexity of O(n). The same comes to key-value stores as hastable. There are implementations that give you complexity of O(log n) to get a value by its key whereas hastable copes in O(1). Basically it means that getting a value from hashtable by its key is really cheap.
Basically, hashtables have similar performance characteristics (cheap lookup, cheap appending (for arrays - hashtables are unordered, adding to them is cheap partly because of this) as arrays with numerical indices, but are much more flexible in terms of what the key may be. Given a continuous chunck of memory and a fixed size per item, you can get the adress of the nth item very easily and cheaply. That's thanks to the indices being integers - you can't do that with, say, strings. At least not directly. Hashes allows reducing any object (that implements it) to a number and you're back to arrays. You still need to add checks for hash collisions and resolve them (which incurs mostly a memory overhead, since you need to store the original value), but with a halfway decent implementation, this is not much of an issue.
So you can now associate any (hashable) object with any (really any) value. This has countless uses (although I have to admit, I can't think of one that's applyable to sorting or searching). You can build caches with small overhead (because checking if the cache can help in a given case is O(1)), implement a relatively performant object system (several dynamic languages do this), you can go through a list of (id, value) pairs and accumulate the values for identical ids in any way you like, and many other things.
Very simple. Hashtables are often called "associated arrays." Arrays allow access your data by index. Hash tables allow access your data by any other identifier, e.g. name. For example
one is associated with 1
two is associated with 2
So, when you got word "one" you can find its value 1 using hastable where key is one and value is 1. Array allows only opposite mapping.
For n data elements:
Hashtables allows O(k) (usually dependent only on the hashing function) searches. This is better than O(log n) for binary searches (which follow an n log n sorting, if data is not sorted you are worse off)
However, on the flip side, the hashtables tend to take roughly 3n amount of space.