(Java) data structure for fast insertion, deletion, and RANDOM SELECTION

(Java) data structure for fast insertion, deletion, and RANDOM SELECTION - java

I need a data structure that supports the following operations in O(1):
myList.add(Item)
myList.remove(Item.ID) ==> It actually requires random access
myList.getRandomElement() (with equal probability)
--(Please note that getRandomElement() does not mean random access, it just means: "Give me one of the items at random, with equal probability")
Note that my items are unique, so I don't care if a List or Set is used.
I checked some java data structures, but it seems that none of them is the solution:
HashSet supports 1,2 in O(1), but it cannot give me a random element in O(1). I need to call mySet.iterator().next() to select a random element, which takes O(n).
ArrayList does 1,3 in O(1), but it needs to do a linear search to find the element I want to delete, though it takes O(n)
Any suggestions? Please tell me which functions should I call?
If java does not have such data structure, which algorithm should I use for such purpose?

You can use combination of HashMap and ArrayList if memory permits as follows:-
Store numbers in ArrayList arr as they come.
Use HashMap to give mapping arr[i] => i
While generating random select random form arrayList
Deleting :-
check in HashMap for num => i
swap(i,arr.size()-1)
HashMap.remove(num)
HashMap(arr[i])=> i
arr.remove(arr.size()-1)
All operation are O(1) but extra O(N) space

You can use a HashMap (of ID to array index) in conjunction with an array (or ArrayList).
add could be done in O(1) by simply adding to the array and adding the ID and index to the HashMap.
remove could be done in O(1) by doing a lookup (and removal) from the HashMap to find the index, then move the last index in the array to that index, update that element's index in the HashMap and decreasing the array size by one.
getRandomElement could be done in O(1) by returning a random element from the array.
Example:
Array: [5,3,2,4]
HashMap: [5->0, 3->1, 2->2, 4->3]
To remove 3:
Look up (and remove) key 3 in the HashMap (giving 3->1)
Swap 3 and, the last element, 4 in the array
Update 4's index in the HashMap to 1
Decrease the size of the array by 1
Array: [5,4,2]
HashMap: [5->0, 2->2, 4->1]
To add 6:
Simply add it to the array and HashMap
Array: [5,4,2,6]
HashMap: [5->0, 2->2, 4->1, 6->3]

Related

Iteration through a HashMap (Complexity)

From what I understand, the time complexity to iterate through a Hash table with capacity "m" and number of entries "n" is O(n+m). I was wondering, intuitively, why this is the case? For instance, why isn't it n*m?
Thanks in advance!

You are absolutely correct. Iterating a HashMap is an O(n + m) operation, with n being the number of elements contained in the HashMap and m being its capacity. Actually, this is clearly stated in the docs:
Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings).
Intuitively (and conceptually), this is because a HashMap consists of an array of buckets, with each element of the array pointing to either nothing (i.e. to null), or to a list of entries.
So if the array of buckets has size m, and if there are n entries in the map in total (I mean, n entries scattered throughout all the lists hanging from some bucket), then, iterating the HashMap is done by visiting each bucket, and, for buckets that have a list with entries, visiting each entry in the list. As there are m buckets and n elements in total, iteration is O(m + n).
Note that this is not the case for all hash table implementations. For example, LinkedHashMap is like a HashMap, except that it also has all its entries connected in a doubly-linked list fashion (to preserve either insertion or access order). If you are to iterate a LinkedHashMap, there's no need to visit each bucket. It would be enough to just visit the very first entry and then follow its link to the next entry, and then proceed to the next one, etc, and so on until the last entry. Thus, iterating a LinkedHashMap is just O(n), with n being the total number of entries.

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings)
n = the number of buckets
m = the number of key-value mappings
The complexity of a Hashmap is O(n+m) because the worst-case scenario is one array element contains the whole linked list, which could occur due to flawed implementation of hashcode function being used as a key.
Visialise the worst-case scenario
To iterate this scenario, first java needs to iterate the complete array O(n) and then iterate the linked list O(m), combining these gives us O(n+m).

How to implement a HashMap data structure in Java?

I want to implement a HashMap data structure, but I can't quite figure out what to do with underlying array structure.
If I'm not mistaken, in HashMap each key is hashed and converted into an integer which is used to refer to the array index. Search time is O(1) because of direct referring.
Let's say K is key and V is value. We can create an array of size n of V type and it will reside in the index produced by hash(K) function. However, hash(K) doesn't produce consecutive indices and Java's arrays aren't sparse and to solve this we can create a very large array to hold elements, but it won't be efficient, it will hold a lot of NULL elements.
One solution is to store elements in consecutive order, and for finding an element we have to search the entire array and check elements' keys but it will be linear time. I want to achieve direct access.
Thanks, beforehand.

Borrowed from the Wikipedia article on hash tables, you can use a smaller array for underlying storage by taking the hash modulo the array size like so:
hash = hashfunc(key)
index = hash % array_size
This is a good solution because you can keep the underlying array relatively dense, you don't have to modify the hash funciton, and it does not affect the time complexity.

You can look at the source code for all your problems.
For the sake of not going through a pretty large code. The solution to your problem is to use modulo. You can choose n=64, then store an element x with h(x) mod 64 = 2 in the second position of the array, ...
If two elements have the same hash modulo n, then you store them next to each other (usually done in a tree map). Another solution would be to increase n.

Hash Table implement iterate and findMin

I have coded a standard Hash Table class in java. It has a large array of buckets, and to insert, retrieve or delete elements, I simply calculate the hash of the element and look at the appropriate index in the array to get the right bucket.
However, I would like to implement some sort of iterator. Is there an other way than looping through all the indices in the array and ignoring those that are empty? Because my hash table might contain hundreds of empty entries, and only a few elements that have been hashed and inserted. Is there a O(n) way to iterate instead of O(size of table) when n<<size of table?
To implement findMin, I could simply save the smallest element each time I insert a new one, but I want to use the iterator approach.
Thanks!

You can maintain a linked list of the map entries, like LinkedHashMap does in the standard library.
Or you can make your hash table ensure that the capacity is always at most kn, for some suitable value of k. This will ensure iteration is linear in n.

You could store a sorted list of the non-empty buckets, and insert a bucket's id into the list (if it's not already there) when you insert something in the hash table.
But maybe it's not too expensive to search through a few hundred empty buckets, if it's not buried too deep inside a loop. A little inefficiency might be better than a more complex design.

If order is important to you you should consider using a Binary Search Tree (a left leaning red black tree for example) or a Skip List to implement your Dictionary. They are better for the job in these cases.

Having a huge list of numbers and an order with unique order numbers, how to make both O(1) accessible?

Imagine I have a huge list with values
123
567
2355
479977
....
These are say ordered ascending
so
123 - 1
567 - 2
2355 - 3
479977 - 4
...
I want to have a single object that gives me access with the order number (1 or 2 or 3 ...) for the value, as well as with the actual value (123 or 567 or ...) for the order number. Does a structure like this exist?
EDIT: insertions and deletions should be possible.
If I have 2 Hashmaps, I need twice the memory and have to perform the operations twice.

You can maintain a ArrayList<Integer> which has O(1) index lookup to store all your ints and the (index -> int) relationship and a HashMap<Integer, Integer> which also has O(1) lookup to store the (int -> index) relationship.
Doing so, you have O(1) for each lookup direction.

If you can have multiple datastructures, I would recommend having 2 Maps :
1) Since the data is in an Array, you can directly access the values present at a given order (position). considering the update in the question that insertions and deletions need to supported, this can be achieved by using a HashMap where key = order, value = value at that order
2) Another reverse HashMap where key = value , value = order
Then you can have O(1) lookup time for both the cases.

If I understand correctly, you're looking for something like a HashMap. You can read about them on Oracle's website: JavaDocs

Using a sorted Tree, you can get log(n) time for both. Understanding that O(log(n)) is not O(1), it's definitely on the order of. For instance if you have 4 billion elements, it would be 1 (theoretically, though not always) vs 32.
If you need both to be O(1), you should create an array of sorted ints / longs / BigIntegers or whatever you need for lookup one way, and then a HashMap of ints / longs / BigIntegers to its position index

Answer for static data:
O(1) is not possible for value to order. (A hashmap has not true O(1) compexity)
But O(1) is easy for order to value.
About value to order:
The Order will be about log2(N). You need either a binary search (log2 (n)) or a hashmap with similar effort. I expect the binSearch to be the fastest (less Object overhead) when using the standard HashMap implementation, where you map the number with its order value.
Update: Why the binary Search:
With binary search you find the array position of the element, which is equal to the order. This does not need any additonal memory, and need far less memory than the std hashMap implementation.
for dynamic data, it depens how often one inserts and how often one searches.
for high number of searches and few insertions I wouldstay with Array(List).
About order to value:
Simply store values ascening in an array (Or ArrayList) a[]; a[orderNr] gives the value.
.

picking without replacement in java

I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).

HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];

Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps

Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.

How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.