Storing separate values for duplicate keys in a search tree - java

I'm trying to make a search tree that can store duplicate keys and have different values for said keys.
Basically, I have hashcode in the form of a bit string. The value is that bit string, and the key is the number of 1's that appear in the bit string.
For example, I could have two bit strings:
bitstring1 = "00001111";
bitstring2 = "11110000";
So they have identical keys:
key1 = 4;
key2 = 4;
Is it possible to implement this into a search tree?

One possibility is a binary search tree in which the nodes contain a list of values that match the key. That's a very simple addition to a standard binary search tree, but could become inefficient if there are a lot of values for a key because each access would require an O(n) (where n is the number of values for that key) sequential search of that key's values.
If lookups are much more frequent than additions or deletions, you could make the values list sorted. Insertions and deletions would still require that sequential scan, but lookups could do binary search. Depending on the application, this is a good compromise.
Another possibility is to make that list of values another binary search tree. So your nodes would contain a binary tree of values. That would be more efficient than a sorted list, but it would cost more memory.

Related

How to implement a HashMap data structure in Java?

I want to implement a HashMap data structure, but I can't quite figure out what to do with underlying array structure.
If I'm not mistaken, in HashMap each key is hashed and converted into an integer which is used to refer to the array index. Search time is O(1) because of direct referring.
Let's say K is key and V is value. We can create an array of size n of V type and it will reside in the index produced by hash(K) function. However, hash(K) doesn't produce consecutive indices and Java's arrays aren't sparse and to solve this we can create a very large array to hold elements, but it won't be efficient, it will hold a lot of NULL elements.
One solution is to store elements in consecutive order, and for finding an element we have to search the entire array and check elements' keys but it will be linear time. I want to achieve direct access.
Thanks, beforehand.
Borrowed from the Wikipedia article on hash tables, you can use a smaller array for underlying storage by taking the hash modulo the array size like so:
hash = hashfunc(key)
index = hash % array_size
This is a good solution because you can keep the underlying array relatively dense, you don't have to modify the hash funciton, and it does not affect the time complexity.
You can look at the source code for all your problems.
For the sake of not going through a pretty large code. The solution to your problem is to use modulo. You can choose n=64, then store an element x with h(x) mod 64 = 2 in the second position of the array, ...
If two elements have the same hash modulo n, then you store them next to each other (usually done in a tree map). Another solution would be to increase n.

Approximate hashtable, does such a data structure exist?

I don't figure out how to implement a special hash table.
The idea would be that the hash table gives an approximate
match. So a perfect hash table (such as found in java.util)
just gives a map, such that:
Hashtable h = new Hashtable();
...
x = h.get(y);
If x is the result of applying the map h to the argument y,
i.e. basically in mathematics it would be a function
namely x = h(y). Now for the approximate match, what about a
data structure that gives me quickly:
x = h(k) where k=max { z<=y | h(z)!=null }
The problem is k can be very far away from the given y. For example
y could be 2000, and the next occupied slot k could be 1000. Some
linear search would be costly, the data structure should do the job
more quickly.
I know how to do it with a tree(*), but something with a hash, can this
also work? Or maybe combine some tree and hash properties in the sought
of data structure? Some data structure that tends toward O(1) access?
Bye
(*) You can use a tree ordered by y, and find something next below or equal y.
This is known as Spatial hashing. Keep in mind it has to be tailored for your specific domain.
It can be used when the hash tells you something about logical arrangement of objects. So when |hash(a)-hash(b)| < |hash(a)-hash(c)| means b is closer/more similar to a than c is.
Then the basic idea is that you divide the space into buckets (e.g. drop the least significant digits of the hash -- the naive approach) and your spatial hash is this bucket ID. You have to take care of the edge cases, when the objects are very near to each other, while being on the boundary of the buckets (e.g. h(1999) = 1 but h(2000)=2). You can solve this by two overlapping hashes and having two separate hash maps for them and querying both of them, or looking to the neighboring buckets etc...
As I sais in the beginning, this has to be thought through very well.
The tree (e.g. KD-tree for higher dimensions) isn't so demanding in the design phase and is generally a more convenient approach to nearest neighbor(s) querying.
The specific formula you give suggests you want a set that can retrieve the greatest item less than a given input.
One simple approach to achieving that would be to keep a sorted list of the items, and perform a binary search to locate the position in the list at which the given element would be inserted, then return the element equal to or less than that element.
As always, any set can be converted into a map by using a pair object to wrap the key-value pair, or by maintaining a parallel data structure for the values.
For an array-based approach, the runtime will be O(log n) for retrieval and O(n) for insertion of a single element. If 'add all' sorts the added elements and then merges them, it can be O(n log n).
It's not possible1 to have a constant-time algorithm that can answer what the first element less than a given element is using a hashing approach; a good hashing algorithm spreads out similar (but non-equal) items, to avoid having many similar items fall into the same bucket and destroy the desired constant-time retrieval behavior, this means the elements of a hash set (or map) are very deliberately not even remotely close to sorted order, they are as close to randomly distributed as we could achieve while using an efficient repeatable hashing algorithm.
1. Of course, proving that it's not possible is difficult, since one can't easily prove that there isn't a simple repeatable constant-time request that will reliably convince an oracle (or God, if God were that easy to manipulate) to give you the answer to the question you want, but it seems unlikely.

fastest Java collection for string lookup?

I have a Java class which contains two Strings, for example the name of a person and the name of the group.
I also have a list of groups (about 10) and a list of persons (about 100). The list of my data objects is larger, it can exceed 10.000 items.
Now I would like to search through my data objects such that I find all objects having a person from the person list and a group from the group list.
My question is: what is the best data structure for the person and group list?
I could use an ArrayList and simply iterate until I find a match, but that is obviously inefficient. A HashSet or HashMap would be much better.
Are there even more efficient ways to solve this? Please advise.
Every data structure has pro and cons.
A Map is used to retrieve data in O(1) if you have an access key.
A List is used to mantain an order between elements, but accessing an element using a key is not possible and you need to loop the whole list that happens in O(n).
A good data-structure for storing and lookup strings is a Trie:
It's essentially a tree structure which uses characters or substrings to denote paths to follow.
Advantages over hash-maps (quote from Wikipedia):
Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
There are no collisions of different keys in a trie.
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
A trie can provide an alphabetical ordering of the entries by key.
I agree with #Davide answer..If we want fast lookup as well as to maintain the order too, then we can go for LinkedHashMap implementation of Map.
By using it, we can have both things:
Data retrieval, If we have access key.
We can maintain the insertion order, so while iterating we will get the data in the same order as of during insertion.
Depending on the scenario (If you have the data before receiving lists of groups/people), preprocessing the data would save you time.
Comparing the data to the groups/people lists will require at least 10,000+ lookups. Comparing the groups/people lists to the data will require a max 10*100 = 1,000 lookups,.. less if you compare against each group one at a time (10+100 = 110 lookups).

ArrayList Searching Multiple Words

int index = Collections.binarySearch(myList, SearchWord);
System.out.println(myList.get(index));
Actually, i stored 1 million words to Array List, now i need to search the particular word through the key. The result is not a single word, it may contain multiple words.
Example is suppose if i type "A" means output is [Aarhus, Aaron, Ababa,...]. The result depends on searching word. How can i do it and which sorting algorithm is best in Collections.
Options:
If you want to stick to the array list, sort it before you search. Then find first key that matches your search criteria and iterate from it onward until you find a key that does not match. Collect all matching keys into some buffer structure. Bingo, you have your answer.
Change data structure to a tree. Either
a simple binary tree - your get all keys sorted automatically. Navigate the tree in depth first fashion. Until you find a key that does not match.
a fancy trie structure. That way you get your get all keys sorted automatically plus you get a significant performance boost due to efficient storage. Rest is the same, navigate the tree, collect matching keys.

How key is mapped to value in a hasmap in java with O(1) get() method?

Consider an int array variable x[]. The variable X will have starting address reference. When array is accessed with index 2 that is x[2].then its memory location is calculated as
address of x[2] is starting addr + index * size of int.
ie. x[2]=x + 2*4.
But in case of hashmap how the memory address will be mapped internally.
By reading many previous posts I observed that HashMap uses a linked list to store the key value list. But if that is the case, then to find a key, it generates a hashcode then it will checks for equal hash code in list and retrieves the value..
This takes O(n) complexity.
If i am wrong in above observation Please correct me... I am a beginner.
Thank you
The traditional implementation of a HashMap is to use a function to generate a key, then use that key to access a value directly. Think of it as generating something that will translate into an array index. It does not look through the hashmap comparing element hashes to the generated hash; it generates the hash, and uses the hash to access an element directly.
What I think you're talking about is the case where two values in the HashMap generate the SAME key. THEN it uses a list of those, and has to look through them to determine which one it wants. But that's not O(n) where n is the number of elements in the HashMap, just O(m) where m is the number of elements with the same hash. Clearly the name of the game is to find a hash function where the generated hash is unique for all the elements, as much as is feasible.
--- edit to expand on the explanation ---
In your post, you state:
By reading many previous posts I observed that HashMap uses a linked
list to store the key value list.
This is wrong for the overall HashMap. For a HashMap to work reasonably, there must be a way to use the key to calculate a way to access the corresponding element directly, not by searching through all the values in the HashMap.
A "perfect" hash calculation would translate each possible key into hash value that was not calculated for any other key. This is usually not feasible, and so it is usually possible that two different keys will result in the same result from the hash calculation. In this case, the HashMap implementation could use a linked list of values, and would need to look through all such values to find the one that it was looking for. But this number is FAR less than the number of values in the overall HashMap.
You can make a hash where strings are the keys, and in which the first character of the string is converted to a number which is then used as an array index. As long as your strings all have different first characters, then accessing the value is a simple calc plus an array access -- O(1). Or you could add all the character values of the string indices together and take the last two (or three) digits, and THAT would be your hash calc. As long as the addition produced unique values for each index string, you don't ever have to look through a list; again, O(1).
And, in fact, as long as the hash calculation is approximately perfect, the lookup is still O(1) overall, because the limited number of times you have to look through a short list does not alter the overall efficiency.

Categories

Resources