ArrayList Searching Multiple Words - java

int index = Collections.binarySearch(myList, SearchWord);
System.out.println(myList.get(index));
Actually, i stored 1 million words to Array List, now i need to search the particular word through the key. The result is not a single word, it may contain multiple words.
Example is suppose if i type "A" means output is [Aarhus, Aaron, Ababa,...]. The result depends on searching word. How can i do it and which sorting algorithm is best in Collections.

Options:
If you want to stick to the array list, sort it before you search. Then find first key that matches your search criteria and iterate from it onward until you find a key that does not match. Collect all matching keys into some buffer structure. Bingo, you have your answer.
Change data structure to a tree. Either
a simple binary tree - your get all keys sorted automatically. Navigate the tree in depth first fashion. Until you find a key that does not match.
a fancy trie structure. That way you get your get all keys sorted automatically plus you get a significant performance boost due to efficient storage. Rest is the same, navigate the tree, collect matching keys.

Related

Building a Java TreeMap from a sorted list of elements

I have a text file containing a sorted list of words being my dictionary.
I would like to use a TreeMap in order to have log(n) as average cost when I have to see if a words belongs to the dictionary or not (that is containsKey).
I have read of the Black-Read tree being behind the scenes of the TreeMap, so it is self balancing.
My question is: which is the best way to feed the TreeMap with the list of words?
I mean: feeding it with a sorted list should be the worst case scenario for a binary tree, because it have to balance almost every other word, haven't it?
The list of words can vary from 7K to 150K in number.
TreeMap hides its implementation details, as good OO design prescribes, so to really optimize for your use case will probably be hard.
However, if it is an option to read all items into an array/list before adding them to your TreeMap, you can add them "inside out": the middle element of the list will become the root, so add it first, and then recursively add the first half and second half in the same manner. In fact, this is the strategy that the TreeMap(SortedMap) constructor follows.
If it is not an option to read all items, I think you have no other option than to simply put your entries to the map consecutively, or write your own tree implementation so that you have more control over how to generate it. If you at least know the number of items beforehand, you should be able to generate a balanced tree without ever having to rebalance.
If you do not need the extra features of a TreeMap, you might also consider using a HashMap, which (given a good hash function for your keys) even has O(1) access.

Storing separate values for duplicate keys in a search tree

I'm trying to make a search tree that can store duplicate keys and have different values for said keys.
Basically, I have hashcode in the form of a bit string. The value is that bit string, and the key is the number of 1's that appear in the bit string.
For example, I could have two bit strings:
bitstring1 = "00001111";
bitstring2 = "11110000";
So they have identical keys:
key1 = 4;
key2 = 4;
Is it possible to implement this into a search tree?
One possibility is a binary search tree in which the nodes contain a list of values that match the key. That's a very simple addition to a standard binary search tree, but could become inefficient if there are a lot of values for a key because each access would require an O(n) (where n is the number of values for that key) sequential search of that key's values.
If lookups are much more frequent than additions or deletions, you could make the values list sorted. Insertions and deletions would still require that sequential scan, but lookups could do binary search. Depending on the application, this is a good compromise.
Another possibility is to make that list of values another binary search tree. So your nodes would contain a binary tree of values. That would be more efficient than a sorted list, but it would cost more memory.

Fastest way to build an index (list of substrings with lines of occurrence) of a String?

Problem:
Essentially, my goal is to build an ArrayList of IndexEntry objects from a text file. An IndexEntry has the following fields: String word, representing this unique word in the text file, and ArrayList numsList, a list containing the lines of the text file in which word occurs.
The ArrayList I build must keep the IndexEntries sorted so that their word fields are in alphabetical order. However, I want to do this in the fastest way possible. Currently, I visit each word as it appears in the text file and use binary search to determine if an IndexEntry for that word already exists in order to add the current line number to its numsList. In the case of an IndexEntry not existing I create a new one in the appropriate spot in order to maintain alphabetical order.
Example:
_
One
Two
One
Three
_
Would yield an ArrayList of IndexEntries whose output as a String (in the order of word, numsList) is:
One [1, 5], Three [7], Two [3]
Keep in mind that I am working with much larger text files, with many occurrences of the same word.
Question:
Is binary search the fastest way to approach this problem? I am still a novice at programming in Java, and am curious about searching algorithms that might perform better in this scenario or the relative time complexity of using a Hash Table when compared with my current solution.
You could try a TreeMap or a ConcurrentSkipListMap which will keep your index sorted.
However, if you only need a sorted list at the end of your indexing, good old HashMap<String, List> is the way to go (ArrayList as value is probably a safe bet as well)
When you are done, get the values of the map and sort them once by key.
Should be good enough for a couple hundred megabytes of text files.
If you are on Java 8, use the neat computeIfAbsent and computeIfPresent methods.

fastest Java collection for string lookup?

I have a Java class which contains two Strings, for example the name of a person and the name of the group.
I also have a list of groups (about 10) and a list of persons (about 100). The list of my data objects is larger, it can exceed 10.000 items.
Now I would like to search through my data objects such that I find all objects having a person from the person list and a group from the group list.
My question is: what is the best data structure for the person and group list?
I could use an ArrayList and simply iterate until I find a match, but that is obviously inefficient. A HashSet or HashMap would be much better.
Are there even more efficient ways to solve this? Please advise.
Every data structure has pro and cons.
A Map is used to retrieve data in O(1) if you have an access key.
A List is used to mantain an order between elements, but accessing an element using a key is not possible and you need to loop the whole list that happens in O(n).
A good data-structure for storing and lookup strings is a Trie:
It's essentially a tree structure which uses characters or substrings to denote paths to follow.
Advantages over hash-maps (quote from Wikipedia):
Looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
There are no collisions of different keys in a trie.
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
A trie can provide an alphabetical ordering of the entries by key.
I agree with #Davide answer..If we want fast lookup as well as to maintain the order too, then we can go for LinkedHashMap implementation of Map.
By using it, we can have both things:
Data retrieval, If we have access key.
We can maintain the insertion order, so while iterating we will get the data in the same order as of during insertion.
Depending on the scenario (If you have the data before receiving lists of groups/people), preprocessing the data would save you time.
Comparing the data to the groups/people lists will require at least 10,000+ lookups. Comparing the groups/people lists to the data will require a max 10*100 = 1,000 lookups,.. less if you compare against each group one at a time (10+100 = 110 lookups).

Hash Table implement iterate and findMin

I have coded a standard Hash Table class in java. It has a large array of buckets, and to insert, retrieve or delete elements, I simply calculate the hash of the element and look at the appropriate index in the array to get the right bucket.
However, I would like to implement some sort of iterator. Is there an other way than looping through all the indices in the array and ignoring those that are empty? Because my hash table might contain hundreds of empty entries, and only a few elements that have been hashed and inserted. Is there a O(n) way to iterate instead of O(size of table) when n<<size of table?
To implement findMin, I could simply save the smallest element each time I insert a new one, but I want to use the iterator approach.
Thanks!
You can maintain a linked list of the map entries, like LinkedHashMap does in the standard library.
Or you can make your hash table ensure that the capacity is always at most kn, for some suitable value of k. This will ensure iteration is linear in n.
You could store a sorted list of the non-empty buckets, and insert a bucket's id into the list (if it's not already there) when you insert something in the hash table.
But maybe it's not too expensive to search through a few hundred empty buckets, if it's not buried too deep inside a loop. A little inefficiency might be better than a more complex design.
If order is important to you you should consider using a Binary Search Tree (a left leaning red black tree for example) or a Skip List to implement your Dictionary. They are better for the job in these cases.

Categories

Resources