I need to add the strings in Data structure(DS). Later I need to find the string and then remove it based on same condition.
Hashset can be best fit here as it provides O(1) complexity for search and removal of given element also will just require updating parent
right or left node. In Arraylist/Array it will be O(n) for search and same for removal.
Per my understanding Hashset will be better here as i need to search a large number of elements and if found remove it.
My question :- Is Hashset or some other DS better here ?
Usually such tasks are best handled by Trie data structure and the variations of it.
Alternatively you can use a hash table, however it doesn't guarantee worst-case complexity.
As usual, it depends on your needs:
if you just need to add String instances, retrieve and delete them, this is no brainer - HashSet is your choice (TreeSet has worse asymptotic complexity and it is suitable in case that you need order of String instances for some reason, e.g. alphabetically)
if you wish to store and efficiently search all String instances with the specified prefix, use Trie as correctly mentioned in Serge Rogatch's answer
if you wish to check if pattern exists in the specified String instance or not, use Suffix tree
Related
I need to have a data structure mapped by the name of the person, which implies that duplicate keys will have to be stored. I would like to have log(n) times (at most) for insertions, deletions and searches.
Ideally, I would have a Hashtable mapped by a unique identifier, that will be generated upon insertion. By doing so, I have insertions in constant time. With an auxiliary ordered Balanced Tree with each entry being a reference to an entry in the Hashtable, I would be able to search/delete by name in logarithmic time, and print all the entries in linear time, ordered by name.
Is there a way to do this in Java by reusing the available Collections? Or at least a solution with similar complexity...
In this question, it is suggested that a Map will solve this problem. But from my understanding, no Map can deal with repeated keys properly.
If I understand the question right, you want a MultiMap. There are a few implementations in Guava and Commons-Lang, or you could roll your own using e.g. a HashMap (or TreeMap if you suspect your .hashCode() implementation is too slow or too poor, but I'd benchmark first) and a List or Set implementation as the value types.
Note that if you want to iterate over the values based on Name order, you're best off using a Tree-based Multimap: If the name is already the key, your order will be what you want without needing to do anything else.
As you said yourself, Map will help you. For repeated keys, you should store list of Objects as values instead of Objects themselves. You can very well levarage Generics for this purpose to either store Object or List of Objects in the Value.
Edit: This degenerates to O(n) complexity in worst case for search (All entries have same name), but this case should be very very rare, so should probably ignore it.
If you agree to generating a unique ID per entity, then you should use a composite key consisting of the name and the unique ID. Make the key comparable by name, then by ID to break ties.
Then you just use a plain TreeMap for storage and retrieval. Especially take note of the NavigableMap API, which will allow you to find the best match for a key disregarding the differences in the ID part.
I have this.
private ArrayList<String> words;
It's a dictionary, so the words are already sorted. By old study I know that a binomial search should be really very very quick, I suppose that Java already implements what is necessary.
So, what is the most efficient way of finding if a certain string exists inside a sorted ArrayList ?
Or should I use a different type?
Thank you.
Or should I use a different type?
Try using a HashSet<String> instead. Its contains method has O(1) lookup assuming that there are not too many hash collisions. From the documentation:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.
A binary search on a sorted ArrayList is only O(log n). This is still very fast, but it is not as fast as using a HashSet.
A binary search will be the fastest in a sorted array. Testing for existence can be done in constant time if you are using a hash set.
Depends how many times you're going to try and find a specific string. You might want to try a HashMap<String, String> as this will remain fast as the map grows.
If you're going to be doing binary searches I would suggest you reorganize your data into a Binary Search Tree
ArrayLists are often used for sequential operations and random access. If you're going to be doing a search and want the fastest lookup its best to organize your data from the get go. This also has the advantage of facilitating faster inserts/removals and all other operations you'd be hoping to accomplish in the fastest possible time.
There are tons of guides on google and elsewhere to get you started.
I know you can find the first and last elements in a treeset. What if I wanted to know what the second or third element was without iterating? Or, more preferable, given an element, figure out it's rank in the treeset.
Thanks
EDIT: I think you can do it using tailset, ie. compare the size of the original set with the size of the tailset. How efficient is tailset?
TreeSet does not provide an efficient rank method. I suspect (you can confirm by looking at its source) that TreeSet does not even maintain any extra bits of information (i.e. counts of elements on the left and right subtrees of each node) that one would need to perform such queries in O(log(n)) time. So there does not appear to be any fast method of finding the rank of an element of TreeSet.
If you really really need it, you can either implement your own SortedSet with a balanced binary search tree which allows such queries or modify the TreeSet implementation to create a new implementation which is augmented to allow such queries. Refer to the chapter on augmenting data structures in CLRS for more details about how this can actually be done.
According to the source of the Sun JDK6 implementation, tailSet(E).size() iterates over the tail set and counts the elements, so this call is O(tail set size).
There is no other way than Iterator.
Edited:
Try this:
treeSet.higher(treeSet.first());
This should give second element on TreeSet. I'm not sure if this is more optimized then just using Iterator.
I'm working with a very large (custom Object) linkedlist, and I'm trying to determine if an object that I'm trying to add to the list is already in there.
The issue is that the item I am searching for is a unique object containing:
A 1st String
A 2nd String
A unique Count #
I'm trying to find out if there is an item in my linked list that contains the (1st String) and (2nd String), but ignore (the unique Count #).
This can be done the dumb way (the way I tried it first) by going through each individual linkedlist item - but this takes way too long. I'm trying to speed it up! I figured using (indexOf) would help, but I don't know how I can customize what it is searching for.
Any ideas?
indexOf() has O(n) performance as well because it progressively scans the List until it finds the element you're looking for.
Is the list sorted? If so, you might be able to search for an element using something like quicksort.
If you need constant time access for random elements, I don't think a Linked List is your best bet.
Do you NEED to use a LinkedList? If it's not legacy code, I would recommend either HashSet or LinkedHashMap. Both will give you constant-time lookup, and if you still need insertion-order iteration, LinkedHashMap has an internal LinkedList running through the keys.
Unfortunately the "dumb way" is the most effiecient way to do so, although you could use
if ( linkedList.contains(objectThatMayBeInList) ) { //do something }
The problem is that a LinkedList has a best case search of O(N) where N is the size of the list. That means that on any given search you have a worst case scenario of N computations. Linked lists are not the best data structure for that kind of an operation, but at the same time, it's not that bad, and it shouldn't be too slow, computers are good at doing that. Is there more specifics you can give us as to the size of the list?
Basically you want to find out if object A exists in linked list L. This is the search problem, and if the list is unordered you cannot do it faster than O(n).
If you kept the list sorted (making insertion slower), you could do a binary search to see if A is in the list, which would be much faster.
Perhaps you could also keep a Map (HashMap or TreeMap for instance) in addition to the list, where you keep track of what stuff is in the list.
Is there a way to first sort then search for an objects within a linked list of objects.
I thought just to you one of the sorting way and a binary search what do you think?
Thanks
This is not a good approach, IMO. If you use Collections.sort(list), where the list is a LinkedList, this copies the list to a temporary array, sorts it, and then copies back to the list' i.e. O(NlogN) to sort plus 2 * O(N) copies. But when you then try to do an binary search (e.g. using Collections.binarySearch(list), each search will do O(N) list traversal operations. So you may as well have not bothered sorting the list!
Another approach would be to convert the list to an array or an ArrayList, and then sort and search that array / ArrayList. That gives one copy plus one sort to setup, and O(logN) for each search.
But neither of these is the best approach. That depends on how many times you need to perform search operations.
If you simply want to do one search on the list, then calling list.contains(...) is O(N) ... and that is better than anything involving sorting and binary searching.
If you want to do multiple searches on a list that never changes, you're probably better off putting the list entries into a HashSet. Constructing a HashSet is O(N) and searching is O(1). (This assumes you don't need your own comparator.)
If you want to do multiple searches on a list that keeps changing where the order IS NOT significant, replace the list with a HashSet. The incremental cost of updating the HashSet will be O(1) for each addition/removal, and O(1) for each search.
If you want to do multiple searches on a list that keeps changing and the order IS significant, replace the list with an insertion-ordered LinkedHashMap. That will be O(1) for each addition/removal, and O(1) for each search ... but with large constants of proportionality than for a HashSet.
java.util.Collections#sort()
java.util.Collections#binarySearch()
The Collections class has lots of other amazing methods to make programmers life easier.
Note that the sort method's implementation will indeed convert the list to array, but from you need not explicitly convert the list in to array before calling the method:)
You may want to question if searching over a sorted list is the best option for your use-case as this does not perform well. The list sort is O(NlogN) and the binary search is O(logN). You might consider making a Set out of your list elements and then searching that via the contains method, which is O(1), if you just want to see if an element exists. It would be much easier to give you some advice on what collection you might consider if you could explain more about your use-case.
EDIT: Consider performance issues of List sorting if you plan to do this for large lists.