find in ArrayList<String> - java

I have this.
private ArrayList<String> words;
It's a dictionary, so the words are already sorted. By old study I know that a binomial search should be really very very quick, I suppose that Java already implements what is necessary.
So, what is the most efficient way of finding if a certain string exists inside a sorted ArrayList ?
Or should I use a different type?
Thank you.

Or should I use a different type?
Try using a HashSet<String> instead. Its contains method has O(1) lookup assuming that there are not too many hash collisions. From the documentation:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.
A binary search on a sorted ArrayList is only O(log n). This is still very fast, but it is not as fast as using a HashSet.

A binary search will be the fastest in a sorted array. Testing for existence can be done in constant time if you are using a hash set.

Depends how many times you're going to try and find a specific string. You might want to try a HashMap<String, String> as this will remain fast as the map grows.

If you're going to be doing binary searches I would suggest you reorganize your data into a Binary Search Tree
ArrayLists are often used for sequential operations and random access. If you're going to be doing a search and want the fastest lookup its best to organize your data from the get go. This also has the advantage of facilitating faster inserts/removals and all other operations you'd be hoping to accomplish in the fastest possible time.
There are tons of guides on google and elsewhere to get you started.

Related

Datastructure to find/search and add?

I need to add the strings in Data structure(DS). Later I need to find the string and then remove it based on same condition.
Hashset can be best fit here as it provides O(1) complexity for search and removal of given element also will just require updating parent
right or left node. In Arraylist/Array it will be O(n) for search and same for removal.
Per my understanding Hashset will be better here as i need to search a large number of elements and if found remove it.
My question :- Is Hashset or some other DS better here ?
Usually such tasks are best handled by Trie data structure and the variations of it.
Alternatively you can use a hash table, however it doesn't guarantee worst-case complexity.
As usual, it depends on your needs:
if you just need to add String instances, retrieve and delete them, this is no brainer - HashSet is your choice (TreeSet has worse asymptotic complexity and it is suitable in case that you need order of String instances for some reason, e.g. alphabetically)
if you wish to store and efficiently search all String instances with the specified prefix, use Trie as correctly mentioned in Serge Rogatch's answer
if you wish to check if pattern exists in the specified String instance or not, use Suffix tree

How to implement immutable collection with constant append and random access time?

I'm looking for a solution like this one proposed by Eric Lippert. It is a great implementation, as it is immutable plus the append time is O(1), but it downside is the O(i) random access time.
On the other side there is a great implementation of collection with O(1) on both append and random access. The only problem is that it strongly relies on mutability.
My questions is how to implement a collection which combines the benefits of both solutions? That is:
immutability
O(1) append time
O(1) random access time
Memory complexity is not that big issue for me.
I do not know of a way to implement a list which has all your requirements -- immutability, persistence, O(1) insertion, O(1) removal, O(1) random access.
My suggestion to you is that (1) if you are interested in this topic, read Chris Okasaki's book. (Or, get a copy of his thesis, which was the basis of the book.) And (2) Chris Okasaki suggests the data structure described here for your purposes:
http://www.codeproject.com/Articles/9680/Persistent-Data-Structures#RandomAccessLists
This list is O(1) insert and O(1) removal to the head and O(lg) for random access.
I'm not sure how you would get O(1) append and O(1) random access unless you include another data structure.
Typically, if you want to be able to append elements, you can either copy the source collection, which keeps O(1) random access but gives you O(n) append; or you can do what Eric did, and retain the old list segment(s), which gives you O(1) append time but O(n) random access. Assuming the constant append time is critical, that leaves you with the option of incorporating a second data structure to provide constant-time random access.
The Scala documentation claims "effectively constant" add and lookup times for its immutable HashMap. If true, I suggest looking at their implementation. You could take a solution like Eric's and add an efficient immutable map of indexes to the elements themselves. This would add a bit of memory overhead, though, and while append operations would be efficient, insertions would not.
I am, however, a bit skeptical of the Scala HashMap performance claims. Other immutable map implementations claim log32(n) complexity, which presumably applies to both add and lookup operations. My gut tells me that you're not going to get better than logarithmic complexity, though log32(n) is pretty reasonable.
var bag = new HashBag<int>
{
1,
2,
3
};
var g = new GuardedCollection<int>(bag);
bag.Add(4);
g.Add(5);
The HashBag remains mutable, but you can still pass it to another consumer as an immutable GuardedCollection.

sort while inserting or copy and sort

I have an iterator that gives me n elements. Currently I copy them one by one into an ArrayList and then call Collections.Sort() on that list to obtain a sorted ArrayList. This takes nlog(n)+n operations. Is there a faster way to do it, i.e. can I already use the insertion operation to a certain degree?
The iterator does not give any sorting, the elements occur pretty much randomly.
if you have only that iterator, I don't see faster solutions. note that nlogn+n is also O(nlogn).
if you want to "sort while inserting", you need do binary search on each insertion, it would be O(nlogn) too. I don't think it would be much faster than what you have.
TreeSet can save you from the binary search implementation, but basically it is the same logic.
Since an iterator is not a collection nor container, it is not possible to sort directly in the iterator, like you already noticed. The method that you are using seems to be the best solution in this case.
If your elements are unique you could drop them into a TreeSet and then copy them out of the TreeSet into an ArrayList. That may not actually be any faster than what you are already doing though.
Beyond that you are unlikely to be able to optimise further than you already have. Writing your own insertion sort would almost certainly be slower than just using the highly optimised Java sort routines.
You could consider looking at the new Java Streams API in Java 8 though. That would allow you to do this by opening the iterator as a stream, sorting it, then collating it to your final collection.
http://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
If you have an object rather than raw data type (such as int , double) in your array, the cost of the object copy must be considered. In this situation, sort the array index may be a better way. Use search data structure map/set is better only when you need to process sorting and inserting simultaneously.

customize an indexof call for a linkedlist (java)

I'm working with a very large (custom Object) linkedlist, and I'm trying to determine if an object that I'm trying to add to the list is already in there.
The issue is that the item I am searching for is a unique object containing:
A 1st String
A 2nd String
A unique Count #
I'm trying to find out if there is an item in my linked list that contains the (1st String) and (2nd String), but ignore (the unique Count #).
This can be done the dumb way (the way I tried it first) by going through each individual linkedlist item - but this takes way too long. I'm trying to speed it up! I figured using (indexOf) would help, but I don't know how I can customize what it is searching for.
Any ideas?
indexOf() has O(n) performance as well because it progressively scans the List until it finds the element you're looking for.
Is the list sorted? If so, you might be able to search for an element using something like quicksort.
If you need constant time access for random elements, I don't think a Linked List is your best bet.
Do you NEED to use a LinkedList? If it's not legacy code, I would recommend either HashSet or LinkedHashMap. Both will give you constant-time lookup, and if you still need insertion-order iteration, LinkedHashMap has an internal LinkedList running through the keys.
Unfortunately the "dumb way" is the most effiecient way to do so, although you could use
if ( linkedList.contains(objectThatMayBeInList) ) { //do something }
The problem is that a LinkedList has a best case search of O(N) where N is the size of the list. That means that on any given search you have a worst case scenario of N computations. Linked lists are not the best data structure for that kind of an operation, but at the same time, it's not that bad, and it shouldn't be too slow, computers are good at doing that. Is there more specifics you can give us as to the size of the list?
Basically you want to find out if object A exists in linked list L. This is the search problem, and if the list is unordered you cannot do it faster than O(n).
If you kept the list sorted (making insertion slower), you could do a binary search to see if A is in the list, which would be much faster.
Perhaps you could also keep a Map (HashMap or TreeMap for instance) in addition to the list, where you keep track of what stuff is in the list.

searching an unorder list without converting it to an array

Is there a way to first sort then search for an objects within a linked list of objects.
I thought just to you one of the sorting way and a binary search what do you think?
Thanks
This is not a good approach, IMO. If you use Collections.sort(list), where the list is a LinkedList, this copies the list to a temporary array, sorts it, and then copies back to the list' i.e. O(NlogN) to sort plus 2 * O(N) copies. But when you then try to do an binary search (e.g. using Collections.binarySearch(list), each search will do O(N) list traversal operations. So you may as well have not bothered sorting the list!
Another approach would be to convert the list to an array or an ArrayList, and then sort and search that array / ArrayList. That gives one copy plus one sort to setup, and O(logN) for each search.
But neither of these is the best approach. That depends on how many times you need to perform search operations.
If you simply want to do one search on the list, then calling list.contains(...) is O(N) ... and that is better than anything involving sorting and binary searching.
If you want to do multiple searches on a list that never changes, you're probably better off putting the list entries into a HashSet. Constructing a HashSet is O(N) and searching is O(1). (This assumes you don't need your own comparator.)
If you want to do multiple searches on a list that keeps changing where the order IS NOT significant, replace the list with a HashSet. The incremental cost of updating the HashSet will be O(1) for each addition/removal, and O(1) for each search.
If you want to do multiple searches on a list that keeps changing and the order IS significant, replace the list with an insertion-ordered LinkedHashMap. That will be O(1) for each addition/removal, and O(1) for each search ... but with large constants of proportionality than for a HashSet.
java.util.Collections#sort()
java.util.Collections#binarySearch()
The Collections class has lots of other amazing methods to make programmers life easier.
Note that the sort method's implementation will indeed convert the list to array, but from you need not explicitly convert the list in to array before calling the method:)
You may want to question if searching over a sorted list is the best option for your use-case as this does not perform well. The list sort is O(NlogN) and the binary search is O(logN). You might consider making a Set out of your list elements and then searching that via the contains method, which is O(1), if you just want to see if an element exists. It would be much easier to give you some advice on what collection you might consider if you could explain more about your use-case.
EDIT: Consider performance issues of List sorting if you plan to do this for large lists.

Categories

Resources