customize an indexof call for a linkedlist (java) - java

I'm working with a very large (custom Object) linkedlist, and I'm trying to determine if an object that I'm trying to add to the list is already in there.
The issue is that the item I am searching for is a unique object containing:
A 1st String
A 2nd String
A unique Count #
I'm trying to find out if there is an item in my linked list that contains the (1st String) and (2nd String), but ignore (the unique Count #).
This can be done the dumb way (the way I tried it first) by going through each individual linkedlist item - but this takes way too long. I'm trying to speed it up! I figured using (indexOf) would help, but I don't know how I can customize what it is searching for.
Any ideas?

indexOf() has O(n) performance as well because it progressively scans the List until it finds the element you're looking for.
Is the list sorted? If so, you might be able to search for an element using something like quicksort.
If you need constant time access for random elements, I don't think a Linked List is your best bet.

Do you NEED to use a LinkedList? If it's not legacy code, I would recommend either HashSet or LinkedHashMap. Both will give you constant-time lookup, and if you still need insertion-order iteration, LinkedHashMap has an internal LinkedList running through the keys.

Unfortunately the "dumb way" is the most effiecient way to do so, although you could use
if ( linkedList.contains(objectThatMayBeInList) ) { //do something }
The problem is that a LinkedList has a best case search of O(N) where N is the size of the list. That means that on any given search you have a worst case scenario of N computations. Linked lists are not the best data structure for that kind of an operation, but at the same time, it's not that bad, and it shouldn't be too slow, computers are good at doing that. Is there more specifics you can give us as to the size of the list?

Basically you want to find out if object A exists in linked list L. This is the search problem, and if the list is unordered you cannot do it faster than O(n).
If you kept the list sorted (making insertion slower), you could do a binary search to see if A is in the list, which would be much faster.
Perhaps you could also keep a Map (HashMap or TreeMap for instance) in addition to the list, where you keep track of what stuff is in the list.

Related

a collection data structure to keep items sorted

I got a program which is using an ArrayList<T> and that type T also implements Comparable<T>. I need to keep that list sorted.
For now, when I insert a new item, I add it to the ArrayList and then invoke Collections.sort(myArrayList).
Is sorting with Collections.sort every time I insert a new item seriously hurt run time complexity?
Is there a better suited data structure I can use to always keep the list sorted? I know of a structure called a PriorityQueue but I also need to be able to get the list's elements by index.
EDIT:
In my specific case, inserting a new item happens much less than geting an already existing item, so eventually a good advice could also be to stay with the ArrayList since it got a constant time complexity of getting an item. But if you know of anything else...
It seems like Collection.Sort is actually the way to go here since when the collection is already almost sorted, the sorting will take not longer than O(n) in the worst case.
Instead of using Collections.sort(myArrayList) after every insertion you might want to do something a bit smarter as you know that every time you insert an element your collection is already ordered.
Collections.sort(myArrayList) takes 0(nlogn) time, you could do an ordered insert in an ordered collection in O(n) time using Collections.binarySearch. If the collection is ordered in ascending order Collections.binarySearch returns the index of the element you are looking for if it exists or (-(insertion point) - 1). Before inserting an element you can look for it with Collections.binarySearch (O(logn) time). Done that you can derive the index at which inserting the new element. You can then add the element with addAt in O(n) time. The whole insertion complexity is bounded by the addAt so you can do an ordered insert in an ArrayList in O(n) time.
List is an ordered collection, which means you need to have the ability to access with index. If a collection internally shuffles or sorts the elements, the insertion order wont be same as the order of the elements in the internal data structure. So you cannot depend on index based access anymore. Hence Sun didn't provide a SortedList or a TreeList class. That is why you use Collections.sort(..)
Apache commons-collections does provide a TreeList class but it is not a sorted List and is called so because it uses a tree data structure to store the elements internally. Check its documentation here - http://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/list/TreeList.html
A single data structure cannot provide both sorting and index based retrieval. If the input set is limited to a few hundreds or thousands, you can keep two data structures in parallel.
For example, ArrayList for index based retrieval and TreeMap (or priority queue) for sorting.

sort while inserting or copy and sort

I have an iterator that gives me n elements. Currently I copy them one by one into an ArrayList and then call Collections.Sort() on that list to obtain a sorted ArrayList. This takes nlog(n)+n operations. Is there a faster way to do it, i.e. can I already use the insertion operation to a certain degree?
The iterator does not give any sorting, the elements occur pretty much randomly.
if you have only that iterator, I don't see faster solutions. note that nlogn+n is also O(nlogn).
if you want to "sort while inserting", you need do binary search on each insertion, it would be O(nlogn) too. I don't think it would be much faster than what you have.
TreeSet can save you from the binary search implementation, but basically it is the same logic.
Since an iterator is not a collection nor container, it is not possible to sort directly in the iterator, like you already noticed. The method that you are using seems to be the best solution in this case.
If your elements are unique you could drop them into a TreeSet and then copy them out of the TreeSet into an ArrayList. That may not actually be any faster than what you are already doing though.
Beyond that you are unlikely to be able to optimise further than you already have. Writing your own insertion sort would almost certainly be slower than just using the highly optimised Java sort routines.
You could consider looking at the new Java Streams API in Java 8 though. That would allow you to do this by opening the iterator as a stream, sorting it, then collating it to your final collection.
http://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
If you have an object rather than raw data type (such as int , double) in your array, the cost of the object copy must be considered. In this situation, sort the array index may be a better way. Use search data structure map/set is better only when you need to process sorting and inserting simultaneously.

How to find the rank of an element in a TreeSet

I know you can find the first and last elements in a treeset. What if I wanted to know what the second or third element was without iterating? Or, more preferable, given an element, figure out it's rank in the treeset.
Thanks
EDIT: I think you can do it using tailset, ie. compare the size of the original set with the size of the tailset. How efficient is tailset?
TreeSet does not provide an efficient rank method. I suspect (you can confirm by looking at its source) that TreeSet does not even maintain any extra bits of information (i.e. counts of elements on the left and right subtrees of each node) that one would need to perform such queries in O(log(n)) time. So there does not appear to be any fast method of finding the rank of an element of TreeSet.
If you really really need it, you can either implement your own SortedSet with a balanced binary search tree which allows such queries or modify the TreeSet implementation to create a new implementation which is augmented to allow such queries. Refer to the chapter on augmenting data structures in CLRS for more details about how this can actually be done.
According to the source of the Sun JDK6 implementation, tailSet(E).size() iterates over the tail set and counts the elements, so this call is O(tail set size).
There is no other way than Iterator.
Edited:
Try this:
treeSet.higher(treeSet.first());
This should give second element on TreeSet. I'm not sure if this is more optimized then just using Iterator.

Which implementation of List to use?

In my program I often use collections to store lists of objects. Currently I use ArrayList to store objects.
My question is: is this a best choice? May be its better to use LinkedList? Or something else?
Criteria to consider are:
Memory usage
Performance
Operations which I need are:
Add element to collection
Iterate through the elements
Any thoughts?
Update: my choice is : ArrayList :) Basing on this discussion as well as the following ones:
When to use LinkedList over ArrayList?
List implementations: does LinkedList really perform so poorly vs. ArrayList and TreeList?
I always default to ArrayList, and would in your case as well, except when
I need thread safety (in which case I start looking at List implementations in java.util.concurrent)
I know I'm going to be doing lots of insertion and manipulation to the List or profiling reveals my usage of an ArrayList to be a problem (very rare)
As to what to pick in that second case, this SO.com thread has some useful insights: List implementations: does LinkedList really perform so poorly vs. ArrayList and TreeList?
I know I'm late but, maybe, this page can help you, not only now, but in the future...
Linked list is faster for adding/removing inside elements (ie not head or tail)
Arraylist is faster for iterating
It's a classic tradeoff between insert vs. retrieve optimization. Common choice for the task as you describe it is the ArrayList.
ArrayList is fine for your (and most other) purposes. It has a very small memory overhead and has good amortized performance for most operations. The cases where it is not ideal are relatively rare:
The list ist very large
You frequently need to do one of these operations:
Add/remove items during iteration
Remove items from the beginning of the list
If you're only adding at the end of the list, ArrayList should be ok. From the documentation of ArrayList:
The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost
and ArrayList should also use less memory than a linked list as you don't need to use space for the links.
It depends on your usage profile.
Do you add to the end of the list? Both are fine for this.
Do you add to the start of the list? LinkedList is better for this.
Do you require random access (will you ever call get(n) on it)? ArrayList is better for this.
Both are good at iterating, both Iterator implementations are O(1) for next().
If in doubt, test your own app with each implementation and make your own choice.
Given your criteria, you should be using the LinkedList.
LinkedList implements the Deque interface which means that it can add to the start or end of the list in constant time (1). In addition, both the ArrayList and LinkedList will iterate in (N) time.
You should NOT use the ArrayList simply because the cost of adding an element when the list is full. In this case, adding the element would be (N) because of the new array being created and copying all elements from one array to the other.
Also, the ArrayList will take up more memory because the size of your backing array might not be completely filled.

searching an unorder list without converting it to an array

Is there a way to first sort then search for an objects within a linked list of objects.
I thought just to you one of the sorting way and a binary search what do you think?
Thanks
This is not a good approach, IMO. If you use Collections.sort(list), where the list is a LinkedList, this copies the list to a temporary array, sorts it, and then copies back to the list' i.e. O(NlogN) to sort plus 2 * O(N) copies. But when you then try to do an binary search (e.g. using Collections.binarySearch(list), each search will do O(N) list traversal operations. So you may as well have not bothered sorting the list!
Another approach would be to convert the list to an array or an ArrayList, and then sort and search that array / ArrayList. That gives one copy plus one sort to setup, and O(logN) for each search.
But neither of these is the best approach. That depends on how many times you need to perform search operations.
If you simply want to do one search on the list, then calling list.contains(...) is O(N) ... and that is better than anything involving sorting and binary searching.
If you want to do multiple searches on a list that never changes, you're probably better off putting the list entries into a HashSet. Constructing a HashSet is O(N) and searching is O(1). (This assumes you don't need your own comparator.)
If you want to do multiple searches on a list that keeps changing where the order IS NOT significant, replace the list with a HashSet. The incremental cost of updating the HashSet will be O(1) for each addition/removal, and O(1) for each search.
If you want to do multiple searches on a list that keeps changing and the order IS significant, replace the list with an insertion-ordered LinkedHashMap. That will be O(1) for each addition/removal, and O(1) for each search ... but with large constants of proportionality than for a HashSet.
java.util.Collections#sort()
java.util.Collections#binarySearch()
The Collections class has lots of other amazing methods to make programmers life easier.
Note that the sort method's implementation will indeed convert the list to array, but from you need not explicitly convert the list in to array before calling the method:)
You may want to question if searching over a sorted list is the best option for your use-case as this does not perform well. The list sort is O(NlogN) and the binary search is O(logN). You might consider making a Set out of your list elements and then searching that via the contains method, which is O(1), if you just want to see if an element exists. It would be much easier to give you some advice on what collection you might consider if you could explain more about your use-case.
EDIT: Consider performance issues of List sorting if you plan to do this for large lists.

Categories

Resources