I know you can find the first and last elements in a treeset. What if I wanted to know what the second or third element was without iterating? Or, more preferable, given an element, figure out it's rank in the treeset.
Thanks
EDIT: I think you can do it using tailset, ie. compare the size of the original set with the size of the tailset. How efficient is tailset?
TreeSet does not provide an efficient rank method. I suspect (you can confirm by looking at its source) that TreeSet does not even maintain any extra bits of information (i.e. counts of elements on the left and right subtrees of each node) that one would need to perform such queries in O(log(n)) time. So there does not appear to be any fast method of finding the rank of an element of TreeSet.
If you really really need it, you can either implement your own SortedSet with a balanced binary search tree which allows such queries or modify the TreeSet implementation to create a new implementation which is augmented to allow such queries. Refer to the chapter on augmenting data structures in CLRS for more details about how this can actually be done.
According to the source of the Sun JDK6 implementation, tailSet(E).size() iterates over the tail set and counts the elements, so this call is O(tail set size).
There is no other way than Iterator.
Edited:
Try this:
treeSet.higher(treeSet.first());
This should give second element on TreeSet. I'm not sure if this is more optimized then just using Iterator.
Related
I have an algorithm where I pass through nodes in a graph in a certain way, occasionally passing through the same node several times, and I need to form a list of the nodes passed, such that a node appears once for the last time I passed it.
For instance, if I passed through nodes A -> B -> C -> A -> C, the list I need in the end is [B, A, C].
What I wanted to do was to use a LinkedList, such that every node in the graph will contain a reference to its node in the LinkedList. Then, every time I pass through a node, I will remove its corresponding node from the LinkedList and insert it again into the end of the LinkedList, and the complexity of the operation will only be O(1).
However, when I began implementing this, I ran into a problem: apparently, the java class LinkedList does not allow me to see its actual list nodes. Using the regular remove functions of LinkedList to remove the list node containing a given graph node will be O(n) instead O(1), negating the whole point of using a LinkedList to begin with.
Naturally, I can implement LinkedList myself, but I would rather avoid that - it seems to me that if I have to implement LinkedList in java, I'm doing something wrong.
So, is there a way to solve this problem without implementing LinkedList on my own? Is there something that I'm missing?
As it seems, you are expecting a built-in approach, i don't think there is any Collection which provides such functionality. You will have to implement it on your own as #MartijinCourteaux suggested. Or:
use Sorted Set collection like TreeSet<E> with supporting cost of O(log n) for operations: add, remove and contains.
LinkedHashSet<E> But beware unlike HashSet<E>, LinkedHashSet can have O(1) expected performance for operations: add, contains, remove but the performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list. But we can use it without incurring the increased cost associated with TreeSet. However, insertion order is not affected if an element is re-inserted into the set so try removing the first insertion of an element before re-inserting it.
LinkedHashMap keeps order of entered values and allows remove node by its key and then put back to the end. I think that it is all you need.
Unless your linked list is large just using a regular array list will give fast performance even with the shuffling. You should also consider using hash sets, if order is not important, linked hash set if the order of insert matters, or tree set if you want it sorted. They don't allow duplicate values but have good O performance for insert, delete and contains.
What are the advantages of each structure?
In my program I will be performing these steps and I was wondering which data structure above I should be using:
Taking in an unsorted array and
adding them to a sorted structure1.
Traversing through sorted data and removing the right one
Adding data (never removing) and returning that structure as an array
When do you know when to use a TreeSet or LinkedList? What are the advantages of each structure?
In general, you decide on a collection type based on the structural and performance properties that you need it to have. For instance, a TreeSet is a Set, and therefore does not allow duplicates and does not preserve insertion order of elements. By contrast a LinkedList is a List and therefore does allow duplicates and does preserve insertion order. On the performance side, TreeSet gives you O(logN) insertion and deletion, whereas LinkedList gives O(1) insertion at the beginning or end, and O(N) insertion at a selected position or deletion.
The details are all spelled out in the respective class and interface javadocs, but a useful summary may be found in the Java Collections Cheatsheet.
In practice though, the choice of collection type is intimately connected to algorithm design. The two need to be done in parallel. (It is no good deciding that your algorithm requires a collection with properties X, Y and Z, and then discovering that no such collection type exists.)
In your use-case, it looks like TreeSet would be a better fit. There is no efficient way (i.e. better than O(N^2)) to sort a large LinkedList that doesn't involve turning it into some other data structure to do the sorting. There is no efficient way (i.e. better than O(N)) to insert an element into the correct position in a previously sorted LinkedList. The third part (copying to an array) works equally well with a LinkedList or TreeSet; it is an O(N) operation in both cases.
[I'm assuming that the collections are large enough that the big O complexity predicts the actual performance accurately ... ]
The genuine power and advantage of TreeSet lies in interface it realizes - NavigableSet
Why is it so powerfull and in which case?
Navigable Set interface add for example these 3 nice methods:
headSet(E toElement, boolean inclusive)
tailSet(E fromElement, boolean inclusive)
subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive)
These methods allow to organize effective search algorithm(very fast).
Example: we need to find all the names which start with Milla and end with Wladimir:
TreeSet<String> authors = new TreeSet<String>();
authors.add("Andreas Gryphius");
authors.add("Fjodor Michailowitsch Dostojewski");
authors.add("Alexander Puschkin");
authors.add("Ruslana Lyzhichko");
authors.add("Wladimir Klitschko");
authors.add("Andrij Schewtschenko");
authors.add("Wayne Gretzky");
authors.add("Johann Jakob Christoffel");
authors.add("Milla Jovovich");
authors.add("Taras Schewtschenko");
System.out.println(authors.subSet("Milla", "Wladimir"));
output:
[Milla Jovovich, Ruslana Lyzhichko, Taras Schewtschenko, Wayne Gretzky]
TreeSet doesn't go over all the elements, it finds first and last elemenets and returns a new Collection with all the elements in the range.
TreeSet:
TreeSet uses Red-Black tree underlying. So the set could be thought as a dynamic search tree. When you need a structure which is operated read/write frequently and also should keep order, the TreeSet is a good choice.
If you want to keep it sorted and it's append-mostly, TreeSet with a Comparator is your best bet. The JVM would have to traverse the LinkedList from the beginning to decide where to place an item. LinkedList = O(n) for any operations, TreeSet = O(log(n)) for basic stuff.
The most important point when choosing a data structure are its inherent limitations. For example if you use TreeSet to store objects and during run-time your algorithm changes attributes of these objects which affect equal comparisons while the object is an element of the set, get ready for some strange bugs.
The Java Doc for Set interface state that:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interface Set Java Doc
I need to iterate both forwards and backwards in a sorted set. If I use NavigableSet, I get a strictly-forward iterator and a strictly-backward iterator (iterator() and descendingIterator()) but none that can move forward and backward.
What's the time complexity of NavigableSet.lower() and higher() ? I can use those instead, but am reluctant to do so if they are inefficient.
Depending on your exact needs you could convert the sorted set to a list, say an array list, and use a list iterator for traversal. It can be used in both directions via the next() and previous() methods, which may be mixed freely.
There are only two implementations of the NavigableSet. Saying you opted for the TreeSet, while I don't have the source handy, the Javadoc says that it is based on a TreeMap providing O(log(n)) for get/put/containsKey/remove. At worst this would perform one get to find the value of we're finding the lower/higher for, plus an additional search to get the next/previous value, providing O(2log(n)) = O(log(n)).
Trees are worst case O(n) for search in the event it is actually a list, but in general, O(height).
I'm working with a very large (custom Object) linkedlist, and I'm trying to determine if an object that I'm trying to add to the list is already in there.
The issue is that the item I am searching for is a unique object containing:
A 1st String
A 2nd String
A unique Count #
I'm trying to find out if there is an item in my linked list that contains the (1st String) and (2nd String), but ignore (the unique Count #).
This can be done the dumb way (the way I tried it first) by going through each individual linkedlist item - but this takes way too long. I'm trying to speed it up! I figured using (indexOf) would help, but I don't know how I can customize what it is searching for.
Any ideas?
indexOf() has O(n) performance as well because it progressively scans the List until it finds the element you're looking for.
Is the list sorted? If so, you might be able to search for an element using something like quicksort.
If you need constant time access for random elements, I don't think a Linked List is your best bet.
Do you NEED to use a LinkedList? If it's not legacy code, I would recommend either HashSet or LinkedHashMap. Both will give you constant-time lookup, and if you still need insertion-order iteration, LinkedHashMap has an internal LinkedList running through the keys.
Unfortunately the "dumb way" is the most effiecient way to do so, although you could use
if ( linkedList.contains(objectThatMayBeInList) ) { //do something }
The problem is that a LinkedList has a best case search of O(N) where N is the size of the list. That means that on any given search you have a worst case scenario of N computations. Linked lists are not the best data structure for that kind of an operation, but at the same time, it's not that bad, and it shouldn't be too slow, computers are good at doing that. Is there more specifics you can give us as to the size of the list?
Basically you want to find out if object A exists in linked list L. This is the search problem, and if the list is unordered you cannot do it faster than O(n).
If you kept the list sorted (making insertion slower), you could do a binary search to see if A is in the list, which would be much faster.
Perhaps you could also keep a Map (HashMap or TreeMap for instance) in addition to the list, where you keep track of what stuff is in the list.
Is there a way to first sort then search for an objects within a linked list of objects.
I thought just to you one of the sorting way and a binary search what do you think?
Thanks
This is not a good approach, IMO. If you use Collections.sort(list), where the list is a LinkedList, this copies the list to a temporary array, sorts it, and then copies back to the list' i.e. O(NlogN) to sort plus 2 * O(N) copies. But when you then try to do an binary search (e.g. using Collections.binarySearch(list), each search will do O(N) list traversal operations. So you may as well have not bothered sorting the list!
Another approach would be to convert the list to an array or an ArrayList, and then sort and search that array / ArrayList. That gives one copy plus one sort to setup, and O(logN) for each search.
But neither of these is the best approach. That depends on how many times you need to perform search operations.
If you simply want to do one search on the list, then calling list.contains(...) is O(N) ... and that is better than anything involving sorting and binary searching.
If you want to do multiple searches on a list that never changes, you're probably better off putting the list entries into a HashSet. Constructing a HashSet is O(N) and searching is O(1). (This assumes you don't need your own comparator.)
If you want to do multiple searches on a list that keeps changing where the order IS NOT significant, replace the list with a HashSet. The incremental cost of updating the HashSet will be O(1) for each addition/removal, and O(1) for each search.
If you want to do multiple searches on a list that keeps changing and the order IS significant, replace the list with an insertion-ordered LinkedHashMap. That will be O(1) for each addition/removal, and O(1) for each search ... but with large constants of proportionality than for a HashSet.
java.util.Collections#sort()
java.util.Collections#binarySearch()
The Collections class has lots of other amazing methods to make programmers life easier.
Note that the sort method's implementation will indeed convert the list to array, but from you need not explicitly convert the list in to array before calling the method:)
You may want to question if searching over a sorted list is the best option for your use-case as this does not perform well. The list sort is O(NlogN) and the binary search is O(logN). You might consider making a Set out of your list elements and then searching that via the contains method, which is O(1), if you just want to see if an element exists. It would be much easier to give you some advice on what collection you might consider if you could explain more about your use-case.
EDIT: Consider performance issues of List sorting if you plan to do this for large lists.