Java: SortedSet "cursor"-style iterator

Java: SortedSet "cursor"-style iterator - java

I need to iterate both forwards and backwards in a sorted set. If I use NavigableSet, I get a strictly-forward iterator and a strictly-backward iterator (iterator() and descendingIterator()) but none that can move forward and backward.
What's the time complexity of NavigableSet.lower() and higher() ? I can use those instead, but am reluctant to do so if they are inefficient.

Depending on your exact needs you could convert the sorted set to a list, say an array list, and use a list iterator for traversal. It can be used in both directions via the next() and previous() methods, which may be mixed freely.

There are only two implementations of the NavigableSet. Saying you opted for the TreeSet, while I don't have the source handy, the Javadoc says that it is based on a TreeMap providing O(log(n)) for get/put/containsKey/remove. At worst this would perform one get to find the value of we're finding the lower/higher for, plus an additional search to get the next/previous value, providing O(2log(n)) = O(log(n)).
Trees are worst case O(n) for search in the event it is actually a list, but in general, O(height).

Related

Why not use ListIterator for full LinkedList Operation?

My main question is if ListIterator or Iterator class reduces the time taken for removal of the elements from a given LinkedList and the same can be said while adding elements in the same given LinkedList using any one of the following classes above. What's the point of using the inbuilt functions of LinkedList class itself? Why should we perform any of the operations through LinkedList functions when we can use the ListIterator functions for better performance?

A ListIterator can indeed efficiently remove the node on which it is positioned. You can thus create a ListIterator, use next() two times to move the cursor, and then remove the node instantly. But evidently you did a lot of work before the actual removal.
Using ListIterator.remove is not more efficient "time complexity"-wise than removing through the LinkedList.remove(int index) if you need to construct the iterator. The LinkedList.remove method takes O(k) time, with k the index of the item you wish to remove. Removing this element with the ListIterator has the same timecomplexity since: (a) we create a ListIterator in constant time; (b) we call .next() k times, each operation in O(1); and (c) we call .remove() which is again O(1). But since we call .next() k times, this is thus an O(k) operation as well.
A similar situation happens for .add(..) on an arbitrary location (an "insert"), except that we here of course insert a node, not remove one.
Now since the two have the same time complexity, one might wonder why a LinkedList has such remove(int index) objects in the first place. The main reason is programmer's convenience. It is more convenient to call mylist.remove(5), than to create an iterator, use a loop to move five places, and then call remove. Furthermore the methods on a linked list guard against some edge-cases like a negative index, etc. By doing this manually you might end removing the first element, which might not be the intended behaviour. Finally code written is sometimes read multiple times. If a future reader reads mylist.remove(5), they understand that it removes the fifth element, wheres a solution with looping will require some extra brain cycles to understand what that part is doing.
As #Andreas says, furthermore the List interface defines these methods, and hence the LinkedList<T> should implement these.

Why don't we count linear search cost as a prerequisite bottleneck for the insertion operation of a linked list, compared to ArrayList?

I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?

This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.

If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.

How to efficiently remove an element from java LinkedList

I have an algorithm where I pass through nodes in a graph in a certain way, occasionally passing through the same node several times, and I need to form a list of the nodes passed, such that a node appears once for the last time I passed it.
For instance, if I passed through nodes A -> B -> C -> A -> C, the list I need in the end is [B, A, C].
What I wanted to do was to use a LinkedList, such that every node in the graph will contain a reference to its node in the LinkedList. Then, every time I pass through a node, I will remove its corresponding node from the LinkedList and insert it again into the end of the LinkedList, and the complexity of the operation will only be O(1).
However, when I began implementing this, I ran into a problem: apparently, the java class LinkedList does not allow me to see its actual list nodes. Using the regular remove functions of LinkedList to remove the list node containing a given graph node will be O(n) instead O(1), negating the whole point of using a LinkedList to begin with.
Naturally, I can implement LinkedList myself, but I would rather avoid that - it seems to me that if I have to implement LinkedList in java, I'm doing something wrong.
So, is there a way to solve this problem without implementing LinkedList on my own? Is there something that I'm missing?

As it seems, you are expecting a built-in approach, i don't think there is any Collection which provides such functionality. You will have to implement it on your own as #MartijinCourteaux suggested. Or:
use Sorted Set collection like TreeSet<E> with supporting cost of O(log n) for operations: add, remove and contains.
LinkedHashSet<E> But beware unlike HashSet<E>, LinkedHashSet can have O(1) expected performance for operations: add, contains, remove but the performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list. But we can use it without incurring the increased cost associated with TreeSet. However, insertion order is not affected if an element is re-inserted into the set so try removing the first insertion of an element before re-inserting it.

LinkedHashMap keeps order of entered values and allows remove node by its key and then put back to the end. I think that it is all you need.

Unless your linked list is large just using a regular array list will give fast performance even with the shuffling. You should also consider using hash sets, if order is not important, linked hash set if the order of insert matters, or tree set if you want it sorted. They don't allow duplicate values but have good O performance for insert, delete and contains.

How to find the rank of an element in a TreeSet

I know you can find the first and last elements in a treeset. What if I wanted to know what the second or third element was without iterating? Or, more preferable, given an element, figure out it's rank in the treeset.
Thanks
EDIT: I think you can do it using tailset, ie. compare the size of the original set with the size of the tailset. How efficient is tailset?

TreeSet does not provide an efficient rank method. I suspect (you can confirm by looking at its source) that TreeSet does not even maintain any extra bits of information (i.e. counts of elements on the left and right subtrees of each node) that one would need to perform such queries in O(log(n)) time. So there does not appear to be any fast method of finding the rank of an element of TreeSet.
If you really really need it, you can either implement your own SortedSet with a balanced binary search tree which allows such queries or modify the TreeSet implementation to create a new implementation which is augmented to allow such queries. Refer to the chapter on augmenting data structures in CLRS for more details about how this can actually be done.

According to the source of the Sun JDK6 implementation, tailSet(E).size() iterates over the tail set and counts the elements, so this call is O(tail set size).

There is no other way than Iterator.
Edited:
Try this:
treeSet.higher(treeSet.first());
This should give second element on TreeSet. I'm not sure if this is more optimized then just using Iterator.

When do you know when to use a TreeSet or LinkedList?

What are the advantages of each structure?
In my program I will be performing these steps and I was wondering which data structure above I should be using:
Taking in an unsorted array and
adding them to a sorted structure1.
Traversing through sorted data and removing the right one
Adding data (never removing) and returning that structure as an array

When do you know when to use a TreeSet or LinkedList? What are the advantages of each structure?
In general, you decide on a collection type based on the structural and performance properties that you need it to have. For instance, a TreeSet is a Set, and therefore does not allow duplicates and does not preserve insertion order of elements. By contrast a LinkedList is a List and therefore does allow duplicates and does preserve insertion order. On the performance side, TreeSet gives you O(logN) insertion and deletion, whereas LinkedList gives O(1) insertion at the beginning or end, and O(N) insertion at a selected position or deletion.
The details are all spelled out in the respective class and interface javadocs, but a useful summary may be found in the Java Collections Cheatsheet.
In practice though, the choice of collection type is intimately connected to algorithm design. The two need to be done in parallel. (It is no good deciding that your algorithm requires a collection with properties X, Y and Z, and then discovering that no such collection type exists.)
In your use-case, it looks like TreeSet would be a better fit. There is no efficient way (i.e. better than O(N^2)) to sort a large LinkedList that doesn't involve turning it into some other data structure to do the sorting. There is no efficient way (i.e. better than O(N)) to insert an element into the correct position in a previously sorted LinkedList. The third part (copying to an array) works equally well with a LinkedList or TreeSet; it is an O(N) operation in both cases.
[I'm assuming that the collections are large enough that the big O complexity predicts the actual performance accurately ... ]

The genuine power and advantage of TreeSet lies in interface it realizes - NavigableSet
Why is it so powerfull and in which case?
Navigable Set interface add for example these 3 nice methods:
headSet(E toElement, boolean inclusive)
tailSet(E fromElement, boolean inclusive)
subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive)
These methods allow to organize effective search algorithm(very fast).
Example: we need to find all the names which start with Milla and end with Wladimir:
TreeSet<String> authors = new TreeSet<String>();
authors.add("Andreas Gryphius");
authors.add("Fjodor Michailowitsch Dostojewski");
authors.add("Alexander Puschkin");
authors.add("Ruslana Lyzhichko");
authors.add("Wladimir Klitschko");
authors.add("Andrij Schewtschenko");
authors.add("Wayne Gretzky");
authors.add("Johann Jakob Christoffel");
authors.add("Milla Jovovich");
authors.add("Taras Schewtschenko");
System.out.println(authors.subSet("Milla", "Wladimir"));
output:
[Milla Jovovich, Ruslana Lyzhichko, Taras Schewtschenko, Wayne Gretzky]
TreeSet doesn't go over all the elements, it finds first and last elemenets and returns a new Collection with all the elements in the range.

TreeSet:
TreeSet uses Red-Black tree underlying. So the set could be thought as a dynamic search tree. When you need a structure which is operated read/write frequently and also should keep order, the TreeSet is a good choice.

If you want to keep it sorted and it's append-mostly, TreeSet with a Comparator is your best bet. The JVM would have to traverse the LinkedList from the beginning to decide where to place an item. LinkedList = O(n) for any operations, TreeSet = O(log(n)) for basic stuff.

The most important point when choosing a data structure are its inherent limitations. For example if you use TreeSet to store objects and during run-time your algorithm changes attributes of these objects which affect equal comparisons while the object is an element of the set, get ready for some strange bugs.
The Java Doc for Set interface state that:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interface Set Java Doc

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.