Seeking further understanding on Iterators in java

Seeking further understanding on Iterators in java - java

If I am using a for loop (the standard for loop, not an enhanced for statement), I fail to see how an iterator increases efficiency when searching through a collection. If I have a statement such as:
(Assuming that aList is a List of generic objects, type E, nextElement refers to the next element within the list)
for (int index = 0; index < aList.size(); index++){
E nextElement = aList.get(index);
// do something with nextElement...
}
and I have the get method that looks something like:
Node<E> nodeRef = head;
for (int i = 0; i < index; i++){
nodeRef = nodeRef.next;
// possible other code
}
this would essentially be searching through the List, one element at a time. However, if I use an iterator, will it not be doing the same operation? I know an iterator is supposed to be O(1) speed, but wouldn't it be O(n) if it has to search through the entire list anyway?

It's not primarily about efficiency, IMO. It's about abstraction. Using an index ties you to collections which can retrieve an item for a given index efficiently (so it won't work well with a linked list, say)... and it doesn't express what you're trying to do, which is iterate over the list.
With an iterator, you can express the idea of iterating over a sequence of items whether that sequence can easily be indexed or not, whether the size is known in advance or not, and even in cases where it's effectively infinite.
Your second case is still written using a for loop which increments an index, which isn't the idiomatic way of thinking about it - it should simply be testing whether or not it's reached the end. For example, it might be:
for (Node<E> nodeRef = head; nodeRef != null; nodeRef = nodeRef.next)
{
}
Now we have the right abstraction: the loop expresses where we start (the head), when we stop (when there are no more elements) and how we go from one element to the next (using the next field). This expresses the idea of iterating more effectively than "I've got a counter starting at 0, and I'm going to ask for the value at the particular counter on each iteration until the value of the counter is greater than some value which happens to be the length of the list."
We're fairly used to the latter way of expressing things, but it doesn't really say what we mean nearly as may as the iterator approach.

Iterators are not about increasing efficiency, they're about abstraction in the object-oriented sense. Implementation-wise, the iterator is doing something similar to what you're doing, going through your collection one element at a time, at least if the collection is index-based. It's supposed to be O(1) when retrieving the next element, not the entire list. Iterators help mask what collection is underneath as well, it could be a linked list or a set, etc, but you don't have to know.
Also, notice how connected your for loop is to your specific logic that you want to do on each element, while with an iterator you can abstract out the looping logic from whatever action you want to do.

I think the question you are asking refers to the efficiency of iterators vs. a for-loop using an explicit get on the collection object.
If you write code with a naive version of get, and you iterate through your list using it, then it takes you
one step to "get" the first element
two steps to "get" the second
three steps to get the third
...
n steps to get the last
for a total of n(n-1)/2 operations, which is O(n^2).
But if you used an iterator which internally kept track of the next element (i.e. one step to advance), then iterating the whole list is O(n), a big improvement.

Like Jon said, iterators have nothing to do with efficiency they just abstract the concept of being able to iterate over a collection. So you are right, if you are just searching through a list there is no real benefit to an iterator over a for loop, but in some cases iterators provide convenient ways for doing things that would be difficult with a simple for loop. For example:
while(itr.hasNext()) {
if(itr.next().equals(somethingBad);
itr.remove();
}
In other cases iterators provide a way to traverse the elements of a collection, that you can not fetch by index (eg a hashset). In this case a for loop is not an option.

Remember that it's also a Design Pattern.
"The Iterator Pattern allows traversal of the elements of an aggregate without exposing the underlying implementation. It also places the task of traversal on the iterator object, not on the aggregate, which simplifies the aggregate interface and implementation, and places the responsibility where it should be." (From: Head First Design Pattern)
It's about encapsulation and also the 'single responsibility' principle.
Cheers,
Wim

You are using a linked list here. Iterating over that list without an iterator takes O(n^2) steps, where n is the size of the list. O(n) for iterating over the list and O(n) each time for finding the next element.
The iterator, on the other hand, remembers the node it has visited the last time, and therefore needs only O(1) to find the next element. So eventually the complexity is O(n), which is faster.

Related

Why not use ListIterator for full LinkedList Operation?

My main question is if ListIterator or Iterator class reduces the time taken for removal of the elements from a given LinkedList and the same can be said while adding elements in the same given LinkedList using any one of the following classes above. What's the point of using the inbuilt functions of LinkedList class itself? Why should we perform any of the operations through LinkedList functions when we can use the ListIterator functions for better performance?

A ListIterator can indeed efficiently remove the node on which it is positioned. You can thus create a ListIterator, use next() two times to move the cursor, and then remove the node instantly. But evidently you did a lot of work before the actual removal.
Using ListIterator.remove is not more efficient "time complexity"-wise than removing through the LinkedList.remove(int index) if you need to construct the iterator. The LinkedList.remove method takes O(k) time, with k the index of the item you wish to remove. Removing this element with the ListIterator has the same timecomplexity since: (a) we create a ListIterator in constant time; (b) we call .next() k times, each operation in O(1); and (c) we call .remove() which is again O(1). But since we call .next() k times, this is thus an O(k) operation as well.
A similar situation happens for .add(..) on an arbitrary location (an "insert"), except that we here of course insert a node, not remove one.
Now since the two have the same time complexity, one might wonder why a LinkedList has such remove(int index) objects in the first place. The main reason is programmer's convenience. It is more convenient to call mylist.remove(5), than to create an iterator, use a loop to move five places, and then call remove. Furthermore the methods on a linked list guard against some edge-cases like a negative index, etc. By doing this manually you might end removing the first element, which might not be the intended behaviour. Finally code written is sometimes read multiple times. If a future reader reads mylist.remove(5), they understand that it removes the fifth element, wheres a solution with looping will require some extra brain cycles to understand what that part is doing.
As #Andreas says, furthermore the List interface defines these methods, and hence the LinkedList<T> should implement these.

Why don't we count linear search cost as a prerequisite bottleneck for the insertion operation of a linked list, compared to ArrayList?

I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?

This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.

If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.

Node reference into linked list like .Net has, to enable O(1) item insertion

.Net's LinkedList has a nice basic linked list feature that allows me to keep a node reference, a "pointer" into a linked list so to speak, and use that reference to navigate and manipulate the linked list from there in an O(1) fashion. To wit:
LinkedList<string> linkedList = new LinkedList<string>();
LinkedListNode<string> cur = linkedList.First;
LinkedListNode<string> rememberThis = null;
do
{
if (...)
rememberThis = cur;
} while ((cur = cur.Next) != null);
if (rememberThis != null)
linkedList.AddAfter(rememberThis, "added-value");
I'm failing to see how I can do the same in Java, namely
Iterating through a LinkedList (this of course is O(n))
Making note of a list node
Use that node reference even after further iteration for O(1) insertion
Java does give me access to a ListIterator, which allows me to do manipulation of the list around the item where I'm at, but I cannot seem to iterate on, while holding on to a previous node.
Am I missing something?

Am I missing something?
No. LinkedList#ListItr class doesn't have bookmark. So you cannot keep iterating on, while holding on to a previous node.
There's no O(1) method addAfter(Node node, E element) in LinkedList, because LinkedList#Node is private. There's add(int index, E element) which is O(n). Too sad.
A workaround is to use 2 ListIterator. One keep iterating on, the other one stops at the position you want to remember. Then you can use ListIterator#add(E e) in the end, which is O(1). But the first one cannot modify the list otherwise it'll break the second one.

No, don't do that, that will breaks. If you ever modified the LinkedList structurally later, a ConcurrentModificationException will be thrown next time you move the ListIterator's crusor, and there is no way around it. This is known as fail-fast behavior.
Anyway, Iterators aren't meant to hold a cursor in a list for a long time. And currently there is no way to hold a cursor to a certain position in a list, including LinkedLists, for a long time, even for openjdk 9 ea IIRC. The reason behind it may be the ambiguity of how to move the existing cursors. This may be obvious in manyq situations, but not always.
After all, it's (almost) impossible to add it to a superinterface of LinkedList (Queue,Deque,List) now.(This is clearly a API design fault!) You can create your own version of LinkedList to implement that.
If you really want to keep a reference, somehow. You will have to hack the internals with reflections, which may doesn't worth it at all.

Remove list elements - my approach for best performance in Java

If I need to remove elements in a list, will the following be better than using LinkedList:
int j = 0;
List list = new ArrayList(1000000);
...
// fill in the list code here
...
for (Iterator i = list.listIterator(); i.hasNext(); j++) {
if (checkCondition) {
i.remove();
i = list.listIterator(j);
}
}
?
LinkedList does "remove and add elements" more effectively than ArrayList, but LinkedList as a doubly-linked list needs more memory, since each element is wrapped as an Entry object. While I need a one-direction List interface, because I'm running over in ascending order of index.

The answer is: it depends on the frequency and distribution of your add and removes. If you have to do only a single remove infrequently, then you might use a linked list. However, the main killer for an ArrayList over a LinkedList is constant time random access. You can't really do this with a normal linked list (however, look at a skip list for some inspiration..). Instead, if you're removing elements relative to other elements (where, you need to remove the next element) then you should use a linked list.

There is no simple answer to this:
It depends on what you are optimizing for. Do you care more about the time taken to perform the operations, or the space used by the lists?
It depends on how long the lists are.
It depends on the proportion of elements that you are removing from the lists.
It depends on the other things that you do to the list.
The chances are that one or more of these determining factors is not predictable up-front; i.e. you don't really know. So my advice would be to put this off for now; i.e. just pick one or the other based on gut feeling (or a coin toss). You can revisit the decision later, if you have a quantifiable performance problem in this area ... as demonstrated by cpu or memory usage profiling.

Whats the replacement of For-Each loop for filtering?

Though for-each loop has many advantages but the problem is ,it doesn't work when you want to Filter(Filtering means removing element from List) a List,Can you please any replacement as even traversing through Index is not a good option..

What do you mean by "filtering"? Removing certain elements from a list? If so, you can use an iterator:
for(Iterator<MyElement> it = list.iterator(); it.hasNext(); ) {
MyElement element = it.next();
if (some condition) {
it.remove();
}
}
Update (based on comments):
Consider the following example to illustrate how iterator works. Let's say we have a list that contains 'A's and 'B's:
A A B B A
We want to remove all those pesky Bs. So, using the above loop, the code will work as follows:
hasNext()? Yes. next(). element points to 1st A.
hasNext()? Yes. next(). element points to 2nd A.
hasNext()? Yes. next(). element points to 1st B. remove(). iterator counter does NOT change, it still points to a place where B was (technically that's not entirely correct but logically that's how it works). If you were to call remove() again now, you'd get an exception (because list element is no longer there).
hasNext()? Yes. next(). element points to 2nd B. The rest is the same as #3
hasNext()? Yes. next(). element points to 3rd A.
hasNext()? No, we're done. List now has 3 elements.
Update #2: remove() operation is indeed optional on iterator - but only because it is optional on an underlying collection. The bottom line here is - if your collection supports it (and all collections in Java Collection Framework do), so will the iterator. If your collection doesn't support it, you're out of luck anyway.

ChssPly76's answer is the right approach here - but I'm intrigued as to your thinking behind "traversing through index is not a good option". In many cases - the common case in particular being that of an ArrayList - it's extremely efficient. (In fact, in the arraylist case, I believe that repeated calls to get(i++) are marginally faster than using an Iterator, though nowhere near enough to sacrifice readability).
Broadly speaking, if the object in question implements java.util.RandomAccess, then accessing sequential elements via an index should be roughly the same speed as using an Iterator. If it doesn't (e.g. LinkedList would be a good counterexample) then you're right; but don't dismiss the option out of hand.

I have had success using the
filter(java.util.Collection collection, Predicate predicate)
method of CollectionUtils in commons collections.
http://commons.apache.org/collections/api-2.1.1/org/apache/commons/collections/CollectionUtils.html#filter(java.util.Collection,%20org.apache.commons.collections.Predicate)

If you, like me, don't like modifying a collection while iterating through it's elements or if the iterator just doesn't provide an implementation for remove, you can use a temporary collection to just collect the elements you want to delete. Yes, yes, its less efficient compared to modifying the iterator, but to me it's clearer to understand whats happening:
List<Object> data = getListFromSomewhere();
List<Object> filter = new ArrayList<Object>();
// create Filter
for (Object item: data) {
if (throwAway(item)) {
filter.add(item);
}
}
// use Filter
for (Object item:filter) {
data.remove(item);
}
filter.clear();
filter = null;

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.