Can a iterator change the collection it is iterating over? Java - java

I'm attempting to use the number of iterations from an iterator as a counter, but was wondering the ramifications of doing so.
private int length(Iterator<?> it) {
int i = 0;
while(it.hasNext()) {
it.next();
i++;
}
return i;
}
This works fine, but I'm worried about what the iterator may do behind the scenes. Perhaps as I'm iterating over a stack, it pops the items off the stack, or if I'm using a priority queue, and it modifies the priority.
The javadoc say this about iterator:
next
E next()
Returns the next element in the iteration.
Returns:
the next element in the iteration
Throws:
NoSuchElementException - if the iteration has no more elements
I don't see a guarantee that iterating over this unknown collection won't modify it. Am I thinking of unrealistic edge cases, or is this a concern? Is there a better way?

The Iterator simply provides an interface into some sort of stream, therefore not only is it perfectly possible for next() to destroy data in some way, but it's even possible for the data in an Iterator to be unique and irreplaceable.
We could come up with more direct examples, but an easy one is the Iterator in DirectoryStream. While a DirectoryStream is technically Iterable, it only allows one Iterator to be constructed, so if you tried to do the following:
Path dir = ...
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
int count = length(stream.iterator());
for (Path entry: stream) {
...
}
}
You would get an exception in the foreach block, because the stream can only be iterated once. So in summary, it is possible for your length() method to change objects and lose data.
Furthermore, there's no reason an Iterator has to be associated with some separate data-store. Take for example an answer I gave a few months ago providing a clean way to select n random numbers. By using an infinite Iterator we are able to provide, filter, and pass around arbitrarily large amounts of random data lazily, no need to store it all at once, or even compute them until they're needed. Because the Iterator doesn't back any data structure, querying it is obviously destructive.
Now that said, these examples don't make your method bad. Notice that the Guava library (which everyone should be using) provides an Iterators class with exactly the behavior you detail above, called size() to conform with the Collections Framework. The burden is then on the user of such methods to be aware of what sort of data they're working with, and avoid making careless calls such as trying to count the number of results in an Iterator that they know cannot be replaced.

As far as I can tell, the Collection specification does not explicitly state that iterating over a collection does not modify it, but no classes in the standard library show that behaviour (actually at least one does, see dimo414's answer), so any class that did would be highly suspect. I don't think you need to worry about this.
Note that the Guava library implements Iterators.size() and Iterables.size() in the same way that you are, so clearly they find it safe in the general case.

No, iterating over a collection will not modify the collection. The Iterator class does have a remove() method, which is the only safe way of removing an element from a collection during iteration. But simply calling hasNext() and next() will not modify the collection.
Keep in mind that if you modify the object returned by next(), those changes will be present in your collection.

Think about it -- methods that return things are (if they are written correctly) accessor methods, meaning that they just return data. They do not modify it (they are not mutator methods).
Here's an example I had on my disk of how an iterator might be implemented. As you can see, no values are actually modified.
public class ArraySetIterator implements Iterator
{
private int nextIndex;
private ArraySet theArraySet;
public ArraySetIterator (ArraySet a)
{
this.nextIndex = 0;
this.theArraySet = a;
}
public boolean hasNext ()
{
return this.nextIndex < this.theArraySet.size();
}
public Object next()
{
return this.theArraySet.get(this.nextIndex++);
}
}

Related

Efficiently removing an element added to a ConcurrentQueue

In principle it is easy to remove an element from ConcurrentLinkedQueue or similar implementation. For example, the Iterator for that class supports efficient O(1) removal of the current element:
public void remove() {
Node<E> l = lastRet;
if (l == null) throw new IllegalStateException();
// rely on a future traversal to relink.
l.item = null;
lastRet = null;
}
I want to add an element to the queue with add(), and then later delete that exact element from the queue. The only options I can see are:
Save a reference to the object and call ConcurrentLinkedQueue.remove(Object o) with the object - but this forces a traversal of the whole queue in the worst case (and half on average with a random add and removal pattern).
This has the further issue that it doesn't necessarily remove the same object I inserted. It removes an equal object, which may very be a different one if multiple objects in my queue are equal.
Use ConcurrentLinkedDeque instead, then addLast() my element, then immediately grab a descendingIterator() and iterate until I find my element (which will often be the first one, but may be later since I'm effectively swimming against the tide of concurrent additions).
This addition to being awkward and potentially quite slow, this forces me to use Deque class which in this case is much more complex and slower for many operations (check out Iterator.remove() for that class!).
Furthermore this solution still has a subtle failure mode if identical (i.e., == identity) can be inserted, because I might find the object inserted by someone else, but that can ignored in the usual case that is not possible.
Both solutions seem really awkward, but deleting an arbitrary element in these kind of structures seems like a core operation. What am I missing?
It occurs to me this is a general issue with other concurrent lists and dequeues and even with non concurrent structures like LinkedList.
C++ offers it in the form of methods like insert.
Nope, there's not really any way of doing this in the Java APIs; it's not really considered a core operation.
For what it's worth, there are some significant technical difficulties to how you would do it in the first place. For example, consider ArrayList. If adding an element to an ArrayList gave you another reference object that told you where that element was...
you'd be adding an allocation to every add operation
each object would have to keep track of one (or more!) references to its "pointer" object, which would at least double the memory consumption of the data structure
every time you inserted an element into the ArrayList, you'd have to update the pointer objects for every other element whose position shifted
In short, while this might be a reasonable operation for linked-list-based data structures, there's not really any good way of fitting it into a more general API, so it's pretty much omitted.
If you specifically need this capability (e.g. you are going to have massive collections) then you will likely need to implement your own collection that returns a reference to the entry on add and has the ability to remove in O(1).
class MyLinkedList<V> {
public class Entry {
private Entry next;
private Entry prev;
private V value;
}
public Entry add(V value) {
...
}
public void remove(Entry entry) {
...
}
}
In other words you are not removing by value but by reference to the collection entry:
MyLinkedList<Integer> intList;
MyLinkedList.Entry entry = intList.add(15);
intList.remove(entry);
That's obviously a fair amount of work to implement.

Need to manually synchronize the Synchronized list while iteration when it could be avoided?

My question is about synchronizedList method Collections Class.
Javadocs say:
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Though manually synchroniziation is not required for other methods. I looked into the source code of Collections class
and found shyncronization has already been taken care for all methods like add
public boolean add(E e) {
synchronized(list) {return c.add(e);}
}
but not for iterator method. I think iterator method could have also handled synchronization in the same fashion
as above method (it would have avoided the extra work i.e manual synchronization for programmers). i am sure there
must be some concrete reason behind it but i am missing it?
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
A way to avoid manual synchronization from Programmer
public Iterator<E> iterator() {
synchronized(list) {
return c.iterator(); // No need to manually synched by user!
}
}
I think iterator method could have also handled synchronization in the same fashion as above method
No, it absolutely couldn't.
The iterator has no control over what your code does between calls to the individual methods on it. That's the point. Your iteration code will call hasNext() and next() repeatedly, and synchronization during those calls is feasible but irrelevant - what's important is that no other code tries to modify the list across the whole time you're iterating.
So imagine a timeline of:
t = 0: call iterator()
t = 1: call hasNext()
t = 2: call next()
// Do lots of work with the returned item
t = 10: call hasNext()
The iterator can't synchronize between the end of the call to next() at t=2 and the call to hasNext() at t=10. So if another thread tries to (say) add an item to the list at t=7, how is the iterator meant to stop it from doing so?
This is the overall problem with synchronized collections: each individual operation is synchronized, whereas typically you want a whole chunky operation to be synchronized.
If you don't synchronize the entire iteration, another thread could modify the collection as you iterate, leading to a ConccurentModificationException.
Also, the returned iterator is not thread-safe.
They could fix that by wrapping the iterator in a SynchronizedIterator that locks every method in the iterator, but that wouldn't help either – another thread could still modify the collection between two iterations, and break everything.
This is one of the reasons that the Collections.synchronized*() methods are completely useless.
For more information about proper thread-safe collection usage, see my blog.
If you want to avoid manual synchronization, you have to use a Collection like java.util.concurrent.CopyOnWriteArrayList. Every time an object is added to the list, the underlying datastructure is copyied to avaoid a concurrent modification exception.
The reason why you need manual serialization on the Iterator in your example is that the Iterator uses the same internal datastructure as the list but they are independend objects and both Iterator and list can be accessed by different threads at any arbitrary moment in time.
Another aproach would be to make a local copy of the list and iterate over the copy.

Need of Iterator class in Java?

The question might be pretty vague I know. But the reason I ask this is because the class must have been made with some thought in mind.
This question came into my mind while browsing through a few questions here on SO.
Consider the following code:
class A
{
private int myVar;
A(int varAsArg)
{
myVar = varAsArg;
}
public static void main(String args[])
{
List<A> myList = new LinkedList<A>();
myList.add(new A(1));
myList.add(new A(2));
myList.add(new A(3));
//I can iterate manually like this:
for(A obj : myList)
System.out.println(obj.myVar);
//Or I can use an Iterator as well:
for(Iterator<A> i = myList.iterator(); i.hasNext();)
{
A obj = i.next();
System.out.println(obj.myVar);
}
}
}
So as you can see from the above code, I have a substitute for iterating using a for loop, whereas, I could do the same using the Iterator class' hasNext() and next() method. Similarly there can be an example for the remove() method. And the experienced users had commented on the other answers to use the Iterator class instead of using the for loop to iterate through the List. Why?
What confuses me even more is that the Iterator class has only three methods. And the functionality of those can be achieved with writing a little different code as well.
Some people might argue that the functionality of many classes can be achieved by writing one's own code instead of using the class made for the purpose. Yes,true. But as I said, Iterator class has only three methods. So why go through the hassle of creating an extra class when the same job can be done with a simple block of code which is not way too complicated to understand either.
EDIT:
While I'm at it, since many of the answers say that I can't achieve the remove functionality without using Iterator,I would just like to know if the following is wrong, or will it have some undesirable result.
for(A obj : myList)
{
if(obj.myVar == 1)
myList.remove(obj);
}
Doesn't the above code snippet do the same thing as remove() ?
Iterator came long before the for statement that you show in the evolution of Java. So that's why it's there. Also if you want to remove something, using Iterator.remove() is the only way you can do it (you can't use the for statement for that).
First of all, the for-each construct actually uses the Iterator interface under the covers. It does not, however, expose the underlying Iterator instance to user code, so you can't call methods on it.
This means that there are some things that require explicit use of the Iterator interface, and cannot be achieved by using a for-each loop.
Removing the current element is one such use case.
For other ideas, see the ListIterator interface. It is a bidirectional iterator that supports inserting elements and changing the element under the cursor. None of this can be done with a for-each loop.
for(A obj : myList)
{
if(obj.myVar == 1)
myList.remove(obj);
}
Doesn't the above code snippet do the same thing as remove() ?
No, it does not. All standard containers that I know of will throw ConcurrentModificationException when you try to do this. Even if it were allowed to work, it is ambiguous (what if obj appears in the list twice?) and inefficient (for linked lists, it would require linear instead of constant time).
The foreach construct (for (X x: list)) actually uses Iterator as its implementation internally. You can feed it any Iterable as a source of elements.
And, as others already remarked: Iterator is longer in Java than foreach, and it provides remove().
Also: how else would you implement your own provider class (myList in your example)? You make it Iterable and implement a method that creates an Iterator.
For one thing, Iterator was created way before the foreach loop (shown in your code sample above) was introduced into Java. (The former came in Java2, the latter only in Java5).
Since Java5, indeed the foreach loop is the preferred idiom for the most common scenario (when you are iterating through a single Iterable at a time, in the default order, and do not need to remove or index elements). Note though that the foreach uses an iterator in the background for standard collection classes; in other words it is just syntactic sugar.
Iterator, listIterator both are used to allow different permission to user, like list iterator have 9 methods but iterator have only 3 methods, but have remove functionality which you can't achieve with for loop. Enumeration is another thing which is also used to give only read permissions.
Iterator is an implementation of the classical GoF design pattern. In that way you can achieve clear behaviour separation from the 'technical code' which iterates (the Iterator) and your business code.
Imagine you have to change the 'next' behaviour (say, by getting not the next element but the next EVEN element). If you rely only on for loops you will have to change manually every single for loop, in a way like this
for (int i; i < list.size(); i = i+2)
while if you use an Iterator you can simply override/rewrite the "next()" and "hasNext()" methods and the change will be visible everywhere in your application.
I think answer to your question is abstraction. Iterator is written because to abstract iterating over different set of collections.
Every collection has different methods to iterate over their elements. ArrayList has indexed access. Queues has poll and peek methods. Stack has pop and peek.
Usually you only need to iterate over elements so Iterator comes into play. You do not care about which type of Collection you need to iterate. You only call iterator() method and user Iterator object itself to do this.
If you ask why not put same methods on Collection interface and get rid of extra object creation. You need to know your current position in collection so you can not implement next method in Collection because you can not use it on different locations because every time you call next() method it will increment index (simplifying every collection has different implementation) so you will skip some objects if you use same collection at different places. Also if collection support concurrency than you can not write a multi-thread safe next() method in collection.
It is usually not safe to remove an object from collection iterating by other means than iterator. Iterator.remove() method is safest way to do it. For ArrayList example:
for(int i=0;i

Whats the replacement of For-Each loop for filtering?

Though for-each loop has many advantages but the problem is ,it doesn't work when you want to Filter(Filtering means removing element from List) a List,Can you please any replacement as even traversing through Index is not a good option..
What do you mean by "filtering"? Removing certain elements from a list? If so, you can use an iterator:
for(Iterator<MyElement> it = list.iterator(); it.hasNext(); ) {
MyElement element = it.next();
if (some condition) {
it.remove();
}
}
Update (based on comments):
Consider the following example to illustrate how iterator works. Let's say we have a list that contains 'A's and 'B's:
A A B B A
We want to remove all those pesky Bs. So, using the above loop, the code will work as follows:
hasNext()? Yes. next(). element points to 1st A.
hasNext()? Yes. next(). element points to 2nd A.
hasNext()? Yes. next(). element points to 1st B. remove(). iterator counter does NOT change, it still points to a place where B was (technically that's not entirely correct but logically that's how it works). If you were to call remove() again now, you'd get an exception (because list element is no longer there).
hasNext()? Yes. next(). element points to 2nd B. The rest is the same as #3
hasNext()? Yes. next(). element points to 3rd A.
hasNext()? No, we're done. List now has 3 elements.
Update #2: remove() operation is indeed optional on iterator - but only because it is optional on an underlying collection. The bottom line here is - if your collection supports it (and all collections in Java Collection Framework do), so will the iterator. If your collection doesn't support it, you're out of luck anyway.
ChssPly76's answer is the right approach here - but I'm intrigued as to your thinking behind "traversing through index is not a good option". In many cases - the common case in particular being that of an ArrayList - it's extremely efficient. (In fact, in the arraylist case, I believe that repeated calls to get(i++) are marginally faster than using an Iterator, though nowhere near enough to sacrifice readability).
Broadly speaking, if the object in question implements java.util.RandomAccess, then accessing sequential elements via an index should be roughly the same speed as using an Iterator. If it doesn't (e.g. LinkedList would be a good counterexample) then you're right; but don't dismiss the option out of hand.
I have had success using the
filter(java.util.Collection collection, Predicate predicate)
method of CollectionUtils in commons collections.
http://commons.apache.org/collections/api-2.1.1/org/apache/commons/collections/CollectionUtils.html#filter(java.util.Collection,%20org.apache.commons.collections.Predicate)
If you, like me, don't like modifying a collection while iterating through it's elements or if the iterator just doesn't provide an implementation for remove, you can use a temporary collection to just collect the elements you want to delete. Yes, yes, its less efficient compared to modifying the iterator, but to me it's clearer to understand whats happening:
List<Object> data = getListFromSomewhere();
List<Object> filter = new ArrayList<Object>();
// create Filter
for (Object item: data) {
if (throwAway(item)) {
filter.add(item);
}
}
// use Filter
for (Object item:filter) {
data.remove(item);
}
filter.clear();
filter = null;

How to safely remove other elements from a Collection while iterating through the Collection

I'm iterating over a JRE Collection which enforces the fail-fast iterator concept, and thus will throw a ConcurrentModificationException if the Collection is modified while iterating, other than by using the Iterator.remove() method . However, I need to remove an object's "logical partner" if the object meets a condition. Thus preventing the partner from also being processed. How can I do that? Perhaps by using better collection type for this purpose?
Example.
myCollection<BusinessObject>
for (BusinessObject anObject : myCollection)
{
if (someConditionIsTrue)
{
myCollection.remove(anObjectsPartner); // throws ConcurrentModificationException
}
}
Thanks.
It's not a fault of the collection, it's the way you're using it. Modifying the collection while halfway through an iteration leads to this error (which is a good thing as the iteration would in general be impossible to continue unambiguously).
Edit: Having reread the question this approach won't work, though I'm leaving it here as an example of how to avoid this problem in the general case.
What you want is something like this:
for (Iterator<BusinessObject> iter = myCollection.iterator; iter.hasNext(); )
{
BusinessObject anObject = iter.next();
if (someConditionIsTrue)
{
iter.remove();
}
}
If you remove objects through the Iterator itself, it's aware of the removal and everything works as you'd expect. Note that while I think all standard collections work nicely in this respect, Iterators are not required to implement the remove() method so if you have no control over the class of myCollection (and thus the implementation class of the returned iterator) you might need to put more safety checks in there.
An alternative approach (say, if you can't guarantee the iterator supports remove() and you require this functionality) is to create a copy of the collection to iterate over, then remove the elements from the original collection.
Edit: You can probably use this latter technique to achieve what you want, but then you still end up coming back to the reason why iterators throw the exception in the first place: What should the iteration do if you remove an element it hasn't yet reached? Removing (or not) the current element is relatively well-defined, but you talk about removing the current element's partner, which I presume could be at a random point in the iterable. Since there's no clear way that this should be handled, you'll need to provide some form of logic yourself to cope with this. In which case, I'd lean towards creating and populating a new collection during the iteration, and then assigning this to the myCollection variable at the end. If this isn't possible, then keeping track of the partner elements to remove and calling myCollection.removeAll would be the way to go.
You want to remove an item from a list and continue to iterate on the same list. Can you implement a two-step solution where in step 1 you collect the items to be removed in an interim collection and in step 2 remove them after identifying them?
Some thoughts (it depends on what exactly the relationship is between the two objects in the collection):
A Map with the object as the key and the partner as the value.
A CopyOnWriteArrayList, but you have to notice when you hit the partner
Make a copy into a different Collection object, and iterate over one, removing the other. If this original Collection can be a Set, that would certaily be helpful in removal.
You could try finding all the items to remove first and then remove them once you have finished processing the entire list. Skipping over the deleted items as you find them.
myCollection<BusinessObject>
List<BusinessObject> deletedObjects = new ArrayList(myCollection.size());
for (BusinessObject anObject : myCollection)
{
if (!deletedObjects.contains(anObject))
{
if (someConditionIsTrue)
{
deletedObjects.add(anObjectsPartner);
}
}
}
myCollection.removeAll(deletedObjects);
CopyOnWriteArrayList will do what you want.
Why not use a Collection of all the original BusinessObject and then a separate class (such as a Map) which associates them (ie creates partner)? Put these both as a composite elements in it's own class so that you can always remove the Partner when Business object is removed. Don't make it the responsibility of the caller every time they need to remove a BusinessObject from the Collection.
IE
class BusinessObjectCollection implements Collection<BusinessObject> {
Collection<BusinessObject> objects;
Map<BusinessObject, BusinessObject> associations;
public void remove(BusinessObject o) {
...
// remove from collection and dissasociate...
}
}
The best answer is the second, use an iterator.

Categories

Resources