ConcurrentModificationException with vector and clear() - java

I have a very simple snippet of code that populates a vector, iterates through it, and then clears it. Here is basically what I'm trying in principle:
vector v = new Vector();
v.add(1);
v.add(2);
v.add(3);
ListIterator iter = v.listIterator();
while (iter.hasNext()) {
System.out.println(iter.next());
}
v.clear()
But I get a ConcurrentModificationException.
From reading up on this, apparently using "synchronized" in some fashion is the solution. But I'm seeing a couple different approaches and am wondering what is the best, simplest way to resolve this in my case (with no explicit threads involved)?

If no threads are involved, it's likely that the example you posted is incomplete.
This ConcurrentModificationException can happen simply if you modify the collection while iterating over it.
Note that this exception does not always indicate that an object has been concurrently modified by a different thread. If a single thread issues a sequence of method invocations that violates the contract of an object, the object may throw this exception. For example, if a thread modifies a collection directly while it is iterating over the collection with a fail-fast iterator, the iterator will thow this exception.
And the docs for Vector.listIterator() explicitly state that:
Note that the list iterator returned by this implementation will throw an UnsupportedOperationException in response to its remove, set and add methods unless the list's remove(int), set(int, Object), and add(int, Object) methods are overridden.
(Well actually that the docs for AbstractList.listIterator(int); Vector doesn't redefine that.)

Let me shorten that code for you:
Vector<Integer> v = new Vector<Integer>();
v.add(1);
v.add(2);
v.add(3);
for (Integer i: v) {
System.out.println(i);
}
v.clear();
You shouldn't use listIterator() unless you do forward and backward navigation. For simple iteration use the iterator() method, preferably in the for-each style I show above (where the actual iterator object is hidden from you).
And don't use Vector!!! Use ArrayList instead.

Related

Can a iterator change the collection it is iterating over? Java

I'm attempting to use the number of iterations from an iterator as a counter, but was wondering the ramifications of doing so.
private int length(Iterator<?> it) {
int i = 0;
while(it.hasNext()) {
it.next();
i++;
}
return i;
}
This works fine, but I'm worried about what the iterator may do behind the scenes. Perhaps as I'm iterating over a stack, it pops the items off the stack, or if I'm using a priority queue, and it modifies the priority.
The javadoc say this about iterator:
next
E next()
Returns the next element in the iteration.
Returns:
the next element in the iteration
Throws:
NoSuchElementException - if the iteration has no more elements
I don't see a guarantee that iterating over this unknown collection won't modify it. Am I thinking of unrealistic edge cases, or is this a concern? Is there a better way?
The Iterator simply provides an interface into some sort of stream, therefore not only is it perfectly possible for next() to destroy data in some way, but it's even possible for the data in an Iterator to be unique and irreplaceable.
We could come up with more direct examples, but an easy one is the Iterator in DirectoryStream. While a DirectoryStream is technically Iterable, it only allows one Iterator to be constructed, so if you tried to do the following:
Path dir = ...
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
int count = length(stream.iterator());
for (Path entry: stream) {
...
}
}
You would get an exception in the foreach block, because the stream can only be iterated once. So in summary, it is possible for your length() method to change objects and lose data.
Furthermore, there's no reason an Iterator has to be associated with some separate data-store. Take for example an answer I gave a few months ago providing a clean way to select n random numbers. By using an infinite Iterator we are able to provide, filter, and pass around arbitrarily large amounts of random data lazily, no need to store it all at once, or even compute them until they're needed. Because the Iterator doesn't back any data structure, querying it is obviously destructive.
Now that said, these examples don't make your method bad. Notice that the Guava library (which everyone should be using) provides an Iterators class with exactly the behavior you detail above, called size() to conform with the Collections Framework. The burden is then on the user of such methods to be aware of what sort of data they're working with, and avoid making careless calls such as trying to count the number of results in an Iterator that they know cannot be replaced.
As far as I can tell, the Collection specification does not explicitly state that iterating over a collection does not modify it, but no classes in the standard library show that behaviour (actually at least one does, see dimo414's answer), so any class that did would be highly suspect. I don't think you need to worry about this.
Note that the Guava library implements Iterators.size() and Iterables.size() in the same way that you are, so clearly they find it safe in the general case.
No, iterating over a collection will not modify the collection. The Iterator class does have a remove() method, which is the only safe way of removing an element from a collection during iteration. But simply calling hasNext() and next() will not modify the collection.
Keep in mind that if you modify the object returned by next(), those changes will be present in your collection.
Think about it -- methods that return things are (if they are written correctly) accessor methods, meaning that they just return data. They do not modify it (they are not mutator methods).
Here's an example I had on my disk of how an iterator might be implemented. As you can see, no values are actually modified.
public class ArraySetIterator implements Iterator
{
private int nextIndex;
private ArraySet theArraySet;
public ArraySetIterator (ArraySet a)
{
this.nextIndex = 0;
this.theArraySet = a;
}
public boolean hasNext ()
{
return this.nextIndex < this.theArraySet.size();
}
public Object next()
{
return this.theArraySet.get(this.nextIndex++);
}
}

Need to manually synchronize the Synchronized list while iteration when it could be avoided?

My question is about synchronizedList method Collections Class.
Javadocs say:
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Though manually synchroniziation is not required for other methods. I looked into the source code of Collections class
and found shyncronization has already been taken care for all methods like add
public boolean add(E e) {
synchronized(list) {return c.add(e);}
}
but not for iterator method. I think iterator method could have also handled synchronization in the same fashion
as above method (it would have avoided the extra work i.e manual synchronization for programmers). i am sure there
must be some concrete reason behind it but i am missing it?
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
A way to avoid manual synchronization from Programmer
public Iterator<E> iterator() {
synchronized(list) {
return c.iterator(); // No need to manually synched by user!
}
}
I think iterator method could have also handled synchronization in the same fashion as above method
No, it absolutely couldn't.
The iterator has no control over what your code does between calls to the individual methods on it. That's the point. Your iteration code will call hasNext() and next() repeatedly, and synchronization during those calls is feasible but irrelevant - what's important is that no other code tries to modify the list across the whole time you're iterating.
So imagine a timeline of:
t = 0: call iterator()
t = 1: call hasNext()
t = 2: call next()
// Do lots of work with the returned item
t = 10: call hasNext()
The iterator can't synchronize between the end of the call to next() at t=2 and the call to hasNext() at t=10. So if another thread tries to (say) add an item to the list at t=7, how is the iterator meant to stop it from doing so?
This is the overall problem with synchronized collections: each individual operation is synchronized, whereas typically you want a whole chunky operation to be synchronized.
If you don't synchronize the entire iteration, another thread could modify the collection as you iterate, leading to a ConccurentModificationException.
Also, the returned iterator is not thread-safe.
They could fix that by wrapping the iterator in a SynchronizedIterator that locks every method in the iterator, but that wouldn't help either – another thread could still modify the collection between two iterations, and break everything.
This is one of the reasons that the Collections.synchronized*() methods are completely useless.
For more information about proper thread-safe collection usage, see my blog.
If you want to avoid manual synchronization, you have to use a Collection like java.util.concurrent.CopyOnWriteArrayList. Every time an object is added to the list, the underlying datastructure is copyied to avaoid a concurrent modification exception.
The reason why you need manual serialization on the Iterator in your example is that the Iterator uses the same internal datastructure as the list but they are independend objects and both Iterator and list can be accessed by different threads at any arbitrary moment in time.
Another aproach would be to make a local copy of the list and iterate over the copy.

Need of Iterator class in Java?

The question might be pretty vague I know. But the reason I ask this is because the class must have been made with some thought in mind.
This question came into my mind while browsing through a few questions here on SO.
Consider the following code:
class A
{
private int myVar;
A(int varAsArg)
{
myVar = varAsArg;
}
public static void main(String args[])
{
List<A> myList = new LinkedList<A>();
myList.add(new A(1));
myList.add(new A(2));
myList.add(new A(3));
//I can iterate manually like this:
for(A obj : myList)
System.out.println(obj.myVar);
//Or I can use an Iterator as well:
for(Iterator<A> i = myList.iterator(); i.hasNext();)
{
A obj = i.next();
System.out.println(obj.myVar);
}
}
}
So as you can see from the above code, I have a substitute for iterating using a for loop, whereas, I could do the same using the Iterator class' hasNext() and next() method. Similarly there can be an example for the remove() method. And the experienced users had commented on the other answers to use the Iterator class instead of using the for loop to iterate through the List. Why?
What confuses me even more is that the Iterator class has only three methods. And the functionality of those can be achieved with writing a little different code as well.
Some people might argue that the functionality of many classes can be achieved by writing one's own code instead of using the class made for the purpose. Yes,true. But as I said, Iterator class has only three methods. So why go through the hassle of creating an extra class when the same job can be done with a simple block of code which is not way too complicated to understand either.
EDIT:
While I'm at it, since many of the answers say that I can't achieve the remove functionality without using Iterator,I would just like to know if the following is wrong, or will it have some undesirable result.
for(A obj : myList)
{
if(obj.myVar == 1)
myList.remove(obj);
}
Doesn't the above code snippet do the same thing as remove() ?
Iterator came long before the for statement that you show in the evolution of Java. So that's why it's there. Also if you want to remove something, using Iterator.remove() is the only way you can do it (you can't use the for statement for that).
First of all, the for-each construct actually uses the Iterator interface under the covers. It does not, however, expose the underlying Iterator instance to user code, so you can't call methods on it.
This means that there are some things that require explicit use of the Iterator interface, and cannot be achieved by using a for-each loop.
Removing the current element is one such use case.
For other ideas, see the ListIterator interface. It is a bidirectional iterator that supports inserting elements and changing the element under the cursor. None of this can be done with a for-each loop.
for(A obj : myList)
{
if(obj.myVar == 1)
myList.remove(obj);
}
Doesn't the above code snippet do the same thing as remove() ?
No, it does not. All standard containers that I know of will throw ConcurrentModificationException when you try to do this. Even if it were allowed to work, it is ambiguous (what if obj appears in the list twice?) and inefficient (for linked lists, it would require linear instead of constant time).
The foreach construct (for (X x: list)) actually uses Iterator as its implementation internally. You can feed it any Iterable as a source of elements.
And, as others already remarked: Iterator is longer in Java than foreach, and it provides remove().
Also: how else would you implement your own provider class (myList in your example)? You make it Iterable and implement a method that creates an Iterator.
For one thing, Iterator was created way before the foreach loop (shown in your code sample above) was introduced into Java. (The former came in Java2, the latter only in Java5).
Since Java5, indeed the foreach loop is the preferred idiom for the most common scenario (when you are iterating through a single Iterable at a time, in the default order, and do not need to remove or index elements). Note though that the foreach uses an iterator in the background for standard collection classes; in other words it is just syntactic sugar.
Iterator, listIterator both are used to allow different permission to user, like list iterator have 9 methods but iterator have only 3 methods, but have remove functionality which you can't achieve with for loop. Enumeration is another thing which is also used to give only read permissions.
Iterator is an implementation of the classical GoF design pattern. In that way you can achieve clear behaviour separation from the 'technical code' which iterates (the Iterator) and your business code.
Imagine you have to change the 'next' behaviour (say, by getting not the next element but the next EVEN element). If you rely only on for loops you will have to change manually every single for loop, in a way like this
for (int i; i < list.size(); i = i+2)
while if you use an Iterator you can simply override/rewrite the "next()" and "hasNext()" methods and the change will be visible everywhere in your application.
I think answer to your question is abstraction. Iterator is written because to abstract iterating over different set of collections.
Every collection has different methods to iterate over their elements. ArrayList has indexed access. Queues has poll and peek methods. Stack has pop and peek.
Usually you only need to iterate over elements so Iterator comes into play. You do not care about which type of Collection you need to iterate. You only call iterator() method and user Iterator object itself to do this.
If you ask why not put same methods on Collection interface and get rid of extra object creation. You need to know your current position in collection so you can not implement next method in Collection because you can not use it on different locations because every time you call next() method it will increment index (simplifying every collection has different implementation) so you will skip some objects if you use same collection at different places. Also if collection support concurrency than you can not write a multi-thread safe next() method in collection.
It is usually not safe to remove an object from collection iterating by other means than iterator. Iterator.remove() method is safest way to do it. For ArrayList example:
for(int i=0;i

Why does it.next() throw java.util.ConcurrentModificationException?

final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) C.remove(c);
}
}
}
Not a SSCCE, but can you pick up the smell?
The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees
in the presence of unsynchronized concurrent modification. Fail-fast
iterators throw ConcurrentModificationException on a best-effort
basis. Therefore, it would be wrong to write a program that depended
on this exception for its correctness: the fail-fast behavior of
iterators should be used only to detect bugs.
Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:
If multiple threads access a hash set concurrently, and at least one
of the threads modifies the set, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the set. If no such object exists, the set
should be "wrapped" using the Collections.synchronizedSet method. This
is best done at creation time, to prevent accidental unsynchronized
access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
I believe the references to the Javadoc are self explanatory in what ought to be done next.
Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.
Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.
Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.
Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario
This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.
Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.
The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.
The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.
When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.
To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
}
}
}
The reason is that you are trying to modify the collection outside iterator.
How it works :
When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently.
1. The variable for collection is being incremented for each change made to the collection and and iterator.
2. The variable for iterator is being incremented for each change made to the iterator.
So when you call it.remove() through iterator that increases the value of both the modification-number-variable by 1.
But again when you call collection.remove() on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.
And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.
Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException (thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).
Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.
You should only consider this part of the HashSet javadoc:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c) ), which causes the ConcurrentModificationException. Instead, as explained in the javadoc, you should use the Iterator's own remove() method, which removes the element being currently iterated from the underlying collection.
Replace
if(c.isSomething()) C.remove(c);
with
if(c.isSomething()) it.remove();
If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet:
Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}
PS: note that in both cases, this will only remove elements from the temporary HashSet. The Multimap won't be modified.

why iterator.remove() has been described as optional operation?

I went through the documentation(http://java.sun.com/javase/6/docs/api/java/util/Iterator.html) of Iterator.remove()
there remove() was described as
void remove()
Removes from the underlying collection the last element returned
by the iterator (optional operation).
This method can be called only once
per call to next. The behavior of an
iterator is unspecified if the
underlying collection is modified
while the iteration is in progress in
any way other than by calling this
method.
So can anybody tell what "optional" means.
Does this affect the robustness of operation?(Like c++ ,it does not guarantee the robustness of the operations.)
Why "optional" has been specified categorically here.
What does "modification" mean in the second line of documentation
behavior of an iterator is unspecified if the underlying collection is modified
#1: Optional means you can implement it or throw an UnsupportedOperationException
#2: This operation is optional because sometimes you just don't want your iterator's content to be modified. Or what do you understand by "robustness of operation"?
EDIT #4: behavior of an iterator is unspecified if the underlying collection is modified
Normally, you use an iterator by executing
List<String> c = new ArrayList<String>();
c.add("Item 1");
c.add("Item 2");
c.add("Item 3");
...
for (Iterator<String> i = c.iterator(); i.hasNext();)
{
String s = i.next();
...
}
If you now would want to remove an item while iterating through the list, and you would call
c.remove("Item 2");
this is not clean, possibly corrupts data in your List/Collection/... and should be avoided. Instead, remove() the item through the iterator:
i.remove();
First of all java.util.Iterator is an interface i.e. an agreement how classes that implement this interface interract with the rest of the world. It's their responsibility how they'll implement interaface's methods.
If the underlying data structure doesn't allow removal then remove() will throw an UnsupportedOperationException. For example, if you are iterating through a result set retrieved from a DB it does make sense not to implement this method.
If you iterate over some collection which is shared between concurrent threads and the other thread modifies the data iterating thread then will return undeterministic results.
It is described as being optional because not all collection classes that can give you an iterator implement the remove() method in the iterator they return. If the returned iterator doesn't implement it, an UnsupportedOperationException will be thrown.
The normal java.util.ArrayList, java.util.LinkedList and other standard collection classes all implement the remove() method in their iterators, so you can use it safely.

Categories

Resources