I went through the documentation(http://java.sun.com/javase/6/docs/api/java/util/Iterator.html) of Iterator.remove()
there remove() was described as
void remove()
Removes from the underlying collection the last element returned
by the iterator (optional operation).
This method can be called only once
per call to next. The behavior of an
iterator is unspecified if the
underlying collection is modified
while the iteration is in progress in
any way other than by calling this
method.
So can anybody tell what "optional" means.
Does this affect the robustness of operation?(Like c++ ,it does not guarantee the robustness of the operations.)
Why "optional" has been specified categorically here.
What does "modification" mean in the second line of documentation
behavior of an iterator is unspecified if the underlying collection is modified
#1: Optional means you can implement it or throw an UnsupportedOperationException
#2: This operation is optional because sometimes you just don't want your iterator's content to be modified. Or what do you understand by "robustness of operation"?
EDIT #4: behavior of an iterator is unspecified if the underlying collection is modified
Normally, you use an iterator by executing
List<String> c = new ArrayList<String>();
c.add("Item 1");
c.add("Item 2");
c.add("Item 3");
...
for (Iterator<String> i = c.iterator(); i.hasNext();)
{
String s = i.next();
...
}
If you now would want to remove an item while iterating through the list, and you would call
c.remove("Item 2");
this is not clean, possibly corrupts data in your List/Collection/... and should be avoided. Instead, remove() the item through the iterator:
i.remove();
First of all java.util.Iterator is an interface i.e. an agreement how classes that implement this interface interract with the rest of the world. It's their responsibility how they'll implement interaface's methods.
If the underlying data structure doesn't allow removal then remove() will throw an UnsupportedOperationException. For example, if you are iterating through a result set retrieved from a DB it does make sense not to implement this method.
If you iterate over some collection which is shared between concurrent threads and the other thread modifies the data iterating thread then will return undeterministic results.
It is described as being optional because not all collection classes that can give you an iterator implement the remove() method in the iterator they return. If the returned iterator doesn't implement it, an UnsupportedOperationException will be thrown.
The normal java.util.ArrayList, java.util.LinkedList and other standard collection classes all implement the remove() method in their iterators, so you can use it safely.
Related
I'm having a problem when testing a class that uses a hashset, when I iterate over the elements, I'm getting a ConcurrentModificationException even though, as far as I can tell (Single threaded app), only one thread ever accesses the class at any time. This breaks when two identical entries are added to the list.
private final HashSet<?> entries = new HashSet<>(10);
/**
* Updates an existing entry if it exists, if not, adds it to the library.
*
* #param <T> The type to add
* #param object The object to test for existence of and to update to
* #param key The class of the object
*/
public <T> void add(T object, Class<T> key) {
this.entries.stream().filter((entry) -> (object.equals(entry.getStorage()))).forEach(this.entries::remove);
this.entries.add(new ClanLibraryEntry<>(object, key));
}
Note the javadoc for HashSet:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException.
This does not actually require multiple threads. It only requires the set being modified by something other than the Iterator itself.
You're using streams, so this code violates the general streams principle against non-interference. See the java.util.stream package documentation in the section on "Non-interference". This is essentially a generalization of the fail-fast property of iterators that others have cited. The streams implementation doesn't necessarily use the source collection's iterator, so that rule doesn't apply directly; however, the concept is the same: you mustn't modify the stream's source while the stream is in operation. The penalty may be a ConcurrentModificationException if you're lucky (because this tells you the code is doing something wrong) or if you're not lucky, inconsistent or incorrect results.
It looks the intent is simply to remove certain matching elements. The Collection interface has a default method removeIf that, given a predicate, does exactly this. You can use this instead of streams. Here's the code:
this.entries.removeIf(entry -> object.equals(entry.getStorage()));
The following are from the HashSet manual (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html):
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
My question is about synchronizedList method Collections Class.
Javadocs say:
It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
...
synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Though manually synchroniziation is not required for other methods. I looked into the source code of Collections class
and found shyncronization has already been taken care for all methods like add
public boolean add(E e) {
synchronized(list) {return c.add(e);}
}
but not for iterator method. I think iterator method could have also handled synchronization in the same fashion
as above method (it would have avoided the extra work i.e manual synchronization for programmers). i am sure there
must be some concrete reason behind it but i am missing it?
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
A way to avoid manual synchronization from Programmer
public Iterator<E> iterator() {
synchronized(list) {
return c.iterator(); // No need to manually synched by user!
}
}
I think iterator method could have also handled synchronization in the same fashion as above method
No, it absolutely couldn't.
The iterator has no control over what your code does between calls to the individual methods on it. That's the point. Your iteration code will call hasNext() and next() repeatedly, and synchronization during those calls is feasible but irrelevant - what's important is that no other code tries to modify the list across the whole time you're iterating.
So imagine a timeline of:
t = 0: call iterator()
t = 1: call hasNext()
t = 2: call next()
// Do lots of work with the returned item
t = 10: call hasNext()
The iterator can't synchronize between the end of the call to next() at t=2 and the call to hasNext() at t=10. So if another thread tries to (say) add an item to the list at t=7, how is the iterator meant to stop it from doing so?
This is the overall problem with synchronized collections: each individual operation is synchronized, whereas typically you want a whole chunky operation to be synchronized.
If you don't synchronize the entire iteration, another thread could modify the collection as you iterate, leading to a ConccurentModificationException.
Also, the returned iterator is not thread-safe.
They could fix that by wrapping the iterator in a SynchronizedIterator that locks every method in the iterator, but that wouldn't help either – another thread could still modify the collection between two iterations, and break everything.
This is one of the reasons that the Collections.synchronized*() methods are completely useless.
For more information about proper thread-safe collection usage, see my blog.
If you want to avoid manual synchronization, you have to use a Collection like java.util.concurrent.CopyOnWriteArrayList. Every time an object is added to the list, the underlying datastructure is copyied to avaoid a concurrent modification exception.
The reason why you need manual serialization on the Iterator in your example is that the Iterator uses the same internal datastructure as the list but they are independend objects and both Iterator and list can be accessed by different threads at any arbitrary moment in time.
Another aproach would be to make a local copy of the list and iterate over the copy.
I am confused by one specific point regarding synchronization feature of LinkedHashMap. Here are the related Javadoc where I am confused. My confusion point is, why remove method is special here, which is mentioned by -- "except through the iterator's own remove method"?
http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html
The iterators returned by the iterator method of the collections returned by all of this class's collection view methods are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
thanks in advance,
Lin
Basically, you're not allowed to structurally modify a map while iterating over it, since doing so would invalidate the iterator.
Iterator.remove() is specifically exempt from this, to enable you to easily write code like this:
Iterator<E> it = coll.iterator();
while (it.hasNext()) {
E val = it.next();
if (some_condition) {
it.remove();
}
}
It isn't special, it's the normal way of removing some item from a collection when using an iterator. If an element is removed "outside" the iterator, the iterator view of the given collection becomes inconsistent as compared to the actual collection since the iterator has no way of knowing why or how the element was removed.
Iterators can be of different types. There are iterators which "operate" on a given state of the collection in which case modifying the collection outside the iterator makes no difference. This is common for immutable collections. The other type of iterator is a "fail-fast" one which throws up an exception as soon as it finds out that the iterator is now looking at an old state of the collection (i.e. the collection was modified from outside). As mentioned in the docs, LinkedHashMap uses a fail-fast iterator.
It's not special, it acquires a sort of lock on the LinkedHashMap so that you are able to remove elements just through its remove method.
This because, as specified before:
A structural modification is any operation that adds or deletes one or more mappings or, in the case of access-ordered linked hash maps, affects iteration order.
This means that you have a pending iterator on the data structure, allowing any modification would generate problems with respect to deterministic behavior (since the iterator would become inconsistent, as the underlying structure changed). So the implementation just make any structural modification fail with an exception as soon as you invoke any method if you have an open iterator on it.
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) C.remove(c);
}
}
}
Not a SSCCE, but can you pick up the smell?
The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees
in the presence of unsynchronized concurrent modification. Fail-fast
iterators throw ConcurrentModificationException on a best-effort
basis. Therefore, it would be wrong to write a program that depended
on this exception for its correctness: the fail-fast behavior of
iterators should be used only to detect bugs.
Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:
If multiple threads access a hash set concurrently, and at least one
of the threads modifies the set, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the set. If no such object exists, the set
should be "wrapped" using the Collections.synchronizedSet method. This
is best done at creation time, to prevent accidental unsynchronized
access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
I believe the references to the Javadoc are self explanatory in what ought to be done next.
Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.
Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.
Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.
Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario
This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.
Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.
The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.
The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.
When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.
To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
}
}
}
The reason is that you are trying to modify the collection outside iterator.
How it works :
When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently.
1. The variable for collection is being incremented for each change made to the collection and and iterator.
2. The variable for iterator is being incremented for each change made to the iterator.
So when you call it.remove() through iterator that increases the value of both the modification-number-variable by 1.
But again when you call collection.remove() on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.
And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.
Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException (thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).
Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.
You should only consider this part of the HashSet javadoc:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c) ), which causes the ConcurrentModificationException. Instead, as explained in the javadoc, you should use the Iterator's own remove() method, which removes the element being currently iterated from the underlying collection.
Replace
if(c.isSomething()) C.remove(c);
with
if(c.isSomething()) it.remove();
If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet:
Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}
PS: note that in both cases, this will only remove elements from the temporary HashSet. The Multimap won't be modified.
I have a very simple snippet of code that populates a vector, iterates through it, and then clears it. Here is basically what I'm trying in principle:
vector v = new Vector();
v.add(1);
v.add(2);
v.add(3);
ListIterator iter = v.listIterator();
while (iter.hasNext()) {
System.out.println(iter.next());
}
v.clear()
But I get a ConcurrentModificationException.
From reading up on this, apparently using "synchronized" in some fashion is the solution. But I'm seeing a couple different approaches and am wondering what is the best, simplest way to resolve this in my case (with no explicit threads involved)?
If no threads are involved, it's likely that the example you posted is incomplete.
This ConcurrentModificationException can happen simply if you modify the collection while iterating over it.
Note that this exception does not always indicate that an object has been concurrently modified by a different thread. If a single thread issues a sequence of method invocations that violates the contract of an object, the object may throw this exception. For example, if a thread modifies a collection directly while it is iterating over the collection with a fail-fast iterator, the iterator will thow this exception.
And the docs for Vector.listIterator() explicitly state that:
Note that the list iterator returned by this implementation will throw an UnsupportedOperationException in response to its remove, set and add methods unless the list's remove(int), set(int, Object), and add(int, Object) methods are overridden.
(Well actually that the docs for AbstractList.listIterator(int); Vector doesn't redefine that.)
Let me shorten that code for you:
Vector<Integer> v = new Vector<Integer>();
v.add(1);
v.add(2);
v.add(3);
for (Integer i: v) {
System.out.println(i);
}
v.clear();
You shouldn't use listIterator() unless you do forward and backward navigation. For simple iteration use the iterator() method, preferably in the for-each style I show above (where the actual iterator object is hidden from you).
And don't use Vector!!! Use ArrayList instead.