HashSet's Entries Give ConcurrentModificationException When Iterated Over - java

I'm having a problem when testing a class that uses a hashset, when I iterate over the elements, I'm getting a ConcurrentModificationException even though, as far as I can tell (Single threaded app), only one thread ever accesses the class at any time. This breaks when two identical entries are added to the list.
private final HashSet<?> entries = new HashSet<>(10);
/**
* Updates an existing entry if it exists, if not, adds it to the library.
*
* #param <T> The type to add
* #param object The object to test for existence of and to update to
* #param key The class of the object
*/
public <T> void add(T object, Class<T> key) {
this.entries.stream().filter((entry) -> (object.equals(entry.getStorage()))).forEach(this.entries::remove);
this.entries.add(new ClanLibraryEntry<>(object, key));
}

Note the javadoc for HashSet:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException.
This does not actually require multiple threads. It only requires the set being modified by something other than the Iterator itself.

You're using streams, so this code violates the general streams principle against non-interference. See the java.util.stream package documentation in the section on "Non-interference". This is essentially a generalization of the fail-fast property of iterators that others have cited. The streams implementation doesn't necessarily use the source collection's iterator, so that rule doesn't apply directly; however, the concept is the same: you mustn't modify the stream's source while the stream is in operation. The penalty may be a ConcurrentModificationException if you're lucky (because this tells you the code is doing something wrong) or if you're not lucky, inconsistent or incorrect results.
It looks the intent is simply to remove certain matching elements. The Collection interface has a default method removeIf that, given a predicate, does exactly this. You can use this instead of streams. Here's the code:
this.entries.removeIf(entry -> object.equals(entry.getStorage()));

The following are from the HashSet manual (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html):
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Related

What's the difference between returning an Iterator and an unmodifiable list?

From what I understand, both are used to return unmodifiable objects, but why would one be used over the other?
Edit: My mistake, Iterators are not unmodifiable. But I'm still unclear on why an Iterator might be used instead of a List.
Return a List when you need a list; return an Iterator when you need an iterator. They are completely different data structures that serve different purposes.
As an aside, an Iterator might be able to modify the underlying collection through the remove() method (which removes the last item returned by next()).
Edit (in response to OP's edit): You would want to return an Iterator if you specifically wanted to prevent client code from seeing the underlying list, or you want to force sequential access to the elements. It's also required that you write an iterator() method that returns an Iterator when your class implements Iterable. There's also the interesting case of returning an Iterator that iterates through an undetermined collection of items, that might not even exist in memory until they are requested. (Think, for instance, of state space searching in an AI-based application.)

How to prevent an ArrayList to be Structually Modified?

I am mapping a Table... say Employee to an ArrayList<Employee> which is a class level variable and I will be using it in multiple places instead of hitting the Data Base each time.
I want to keep it as read only, ie. no one can add or remove an element from the ArrayList once populated.
Can someone suggest a way to achieve this?
Edit: On modification I want some kind of Exception to be thrown.
Thanks in advance
You can use Collections.unmodifiableList. It will pass through reads to the backing list, so any updates to the backing (original) list will affect the immutable view that other classes see.
If you want an unmodifiable list that is not updated when the master list is updated, you'll need to call a copy constructor:
Collections.unmodifiableList(new ArrayList<Employee>(masterList));
Guava's immutable list is also an option in this case.
unmodifiable list is what you want here is the doc,and guava has an immutable list
You can provide a getter which will return a copy of the existing list.
Use a copy constructor for that:
class Employee {
private String id;
...
public Employee(Employee other) {
this.id = other.id;
...
}
}
List<Employee> getEmployeeData()
{
//create a new list using the existing one via copy constructor
return "Newly constructed list";
}
Other approach which comes to my mind is to get a private Iterator on the List after populating it, so if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. But note that the fail-fast behavior of an iterator cannot be guaranteed.
From javaDoc:
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness

synchronization issue on LinkedHashMap

I am confused by one specific point regarding synchronization feature of LinkedHashMap. Here are the related Javadoc where I am confused. My confusion point is, why remove method is special here, which is mentioned by -- "except through the iterator's own remove method"?
http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html
The iterators returned by the iterator method of the collections returned by all of this class's collection view methods are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
thanks in advance,
Lin
Basically, you're not allowed to structurally modify a map while iterating over it, since doing so would invalidate the iterator.
Iterator.remove() is specifically exempt from this, to enable you to easily write code like this:
Iterator<E> it = coll.iterator();
while (it.hasNext()) {
E val = it.next();
if (some_condition) {
it.remove();
}
}
It isn't special, it's the normal way of removing some item from a collection when using an iterator. If an element is removed "outside" the iterator, the iterator view of the given collection becomes inconsistent as compared to the actual collection since the iterator has no way of knowing why or how the element was removed.
Iterators can be of different types. There are iterators which "operate" on a given state of the collection in which case modifying the collection outside the iterator makes no difference. This is common for immutable collections. The other type of iterator is a "fail-fast" one which throws up an exception as soon as it finds out that the iterator is now looking at an old state of the collection (i.e. the collection was modified from outside). As mentioned in the docs, LinkedHashMap uses a fail-fast iterator.
It's not special, it acquires a sort of lock on the LinkedHashMap so that you are able to remove elements just through its remove method.
This because, as specified before:
A structural modification is any operation that adds or deletes one or more mappings or, in the case of access-ordered linked hash maps, affects iteration order.
This means that you have a pending iterator on the data structure, allowing any modification would generate problems with respect to deterministic behavior (since the iterator would become inconsistent, as the underlying structure changed). So the implementation just make any structural modification fail with an exception as soon as you invoke any method if you have an open iterator on it.

Why does it.next() throw java.util.ConcurrentModificationException?

final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) C.remove(c);
}
}
}
Not a SSCCE, but can you pick up the smell?
The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees
in the presence of unsynchronized concurrent modification. Fail-fast
iterators throw ConcurrentModificationException on a best-effort
basis. Therefore, it would be wrong to write a program that depended
on this exception for its correctness: the fail-fast behavior of
iterators should be used only to detect bugs.
Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:
If multiple threads access a hash set concurrently, and at least one
of the threads modifies the set, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the set. If no such object exists, the set
should be "wrapped" using the Collections.synchronizedSet method. This
is best done at creation time, to prevent accidental unsynchronized
access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
I believe the references to the Javadoc are self explanatory in what ought to be done next.
Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.
Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.
Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.
Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario
This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.
Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.
The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.
The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.
When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.
To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
}
}
}
The reason is that you are trying to modify the collection outside iterator.
How it works :
When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently.
1. The variable for collection is being incremented for each change made to the collection and and iterator.
2. The variable for iterator is being incremented for each change made to the iterator.
So when you call it.remove() through iterator that increases the value of both the modification-number-variable by 1.
But again when you call collection.remove() on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.
And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.
Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException (thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).
Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.
You should only consider this part of the HashSet javadoc:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c) ), which causes the ConcurrentModificationException. Instead, as explained in the javadoc, you should use the Iterator's own remove() method, which removes the element being currently iterated from the underlying collection.
Replace
if(c.isSomething()) C.remove(c);
with
if(c.isSomething()) it.remove();
If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet:
Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}
PS: note that in both cases, this will only remove elements from the temporary HashSet. The Multimap won't be modified.

why iterator.remove() has been described as optional operation?

I went through the documentation(http://java.sun.com/javase/6/docs/api/java/util/Iterator.html) of Iterator.remove()
there remove() was described as
void remove()
Removes from the underlying collection the last element returned
by the iterator (optional operation).
This method can be called only once
per call to next. The behavior of an
iterator is unspecified if the
underlying collection is modified
while the iteration is in progress in
any way other than by calling this
method.
So can anybody tell what "optional" means.
Does this affect the robustness of operation?(Like c++ ,it does not guarantee the robustness of the operations.)
Why "optional" has been specified categorically here.
What does "modification" mean in the second line of documentation
behavior of an iterator is unspecified if the underlying collection is modified
#1: Optional means you can implement it or throw an UnsupportedOperationException
#2: This operation is optional because sometimes you just don't want your iterator's content to be modified. Or what do you understand by "robustness of operation"?
EDIT #4: behavior of an iterator is unspecified if the underlying collection is modified
Normally, you use an iterator by executing
List<String> c = new ArrayList<String>();
c.add("Item 1");
c.add("Item 2");
c.add("Item 3");
...
for (Iterator<String> i = c.iterator(); i.hasNext();)
{
String s = i.next();
...
}
If you now would want to remove an item while iterating through the list, and you would call
c.remove("Item 2");
this is not clean, possibly corrupts data in your List/Collection/... and should be avoided. Instead, remove() the item through the iterator:
i.remove();
First of all java.util.Iterator is an interface i.e. an agreement how classes that implement this interface interract with the rest of the world. It's their responsibility how they'll implement interaface's methods.
If the underlying data structure doesn't allow removal then remove() will throw an UnsupportedOperationException. For example, if you are iterating through a result set retrieved from a DB it does make sense not to implement this method.
If you iterate over some collection which is shared between concurrent threads and the other thread modifies the data iterating thread then will return undeterministic results.
It is described as being optional because not all collection classes that can give you an iterator implement the remove() method in the iterator they return. If the returned iterator doesn't implement it, an UnsupportedOperationException will be thrown.
The normal java.util.ArrayList, java.util.LinkedList and other standard collection classes all implement the remove() method in their iterators, so you can use it safely.

Categories

Resources