synchronization issue on LinkedHashMap - java

I am confused by one specific point regarding synchronization feature of LinkedHashMap. Here are the related Javadoc where I am confused. My confusion point is, why remove method is special here, which is mentioned by -- "except through the iterator's own remove method"?
http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html
The iterators returned by the iterator method of the collections returned by all of this class's collection view methods are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
thanks in advance,
Lin

Basically, you're not allowed to structurally modify a map while iterating over it, since doing so would invalidate the iterator.
Iterator.remove() is specifically exempt from this, to enable you to easily write code like this:
Iterator<E> it = coll.iterator();
while (it.hasNext()) {
E val = it.next();
if (some_condition) {
it.remove();
}
}

It isn't special, it's the normal way of removing some item from a collection when using an iterator. If an element is removed "outside" the iterator, the iterator view of the given collection becomes inconsistent as compared to the actual collection since the iterator has no way of knowing why or how the element was removed.
Iterators can be of different types. There are iterators which "operate" on a given state of the collection in which case modifying the collection outside the iterator makes no difference. This is common for immutable collections. The other type of iterator is a "fail-fast" one which throws up an exception as soon as it finds out that the iterator is now looking at an old state of the collection (i.e. the collection was modified from outside). As mentioned in the docs, LinkedHashMap uses a fail-fast iterator.

It's not special, it acquires a sort of lock on the LinkedHashMap so that you are able to remove elements just through its remove method.
This because, as specified before:
A structural modification is any operation that adds or deletes one or more mappings or, in the case of access-ordered linked hash maps, affects iteration order.
This means that you have a pending iterator on the data structure, allowing any modification would generate problems with respect to deterministic behavior (since the iterator would become inconsistent, as the underlying structure changed). So the implementation just make any structural modification fail with an exception as soon as you invoke any method if you have an open iterator on it.

Related

How does ConcurrentSkipListSet has Iterators that are weakly consistent? Understanding meaning of 'weakly consistent'

Fail-fast iterator iterates over collection. If collection gets modified while iterating, we get exception. Opposite applies for fail-safe, where iteration is happening on one collection, while write operation happen on copy of it, thus it is how fail-safe works (f.e. CopyOnWriteArrayList).
Can someone explain me how does ConcurrentSkipListSet has fail-safe? There are no copies when modifying collection (like CopyOnWrite classes do), so how does it happen? I read because its Iterator is weakly consistent. I read docs, I still don't understand. (but I do know what code visibility or happens-before relation in concurrency is).
Does anyone have logic and easy to remember explanation, as I am a beginner?
//Example:
ConcurrentSkipListSet<Integer> set = new ConcurrentSkipListSet<>();
set.add(1);
set.add(2);
set.add(3);
set.add(4);
Iterator<Integer> iterator = set.iterator();
while (iterator.hasNext()){
System.out.println(iterator.next());
set.remove(4);
}
OUTPUT:
1
2
3
I was expecting ConcurrentException to be thrown here.. Please help :(
The "weakly consistent" term is defined in the java.util.concurrent package description:
Most concurrent Collection implementations (including most Queues)
also differ from the usual java.util conventions in that their
Iterators and Spliterators provide weakly consistent rather than
fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect
any modifications subsequent to construction.
In this case with ConcurrentSkipListSet, the iterator does not have a "fast-fail" property, instead it reflects the modification of 4 having been removed from the set.
Internally, ConcurrentSkipListSet is implemented with ConcurrentSkipListMap, and its iterators are implemented by keeping track of which skip list node should be traversed next. This naturally gives you the "weakly consistent" property: If the next item is deleted, the iterator will still return it. If items beyond the next are deleted, the iterator will reflect those changes.

How to prevent an ArrayList to be Structually Modified?

I am mapping a Table... say Employee to an ArrayList<Employee> which is a class level variable and I will be using it in multiple places instead of hitting the Data Base each time.
I want to keep it as read only, ie. no one can add or remove an element from the ArrayList once populated.
Can someone suggest a way to achieve this?
Edit: On modification I want some kind of Exception to be thrown.
Thanks in advance
You can use Collections.unmodifiableList. It will pass through reads to the backing list, so any updates to the backing (original) list will affect the immutable view that other classes see.
If you want an unmodifiable list that is not updated when the master list is updated, you'll need to call a copy constructor:
Collections.unmodifiableList(new ArrayList<Employee>(masterList));
Guava's immutable list is also an option in this case.
unmodifiable list is what you want here is the doc,and guava has an immutable list
You can provide a getter which will return a copy of the existing list.
Use a copy constructor for that:
class Employee {
private String id;
...
public Employee(Employee other) {
this.id = other.id;
...
}
}
List<Employee> getEmployeeData()
{
//create a new list using the existing one via copy constructor
return "Newly constructed list";
}
Other approach which comes to my mind is to get a private Iterator on the List after populating it, so if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. But note that the fail-fast behavior of an iterator cannot be guaranteed.
From javaDoc:
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness

Why is there no error when using iterator.remove?

If I use Iterator.remove(), everything is fine. If I use ArryaList.remove(), I always receive the error java.util.ConcurrentModificationException.
Can anyone point out the reason?
The javadocs say it all:
This exception may be thrown by methods that have detected concurrent
modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify
a Collection while another thread is iterating over it. In general,
the results of the iteration are undefined under these circumstances.
Some Iterator implementations (including those of all the general
purpose collection implementations provided by the JRE) may choose to
throw this exception if this behavior is detected. Iterators that do
this are known as fail-fast iterators, as they fail quickly and
cleanly, rather that risking arbitrary, non-deterministic behavior at
an undetermined time in the future.
You're modifying a collection and iterating over it at the same time.
Basically because that's how it's designed to work. If you delete an element from the list, the iterator doesn't know about it, and when it tries to access another element in the list, the list has changed and an error is raised. But if you remove an element through the iterator, then the itertor knows about the removal, makes the appropriate adjustment to its data structures, and continues.
So exactly from docs
This exception may be thrown by methods that have detected concurrent
modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify
a Collection while another thread is iterating over it. In general,
the results of the iteration are undefined under these circumstances.
Some Iterator implementations (including those of all the general
purpose collection implementations provided by the JRE) may choose to
throw this exception if this behavior is detected. Iterators that do
this are known as fail-fast iterators, as they fail quickly and
cleanly, rather that risking arbitrary, non-deterministic behavior at
an undetermined time in the future.
Have look at Class ConcurrentModificationException
If you need to remove an element while you are iterating through a list then you should always use iterator.
e.g.
List<String> list = //...
//...
Iterator<String> iter = list.iterator();
while(iter.hasNext()) {
String str = itr.next();
//...
if(/* ... */) {
iter.remove(str);
}
}
That's the safest way of removing an element from a list while you are iterating through the list.
Note: You should not do something as follows:
List<String> list = //...
//...
for(String str : list) {
if(/* ... */) {
list.remove(); // <-- should not do this
}
}
Where mostly you will end up in a ConcurrentModificationException. That enhanced for loop uses the iterator under the hood. And you are manually removing the element from the list instead of going through the iterator. That's why you get that exception.

How does an Iterator throw ConcurrentModificationException on add

How does Iterator throw ConcurrentModificationException when we are adding some object after current node or removing some object after current node. Does Iterator maintain a copy or reference to the underlying collection?
The iterator maintains a reference to the underlying collection. If you add or remove an element, the iterator might be left at an impossible index, or the collection might change "out from underneath" the iterator.
Therefore, instead of letting the iterator get corrupted without letting you know, most collections do the courtesy of throwing a ConcurrentModificationException when you try to modify the collection while iterating, so you don't wind up with unpredictably corrupted iterators.
By contract, you are not allowed to modify the collection while iterating over it (except by using Iterator.remove() et al).
Instead of randomly failing when you do this, the collection is nice enough to keep track of how many times it's been modified, and throw ConcurrentModificationException when it detects concurrent modification.
That ConcurrentModificationException is probably your friend and you ought to learn to live with it. However, just for completeness:
There are non-Oracle collections out there that don't throw ConcurrentModificationException. They're faster (because they don't spend time checking) and, obviously, more flexible, but they require greater care when using.
Oracle has four (at last count) "Concurrent" classes that don't throw it either in java.util.concurrent (ConcurrentHashMap, ConcurrentLinkedQueue, ConcurrentSkipListMap, and ConcurrentSkipListSet). They're marginally slower than their non-concurrent equivalents, but they're thread-safe and they dont block. They won't scramble your data no matter what you do, but they won't stop you from scrambling it.
For removing you can use iterator.remove(), as follows:
for (Iterator iterator = list.iterator(); iterator.hasNext();) {
Object object = iterator.next();
/* ... */
if (condition) {
iterator.remove();
}
For adding you can replace simple Iterator for ListIterator, as follows
ListIterator<Object> iterator = list.listIterator();
iterator.add(new Object());
Of course an iterator has a link to the underlying collection, this avoids the copy. If you look for example at the source code of ArrayList iterator (ListItr), you'll see it mostly has a link to the list and a cursor.
So, don't share an iterator between threads and don't modify a collection on which you're iterating.

Why does it.next() throw java.util.ConcurrentModificationException?

final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) C.remove(c);
}
}
}
Not a SSCCE, but can you pick up the smell?
The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees
in the presence of unsynchronized concurrent modification. Fail-fast
iterators throw ConcurrentModificationException on a best-effort
basis. Therefore, it would be wrong to write a program that depended
on this exception for its correctness: the fail-fast behavior of
iterators should be used only to detect bugs.
Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:
If multiple threads access a hash set concurrently, and at least one
of the threads modifies the set, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the set. If no such object exists, the set
should be "wrapped" using the Collections.synchronizedSet method. This
is best done at creation time, to prevent accidental unsynchronized
access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
I believe the references to the Javadoc are self explanatory in what ought to be done next.
Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.
Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.
Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.
Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario
This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.
Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.
The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.
The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.
When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.
To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:
final Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term t : terms.keySet()) {
Collection<BooleanClause> C = new HashSet(terms.get(t));
if (!C.isEmpty()) {
for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
BooleanClause c = it.next();
if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
}
}
}
The reason is that you are trying to modify the collection outside iterator.
How it works :
When you create an iterator the collection maintains a modificationNum-variable for both the collection and the iterator independently.
1. The variable for collection is being incremented for each change made to the collection and and iterator.
2. The variable for iterator is being incremented for each change made to the iterator.
So when you call it.remove() through iterator that increases the value of both the modification-number-variable by 1.
But again when you call collection.remove() on collection directly, that increments only the value of the modification-numbervariable for the collection, but not the variable for the iterator.
And rule is : whenever the modification-number value for the iterator does not match with the original collection modification-number value, it gives ConcurrentModificationException.
Vineet Reynolds has explained in great details the reasons why collections throw a ConcurrentModificationException (thread-safety, concurrency). Swagatika has explained in great details the implementation details of this mechanism (how collection and iterator keep count of the number of modifications).
Their answers were interesting, and I upvoted them. But, in your case, the problem does not come from concurrency (you have only one thread), and implementation details, while interesting, should not be considered here.
You should only consider this part of the HashSet javadoc:
The iterators returned by this class's iterator method are fail-fast:
if the set is modified at any time after the iterator is created, in
any way except through the iterator's own remove method, the Iterator
throws a ConcurrentModificationException. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly,
rather than risking arbitrary, non-deterministic behavior at an
undetermined time in the future.
In your code, you iterate over your HashSet using its iterator, but you use the HashSet's own remove method to remove elements ( C.remove(c) ), which causes the ConcurrentModificationException. Instead, as explained in the javadoc, you should use the Iterator's own remove() method, which removes the element being currently iterated from the underlying collection.
Replace
if(c.isSomething()) C.remove(c);
with
if(c.isSomething()) it.remove();
If you want to use a more functional approach, you could create a Predicate and use Guava's Iterables.removeIf() method on the HashSet:
Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}
PS: note that in both cases, this will only remove elements from the temporary HashSet. The Multimap won't be modified.

Categories

Resources