Yet another ConcurrentModificationException question - java

I am currently trying to learn how to properly handle multi-threaded access to Collections, so I wrote the following Java application.
As you can see, I create a synchronized ArrayList which I try to access once from within a Thread and once without.
I iterate over the ArrayList using a for loop. In order to prevent multiple access on the List at the same time, I wrapped the loop into a synchronized block.
public class ThreadTest {
Collection<Integer> data = Collections.synchronizedList(new ArrayList<Integer>());
final int MAX = 999;
/**
* Default constructor
*/
public ThreadTest() {
initData();
startThread();
startCollectionWork();
}
private int getRandom() {
Random randomGenerator = new Random();
return randomGenerator.nextInt(100);
}
private void initData() {
for (int i = 0; i < MAX; i++) {
data.add(getRandom());
}
}
private void startCollectionWork() {
System.out.println("\nStarting to work on data outside of thread");
synchronized (data) {
System.out.println("\nEntered synchronized block outside of thread");
for (int value : data) { // ConcurrentModificationException here!
if (value % 5 == 1) {
System.out.println(value);
data.remove(value);
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
}
System.out.println("Done working on data outside of thread");
}
private void startThread() {
Thread thread = new Thread() {
#Override
public void run() {
System.out.println("\nStarting to work on data in a new thread");
synchronized (data) {
System.out.println("\nEntered synchronized block in thread");
for (int value : data) { // ConcurrentModificationException
if (value % 5 == 1) {
System.out.println(value);
data.remove(value);
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
}
System.out.println("Done working on data in a new thread");
}
};
thread.start();
}
}
But everytime one of the for loops gets entered, I get a ConcurrentModificationException. This is my console output (which changes with every new run):
Starting to work on data outside of thread
Entered synchronized block outside of thread
51
Starting to work on data in a new thread
Entered synchronized block in thread
value % 5 = 2
value % 5 = 2
value % 5 = 4
value % 5 = 3
value % 5 = 2
value % 5 = 2
value % 5 = 0
21
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at ThreadTest.startCollectionWork(ThreadTest.java:50)
at ThreadTest.<init>(ThreadTest.java:32)
at MultiThreadingTest.main(MultiThreadingTest.java:18)
Exception in thread "Thread-1" java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at ThreadTest$1.run(ThreadTest.java:70)
What's wrong?
Ps: Please don't just post links to multi-threaded how-to's since I've already read enough about that. I am just curious why my application doesn't run as I want it to.
Update: I replaced the for(x : y) syntax with an explicit Iterator and a while loop. The problem remains though..
synchronized(data){
Iterator<Integer> i = data.iterator();
while (i.hasNext()) {
int value = i.next(); // ConcurrentModificationException here!
if (value % 5 == 1) {
System.out.println(value);
i.remove();
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
}

Once you iterate over a collection, you have a contract between the iterating block of code and the collection as it exists at that moment in time. The contract basically states that you'll get each item in the collection once, in the order of the iteration.
The problem is that if you modify the Collection while something is iterating, you cannot maintain that contract. Deletions in a collection will remove the element from the collection, and that element might be required to be present for the initial iteration to satisfy the contract. Insertions in a Collection will likewise present issues if the element might be detected by the iteration that started prior to the element existing in the collection.
While it is easier to break this contract with multiple threads, you can break the contract with a single thread (if you choose to do so).
How this is typically implemented is the collection contains a "revision number", and prior to the iterator grabbing the "next" element in the collection, it checks to see if the collection's revision number is still the same as it was when the iterator started. This is just one way of implementing it, there are others.
So, if you want to iterate over something that you might want to change, an appropriate technique is to make a copy of the collection and iterate over that copy. That way you can modify the original collection and yet not alter the count, position, and presence of the items you were planning to process. Yes, there are other techniques, but conceptually they all fall into the "protect the copy you're iterating across while changing something else that the iterator doesn't access".

The ConcurrentModificationException appears, because you're modifing the list while iterating it... it has nothing to do with multiple threads in this case...
for (int value : data) { // ConcurrentModificationException here!
if (value % 5 == 1) {
System.out.println(value);
data.remove(value); // you cannot do this
data.add(value + 1); // or that
} else {
System.out.println("value % 5 = " + value % 5);
}

The enhanced for loop
for (int value : data) {
uses a Java Iterator under the covers. Iterators are fail-fast, so if the underlying Collection gets modified (i.e. by removing an element) while the Iterator is active you get the Exception. Your code here causes such a change to the underlying Collection:
data.remove(value);
data.add(value + 1);
Change your code to use java.util.Iterator explicitly and use its remove method. If you need to add elements to the Collection while iterating you may want to look at a suitable data structure from java.util.concurrent package e.g. BlockingQueue, where you can call its take method which will block until there is data present; but new Objects can be added via the offer method (very simplistic overview - Google for more)

The problem is not because of multithreading, you made if safe. But only because you modified collection while iterating it.

You are only allowed to remove items from the list through an iterator if you are iterating over the collection, so you can get ConcurrentModificationException with only one thread.
Updated reply after updated question:
You aren't allowed to add elements to the list while you are iterating.

As has been pointed out, you are modifying the Collection while iterating over it, which causes your problem.
Change data to a List and then use a regular for loop to step over it. Then you will no longer have an Iterator to deal with, thus eliminating your problem.
List<Integer> data =...
for (int i=0; i<data.size(); i++) {
int value = data.get(i);
if (value % 5 == 1) {
System.out.println(value);
i.remove();
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}

ConcurrentModificationException is not eliminated by using a synchronized block around the collection on which iteration is done. The exception occurs on the following sequence of steps :
Obtain an iterator from a collecion ( by calling its iterator method or by the for loop construct ).
Begin iterating ( by calling next() or in the for loop )
Have the collection modified ( by any means other than the iterator's methods ) ( either in the same thread or a different one : this is what is happening in your code ). Note that this modification may happen in a thread-safe manner ie sequentially or one after another - it does not matter- it will still lead to CME )
Continue iteration using the same iterator obtained earlier ( before modification in step 3 )
In order to avoid getting the exception , you must make sure that you do not modify the collection after you start your for loop in either of the threads, till the loop is finished.

Related

Java Multithreading: threads shall access list

I'm running with five threads, and I have a list of objects (which I initialize independently of the threads).
The objects in the list use a boolean as a flag, so I know if they have been handled by another Thread already. Also, my Thread has an Integer for its "ID" (so U know which thread is currently working).
The problem: The first thread that gets a hand on the for-loop will handle all objects in the list, but I want the threads to alternate. What am I doing wrong?
the run() method looks similar to this:
void run() {
for (int i = 0; i < list.size(); i++) {
ListObject currentObject = list.get(i);
synchronized (currentObject) {
if (currentObject.getHandled == false) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
} else {
continue;
}
}
}
}
TL;DR Explicitly or implicitly divide the list among the threads; and synchronization if really needed;
The problem: The first thread that gets a hand on the for-loop will
handle all objects in the list, but i want the threads to alternate.
What am I doing wrong?
That is expectable this entire block of code
for (int i = 0; i < list.size(); i++) {
ListObject currentObject = list.get(i);
synchronized (currentObject) {
....
}
}
is basically being executed sequentially since each thread synchronizes in every iteration using the Object currentObject implicit lock. All five threads enter the run method, however one of them enters first in the synchronized (currentObject) all the other will wait in turn for the first thread to release the currentObject implicitly lock. When the thread is finished moves on to the next iteration while the remaining threads are still in the previous iteration. Hence, the first thread entering synchronized (currentObject) will have a head start, and will be steps head of the previous threads, and will likely compute all the remains iterations. Consequently:
The first thread that gets a hand on the for-loop will handle all
objects in the list,
As it is you would be better off performance-wise and readability-wise executing the code sequentially.
Assumption
I am assuming that
the objects stored on the list are not being accessed elsewhere at the same time that those threads are iterating through the list;
the list does not contain multiple references to the same object;
I would suggest that instead of every thread iterating over the entire list and synchronizing in every iteration -- which is extremely non perform and actually defeats the point of parallelism -- every thread would compute a different chunk of the list (e.g., dividing the iterations of the for loop among the threads). For instance:
Approach 1: Using Parallel Stream
If you don't have to explicitly parallelize your code then consider using ParallelStream:
list.parallelStream().forEach(this::setHandled);
private void setHandled(ListObject currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
Approach 2 : If you have to explicitly parallelized the code using executors
I'm running five threads,
(as first illustrated by ernest_k)
ExecutorService ex = Executors.newFixedThreadPool(5);
for (ListObject l : list)
ex.submit(() -> setHandled(l));
...
private void setHandled(ListObject currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
Approach 3: If you have to explicitly use the Threads
void run() {
for (int i = threadID; i < list.size(); i += total_threads) {
ListObject currentObject = list.get(i);
if (currentObject.getHandled == false) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
}
In this approach, I am splitting the iterations of the for loop among threads in a round-robin fashion, assuming that total_threads is the number of threads that will compute the run method, and that each thread will have a unique threadID ranging from 0 to total_threads - 1. Other approaches to distribute the iterations among threads would also so be visible, for instance dynamically distribution the iterations among threads:
void run() {
for (int i = task.getAndIncrement(); i < list.size(); i = task.getAndIncrement();) {
ListObject currentObject = list.get(i);
if (currentObject.getHandled == false) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
}
where task would be an atomic integer (i.e., AtomicInteger task = new AtomicInteger();).
In all approaches the idea is the same assign different chunks of the list to the threads so that those threads can execute those chunks independently of each other.
If the assumptions 1. and 2. cannot be made then you can still apply the aforementioned logic of splitting the iterations among threads but you will need to add synchronization, in my examples to the follow block of code:
private void setHandled(ListObject currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
as it is you can just turn the currentObject field into an AtomicBoolean as follows:
private void setHandled(ListObject currentObject) {
if (currentObject.getHandled.compareAndSet(false, true)) {
System.out.println("Object is handled by " + this.getID());
}
}
otherwise use the synchronized clause:
private void setHandled(ListObject currentObject) {
synchronized (currentObject) {
if (!currentObject.getHandled) {
currentObject.setHandled(true);
System.out.println("Object is handled by " + this.getID());
}
}
}

Does Collection.stream() have internal synchronization?

I have been trying to reproduce (and solve) a ConcurrentModificationException when an instance of HashMap is being read and written by multiple Threads.
Disclaimer: I know that HashMap is not thread-safe.
In the following code:
import java.util.*;
public class MyClass {
public static void main(String args[]) throws Exception {
java.util.Map<String, Integer> oops = new java.util.HashMap<>();
oops.put("1", 1);
oops.put("2", 2);
oops.put("3", 3);
Runnable read = () -> {
System.out.println("Entered read thread");
/*
* ConcurrentModificationException possibly occurs
*
for (int i = 0; i < 100; i++) {
List<Integer> numbers = new ArrayList<>();
numbers.addAll(oops.values());
System.out.println("Size " + numbers.size());
}
*/
for (int i = 0; i < 100; i++) {
List<Integer> numbers = new ArrayList<>();
numbers.addAll(oops.values()
.stream()
.collect(java.util.stream.Collectors.toList()));
System.out.println("Size " + numbers.size());
}
};
Runnable write = () -> {
System.out.println("Entered write thread");
for (int i = 0; i < 100; i++) {
System.out.println("Put " + i);
oops.put(Integer.toString(i), i);
}
};
Thread writeThread = new Thread(write, "write-thread");
Thread readThread = new Thread(read, "read-thread");
readThread.start();
writeThread.start();
readThread.join();
writeThread.join();
}
}
Basically, I make two threads: one keeps putting elements into a HashMap, the other is iterating on HashMap.values().
In the read thread, if I'm using numbers.addAll(oops.values()), the ConcurrentModificationException randomly occurs. Though the lines are printed randomly as expected.
But if I switch to numbers.addAll(oops.values().stream().., I don't get any error. However, I have observed a strange phenomenon. All the lines by the read thread are printed after the lines printed by the write thread.
My question is, does Collection.stream() have somehow internal synchronization?
UPDATE:
Using JDoodle https://www.jdoodle.com/a/IYy, it seems on JDK9 and JDK10, I will get ConcurrentModificationException as expected.
Thanks!
What you are seeing is absolutely by chance; bear in mind that internally System.out.println does a synchronzied; thus may be that somehow makes it look like the results appear in order.
I have not looked too deep into your code - because analyzing why HashMap, which is not thread safe, is miss behaving is most probably futile; as you know, it is documented to be non-thread safe.
About that ConcurrentModificationException, the documentation is specific that it will try at best odds to throw that; so it's either java-8 was weaker in this point, or this was again by accident.
I was able to get ConcurrentModificationException with streams on Java 8 but with some changes in code: increased number of iterations and number of added elements to map in a separate thread from 100 to 10000. And also added CyclicBarrier so that loops in reader and writer threads are started more or less at the same time. I've also checked source code of spliterator for Hashmap.values() and it throws ConcurrentModificationException if some modifications to map were made.
if (m.modCount != mc) //modCount is number of modifications mc is expected modifications count which is stored before trying to fetch next element
throw new ConcurrentModificationException();
I've looked at the source code of Java 8 quickly, it does throw ConcurrentModificationException.
HashMap's values()method returns a subclass of AbstractCollection, whose spliterator() method returns a ValueSpliterator, which throws ConcurrentModificationException.
For information Collection.stream() uses a spliterator to traverse or partition elements of a source.

Iterating over single List in parallel without duplicates in Java

I have an ArrayList filled with 'someObject'. I need to iterate over this list, with 4 different threads (using Futures & Callables). The threads will keep the top 5 valued objects it comes across. I first tried creating a parallel stream, but that didn't work out so well. Is there some obvious thing I'm not thinking of, so each thread can iterate over the objects, without possibly grabbing the same object twice?
You can use an AtomicInteger to iterate over the list:
class MyRunnable implements Runnable {
final List<SomeObject> list;
final AtomicInteger counter; // initialize to 0
public void run() {
while(true) {
int index = counter.getAndIncrement();
if(index < list.size()) {
do something with list.get(index);
} else {
return;
}
}
}
}
So long as each MyRunnable has the same AtomicInteger reference they won't duplicate indices
You don't need AtomicInteger or any other synchronization for that matter.
You should simply logically partition your list (whose size is known upfront) based on the number of processing threads (whose number is also known upfront) and let each of them operate on its own section of [from, to) of the list.
This avoid the need for any synchronization at all (even if it's just an optimized one such as AtomicInteger) which is what you should always strive for (as long as it's safe).
Pseudo code
class Worker<T> implements Runnable {
final List<T> toProcess;
protected Worker(List<T> list, int fromInc, int toExcl){
// note this does not allow passing an empty list or specifying an empty work section but you can relax that if you wish
// this also implicitly checks the list for null
Preconditions.checkArgument(fromInc >= 0 && fromInc < list.size());
Preconditions.checkArgument(toExcl > 0 && fromInc <= list.size());
// note: this does not create a copy, but only a view so it's very cheap
toProcess = list.subList(fromInc, toExcl);
}
#Override
public final void run() {
for(final T t : toProcess) {
process(t);
}
}
protected abstract process(T t);
}
As with the AtomicInteger solution (really any solution which does not involve copying the list), this solution also assumes that you will not be modifying the list once you have handed it off to each thread and processing has commenced. Modifying the list while processing is in progress will result in undefined behavior.

Multiple threads checking map size and conccurency

I have a method that's supposed to feed a map from a queue and it only does that if the map size is not exceeding a certain number. This prompted concurrency problem as the size I get from every thread is non coherent globaly. I replicated the problem by this code
import java.sql.Timestamp;
import java.util.Date;
import java.util.concurrent.ConcurrentHashMap;
public class ConcurrenthashMapTest {
private ConcurrentHashMap<Integer, Integer> map = new ConcurrentHashMap<Integer, Integer>();
private ThreadUx[] tArray = new ThreadUx[999];
public void parallelMapFilling() {
for ( int i = 0; i < 999; i++ ) {
tArray[i] = new ThreadUx( i );
}
for ( int i = 0; i < 999; i++ ) {
tArray[i].start();
}
}
public class ThreadUx extends Thread {
private int seq = 0;
public ThreadUx( int i ) {
seq = i;
}
#Override
public void run() {
while ( map.size() < 2 ) {
map.put( seq, seq );
System.out.println( Thread.currentThread().getName() + " || The size is: " + map.size() + " || " + new Timestamp( new Date().getTime() ) );
}
}
}
public static void main( String[] args ) {
new ConcurrenthashMapTest().parallelMapFilling();
}
}
Normally I should have only one line of output and the size not exceeding 1, but I do have some stuff like this
Thread-1 || The size is: 2 || 2016-06-07 18:32:55.157
Thread-0 || The size is: 2 || 2016-06-07 18:32:55.157
I tried marking the whole run method as synchronized but that didn't work, only when I did this
#Override
public void run() {
synchronized ( map ) {
if ( map.size() < 1 ) {
map.put( seq, seq );
System.out.println( Thread.currentThread().getName() + " || The size is: " + map.size() + " || " + new Timestamp( new Date().getTime() ) );
}
}
}
It worked, why is only the synch block working and the synch method? Also I don't want to use something as old as a synch block as I am working on a Java EE app, is there a Spring or Java EE task executor or annotation that can help?
From Java Concurrency in Practice:
The semantics of methods of ConcurrentHashMap that operate on the entire Map, such as size and isEmpty, have been slightly weakened to reflect the concurrent nature of the collection. Since the result of size could be out of date by the time it is computed, it is really only an estimate, so size is allowed to return an approximation instead of an exact count. While at first this may seem disturbing, in reality methods like size and isEmpty are far less useful in concurrent environments because these quantities are moving targets. So the requirements for these operations were weakened to enable performance optimizations for the most important operations, primarily get, put, containsKey, and remove.
The one feature offered by the synchronized Map implementations but not by ConcurrentHashMap is the ability to lock the map for exclusive access. With Hashtable and synchronizedMap, acquiring the Map lock prevents any other thread from accessing it. This might be necessary in unusual cases such as adding several mappings atomically, or iterating the Map several times and needing to see the same elements in the same order. On the whole, though, this is a reasonable tradeoff: concurrent collections should be expected to change their contents continuously.
Solutions:
Refactor design and do not use size method with concurrent access.
To use methods as size and isEmpty you can use synchronized collection Collections.synchronizedMap. Synchronized collections achieve their thread safety by serializing all access to the collection's state. The cost of this approach is poor concurrency; when multiple threads contend for the collection-wide lock, throughput suffers. Also you will need to synchronize the block where it checks-and-puts with map instance, because it's a compound action.
Third. Use third-party implementation or write your own.
public class BoundConcurrentHashMap <K,V> {
private final Map<K, V> m;
private final Semaphore semaphore;
public BoundConcurrentHashMap(int size) {
m = new ConcurrentHashMap<K, V>();
semaphore = new Semaphore(size);
}
public V get(V key) {
return m.get(key);
}
public boolean put(K key, V value) {
boolean hasSpace = semaphore.tryAcquire();
if(hasSpace) {
m.put(key, value);
}
return hasSpace;
}
public void remove(Object key) {
m.remove(key);
semaphore.release();
}
// approximation, do not trust this method
public int size(){
return m.size();
}
}
Class BoundConcurrentHashMap is as effective as ConcurrentHashMap and almost thread-safe. Because removing an element and releasing semaphore in remove method are not simultaneous as it should be. But in this case it is tolerable. size method still returns approximated value, but put method will not allow to exceed map size.
You are using ConcurrentHashMap, and according to the API doc:
Bear in mind that the results of aggregate status methods including
size, isEmpty, and containsValue are typically useful only when a map
is not undergoing concurrent updates in other threads. Otherwise the
results of these methods reflect transient states that may be adequate
for monitoring or estimation purposes, but not for program control.
Which means you cannot get accurate result unless you explicit synchronize the access to size().
Adding synchronized to the run method does not work because threads are not synchronizing on the same lock object -- each getting a lock on itself.
Synchronizing on the map itself definitely work, but IMHO it's not a good choice because then you lose the performance advantage ConcurrentHashMap can provide.
In conclusion you need to reconsider the design.

Iterating over synchronized collection

I asked here a question about iterating over a Vector, and I have been answered with some good solutions. But I read about another simpler way to do it. I would like to know if it is good solution.
synchronized(mapItems) {
Iterator<MapItem> iterator = mapItems.iterator();
while(iterator.hasNext())
iterator.next().draw(g);
}
mapItems is a synchronized collection: Vector. Is that make the iterating over the Vector safe from ConcurrentModificationException?
Yes, it will make it safe from ConcurrentModificationException at the expense of everything essentially being single-threaded.
Yes, I believe that this will prevent a ConcurrentModificationException. You are synchronizing on the Vector. All methods on Vector that modify it are also synchronized, which means that they would also lock on that same object. So no other thread could change the Vector while you're iterating over it.
Also, you are not modifying the Vector yourself while you're iterating over it.
Simply synchronizing the entire collection would not prevent a ConcurrentModificationException. This will still throw a CME
synchronized(mapItems) {
for(MapItem item : mapsItems){
mapItems.add(new MapItem());
}
}
You may want to consider using a ReadWriteLock.
For processes which iterate over the list without modifying its contents, get a read lock on the shared ReentrantReadWriteLock. This allows multiple threads to have read access to the lock.
For processes which will modify the list, acquire the write lock on the shared lock. This will prevent all other threads from accessing the list (even read-only) until you release the write lock.
Is that make the iterating over the Vector safe from
ConcurrentModificationException?
YES It makes the iterating over Vector safe from ConcurrentModificationException.If it is not synchronized then in that case , if you are accessing the Vector via various threads and some other Thread is structurally modifying the Vector at any time after the iterator is created , the iterator will throw ConcurrentModificationException.
Consider running this code:
import java.util.*;
class VVector
{
static Vector<Integer> mapItems = new Vector<Integer>();
static
{
for (int i = 0 ; i < 200 ; i++)
{
mapItems.add(i);
}
}
public static void readVector()
{
Iterator<Integer> iterator = mapItems.iterator();
try
{
while(iterator.hasNext())
{
System.out.print(iterator.next() + "\t");
}
}
catch (Exception ex){ex.printStackTrace();System.exit(0);}
}
public static void main(String[] args)
{
VVector v = new VVector();
Thread th = new Thread( new Runnable()
{
public void run()
{
int counter = 0;
while ( true )
{
mapItems.add(345);
counter++;
if (counter == 100)
{
break;
}
}
}
});
th.start();
v.readVector();
}
}
At my system it is showing following output while execution:
0 1 2 3 4 5 6 7 8 9
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(Unknown Source)
at java.util.AbstractList$Itr.next(Unknown Source)
at VVector.readVector(VVector.java:19)
at VVector.main(VVector.java:38)
But on the other hand if you make the block of code containing Iterator to access that Vector synchronized using mapItems as lock , it will prevent the execution of other methods related to Vector until that synchronized block is completed atomically .
if we invoke add method inside while loop then throws exception.
synchronized(mapItems) {
Iterator<MapItem> iterator = mapItems.iterator();
while(iterator.hasNext())
iterator.next();
mapItems.add("Something"); // throws ConcurrentModificationException
}

Categories

Resources