Does Collection.stream() have internal synchronization? - java

I have been trying to reproduce (and solve) a ConcurrentModificationException when an instance of HashMap is being read and written by multiple Threads.
Disclaimer: I know that HashMap is not thread-safe.
In the following code:
import java.util.*;
public class MyClass {
public static void main(String args[]) throws Exception {
java.util.Map<String, Integer> oops = new java.util.HashMap<>();
oops.put("1", 1);
oops.put("2", 2);
oops.put("3", 3);
Runnable read = () -> {
System.out.println("Entered read thread");
/*
* ConcurrentModificationException possibly occurs
*
for (int i = 0; i < 100; i++) {
List<Integer> numbers = new ArrayList<>();
numbers.addAll(oops.values());
System.out.println("Size " + numbers.size());
}
*/
for (int i = 0; i < 100; i++) {
List<Integer> numbers = new ArrayList<>();
numbers.addAll(oops.values()
.stream()
.collect(java.util.stream.Collectors.toList()));
System.out.println("Size " + numbers.size());
}
};
Runnable write = () -> {
System.out.println("Entered write thread");
for (int i = 0; i < 100; i++) {
System.out.println("Put " + i);
oops.put(Integer.toString(i), i);
}
};
Thread writeThread = new Thread(write, "write-thread");
Thread readThread = new Thread(read, "read-thread");
readThread.start();
writeThread.start();
readThread.join();
writeThread.join();
}
}
Basically, I make two threads: one keeps putting elements into a HashMap, the other is iterating on HashMap.values().
In the read thread, if I'm using numbers.addAll(oops.values()), the ConcurrentModificationException randomly occurs. Though the lines are printed randomly as expected.
But if I switch to numbers.addAll(oops.values().stream().., I don't get any error. However, I have observed a strange phenomenon. All the lines by the read thread are printed after the lines printed by the write thread.
My question is, does Collection.stream() have somehow internal synchronization?
UPDATE:
Using JDoodle https://www.jdoodle.com/a/IYy, it seems on JDK9 and JDK10, I will get ConcurrentModificationException as expected.
Thanks!

What you are seeing is absolutely by chance; bear in mind that internally System.out.println does a synchronzied; thus may be that somehow makes it look like the results appear in order.
I have not looked too deep into your code - because analyzing why HashMap, which is not thread safe, is miss behaving is most probably futile; as you know, it is documented to be non-thread safe.
About that ConcurrentModificationException, the documentation is specific that it will try at best odds to throw that; so it's either java-8 was weaker in this point, or this was again by accident.

I was able to get ConcurrentModificationException with streams on Java 8 but with some changes in code: increased number of iterations and number of added elements to map in a separate thread from 100 to 10000. And also added CyclicBarrier so that loops in reader and writer threads are started more or less at the same time. I've also checked source code of spliterator for Hashmap.values() and it throws ConcurrentModificationException if some modifications to map were made.
if (m.modCount != mc) //modCount is number of modifications mc is expected modifications count which is stored before trying to fetch next element
throw new ConcurrentModificationException();

I've looked at the source code of Java 8 quickly, it does throw ConcurrentModificationException.
HashMap's values()method returns a subclass of AbstractCollection, whose spliterator() method returns a ValueSpliterator, which throws ConcurrentModificationException.
For information Collection.stream() uses a spliterator to traverse or partition elements of a source.

Related

Java ArrayList thread unsafe example explanation

class ThreadUnsafe {
static final int THREAD_NUMBER = 2;
static final int LOOP_NUMBER = 200;
public static void main(String[] args) {
ThreadUnsafe test = new ThreadUnsafe();
for (int i = 0; i < THREAD_NUMBER; i++) {
new Thread(() -> {
test.method1(LOOP_NUMBER);
}, "Thread" + i).start();
}
}
ArrayList<String> list = new ArrayList<>();
public void method1(int loopNumber) {
for (int i = 0; i < loopNumber; i++) {
method2();
method3();
}
}
private void method2() {
list.add("1");
}
private void method3() {
list.remove(0);
}
}
The code above throws
java.lang.IndexOutOfBoundsException: Index: 0, Size: 1
I know ArrayList is not thread-safe, but in the example, I think every remove() call is guaranteed to be preceded by at least one add() call, so the code should be OK even the order is messed up like the following:
thread0: method2()
thread1: method2()
thread1: method3()
thread0: method3()
Some explanations needed here, please.
If always one add() or remove() call is completely finished before another one is started, your reasoning is correct. But ArrayList doesn't guarantee that as its methods aren't synchronized. So, it can happen that two threads are in the middle of some modifying calls at the same time.
Let's look at the internals of e.g. the add() method to understand one possible failure mode.
When adding an element, ArrayList increases the size using size++. And this is not atomic.
Now imagine the list being empty, and two threads A and B adding an element at exactly the same moment, doing the size++ in parallel (maybe in different CPU cores). Let's imagine things happen in the following order:
A reads size as 0.
B reads size as 0.
A adds one to its value, giving 1.
B adds one to its value, giving 1.
A writes its new value back into the size field, resulting in size=1.
B writes its new value back into the size field, resulting in size=1.
Although we had 2 add() calls, the size is only 1. If now you try to remove 2 elements (and this time it happens sequentially), the second remove() will fail.
To achieve thread safety, no other thread should be able to mess around with the internals like size (or the elements array) while one access is currently in progress.
Multi-threading is inherently complex in that the calls from multiple threads can not only happen in any (expected or unexpected) order, but that they can also overlap, unless protected by some mechanism like synchronized. On the other hand, excessive use of the synchronization can easily lead to poor multi-thread performance, and also to dead-locks.
As a supplement to #RalfKleberhoff's answer,
I think every remove() call is guaranteed to be preceded by at least one add() call,
Yes.
so the code should be OK even the order is messed up
No, that is not a valid inference with respect to a multithreaded program.
Your program contains data races as a result of two threads both accessing the same shared, non-atomic object, with some of those accesses being writes, without appropriate synchronization. The whole behavior of a program that contains data races is undefined, so in fact you cannot draw any conclusions at all about its behavior.
Do not try to cheat or scrimp on synchronization. Do minimize the amount of it that you need by limiting your use of shared objects, but where you need it, you need it, and the rules for determining when and where you need it are not that hard to learn.
ArrayList in java docs says,
Note that this implementation is not synchronized. If multiple threads
access an ArrayList instance concurrently, and at least one of the
threads modifies the list structurally, it must be synchronized
externally.
Why this code is not thread safe ?
Multiple thread running on Machine runs independent of each other.
public void method1(int loopNumber) {
for (int i = 0; i < loopNumber; i++) {
method2();
method3();
}
}
Here method2() and method3() are being process sequential within
the thread but not across the thread. ArrayList list is common between both thread. which will be in inconstant state between both thread on multi core system.
Interesting test would be add empty check in method3() and set LOOP_NUMBER = 10000;
private void method3()
{
if (!list.isEmpty())
list.remove(0);
}
In result you should get same Runtime Exception some thing like java.lang.IndexOutOfBoundsException: Index: 0, Size: 1 or java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 because of same reason inconstant state of variable in list i.e. size.
To fix this issue you could have added synchronized like below or use Syncronized list
public void method1(int loopNumber)
{
for (int i = 0; i < loopNumber; i++)
{
synchronized (list)
{
method2();
method3();
}
}
}

java.util.ConcurrentModificationException: Unexpected List modification while multithreading?

I'm using multithreading to process a List of Strings in batches, however I'm getting this error when the Runnable task is iterating over the List to process each String.
For example the code roughly follows this structure:
public class RunnableTask implements Runnable {
private List<String> batch;
RunnableTask(List<String> batch){
this.batch = batch;
}
#Override
public void run() {
for(String record : batch){
entry = record.split(",");
m = regex.matcher(entry[7]);
if (m.find() && m.group(3) != null){
currentKey = m.group(3).trim();
currentValue = Integer.parseInt(entry[4]);
if ( resultMap.get(currentKey) == null ){
resultMap.put(currentKey, currentValue);
} else {
resultMap.put(currentKey, resultMap.get(currentKey) + currentValue);
}
}
}
}
}
Where the thread that is passing these batches for processing never modifies "batch" and NO CHANGES to batch are made inside the for loop. I understand that this exception ConcurrentModificationException is due to modifying the List during iteration but as far as I can tell that isn't happening. Is there something I'm missing?
Any help is appreciated,
Thankyou!
UPDATE1: It seems instance-variables aren't thread safe. I attempted to use CopyOnWriteArrayList in place of the ArrayList but I received inconsistent results - suggesting that the full iteration doesn't complete before the list is modified in some way and not every element is being processed.
UPDATE2: Locking on the loop with sychronized and/or a reentrantlock both still give the same exception.
I need a way to pass Lists to Runnable tasks and iterate over those lists without new threads causing concurrency issues with that list.
I understand that this exception ConcurrentModificationException is due to modifying the List during iteration but as far as I can tell that isn't happening
Ok, consider what happens when you create a new thread, passing a reference to RunnableTask instance, initialized with a different list as constructor parameter? You just changed the list reference to point to different list. And consider what happens when at the same time, a different thread inside the run() method, is changing the list, at any point. This will at some point of time, throw ConcurrentModificationException.
Instance Variables are not Thread-Safe.
Try this in your code:
public void run() {
for(String record : new ArrayList(batch)){
//do processing with record
}
}
There is a sort of problem with all your threads processing the list (is the list modified during the process?) but is difficult to tell with the code you're providing
Problem is due to multiple thread concurrently modifying the the source List structure. What I would suggest you should devide the source list to new sublist(according to size) and pass that list to threads.
Say your source List have 100 elements. and you are running 5 concurrent thread.
int index = 0;
List<TObject> tempList = new ArrayList<>();
for(TObject obj:srcList){
if(i==(srcList.size()/numberOfthread)){
RunnableTask task = new RunnableTask(tempList);
tempList = new ArrayList<>();
}else
tempList.add(obj);
}
In this case your original list would not be modified.
you need to lock the list before accessing its elements. because List is not thread safe. Try this
public void run() {
synchronizd(batch){
for(String record : batch){//do processing with record}
}
}
yes you are getting ConcurrentModificationException because your List is getting modified during iteration. If performance is not a critical issue I suggest use synchronization.
public class RunnableTask implements Runnable {
private List<String> batch = new ArrayList<String>();
RunnableTask(List<String> batch){
this.batch = batch;
}
public void run() {
synchronized (batch) {
for(String record : batch){//do processing with record}
}
}
}
}
or even better use ReentrantLock.
Your followups indicate that you are trying to reuse the same List multiple times. Your caller must create a new List for each Runnable.
Obviously someone else is changing the content of the list, which is out of picture of the code you mentioned. (If you are sure that the ConcurrentModificationException is complaining for the batch list, but not resultMap, and you are actually showing all code in RunnableTask)
Try to search in your code, for places that is updating the content of the list, check if it is possible concurrently with your RunnableTask.
Simply synchronizing in the RunnableTask is not going to help, you need to synchronize all access to the list, which is obviously happening somewhere else.
If performance is an issue to you so that you cannot synchronize on the batch list (which prohibit multiple RunnableTask to execute concurrently), consider making use of ReaderWriterLock: RunnableTask acquires read lock, while the list update logic acquire the write lock.

Iterating over synchronized collection

I asked here a question about iterating over a Vector, and I have been answered with some good solutions. But I read about another simpler way to do it. I would like to know if it is good solution.
synchronized(mapItems) {
Iterator<MapItem> iterator = mapItems.iterator();
while(iterator.hasNext())
iterator.next().draw(g);
}
mapItems is a synchronized collection: Vector. Is that make the iterating over the Vector safe from ConcurrentModificationException?
Yes, it will make it safe from ConcurrentModificationException at the expense of everything essentially being single-threaded.
Yes, I believe that this will prevent a ConcurrentModificationException. You are synchronizing on the Vector. All methods on Vector that modify it are also synchronized, which means that they would also lock on that same object. So no other thread could change the Vector while you're iterating over it.
Also, you are not modifying the Vector yourself while you're iterating over it.
Simply synchronizing the entire collection would not prevent a ConcurrentModificationException. This will still throw a CME
synchronized(mapItems) {
for(MapItem item : mapsItems){
mapItems.add(new MapItem());
}
}
You may want to consider using a ReadWriteLock.
For processes which iterate over the list without modifying its contents, get a read lock on the shared ReentrantReadWriteLock. This allows multiple threads to have read access to the lock.
For processes which will modify the list, acquire the write lock on the shared lock. This will prevent all other threads from accessing the list (even read-only) until you release the write lock.
Is that make the iterating over the Vector safe from
ConcurrentModificationException?
YES It makes the iterating over Vector safe from ConcurrentModificationException.If it is not synchronized then in that case , if you are accessing the Vector via various threads and some other Thread is structurally modifying the Vector at any time after the iterator is created , the iterator will throw ConcurrentModificationException.
Consider running this code:
import java.util.*;
class VVector
{
static Vector<Integer> mapItems = new Vector<Integer>();
static
{
for (int i = 0 ; i < 200 ; i++)
{
mapItems.add(i);
}
}
public static void readVector()
{
Iterator<Integer> iterator = mapItems.iterator();
try
{
while(iterator.hasNext())
{
System.out.print(iterator.next() + "\t");
}
}
catch (Exception ex){ex.printStackTrace();System.exit(0);}
}
public static void main(String[] args)
{
VVector v = new VVector();
Thread th = new Thread( new Runnable()
{
public void run()
{
int counter = 0;
while ( true )
{
mapItems.add(345);
counter++;
if (counter == 100)
{
break;
}
}
}
});
th.start();
v.readVector();
}
}
At my system it is showing following output while execution:
0 1 2 3 4 5 6 7 8 9
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(Unknown Source)
at java.util.AbstractList$Itr.next(Unknown Source)
at VVector.readVector(VVector.java:19)
at VVector.main(VVector.java:38)
But on the other hand if you make the block of code containing Iterator to access that Vector synchronized using mapItems as lock , it will prevent the execution of other methods related to Vector until that synchronized block is completed atomically .
if we invoke add method inside while loop then throws exception.
synchronized(mapItems) {
Iterator<MapItem> iterator = mapItems.iterator();
while(iterator.hasNext())
iterator.next();
mapItems.add("Something"); // throws ConcurrentModificationException
}

Yet another ConcurrentModificationException question

I am currently trying to learn how to properly handle multi-threaded access to Collections, so I wrote the following Java application.
As you can see, I create a synchronized ArrayList which I try to access once from within a Thread and once without.
I iterate over the ArrayList using a for loop. In order to prevent multiple access on the List at the same time, I wrapped the loop into a synchronized block.
public class ThreadTest {
Collection<Integer> data = Collections.synchronizedList(new ArrayList<Integer>());
final int MAX = 999;
/**
* Default constructor
*/
public ThreadTest() {
initData();
startThread();
startCollectionWork();
}
private int getRandom() {
Random randomGenerator = new Random();
return randomGenerator.nextInt(100);
}
private void initData() {
for (int i = 0; i < MAX; i++) {
data.add(getRandom());
}
}
private void startCollectionWork() {
System.out.println("\nStarting to work on data outside of thread");
synchronized (data) {
System.out.println("\nEntered synchronized block outside of thread");
for (int value : data) { // ConcurrentModificationException here!
if (value % 5 == 1) {
System.out.println(value);
data.remove(value);
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
}
System.out.println("Done working on data outside of thread");
}
private void startThread() {
Thread thread = new Thread() {
#Override
public void run() {
System.out.println("\nStarting to work on data in a new thread");
synchronized (data) {
System.out.println("\nEntered synchronized block in thread");
for (int value : data) { // ConcurrentModificationException
if (value % 5 == 1) {
System.out.println(value);
data.remove(value);
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
}
System.out.println("Done working on data in a new thread");
}
};
thread.start();
}
}
But everytime one of the for loops gets entered, I get a ConcurrentModificationException. This is my console output (which changes with every new run):
Starting to work on data outside of thread
Entered synchronized block outside of thread
51
Starting to work on data in a new thread
Entered synchronized block in thread
value % 5 = 2
value % 5 = 2
value % 5 = 4
value % 5 = 3
value % 5 = 2
value % 5 = 2
value % 5 = 0
21
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at ThreadTest.startCollectionWork(ThreadTest.java:50)
at ThreadTest.<init>(ThreadTest.java:32)
at MultiThreadingTest.main(MultiThreadingTest.java:18)
Exception in thread "Thread-1" java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at ThreadTest$1.run(ThreadTest.java:70)
What's wrong?
Ps: Please don't just post links to multi-threaded how-to's since I've already read enough about that. I am just curious why my application doesn't run as I want it to.
Update: I replaced the for(x : y) syntax with an explicit Iterator and a while loop. The problem remains though..
synchronized(data){
Iterator<Integer> i = data.iterator();
while (i.hasNext()) {
int value = i.next(); // ConcurrentModificationException here!
if (value % 5 == 1) {
System.out.println(value);
i.remove();
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
}
Once you iterate over a collection, you have a contract between the iterating block of code and the collection as it exists at that moment in time. The contract basically states that you'll get each item in the collection once, in the order of the iteration.
The problem is that if you modify the Collection while something is iterating, you cannot maintain that contract. Deletions in a collection will remove the element from the collection, and that element might be required to be present for the initial iteration to satisfy the contract. Insertions in a Collection will likewise present issues if the element might be detected by the iteration that started prior to the element existing in the collection.
While it is easier to break this contract with multiple threads, you can break the contract with a single thread (if you choose to do so).
How this is typically implemented is the collection contains a "revision number", and prior to the iterator grabbing the "next" element in the collection, it checks to see if the collection's revision number is still the same as it was when the iterator started. This is just one way of implementing it, there are others.
So, if you want to iterate over something that you might want to change, an appropriate technique is to make a copy of the collection and iterate over that copy. That way you can modify the original collection and yet not alter the count, position, and presence of the items you were planning to process. Yes, there are other techniques, but conceptually they all fall into the "protect the copy you're iterating across while changing something else that the iterator doesn't access".
The ConcurrentModificationException appears, because you're modifing the list while iterating it... it has nothing to do with multiple threads in this case...
for (int value : data) { // ConcurrentModificationException here!
if (value % 5 == 1) {
System.out.println(value);
data.remove(value); // you cannot do this
data.add(value + 1); // or that
} else {
System.out.println("value % 5 = " + value % 5);
}
The enhanced for loop
for (int value : data) {
uses a Java Iterator under the covers. Iterators are fail-fast, so if the underlying Collection gets modified (i.e. by removing an element) while the Iterator is active you get the Exception. Your code here causes such a change to the underlying Collection:
data.remove(value);
data.add(value + 1);
Change your code to use java.util.Iterator explicitly and use its remove method. If you need to add elements to the Collection while iterating you may want to look at a suitable data structure from java.util.concurrent package e.g. BlockingQueue, where you can call its take method which will block until there is data present; but new Objects can be added via the offer method (very simplistic overview - Google for more)
The problem is not because of multithreading, you made if safe. But only because you modified collection while iterating it.
You are only allowed to remove items from the list through an iterator if you are iterating over the collection, so you can get ConcurrentModificationException with only one thread.
Updated reply after updated question:
You aren't allowed to add elements to the list while you are iterating.
As has been pointed out, you are modifying the Collection while iterating over it, which causes your problem.
Change data to a List and then use a regular for loop to step over it. Then you will no longer have an Iterator to deal with, thus eliminating your problem.
List<Integer> data =...
for (int i=0; i<data.size(); i++) {
int value = data.get(i);
if (value % 5 == 1) {
System.out.println(value);
i.remove();
data.add(value + 1);
} else {
System.out.println("value % 5 = " + value % 5);
}
}
ConcurrentModificationException is not eliminated by using a synchronized block around the collection on which iteration is done. The exception occurs on the following sequence of steps :
Obtain an iterator from a collecion ( by calling its iterator method or by the for loop construct ).
Begin iterating ( by calling next() or in the for loop )
Have the collection modified ( by any means other than the iterator's methods ) ( either in the same thread or a different one : this is what is happening in your code ). Note that this modification may happen in a thread-safe manner ie sequentially or one after another - it does not matter- it will still lead to CME )
Continue iteration using the same iterator obtained earlier ( before modification in step 3 )
In order to avoid getting the exception , you must make sure that you do not modify the collection after you start your for loop in either of the threads, till the loop is finished.

I am confused -- Will this code always work?

I have written this piece of code
public class Test{
public static void main(String[] args) {
List<Integer> list = new ArrayList<Integer>();
for(int i = 1;i<= 4;i++){
new Thread(new TestTask(i, list)).start();
}
while(list.size() != 4){
// this while loop required so that all threads complete their work
}
System.out.println("List "+list);
}
}
class TestTask implements Runnable{
private int sequence;
private List<Integer> list;
public TestTask(int sequence, List<Integer> list) {
this.sequence = sequence;
this.list = list;
}
#Override
public void run() {
list.add(sequence);
}
}
This code works and prints all the four elements of list on my machine.
My question is that will this code always work. I think there might be a issue in this code when two/or more threads add element to this list at the same point. In that case it while loop will never end and code will fail.
Can anybody suggest a better way to do this? I am not very good at multithreading and don't know which concurrent collection i can use?
Thanks, Shekhar
Use this in order to get a real thread-safe list:
List<Integer> list = Collections.synchronizedList(new ArrayList<Integer>());
Depending on your usage, also a CopyOnWriteArrayList could be interesting for you. Precisly, when traversal operations vastly outnumber mutations on that list.
Afair, Lists are not thread-safe in Java, so you might get anything from working to crashing. Use synchronized access to the list in order to get a well-defined behaviour:
#Override
public void run() {
synchronized(list) {
list.add(sequence);
}
}
This way, access to the list is only possible for a single thread at a time.
Also you I'd use Thread.join() to wait for the threads to finish (you have to keep them in a separate list for doing that ...)
I think you need to do two things. First of all you need to join the threads. Because atm the other loop will sometimes run even if the threads are not completed.
You have to do it like this:
Threads threads[4] = new Thread[4];
for(int i = 1;i<= 4;i++){
threads[i] = new Thread(new TestTask(i, list));
threads[i].start();
}
// to wait that all threads finish..
for(int i = 1;i<= 4;i++){
threads[i].join();
}
while(list.size() != 4){
// this while loop required so that all threads complete their work
}
and you can make your ArrayList thread safe with packing it into:
List<Integer> list = Collections.synchronizedList(new ArrayList<Integer>());
No, there are no guarantees. The simplest solution would be to join with each thread.
Read up on wait() and notify() instead of having a busy-waiting while-loop.

Categories

Resources