Concurrent threads adding to ArrayList at same time - what happens? - java

We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?

There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.

Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.

You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.

I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.

you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.

The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.

You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's

In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));

java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.

http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.

Related

Lock List when in use

How do I lock a data structure (such as List) when someone is iterating over it?
For example, let's say I have this class with a list in it:
class A{
private List<Integer> list = new ArrayList<>();
public MyList() {
// initialize this.list
}
public List<Integer> getList() {
return list;
}
}
And I run this code:
public static void main(String[] args) {
A a = new A();
Thread t1 = new Thread(()->{
a.getList().forEach(System.out::println);
});
Thread t2 = new Thread(()->{
a.getList().removeIf(e->e==1);
});
t1.start();
t2.start();
}
I don't have a single block of code that uses the list, so I can't use synchronized().
I was thinking of locking the getList() method after it has been called but how can I know if the caller has finished using it so I could unlock it?
And I don't want to use CopyOnWriteArrayList because of I care about my performance;
after it has been called but how can I know if the caller has finished using it so I could unlock it?
That's impossible. The iterator API fundamentally doesn't require that you explicitly 'close' them, so, this is simply not something you can make happen. You have a problem here:
Iterating over the same list from multiple threads is an issue if anybody modifies that list in between. Actually, threads are immaterial; if you modify a list then interact with an iterator created before the modification, you get ConcurrentModificationException guaranteed. Involve threads, and you merely usually get a CoModEx; you may get bizarre behaviour if you haven't set up your locking properly.
Your chosen solution is "I shall lock the list.. but how do I do that? Better ask SO". But that's not the correct solution.
You have a few options:
Use a lock
It's not specifically the iteration that you need to lock, it's "whatever interacts with this list". Make an actual lock object, and define that any interaction of any kind with this list must occur in the confines of this lock.
Thread t1 = new Thread(() -> {
a.acquireLock();
try {
a.getList().forEach(System.out::println);
} finally {
a.releaseLock();
}
});
t1.start();
Where acquireLock and releaseLock are methods you write that use a ReadWriteLock to do their thing.
Use CopyOnWriteArrayList
COWList is an implementation of java.util.List with the property that it copies the backing store anytime you change anything about it. This has the benefit that any iterator you made is guaranteed to never throw ConcurrentModificationException: When you start iterating over it, you will end up iterating each value that was there as the list was when you began the iteration. Even if your code, or any other thread, starts modifying that list halfway through. The downside is, of course, that it is making lots of copies if you make lots of modifications, so this is not a good idea if the list is large and you're modifying it a lot.
Get rid of the getList() method, move the tasks into the object itself.
I don't know what a is (the object you call .getList() on, but apparently one of the functions that whatever this is should expose is some job that you really can't do with a getList() call: It's not just that you want the contents, you want to get the contents in a stable fashion (perhaps the method should instead have a method that gives you a copy of the list), or perhaps you want to do a thing to each element inside it (e.g. instead of getting the list and calling .forEach(System.out::println) on it, instead pass System.out::println to a and let it do the work. You can then focus your locks or other solutions to avoid clashes in that code, and not in callers of a.
Make a copy yourself
This doesn't actually work, even though it seems like it: Immediately clone the list after you receive it. This doesn't work, because cloning the list is itself an operation that iterates, just like .forEach(System.out::println) does, so if another thread interacts with the list while you are making your clone, it fails. Use one of the above 3 solutions instead.

Why this code doesn't throw ConcurrentModificationException when multiple threads work on same arraylist at the same time using iterator

Here 2 threads work on same arraylist and one thread read the elements and another thread remove a specific element . I expect this to throw ConcurrentModificationException . But it is not throwing why?
import java.util.ArrayList;
import java.util.ConcurrentModificationException;
import java.util.Iterator;
public class IteratorStudies {
public static final ArrayList<String> arr ;
static{
arr = new ArrayList<>();
for(int i=0;i<100;i++) {
arr.add("someCommonValue");
}
arr.add("someSpecialValue");
}
private static Integer initialValue = 4;
public static void main(String x[]) {
Thread t1 = new Thread(){
#Override
public void start(){
Iterator<String> arrIter = arr.iterator();
while(arrIter.hasNext()){
try {
String str = arrIter.next();
System.out.println("value :" + str);
}catch(ConcurrentModificationException e){
e.printStackTrace();
}
}
System.out.println("t1 complete:"+arr);
}
};
Thread t2 = new Thread(){
#Override
public void start(){
Iterator<String> arrIter = arr.iterator();
while(arrIter.hasNext()){
String str = arrIter.next();
if(str.equals("someSpecialValue")){
arrIter.remove();
}
}
System.out.println("t2 complete:"+arr);
}
};
t2.start();
t1.start();
}
}
You've made 2 somewhat common mistakes.
ConcurrentModificationException is not about concurrency
You'd think, given the name, that CoModEx is about concurrency. It's not. As in, you don't need threads to get it. Here, this trivial code will throw it:
void example() {
var list = new ArrayList<String>();
list.add("a");
list.add("b");
for (String elem : list) {
if (elem.equals("a")) list.remove(elem);
}
}
That's becauseCoModEx is thrown by iterators and simply means this happened:
Somebody made an iterator.
Somebody changed the list somehow (and not via the iterator's .remove() method)
Somebody runs any relevant method on the iterator made in #1
So, in the above, the foreach loop implicitly makes an iterator (#1), then the list.remove method is invoked (#2), then by hitting the foreach loop again, we call a relevant method on that iterator (.hasNext()), and, voila, CoModEx occurs.
In fact, multithreaded is less likely: After all, you should assume that if you interact with some object from multiple threads, that it is broken, in that behaviour is unspecified, thus, you have a bug, and worse, a hard to test for one. If you modify a plain jane arraylist from another thread whilst iterating over it, you are not guaranteed a CoModEx. You may get it. You may not. The computer may walk off the desk and try its luck on broadway. "Unspecified behaviour" is a nice way of saying: "Don't, seriously. It'll hurt the whole time because you cannot test it; this will work fine the entire time you are developing it, and juuust as you're giving that important demo to big wig client, it'll fail on you, in embarassing ways".
The way to interact with one object from multiple threads is very carefully: Check the docs of the specific object explicitly states what happens (i.e. use stuff from the java.util.concurrent package which is specifically designed with 'interact with it from more than one thread' use cases in mind), and failing that, use locking. These are tricky things, so the usual way to do multi-threading in java is to not have shared state in the first place. Isolate as much as you can, invert control, and use messaging strategies that have built-in transactional intrinsics, such as message queues (rabbitmq and friends) and databases (which have transactions).
How to use Threads
You override the run() method, and then start the thread by calling the start method. Or better yet, don't override run, pass a Runnable instance along as you create the thread instance.
That's how you use thread. You didn't - you overrode start, which means starting these threads doesn't make a new thread at all, it just runs the payload in your thread. That explains your specific case, but what you're trying to do (witness CoModEx by messing with a list from another thread) doesn't get you a CoModEx either - it gets you unspecified behaviour, which means anything goes.
You have overridden start methods for both thread instances instead of run, and these methods complete within the main execution thread, thus, no simultaneous thread execution takes place and no ConcurrentModificationThreadException is able to occur here.

Should I make a list thread safe even if I only add to it? [duplicate]

We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?
There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.
Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.
You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.
I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.
you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.
The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.
You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's
In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.

AtomicReference<ArrayList> difficulties and alternatives

So in various programs that I have been writing for fun, I have come across concurrent modification exceptions.
In my naive attempt to solve this problem I used an Atomicinstead of some sort of concurrent collection. I am somewhat familiar with why this is casing errors. Essentially the individual elements of the ArrayList are not synchronized and can be modified at whim by different threads.
Could someone summarize for me
What errors occur when i try to make an atomic reference to a collection
What is a legitimate use case for an atomic reference to an Array or list
What are some better alternatives for storing instances for a game which can be used by multiple threads
Using AtomicReference to store an object such as a collection is not enough to make it thread safe. Indeed if the object that you put in the AtomicReference is not thread safe like an ArrayList for example, using it would still be unsafe if we have multiple threads trying to modify its state concurrently. So the good approach is still to put into your AtomicReference an immutable object such that its state cannot be modified by multiple threads anymore. In case of a collection you can for example use the methods of type Collections.unmodifiable* such as Collections.unmodifiableList(List) for the lists, in order to put into the AtomicReference the immutable version of your collection.
If you need thread safe collections, you should have a look to the classes in the package java.util.concurrent, you will find Collections natively thread safe. For example, if you mostly read and rarely modify your List, you can use the thread safe and efficient list CopyOnWriteArrayList.
I think you are confusing what a ConcurrentModification means...
The most common occurrence for this is when you iterate over a collection and modify it in the loop.
For instance if you do the following
public static void main(String[] args) {
List<String> l = new LinkedList<>();
for(int i=0; i < 100; i++) {
l.add("banana"+i);
}
for (String s : l) {
if("banana10".equals(s)) {
l.remove(s);
}
}
}
...this will give you a ConcurrentModificationException. Note, I have not spawned any threads.
The correct way to do the same is as follows:
public static void main(String[] args) {
List<String> l = new LinkedList<>();
for(int i=0; i < 100; i++) {
l.add("banana"+i);
}
for (Iterator<String> iterator = l.iterator(); iterator.hasNext();) {
String s = iterator.next();
if("banana10".equals(s)) {
iterator.remove();
}
}
}
Note the use of an iterator to modify the collection whilst you are looping over it.
So, I don't think you have a concurrency issue!
If you want to make your collection Thread safe, you need to look at the semantics of the thread safety. If you want to allow multiple threads to access the same collection, a ConcurrentList would be a good approach. If you want a list reference which is atomically set, as a whole, you can use an Atomic reference.

Java - concurrent clear of the list

I am trying to find a good way to achieve the following API:
void add(Object o);
void processAndClear();
The class would store the objects and upon calling processAndClear would iterate through the currently stored ones, process them somehow, and then clear the store. This class should be thread safe.
the obvious approach is to use locking, but I wanted to be more "concurrent". This is the approach which I would use:
class Store{
private AtomicReference<CopyOnWriteArrayList<Object>> store = new AtomicReference<>(new CopyOnWriteArrayList <>());
void add(Object o){
store.get().add(o);
}
void processAndClear(){
CopyOnWriteArrayList<Object> objects = store.get();
store.compareAndSet(objects, new CopyOnWriteArrayList<>());
for (Object object : objects) {
//do sth
}
}
}
This would allow threads that try to add objects to proceed almost immediately without any locking/waiting for the xlearing to complete. Is this the more or less correct approach?
Your above code is not thread-safe. Imagine the following:
Thread A is put on hold at add() right after store.get()
Thread B is in processAndClear(), replaces the list, processes all elements of the old one, then returns.
Thread A resumes and adds a new item to the now obsolete list that will never be processed.
The probably easiest solution here would be to use a LinkedBlockingQueue, which would as well simplify the task a lot:
class Store{
final LinkedBlockingQueue<Object> queue = new LinkedBlockingQueue<>();
void add(final Object o){
queue.put(o); // blocks until there is free space in the optionally bounded queue
}
void processAndClear(){
Object element;
while ((element = queue.poll()) != null) { // does not block on empty list but returns null instead
doSomething(element);
}
}
}
Edit: How to do this with synchronized:
class Store{
final LinkedList<Object> queue = new LinkedList<>(); // has to be final for synchronized to work
void add(final Object o){
synchronized(queue) { // on the queue as this is the shared object in question
queue.add(o);
}
}
void processAndClear() {
final LinkedList<Object> elements = new LinkedList<>(); // temporary local list
synchronized(queue) { // here as well, as every access needs to be properly synchronized
elements.addAll(queue);
queue.clear();
}
for (Object e : elements) {
doSomething(e); // this is thread-safe as only this thread can access these now local elements
}
}
}
Why this is not a good idea
Although this is thread-safe, it is much slower if compared to the concurrent version. Assume that you have a system with 100 threads that frequently call add, while one thread calls processAndClear. Then the following performance bottle-necks will occur:
If one thread calls add the other 99 are put on hold in the meantime.
During the first part of processAndClear all 100 threads are put on hold.
If you assume that those 100 adding threads have nothing else to do, you can easily show, that the application runs at the same speed as a single-threaded application minus the cost for synchronization. That means: adding will effectively be slower with 100 threads than with 1. This is not the case if you use a concurrent list as in the first example.
There will however be a minor performance gain with the processing thread, as doSomething can be run on the old elements while new ones are added. But again the concurrent example could be faster, as you could have multiple threads do the processing simultaneously.
Effectively synchronized can be used as well, but you will automatically introduce performance bottle-necks, potentially causing the application to run slower as single-threaded, forcing you to do complicated performance tests. In addition extending the functionality always contains a risk of introducing threading issues, as locking needs to be done manually.A concurrent list in contrast solves all these problems without additional code and the code can easily changed or extended later on.
The class would store the objects and upon calling processAndClear would iterate through the currently stored ones, process them somehow, and then clear the store.
This seems like you should use a BlockingQueue for this task. Your add(...) method would add to the queue and your consumer would call take() which blocks waiting for the next item. The BlockingQueue (ArrayBlockingQueue is a typical implementation) takes care of all of the synchronization and signaling for you.
This means that you don't have to have a CopyOnWriteArrayList nor an AtomicReference. What you would lose is a collection and you can iterate through for other reasons than your post articulates currently.

Categories

Resources