i have a list of string(tagList) which need to shared among multiple threads for reading, so i create a unmodifiable version of it and pass it to threads, i m not sure if it's thread safe, since threads only read that list so i guess it should be ok?
also when i pass that unmodifialble list to the threads, does it pass a single copy and shared by threads or does it create multiple copy and pass one copy to each thread?
here is my code:
final List<String> tList = Collections.unmodifiableList(tagList);
List<Future<Void>> calls = new ArrayList<Future<Void>>();
FileStatus[] fsta = _fileSystem.listStatus(p);
for (FileStatus sta : fsta) {
final Path path = new Path(sta.getPath(), "data.txt");
if (!_fileSystem.exists(path)) {
continue;
}
else {
calls.add(_exec.submit(new Callable<Void>() {
#Override
public Void call() throws Exception {
filterData(path, tList);
return null;
}
}));
}
}
This completely depends on whether underlying list is thread safe on read operations. Unmodifiable list just passes all read calls, such as size (), get (int) etc to underlying list without additional synchronization.
Imagine, for example, implementation of List which caches hash code instead of calculating it each time it is needed. For such implementation, hashCode () method is actually not read-only, because it may modify internally cached hash code value.
Another example is a flavor of LinkedList which caches reference to last accessed entry together with its index, so further attempts to access nearby elements will be performed much faster. For such implementation, method get (int) will be not read-only because it updates cached reference and index, and thus will probably be not thread safe.
It is thread safe (since it can't be modified). It passes the same copy around.
However, the wrapped List (tagList) is still NOT thread safe. You must not modify the wrapped List while it's being shared. The only reason the list returned by unmodifiableList() is safe is because modifications are not allowed through it to the wrapped List.
As you say that the list is unmodifialble; hence it will be thread safe.
When you pass an object you actually pass the reference to the object (and not the actual object). As there is a single copy and its not getting modified so it will remain thread safe.
Just one caution; never access tagList directly. Access it always through wrapped unmodifiable collection named as tList. This can be achieved if you encapsulate it properly.
Related
How do I lock a data structure (such as List) when someone is iterating over it?
For example, let's say I have this class with a list in it:
class A{
private List<Integer> list = new ArrayList<>();
public MyList() {
// initialize this.list
}
public List<Integer> getList() {
return list;
}
}
And I run this code:
public static void main(String[] args) {
A a = new A();
Thread t1 = new Thread(()->{
a.getList().forEach(System.out::println);
});
Thread t2 = new Thread(()->{
a.getList().removeIf(e->e==1);
});
t1.start();
t2.start();
}
I don't have a single block of code that uses the list, so I can't use synchronized().
I was thinking of locking the getList() method after it has been called but how can I know if the caller has finished using it so I could unlock it?
And I don't want to use CopyOnWriteArrayList because of I care about my performance;
after it has been called but how can I know if the caller has finished using it so I could unlock it?
That's impossible. The iterator API fundamentally doesn't require that you explicitly 'close' them, so, this is simply not something you can make happen. You have a problem here:
Iterating over the same list from multiple threads is an issue if anybody modifies that list in between. Actually, threads are immaterial; if you modify a list then interact with an iterator created before the modification, you get ConcurrentModificationException guaranteed. Involve threads, and you merely usually get a CoModEx; you may get bizarre behaviour if you haven't set up your locking properly.
Your chosen solution is "I shall lock the list.. but how do I do that? Better ask SO". But that's not the correct solution.
You have a few options:
Use a lock
It's not specifically the iteration that you need to lock, it's "whatever interacts with this list". Make an actual lock object, and define that any interaction of any kind with this list must occur in the confines of this lock.
Thread t1 = new Thread(() -> {
a.acquireLock();
try {
a.getList().forEach(System.out::println);
} finally {
a.releaseLock();
}
});
t1.start();
Where acquireLock and releaseLock are methods you write that use a ReadWriteLock to do their thing.
Use CopyOnWriteArrayList
COWList is an implementation of java.util.List with the property that it copies the backing store anytime you change anything about it. This has the benefit that any iterator you made is guaranteed to never throw ConcurrentModificationException: When you start iterating over it, you will end up iterating each value that was there as the list was when you began the iteration. Even if your code, or any other thread, starts modifying that list halfway through. The downside is, of course, that it is making lots of copies if you make lots of modifications, so this is not a good idea if the list is large and you're modifying it a lot.
Get rid of the getList() method, move the tasks into the object itself.
I don't know what a is (the object you call .getList() on, but apparently one of the functions that whatever this is should expose is some job that you really can't do with a getList() call: It's not just that you want the contents, you want to get the contents in a stable fashion (perhaps the method should instead have a method that gives you a copy of the list), or perhaps you want to do a thing to each element inside it (e.g. instead of getting the list and calling .forEach(System.out::println) on it, instead pass System.out::println to a and let it do the work. You can then focus your locks or other solutions to avoid clashes in that code, and not in callers of a.
Make a copy yourself
This doesn't actually work, even though it seems like it: Immediately clone the list after you receive it. This doesn't work, because cloning the list is itself an operation that iterates, just like .forEach(System.out::println) does, so if another thread interacts with the list while you are making your clone, it fails. Use one of the above 3 solutions instead.
I have read some questions & answers on visibility of Java array elements from multiple threads, but I still can't really wrap my head around some cases. To demonstrate what I'm having trouble with, I have come up with a simple scenario: Assume that I have a simple collection that adds elements into one of its n buckets by hashing them into one (Bucket is like a list of some sort). And each bucket is separately synchronized. E.g. :
private final Object[] locks = new Object[10];
private final Bucket[] buckets = new Bucket[10];
Here a bucket i is supposed to be guarded by lock[i]. Here is how add elements code looks like:
public void add(Object element) {
int bucketNum = calculateBucket(element); //hashes element into a bucket
synchronized (locks[bucketNum]) {
buckets[bucketNum].add(element);
}
}
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
And finally, a bit trickier part. Assume I want to copy out and merge the contents of all buckets and empty the whole data structure, from an arbitrary thread, like this:
public List<Bucket> clear() {
List<Bucket> allBuckets = new List<>();
for(int bucketNum = 0; bucketNum < buckets.length; bucketNum++) {
synchronized (locks[bucketNum]) {
allBuckets.add(buckets[bucketNum]);
buckets[bucketNum] = new Bucket();
}
}
return allBuckets;
}
I basically swap the old bucket with a newly created one and return the old one. This case is different from the add() one because we are not modifying the object referred by the reference in the array but we are directly changing the array/reference.
Note that I do not care if bucket 2 is modified while I'm holding the lock for bucket 1, I don't need the structure to be fully synchronized and consistent, just visibility and near consistency is enough.
So assuming every bucket[i] is only ever modified under lock[i], would you say that this code works? I hope to be able to learn why and why nots and have a better grasp of visibility, thanks.
First question.
Thread safety in this case depends on whether the reference to the object containing locks and buckets (let's call it Container) is properly shared.
Just imagine: one thread is busy instantiating a new Container object (allocating memory, instantiating arrays, etc.), while another thread starts using this half-instantiated object where locks and buckets are still null (they have not been instantiated by the first thread yet). In this case this code:
synchronized (locks[bucketNum]) {
becomes broken and throws NullPointerException. The final keyword prevents this and guarantees that by the time the reference to Container is not null, its final fields have already been initialized:
An object is considered to be completely initialized when its
constructor finishes. A thread that can only see a reference to an
object after that object has been completely initialized is guaranteed
to see the correctly initialized values for that object's final
fields. (JLS 17.5)
Second question.
Assuming that locks and buckets fields are final and you don't care about consistency of the whole array and "every bucket[i] is only ever modified under lock[i]", this code is fine.
Just to add to Pavel's answer:
In your first question you ask
Since 'buckets' is final, this would not have any visibility problem even without synchronization. My guess is, with synchronization, this wouldn't have any visibility problems without the final either, is this correct?
I'm not sure what you mean by 'visibility problems', but for sure, without the synchronized this code would be incorrect, if multiple threads would access buckets[i] with one of them modifying it (e.g. writing to it). There would be no guarantee that what one thread have written, becomes visible to another one. This also involves internal structures of the bucket which might be modified by the call to add.
Remember that final on buckets pertain only to the single reference to the array itself, not to its cells.
In thread A, an ArrayList is created. It is managed from thread A only.
In thread B, I want to copy that to a new instance.
The requirement is that copyList should not fail and should return a consistent version of the list (= existed at some time at least during the copying process).
My approach is this:
public static <T> ArrayList<T> copyList(ArrayList<? extends T> list) {
List<? extends T> unmodifiableList = Collections.unmodifiableList(list);
return new ArrayList<T>(unmodifiableList);
}
Q1: Does that satisfy the requirements?
Q2: How can I do the same without Collections.unmodifiableList with proably iterators and try-catch blocks?
UPD. That is an interview question I was asked a year ago. I understand this a bad idea to use non-thread-safe collections like ArrayList in multithreaded environment
No. ArrayList is not thread safe and you are not using an explicit synchronization.
While you are executing the method unmodifiableList the first thread can modify the original list and you will have a not valid unmodifiable list.
The simplest way I think is the following:
Replace the List with a synchronized version of it.
On the copy list synchronize on the arrayList and make a copy
For example, something like:
List<T> l = Collections.synchronizedList(new ArrayList<T>());
...
public static <T> List<T> copyList(List<? extends T> list) {
List<T> copyList = null;
synchronized(list) {
copyList = new ArrayList<T>(list);
}
return copyList;
}
You should synchronize access to the ArrayList, or replace ArrayList with a concurrent collection like CopyOnWriteArrayList.
Without doing that you might end up with a copy that is inconsistent.
There is absolutely no way to create a copy of a plain ArrayList if the "owning" thread does not offer some protocol to do so.
Without any protocol, thread A can modify the list potentially at any time, meaning thread B never gets a chance to ensure that is sees a consistent state of the list.
To actually allow a consistent copy to be made, thread A must ensure that any modifications it has made are written to memory and are visible to other threads.
Normally, the VM is allowed to reorder instructions, reads and writes as it sees fit, provided no difference can be observed from within the thread executing the program. This includes, for example, delaying writes by holding values in CPU registers or on the local stack.
The only way to ensure that everything is consistently written to main menory, is for thread A to execute an instruction that presents a reordering barrier to the VM (e.g. synchronized block or volatile field access).
So without some cooperation from thread A, there is no way to ensure above conditions are guaranteed to be fulfilled.
Common methods of circumventing this are to synchronize access to the List by only using it in a safely wrapped form (Collections.synchronizedCollection), or use of a List implementation that has these guarantees built in (any type of concurrent list implementation).
The javadoc for Collections.unmodifiableList(...) says, "Returns an unmodifiable view of the specified list."
The key word there is "view". That means it does not copy the data. All it does is create a wrapper for the given list with mutators that all throw exceptions rather than modify the base list.
Yes, but I acually create new ArrayList(Collections.unmodif...), wouldn't this work?
Oops! I missed that. If you're going to copy the list, then there's no point in calling unmodifiableList(). The only code that will ever access the unmodifiable view is the code that's right there in the same method where it's created. You don't have to worry about that code modifying the list contents because you wrote it.
On the other hand, if you're going to copy the list when other threads could be updating the list, then you're going to need synchronized all around. Every place where code could update the list needs to be in a synchronized block, as does the code that makes the copy. Of course, all of those synchronized blocks must synchronize on the same object.
Some programmers will use the list object itself as the lock object. Others will prefer to use a separate object.
Q1: Does that satisfy the requirements?
If the provided list is modified while copying it using new ArrayList<T>(unmodifiableList), you will get a ConcurrentModificationException even if you wrapped it using Collections.unmodifiableList because the Iterator of an UnmodifiableList simply calls the Iterator of the wrapped list and here as it is a non thread safe list you can still get a ConcurrentModificationException.
What you could do is indeed use CopyOnWriteArrayList instead as it is a thread safe list implementation that provides consistent snapshots of the List when you try to iterate over it. Another way could be to make the Thread A push for other threads regularly a safe copy of it using new ArrayList<T>(myList) as it is the only thread that modifies it we know that while creating the copy no other thread will modify it so it would be safe.
Q2: How can I do the same without Collections.unmodifiableList with
probably iterators and try-catch blocks?
As mentioned above Collections.unmodifiableList is not helping here to make it thread safe, for me the only thing that could make sense is actually the opposite: the thread A (the only thread that can modify the list) creates a safe copy of your ArrayList using new ArrayList<T>(list) then it pushes to other threads an unmodified list of it using Collections.unmodifiableList(list).
Generally speaking you should avoid specifying implementations in your method's definition especially public ones, you should only use interfaces or abstract classes because otherwise you would provide an implementation details to the users of your API which is not expected. So here it should be List or Collection not ArrayList.
I have few ArrayList<T> containing user defined objects (e.g. List<Student>, List<Teachers>). The objects are immutable in nature, i.e. no setters are provided - and also, due to the nature of the problem, "no one" will ever attempt to modify these objects. Once the 'ArrayList' is populated, no further addition/removal of objects is allowed/possible. So List will not dynamically change.
With such given condition, can this data structures (i.e. ArraList) be safely used by multiple threads (simultaneously)? Each of the thread will just read the object-properties, but there is no "set" operation possible.
So, my question is can I rely on ArrayList? If not, what other less expensive data structures can be used in such scenario?
You can share any objects or data structures between threads if they are never modified after a safe publication. As mentioned in the comments, there must be a * happen-before* relationship between the writes that initialize the ArrayList and the read by which the other threads acquire the reference.
E.g. if you setup the ArrayList completely before starting the other threads or before submitting the tasks working on the list to an ExecutorService you are safe.
If the threads are already running you have to use one of the thread safe mechanisms to hand-over the ArrayList reference to the other threads, e.g. by putting it on a BlockingQueue.
Even simplest forms like storing the reference into a static final or volatile field will work.
Keep in mind that your precondition of never modifying the object afterwards must always hold. It’s recommended to enforce that constraint by wrapping the list using Collections.unmodifiableList(…) and forget about the original list reference before publishing:
class Example {
public static final List<String> THREAD_SAFE_LIST;
static {
ArrayList<String> list=new ArrayList<>();
// do the setup
THREAD_SAFE_LIST=Collections.unmodifiableList(list);
}
}
or
class Example {
public static final List<String> THREAD_SAFE_LIST
=Collections.unmodifiableList(Arrays.asList("foo", "bar"));
}
Yes, you should be able to pass the array into each thread. There should be no access errors so long as the array is finished being written to before any thread is possibly getting the information.
The javadocs are not clear about this: are unmodifiable sets thread safe? Or should I worry about internal state concurrency problems?
Set<String> originalSet = new HashSet<>();
originalSet.add("Will");
originalSet.add("this");
originalSet.add("be");
originalSet.add("thread");
originalSet.add("safe?");
final Set<String> unmodifiableSet = Collections.unmodifiableSet(originalSet);
originalSet = null; // no other references to the originalSet
// Can unmodifiableSet be shared among several threads?
I have stumbled upon a piece of code with a static, read-only Set being shared against multiple threads... The original author wrote something like this:
mySet = Collections.synchronizedSet(Collections.unmodifiableSet(originalSet));
And then every thread access it with code such as:
synchronized (mySet) {
// Iterate and read operations
}
By this logic only one thread can operate on the Set at once...
So my question is, for a unmodifiable set, when using operations such as for each, contains, size, etc, do I really need to synchronize access?
If it's an unmodifiable Set<String>, as per your example, then you're fine; because String objects are immutable. But if it's a set of something that's not immutable, you have to be careful about two threads both trying to change the same object inside the set.
You also have to be careful about whether there's a reference somewhere to the Set, that's not unmodifiable. It's possible for a variable to be unmodifiable, but still be referring to a Set which can be modified via a different variable; but your example seems to have that covered.
Objects that are "de facto" immutable are thread safe. i.e. objects that never change their state. That includes objects that could theoretically change but never do.
However all the objects contained inside must also be "de facto" immutable.
Furthermore the object only starts to become thread safe when you stop modifying it.
And it needs to be passed to the other threads in a safe manner. There are 2 ways to do that.
1.) you start the other threads only after you stopped modifying your object. In that case you don't need any synchronization at all.
2.) the other threads are already running while you are modifying the object, but once you completed constructing the object, you pass it to them through a synchronized mechanism e.g. a ConcurrentLinkedDeque. After that you don't need any further synchronization.