Preconditions (generic description):
1. static class field
static List<String> ids = new ArrayList<>();
2. CompletableFuture#runAsync(Runnable runnable,Executor executor)
called within
static void main(String args[]) method
3. elements added to someCollection inside of runAsync call from step2
Code snippet (specific description):
private static List<String> ids = new ArrayList<>();
public static void main(String[] args) throws ExecutionException, InterruptedException {
//...
final List<String> lines = Files.lines(path).collect(Collectors.toList());
for (List<String> lines : CollectionUtils.split(1024, lines)) {
CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
List<User> users = buildUsers();
populate(users);
}, executorService);
futures.add(future);
}
private static void populate(List<User> users){
//...
ids.add(User.getId);
//...
}
}
Problem description:
As I understand from concurrency point of view,
static variable could NOT be shared between threads, so data can be lost some way.
Should it be changed to volatile or it would be reasonable to use
ConcurrentSkipListSet<String> ?
Based on the code snippet:
volatile is not required here because it works on reference level, while the tasks don't update the reference of the collection object, they mutate its state. Would the reference be updated, either volatile or AtomicReference might have been used.
Static object can be shared between threads, but the object must be thread-safe. A concurrent collection will do the job for light to medium load.
But the modern way to do this would involve streams instead of using a shared collection:
List<CompletableFuture<List<String>>> futures = lines.stream()
.map(line -> CompletableFuture.supplyAsync(() -> buildUsers().stream()
.map(User::getId)
.collect(Collectors.toList()),
executorService))
.collect(Collectors.toList());
ids.addAll(futures.stream()
.map(CompletableFuture::join)
.flatMap(List::stream)
.collect(Collectors.toList()));
In your particular case there are ways to guarantee thread safety for ids:
Use thread-safe collection (for example, ConcurrentSkipListSet, CopyOnWriteArrayList, Collections.synchronizedList(new ArrayList(), Collections.newSetFromMap(new ConcurrentHashMap());
Use synchronization as shown below.
Examples of synchronized:
private static synchronized void populate(List<User> users){
//...
ids.add(User.getId);
//...
}
private static void populate(List<User> users){
//...
synchronized (ids) {
ids.add(User.getId);
}
//...
}
I assume that it would be the fastest to use Collections.newSetFromMap(new ConcurrentHashMap(), if you expect a lot of user ids. Otherwise, you would be familiar with ConcurrentSkipListSet.
volatile is a bad option here. Volatile guarantees visibility, but not atomicity. The typical examples of volatile usage are
volatile a = 1
void threadOne() {
if (a == 1) {
// do something
}
}
void threadTwo() {
// do something
a = 2
}
In that case, you do only write/read operations once. As "a" is volatile, then it is guaranteed that each thread "see" (read) full exactly 1 or 2.
Another (bad example):
void threadOne() {
if (a == 1) {
// do something
a++;
}
}
void threadTwo() {
if (a == 1) {
// do something
a = 2
} else if (a == 2) {
a++
}
}
Here we do increment operation (read and write) and there are could be different results of a, because we don't have atomicity. That's why there are AtomicInteger, AtomicLong, etc. In your case, all threads would see the write value ids, but they would write different values and if you see inside "add" method of ArrayList, you will see something like:
elementData[size++] = e;
So nobody guarantees atomicity of size value, you could write different id in one array cell.
In terms of thread safety it doesn't matter whether the variable static or not.
What matters are
Visibility of shared state between threads.
Safety (preserving class invariants) when class object is used by multiple threads through class methods.
Your code sample is fine from visibility perspective because ids is static and will be initialized during class creation time. However it's better to mark it as final or volatile depending on whether ids reference can be changed. But safety is violated because ArrayList doesn't preserve its invariants in multithreaded environment by design. So you should use a collection which is designed for accessing by multiple threads. This topic should help with the choice.
Is the compute() function thread safe? Will multiple threads loop correctly over the list?
class Foo {
private List<Integer> list;
public Foo(List<Integer> list) {
this.list = list;
}
public void compute() {
for (Integer i: list) {
// do some thing with it
// NO LIST modifications
}
}
}
Considering that data does not mutate (as you mentioned in the comment) there will not be any dirty / phantom reads.
If the list is created specifically for the purposes of that method, then you're good to go. That is, if the list isn't modified in any other method or class, then that code is thread safe, since you're only reading.
A general recommendation is to make a read-only copy of the collection, if you're not sure the argument comes from a trustworthy origin (and even if you are sure).
this.list = Collections.unmodifiableList(new ArrayList<Integer>(list));
Note, however, that the elements of the list must also be thread-safe. If, in your real scenario, the list contains some mutable structure, instead of Integer (which are immutable), you should make sure that any modifications to the elements are also thread-safe.
If you can guarantee that the list is not modified elsewhere while you're iterating over it that code is thread safe.
I would create a read-only copy of the list though to be absolutely sure that it won't be modified elsewhere:
class Foo {
private List<Integer> list;
public Foo(List<Integer> list) {
this.list = Collections.unmodifiableList(new ArrayList<>(list));
}
public void compute() {
for (Integer i: list) {
// do some thing with it
// NO LIST modifications
}
}
}
If you don't mind adding a dependency to your project I suggest using Guava's ImmutableList:
this.list = ImmutableList.copyOf(list);
It is also a good idea to use Guavas immutable collections wherever you're using collections that aren't changing since they are inherently thread safe due to being immutable.
You can easily inspect the behavior when having for example 2 threads:
public class Test {
public static void main(String[] args) {
Runnable task1 = () -> { new Foo().compute(); };
Runnable task2 = () -> { new Foo().compute(); };
new Thread(task1).start();
new Thread(task2).start();
}
}
If the list is guaranteed not to be changed anywhere else, iterating on it is thread safe, if you implement compute to simply print the list content, debugging your code should help you understanding it is thread safe.
There is thread safe list in cocncurent library. If you want thread-safe collections always use it. Thread-safe list is CopyOnWriteArrayList
This version
class Foo {
private final List<Integer> list;
public Foo(List<Integer> list) {
this.list = new ArrayList<>(list);
}
public void compute() {
for(Integer i: list) {
// ...
}
}
}
is thread-safe, if following holds:
list arg to ctor can't be modified during ctor run time (e.g., it is local variable in caller) or thread-safe itself (e.g., CopyOnWriteArrayList);
compute won't modify list contents (just as OP stated). I guess compute should be not void but return some numeric value, to be of any utility...
While using Java Threading Primitives to construct a thread safe bounded queue - whats the difference between these 2 constructs
Creating an explicit lock object.
Using the list as the lock and waiting on it.
Example of 1
private final Object lock = new Object();
private ArrayList<String> list = new ArrayList<String>();
public String dequeue() {
synchronized (lock) {
while (list.size() == 0) {
lock.wait();
}
String value = list.remove(0);
lock.notifyAll();
return value;
}
}
public void enqueue(String value) {
synchronized (lock) {
while (list.size() == maxSize) {
lock.wait();
}
list.add(value);
lock.notifyAll();
}
}
Example of 2
private ArrayList<String> list = new ArrayList<String>();
public String dequeue() {
synchronized (list) { // lock on list
while (list.size() == 0) {
list.wait(); // wait on list
}
String value = list.remove(0);
list.notifyAll();
return value;
}
}
public void enqueue(String value) {
synchronized (list) { // lock on list
while (list.size() == maxSize) {
list.wait(); // wait on list
}
list.add(value);
list.notifyAll();
}
}
Note
This is a bounded list
No other operation is being performed apart from enqueue and dequeue.
I could use a blocking queue, but this question is more for improving my limited knowledge of threading.
If this question is repeated please let me know.
The short answer is, no, there is no functional difference, other than the extra memory overhead of maintaining that extra lock object. However, there are a couple of semantics-related items I would consider before making a final decision.
Will I ever need to perform synchronized operations on more than just my internal list?
Let's say you wanted to maintain a parallel data structure to your ArrayList, such that all operations on the list and that parallel data structure needed to be synchronized. In this case, it might be best to use the external lock, as locking on either the list or the structure might be confusing to future development efforts on this class.
Will I ever give access to my list outside of my queue class?
Let's say you wanted to provide an accessor method for your list, or make it visible to extensions of your Queue class. If you were using an external lock object, classes that retrieved references to the list would never be able to perform thread-safe operations on that list. In that case, it'd be better to synchronize on the list and make it clear in the API that external accesses/modifications to the list must also synchronize on that list.
I'm sure there are more reasons why you might choose one over the other, but these are the two big ones I can think of.
I'm looking for a way to synchronize a method based on the parameter it receives, something like this:
public synchronized void doSomething(name){
//some code
}
I want the method doSomething to be synchronized based on the name parameter like this:
Thread 1: doSomething("a");
Thread 2: doSomething("b");
Thread 3: doSomething("c");
Thread 4: doSomething("a");
Thread 1 , Thread 2 and Thread 3 will execute the code without being synchronized , but Thread 4 will wait until Thread 1 has finished the code because it has the same "a" value.
Thanks
UPDATE
Based on Tudor explanation I think I'm facing another problem:
here is a sample of the new code:
private HashMap locks=new HashMap();
public void doSomething(String name){
locks.put(name,new Object());
synchronized(locks.get(name)) {
// ...
}
locks.remove(name);
}
The reason why I don't populate the locks map is because name can have any value.
Based on the sample above , the problem can appear when adding / deleting values from the hashmap by multiple threads in the same time, since HashMap is not thread-safe.
So my question is if I make the HashMap a ConcurrentHashMap which is thread safe, will the synchronized block stop other threads from accessing locks.get(name) ??
TL;DR:
I use ConcurrentReferenceHashMap from the Spring Framework. Please check the code below.
Although this thread is old, it is still interesting. Therefore, I would like to share my approach with Spring Framework.
What we are trying to implement is called named mutex/lock. As suggested by Tudor's answer, the idea is to have a Map to store the lock name and the lock object. The code will look like below (I copy it from his answer):
Map<String, Object> locks = new HashMap<String, Object>();
locks.put("a", new Object());
locks.put("b", new Object());
However, this approach has 2 drawbacks:
The OP already pointed out the first one: how to synchronize the access to the locks hash map?
How to remove some locks which are not necessary anymore? Otherwise, the locks hash map will keep growing.
The first problem can be solved by using ConcurrentHashMap. For the second problem, we have 2 options: manually check and remove locks from the map, or somehow let the garbage collector knows which locks are no longer used and the GC will remove them. I will go with the second way.
When we use HashMap, or ConcurrentHashMap, it creates strong references. To implement the solution discussed above, weak references should be used instead (to understand what is a strong/weak reference, please refer to this article or this post).
So, I use ConcurrentReferenceHashMap from the Spring Framework. As described in the documentation:
A ConcurrentHashMap that uses soft or weak references for both keys
and values.
This class can be used as an alternative to
Collections.synchronizedMap(new WeakHashMap<K, Reference<V>>()) in
order to support better performance when accessed concurrently. This
implementation follows the same design constraints as
ConcurrentHashMap with the exception that null values and null keys
are supported.
Here is my code. The MutexFactory manages all the locks with <K> is the type of the key.
#Component
public class MutexFactory<K> {
private ConcurrentReferenceHashMap<K, Object> map;
public MutexFactory() {
this.map = new ConcurrentReferenceHashMap<>();
}
public Object getMutex(K key) {
return this.map.compute(key, (k, v) -> v == null ? new Object() : v);
}
}
Usage:
#Autowired
private MutexFactory<String> mutexFactory;
public void doSomething(String name){
synchronized(mutexFactory.getMutex(name)) {
// ...
}
}
Unit test (this test uses the awaitility library for some methods, e.g. await(), atMost(), until()):
public class MutexFactoryTests {
private final int THREAD_COUNT = 16;
#Test
public void singleKeyTest() {
MutexFactory<String> mutexFactory = new MutexFactory<>();
String id = UUID.randomUUID().toString();
final int[] count = {0};
IntStream.range(0, THREAD_COUNT)
.parallel()
.forEach(i -> {
synchronized (mutexFactory.getMutex(id)) {
count[0]++;
}
});
await().atMost(5, TimeUnit.SECONDS)
.until(() -> count[0] == THREAD_COUNT);
Assert.assertEquals(count[0], THREAD_COUNT);
}
}
Use a map to associate strings with lock objects:
Map<String, Object> locks = new HashMap<String, Object>();
locks.put("a", new Object());
locks.put("b", new Object());
// etc.
then:
public void doSomething(String name){
synchronized(locks.get(name)) {
// ...
}
}
The answer of Tudor is fine, but it's static and not scalable. My solution is dynamic and scalable, but it goes with increased complexity in the implementation. The outside world can use this class just like using a Lock, as this class implements the interface. You get an instance of a parameterized lock by the factory method getCanonicalParameterLock.
package lock;
import java.lang.ref.Reference;
import java.lang.ref.WeakReference;
import java.util.Map;
import java.util.WeakHashMap;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
public final class ParameterLock implements Lock {
/** Holds a WeakKeyLockPair for each parameter. The mapping may be deleted upon garbage collection
* if the canonical key is not strongly referenced anymore (by the threads using the Lock). */
private static final Map<Object, WeakKeyLockPair> locks = new WeakHashMap<>();
private final Object key;
private final Lock lock;
private ParameterLock (Object key, Lock lock) {
this.key = key;
this.lock = lock;
}
private static final class WeakKeyLockPair {
/** The weakly-referenced parameter. If it were strongly referenced, the entries of
* the lock Map would never be garbage collected, causing a memory leak. */
private final Reference<Object> param;
/** The actual lock object on which threads will synchronize. */
private final Lock lock;
private WeakKeyLockPair (Object param, Lock lock) {
this.param = new WeakReference<>(param);
this.lock = lock;
}
}
public static Lock getCanonicalParameterLock (Object param) {
Object canonical = null;
Lock lock = null;
synchronized (locks) {
WeakKeyLockPair pair = locks.get(param);
if (pair != null) {
canonical = pair.param.get(); // could return null!
}
if (canonical == null) { // no such entry or the reference was cleared in the meantime
canonical = param; // the first thread (the current thread) delivers the new canonical key
pair = new WeakKeyLockPair(canonical, new ReentrantLock());
locks.put(canonical, pair);
}
}
// the canonical key is strongly referenced now...
lock = locks.get(canonical).lock; // ...so this is guaranteed not to return null
// ... but the key must be kept strongly referenced after this method returns,
// so wrap it in the Lock implementation, which a thread of course needs
// to be able to synchronize. This enforces a thread to have a strong reference
// to the key, while it isn't aware of it (as this method declares to return a
// Lock rather than a ParameterLock).
return new ParameterLock(canonical, lock);
}
#Override
public void lock() {
lock.lock();
}
#Override
public void lockInterruptibly() throws InterruptedException {
lock.lockInterruptibly();
}
#Override
public boolean tryLock() {
return lock.tryLock();
}
#Override
public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
return lock.tryLock(time, unit);
}
#Override
public void unlock() {
lock.unlock();
}
#Override
public Condition newCondition() {
return lock.newCondition();
}
}
Of course you'd need a canonical key for a given parameter, otherwise threads would not be synchronized as they would be using a different Lock. Canonicalization is the equivalent of the internalization of Strings in Tudor's solution. Where String.intern() is itself thread-safe, my 'canonical pool' is not, so I need extra synchronization on the WeakHashMap.
This solution works for any type of Object. However, make sure to implement equals and hashCode correctly in custom classes, because if not, threading issues will arise as multiple threads could be using different Lock objects to synchronize on!
The choice for a WeakHashMap is explained by the ease of memory management it brings. How else could one know that no thread is using a particular Lock anymore? And if this could be known, how could you safely delete the entry out of the Map? You would need to synchronize upon deletion, because you have a race condition between an arriving thread wanting to use the Lock, and the action of deleting the Lock from the Map. All these things are just solved by using weak references, so the VM does the work for you, and this simplifies the implementation a lot. If you inspected the API of WeakReference, you would find that relying on weak references is thread-safe.
Now inspect this test program (you need to run it from inside the ParameterLock class, due to private visibility of some fields):
public static void main(String[] args) {
Runnable run1 = new Runnable() {
#Override
public void run() {
sync(new Integer(5));
System.gc();
}
};
Runnable run2 = new Runnable() {
#Override
public void run() {
sync(new Integer(5));
System.gc();
}
};
Thread t1 = new Thread(run1);
Thread t2 = new Thread(run2);
t1.start();
t2.start();
try {
t1.join();
t2.join();
while (locks.size() != 0) {
System.gc();
System.out.println(locks);
}
System.out.println("FINISHED!");
} catch (InterruptedException ex) {
// those threads won't be interrupted
}
}
private static void sync (Object param) {
Lock lock = ParameterLock.getCanonicalParameterLock(param);
lock.lock();
try {
System.out.println("Thread="+Thread.currentThread().getName()+", lock=" + ((ParameterLock) lock).lock);
// do some work while having the lock
} finally {
lock.unlock();
}
}
Chances are very high that you would see that both threads are using the same lock object, and so they are synchronized. Example output:
Thread=Thread-0, lock=java.util.concurrent.locks.ReentrantLock#8965fb[Locked by thread Thread-0]
Thread=Thread-1, lock=java.util.concurrent.locks.ReentrantLock#8965fb[Locked by thread Thread-1]
FINISHED!
However, with some chance it might be that the 2 threads do not overlap in execution, and therefore it is not required that they use the same lock. You could easily enforce this behavior in debugging mode by setting breakpoints at the right locations, forcing the first or second thread to stop wherever necessary. You will also notice that after the Garbage Collection on the main thread, the WeakHashMap will be cleared, which is of course correct, as the main thread waited for both worker threads to finish their job by calling Thread.join() before calling the garbage collector. This indeed means that no strong reference to the (Parameter)Lock can exist anymore inside a worker thread, so the reference can be cleared from the weak hashmap. If another thread now wants to synchronize on the same parameter, a new Lock will be created in the synchronized part in getCanonicalParameterLock.
Now repeat the test with any pair that has the same canonical representation (= they are equal, so a.equals(b)), and see that it still works:
sync("a");
sync(new String("a"))
sync(new Boolean(true));
sync(new Boolean(true));
etc.
Basically, this class offers you the following functionality:
Parameterized synchronization
Encapsulated memory management
The ability to work with any type of object (under the condition that equals and hashCode is implemented properly)
Implements the Lock interface
This Lock implementation has been tested by modifying an ArrayList concurrently with 10 threads iterating 1000 times, doing this: adding 2 items, then deleting the last found list entry by iterating the full list. A lock is requested per iteration, so in total 10*1000 locks will be requested. No ConcurrentModificationException was thrown, and after all worker threads have finished the total amount of items was 10*1000. On every single modification, a lock was requested by calling ParameterLock.getCanonicalParameterLock(new String("a")), so a new parameter object is used to test the correctness of the canonicalization.
Please note that you shouldn't be using String literals and primitive types for parameters. As String literals are automatically interned, they always have a strong reference, and so if the first thread arrives with a String literal for its parameter then the lock pool will never be freed from the entry, which is a memory leak. The same story goes for autoboxing primitives: e.g. Integer has a caching mechanism that will reuse existing Integer objects during the process of autoboxing, also causing a strong reference to exist. Addressing this, however, this is a different story.
Check out this framework. Seems you're looking for something like this.
public class WeatherServiceProxy {
...
private final KeyLockManager lockManager = KeyLockManagers.newManager();
public void updateWeatherData(String cityName, Date samplingTime, float temperature) {
lockManager.executeLocked(cityName, new LockCallback() {
public void doInLock() {
delegate.updateWeatherData(cityName, samplingTime, temperature);
}
});
}
https://code.google.com/p/jkeylockmanager/
I've created a tokenProvider based on the IdMutexProvider of McDowell.
The manager uses a WeakHashMap which takes care of cleaning up unused locks.
You could find my implementation here.
I've found a proper answer through another stackoverflow question: How to acquire a lock by a key
I copied the answer here:
Guava has something like this being released in 13.0; you can get it out of HEAD if you like.
Striped more or less allocates a specific number of locks, and then assigns strings to locks based on their hash code. The API looks more or less like
Striped<Lock> locks = Striped.lock(stripes);
Lock l = locks.get(string);
l.lock();
try {
// do stuff
} finally {
l.unlock();
}
More or less, the controllable number of stripes lets you trade concurrency against memory usage, because allocating a full lock for each string key can get expensive; essentially, you only get lock contention when you get hash collisions, which are (predictably) rare.
Just extending on to Triet Doan's answer, we also need to take care of if the MutexFactory can be used at multiple places, as with currently suggested code we will end up with same MutexFactory at all places of its usage.
For example:-
#Autowired
MutexFactory<CustomObject1> mutexFactory1;
#Autowired
MutexFactory<CustomObject2> mutexFactory2;
Both mutexFactory1 & mutexFactory2 will refer to the same instance of factory even if their type differs, this is due to the fact that a single instance of MutexFactory is created by spring during application startup and same is used for both mutexFactory1 & mutexFactory2.
So here is the extra Scope annotation that needs to be put in to avoid above case-
#Component
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class MutexFactory<K> {
private ConcurrentReferenceHashMap<K, Object> map;
public MutexFactory() {
this.map = new ConcurrentReferenceHashMap<>();
}
public Object getMutex(K key) {
return this.map.compute(key, (k, v) -> v == null ? new Object() : v);
}
}
I've used a cache to store lock objects. The my cache will expire objects after a period, which really only needs to be longer that the time it takes the synchronized process to run
`
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
...
private final Cache<String, Object> mediapackageLockCache = CacheBuilder.newBuilder().expireAfterWrite(DEFAULT_CACHE_EXPIRE, TimeUnit.SECONDS).build();
...
public void doSomething(foo) {
Object lock = mediapackageLockCache.getIfPresent(foo.toSting());
if (lock == null) {
lock = new Object();
mediapackageLockCache.put(foo.toString(), lock);
}
synchronized(lock) {
// execute code on foo
...
}
}
`
I have a much simpler, scalable implementation akin to #timmons post taking advantage of guavas LoadingCache with weakValues. You will want to read the help files on "equality" to understand the suggestion I have made.
Define the following weakValued cache.
private final LoadingCache<String,String> syncStrings = CacheBuilder.newBuilder().weakValues().build(new CacheLoader<String, String>() {
public String load(String x) throws ExecutionException {
return new String(x);
}
});
public void doSomething(String x) {
x = syncStrings.get(x);
synchronized(x) {
..... // whatever it is you want to do
}
}
Now! As a result of the JVM, we do not have to worry that the cache is growing too large, it only holds the cached strings as long as necessary and the garbage manager/guava does the heavy lifting.
Will the following code snippet of a synchronized ArrayList work in a multi-threaded environment?
class MyList {
private final ArrayList<String> internalList = new ArrayList<String>();
void add(String newValue) {
synchronized (internalList) {
internalList.add(newValue);
}
}
boolean find(String match) {
synchronized (internalList) {
for (String value : internalList) {
if (value.equals(match)) {
return true;
}
}
}
return false;
}
}
I'm concerned that one thread wont be able to see changes by another thread.
Your code will work and is thread-safe but not concurrent. You may want to consider using ConcurrentLinkedQueue or other concurrent thread-safe data structures like ConcurrentHashMap or CopyOnWriteArraySet suggested by notnoop and employ contains method.
class MyList {
private final ConcurrentLinkedQueue<String> internalList =
new ConcurrentLinkedQueue<String>();
void add(String newValue) {
internalList.add(newValue);
}
boolean find(String match) {
return internalList.contains(match);
}
}
This should work, because synchronizing on the same object establishes a happens-before relationship, and writes that happen-before reads are guaranteed to be visible.
See the Java Language Specification, section 17.4.5 for details on happens-before.
It will work fine, because all access to the list is synchronized. Hovewer, you can use CopyOnWriteArrayList to improve concurrency by avoiding locks (especially if you have many threads executing find).
It will work, but better solution is to create a List by calling Collections.synchronizedList().
You may want to consider using a Set(Tree or Hash) for your data as you are doing lookups by a key. They have methods that will be much faster than your current find method.
HashSet<String> set = new HashSet<String>();
Boolean result = set.contains(match); // O(1) time