Let's say I have the following field inside of a class:
ConcurrentHashMap<SomeClass, Set<SomeOtherClass>> myMap = new ConcurrentHashMap<SomeClass, Set<SomeOtherClass>>();
An instance of this class is shared among many threads.
If I want to add or remove an element from a set associated with a key, I could do:
Set<SomeOtherClass> setVal = myMap.get(someKeyValue);
setVal.add(new SomeOtherClass());
The get operation is atomic, therefore thread-safe. However, there's no guarantee that, in between the get and the add instruction, some other thread won't modify the structure messing with the first one's execution.
What would be the best way to make the whole operation atomic?
Here's what I've come up with, but I don't believe it to be very efficient (or to be making the best use out of Java's structures):
I have a ReentrantLock field, so my class looks like this:
class A {
ReentrantLock lock = new ReentrantLock();
ConcurrentHashMap<SomeClass, Set<SomeOtherClass>> myMap = new ConcurrentHashMap<SomeClass, Set<SomeOtherClass>>();
}
Then the method call looks something like this:
lock.lock();
Set<SomeOtherClass> setVal = myMap.get(someKeyValue);
synchronized(setVal) {
lock.unlock();
setVal.add(new SomeOtherClass());
}
The idea being that we let go of the lock once we have made sure no one else will access the Set we're trying to modify. However, I don't think this is making the best use of the ConcurrentMap or that it makes much sense to have a lock, a concurrent structure, and a synchronized block all used to achieve one operation.
Is there a better way to go about this?
ConcurrentHashMap guarantees that the entire method invocation of compute (or computeIfAbsent or computeIfPresent) is done atomically. So, e.g., you could do something like this:
myMap.compute(someKeyValue, (k, v) -> {v.add(new SomeOtherClass()); return v;});
Note:
Using compute is analogous to the original snippet that assumes that somKeyValue is present in the map. Using computeIfPresent is probably safer, though.
Related
How do I lock a data structure (such as List) when someone is iterating over it?
For example, let's say I have this class with a list in it:
class A{
private List<Integer> list = new ArrayList<>();
public MyList() {
// initialize this.list
}
public List<Integer> getList() {
return list;
}
}
And I run this code:
public static void main(String[] args) {
A a = new A();
Thread t1 = new Thread(()->{
a.getList().forEach(System.out::println);
});
Thread t2 = new Thread(()->{
a.getList().removeIf(e->e==1);
});
t1.start();
t2.start();
}
I don't have a single block of code that uses the list, so I can't use synchronized().
I was thinking of locking the getList() method after it has been called but how can I know if the caller has finished using it so I could unlock it?
And I don't want to use CopyOnWriteArrayList because of I care about my performance;
after it has been called but how can I know if the caller has finished using it so I could unlock it?
That's impossible. The iterator API fundamentally doesn't require that you explicitly 'close' them, so, this is simply not something you can make happen. You have a problem here:
Iterating over the same list from multiple threads is an issue if anybody modifies that list in between. Actually, threads are immaterial; if you modify a list then interact with an iterator created before the modification, you get ConcurrentModificationException guaranteed. Involve threads, and you merely usually get a CoModEx; you may get bizarre behaviour if you haven't set up your locking properly.
Your chosen solution is "I shall lock the list.. but how do I do that? Better ask SO". But that's not the correct solution.
You have a few options:
Use a lock
It's not specifically the iteration that you need to lock, it's "whatever interacts with this list". Make an actual lock object, and define that any interaction of any kind with this list must occur in the confines of this lock.
Thread t1 = new Thread(() -> {
a.acquireLock();
try {
a.getList().forEach(System.out::println);
} finally {
a.releaseLock();
}
});
t1.start();
Where acquireLock and releaseLock are methods you write that use a ReadWriteLock to do their thing.
Use CopyOnWriteArrayList
COWList is an implementation of java.util.List with the property that it copies the backing store anytime you change anything about it. This has the benefit that any iterator you made is guaranteed to never throw ConcurrentModificationException: When you start iterating over it, you will end up iterating each value that was there as the list was when you began the iteration. Even if your code, or any other thread, starts modifying that list halfway through. The downside is, of course, that it is making lots of copies if you make lots of modifications, so this is not a good idea if the list is large and you're modifying it a lot.
Get rid of the getList() method, move the tasks into the object itself.
I don't know what a is (the object you call .getList() on, but apparently one of the functions that whatever this is should expose is some job that you really can't do with a getList() call: It's not just that you want the contents, you want to get the contents in a stable fashion (perhaps the method should instead have a method that gives you a copy of the list), or perhaps you want to do a thing to each element inside it (e.g. instead of getting the list and calling .forEach(System.out::println) on it, instead pass System.out::println to a and let it do the work. You can then focus your locks or other solutions to avoid clashes in that code, and not in callers of a.
Make a copy yourself
This doesn't actually work, even though it seems like it: Immediately clone the list after you receive it. This doesn't work, because cloning the list is itself an operation that iterates, just like .forEach(System.out::println) does, so if another thread interacts with the list while you are making your clone, it fails. Use one of the above 3 solutions instead.
I stumbled upon the following piece of code:
public static final Map<String, Set<String>> fooCacheMap = new ConcurrentHashMap<>();
this cache is accessed from rest controller method:
public void fooMethod(String fooId) {
Set<String> fooSet = cacheMap.computeIfAbsent(fooId, k -> new ConcurrentSet<>());
//operations with fooSet
}
Is ConcurrentSet really necessary? when I know for sure that the set is accessed only in this method?
As you use it in the controller then multiple threads can call your method simultaneously (ex. multiple parallel requests can call your method)
As this method does not look like synchronized in any way then ConcurrentSet is probably necessary here.
Is ConcurrentSet really necessary?
Possibly, possibly not. We don't know how this code is being used.
However, assuming that it is being used in a multithreaded way (specifically: that two threads can invoke fooMethod concurrently), yes.
The atomicity in ConcurrentHashMap is only guaranteed for each invocation of computeIfAbsent. Once this completes, the lock is released, and other threads are able to invoke the method. As such, access to the return value is not atomic, and so you can get thread inference when accessing that value.
In terms of the question "do I need `ConcurrentSet"? No: you can do it so that accesses to the set are atomic:
cacheMap.compute(fooId, (k, fooSet) -> {
if (fooSet == null) fooSet = new HashSet<>();
// Operations with fooSet
return v;
});
Using a concurrent map will not guarantee thread safety. Additions to the Map need to be performed in a synchronized block to ensure that two threads don't attempt to add the same key to the map. Therefore, the concurrent map is not really needed, especially because the Map itself is static and final. Furthermore, if the code modifies the Set inside the Map, which appears likely, that needs to be synchronized as well.
The correct approach is to the Map is to check for the key. If it does not exist, enter a synchronized block and check the key again. This guarantees that the key does not exist without entering a synchronized block every time.
Set modifications should typically occur in a synchronized block as well.
I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}
You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)
This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.
Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.
I have several threads trying to increment a counter for a certain key in a not thread-safe custom data structure (which you can image to be similiar to a HashMap). I was wondering what the right way to increment the counter in this case would be.
Is it sufficient to synchronize the increment function or do I also need to synchronize the get operation?
public class Example {
private MyDataStructure<Key, Integer> datastructure = new CustomDataStructure<Key, Integer>();
private class MyThread implements Runnable() {
private synchronized void incrementCnt(Key key) {
// from the datastructure documentation: if a value already exists for the given key, the
// previous value will be replaced by this value
datastructure.put(key, getCnt(key)+1);
// or can I do it without using the getCnt() function? like this:
datastructure.put(key, datastructure.get(key)+1));
}
private synchronized int getCnt(Key key) {
return datastructure.get(key);
}
// run method...
}
}
If I have two threads t1, t2 for example, I would to something like:
t1.incrementCnt();
t2.incrmentCnt();
Can this lead to any kind of deadlock? Is there a better way to solve this?
Main issue with this code is that it's likely to fail in providing synchronization access to datastructure, since accessing code synchronizing on this of an inner class. Which is different for different instances of MyThread, so no mutual exclusion will happen.
More correct way is to make datastructure a final field, and then to synchronize on it:
private final MyDataStructure<Key, Integer> datastructure = new CustomDataStructure<Key, Integer>();
private class MyThread implements Runnable() {
private void incrementCnt(Key key) {
synchronized (datastructure) {
// or can I do it without using the getCnt() function? like this:
datastructure.put(key, datastructure.get(key)+1));
}
}
As long as all data access is done using synchronized (datastructure), code is thread-safe and it's safe to just use datastructure.get(...). There should be no dead-locks, since deadlocks can occur only when there's more than one lock to compete for.
As the other answer told you, you should synchronize on your data structure, rather than on the thread/runnable object. It is a common mistake to try to use synchronized methods in the thread or runnable object. Synchronization locks are instance-based, not class-based (unless the method is static), and when you are running multiple threads, this means that there are actually multiple thread instances.
It's less clear-cut about Runnables: you could be using a single instance of your Runnable class with several threads. So in principle you could synchronize on it. But I still think it's bad form because in the future you may want to create more than one instance of it, and get a really nasty bug.
So the general best practice is to synchronize on the actual item that you are accessing.
Furthermore, the design conundrum of whether or not to use two methods should be solved by moving the whole thing into the data structure itself, if you can do so (if the class source is under your control). This is an operation that is confined to the data structure and applies only to it, and doing the increment outside of it is not good encapsulation. If your data structure exposes a synchronized incrementCnt method, then:
It synchronizes on itself, which is what you wanted.
It can use its own private fields directly, which means you don't actually need to call a getter and a setter.
It is free to have the implementation changed to one of the atomic structures in the future if it becomes possible, or add other implementation details (such as logging increment operations separately from setter access operations).
Java Gurus,
Currently we have a HashMap<String,SomeApplicationObject> which is being read frequently and modified occasionally and we are having issues that during the modification/reloading, Read operation returns null which is not acceptable.
To fix this I have following options:
A. Use ConcurrentHashMap
Which looks like the first choice but the operation which we are talking about is reload() - means clear() followed by replaceAll(). So if the Map is read post clear() and pre replaceAll() it returns null which is not desirable. Even if I synchronize this doesn't resolves the issue.
B. Create another implementation based upon ReentrantReadWriteLock
Where I would create acquire Write Lock before reload() operation. This seems more appropriate but I feel there must be something already available for this and I need not to reinvent the wheel.
What is the best way out?
EDIT Is any Collection already available with such feature?
Since you are reloading the map, I would replace it on a reload.
You can do this by using a volatile Map, which you replace in full when it is updated.
It seems you are not sure as to how what Peter Lawrey suggests can be implemented. It could look like this:
class YourClass {
private volatile Map<String, SomeApplicationObject> map;
//constructors etc.
public void reload() {
Map<String,SomeApplicationObject> newMap = getNewValues();
map = Collections.unmodifiableMap(newMap);
}
}
There are no concurrency issues because:
The new map is created via a local variable, which by definition is not shared - getNewValues does not need to be synchronized or atomic
The assignement to map is atomic
map is volatile, which guarantees that other threads will see the change
This sounds a lot like Guava's Cache, though it really depends how you're populating the map, and how you compute the values. (Disclosure: I contribute to Guava.)
The real question is whether or not you can specify how to compute your SomeApplicationObject given the input String. Just based on what you've told us so far, it might look something like this...
LoadingCache<String, SomeApplicationObject> cache = CacheBuilder.newBuilder()
.build(
new CacheLoader<String, SomeApplicationObject>() {
public SomeApplicationObject load(String key) throws AnyException {
return computeSomeApplicationObject(key);
}
});
Then, whenever you wanted to rebuild the cache, you just call cache.invalidateAll(). With a LoadingCache, you can then call cache.get(key) and if it hasn't computed the value already, it'll get recomputed. Or maybe after calling cache.invalidateAll(), you can call cache.loadAll(allKeys), though you'd still need to be able to load single elements at a time in case any queries come in between the invalidateAll and loadAll.
If this isn't acceptable -- if you can't load one value individually, you have to load them all at once -- then I'd go ahead with Peter Lawrey's approach -- keep a volatile reference to a map (ideally an ImmutableMap), recompute the whole map and assign the new map to the reference when you're done.