I have a critical process that I have to make sure at any one time there cannot be two equivalent MyObject being processed (can be different instances but logically equal). The following code demonstrates the idea:
public class MyClass {
public static ConcurrentMap<MyObject, String> concurrentMap = new ConcurrentHashMap<>();
public void process(MyObject myObject) {
String id = UUID.randomUUID().toString();
String existingId = concurrentMap.putIfAbsent(myObject, id);
synchronized (id) {
if (existingId == null) { // no others, start working right away
// do work
} else { // an equivalent myObject is under processing, wait on it
synchronized (existingId) {
// finally can start doing work
}
}
}
}
}
The code above works with the help of synchronized on a random string. But the issues with this code are
Every time it creates a new random id but is not used if an existing id has been linked to an equivalent MyObject. The sole purpose of such id is to act as a unique lock to be discovered by the other thread. Thinking if this can be replaced by some actual lock object?
There is no way in this code to know when should MyObject be removed from the concurrentMap, although this does not affect the result, having the concurrentMap keeps growing may not be good. Thinking if something like a counter of locks can be used here?
Thank you
I don't think I really understand the use case for this idea, and that's a big red flag. But it seems to me that all of this code is unnecessary. If the only idea here is to obtains a unique lock for myObject, then you already have that: it's myObject.
public class MyClass {
public void process(MyObject myObject) {
synchronized (myObject) {
// finally can start doing work
}
}
}
The rest of the code is just dead weight.
However this is still fraught. Since you are relying on an external procedure to synchronize your object, any other bit of code with a reference to myObject can do whatever they like and you have no control over it. It's a really weak form of synchronization. It can work, if everyone in the code base understands the need to synchronize on their MyObject, but this could be tough to achieve in practice.
I think this library is what you are looking for. Particularly StripedKeyLockManager and CountingLock classes which address your questions. You can either use the library in your project or adjust its source code for your needs. Guava also provides similar functionality via Stripped class.
Related
I have two methods as follows:
class A{
void method1(){
someObj.setSomeAttribute(true);
someOtherObj.callMethod(someObj);
}
void method2(){
someObj.setSomeAttribute(false);
someOtherObj.callMethod(someObj);
}
}
where in another place that attribute is evaluated:
class B{
void callMethod(Foo someObj){
if(someObj.getAttribute()){
//do one thing
} else{
//so another thing
}
}
}
Note that A.method1 and A.method2 are updating the attribute of the same object. If those 2 methods are run in 2 threads, will this work or will there be unexpected results?
Will there be unexpected results? Yes, guaranteed, in that if you modify things you wouldn't want to have an impact on your app (such as the phase of the moon, the current song playing in your winamp, whether your dog is cuddling near the CPU, if it's the 5th tuesday of the month, and other such things), that may have an effect on behaviour. Which you don't want.
What you've described is a so-called violation of the java memory model: The end result is that any java implementation is free to return any of multiple values and nevertheless, that VM is operating properly according to the java specification. Even if it does so seemingly arbitrarily.
As a general rule, each thread gets an unfair coin. Unfair, in that it will try to mess with you: It'll flip correctly every time when you test it out, and then in production, and only when you're giving a demo to that crucial customer, it'll get ya.
Every time it reads to or writes from any field, it will flip this mean coin. On heads, it will use the actual field. On tails, it will use a local copy it made.
That's oversimplifying the model quite a bit, but it's a good start to try to get your head around how this works.
The way out is to force so-called 'comes before' relationships: What java will do, is ensure that what you can observe matches these relationships: If event A is defined as having a comes-before relationship vs. event B, then anything A did will be observed, exactly as is, by B, guaranteed. No more coin flips.
Examples of establishing comes-before relationships involve using volatile, synchronized, and any methods that use these things internally.
NB: Of course. if your setSomeAttribute method, which you did not paste, includes some comes-before-establishing act, then there's no problem here, but as a rule a method called setX will not be doing that.
An example of one that doesn't:
class Example {
private String attr;
public void setAttr(String attr) {
this.attr = attr;
}
}
some examples of ones that do:
Let's say method B.callMethod is executed in the same thread as method1 - then you are guaranteed to at least observe the change method1 made, though it's still a coin flip (whether you actually see what method2 did or not). What would not be possible is seeing the value of that attribute before either method1 or method2 runs, because code running in a single thread has comes-before across the entire run (any line that is executed before another in the same thread has a comes-before relationship).
The set method looks like:
class Example {
private String attr;
private final Object lock = new Object();
public void setAttr(String attr) {
synchronized (lock) {
this.attr = attr;
}
}
public String getAttr() {
synchronized (lock) {
return this.attr;
}
}
}
Now the get and set ops lock on the same object, that's one of the ways to establish comes-before. Which thread got to a lock first is observable behaviour; if method1's set got there before B's get, then you are guaranteed to observe method1's set.
More generally, sharing state between threads is extremely tricky and you should endeavour not do so. The alternatives are:
Initialize all state before starting a thread, then do the job, and only when it is finished, relay all results back. Fork/join does this.
Use a messaging system that has great concurrency fundamentals, such as a database, which has transactions, or message queue libraries.
If you must share state, try to write things in terms of the nice classes in j.u.concurrent.
I assume what you expected is when you call A.method1, someObj.getAttribute() will return true in B.callMethod, when you call A.method2, someObj.getAttribute() will return false in B.callMethod.
Unfortunately,this will not work. Because between the line setSomeAttribute and callMethod,other thread may have change the value of the attribute.
If you are only use the attribute in callMethod,why not just pass the attribute instead of the Foo object. Code as follow:
class A{
void method1(){
someOtherObj.callMethod(true);
}
}
class B{
void callMethod(boolean flag){
if(flag){
//do one thing
} else{
//so another thing
}
}
}
If you must use Foo as the parameter, what you can do is to make setAttribute and callMethod atomic.
The easiest way to achieve it is to make it synchronized.Code as follow:
synchronized void method1(){
someObj.setSomeAttribute(true);
someOtherObj.callMethod(someObj);
}
synchronized void method2(){
someObj.setSomeAttribute(false);
someOtherObj.callMethod(someObj);
}
But this may have bad performance, you can achieve it with some more fine-grained lock.
I have a portion of code that needs to be thread safe. It is code that loads and modifies an object from the database based on its ID. I want to avoid synchronizing on just the Integer ID variable, so I am attempting to implement the solution offered in this thread: https://stackoverflow.com/a/659939/3561422
However, I am not creating a cache so I have nothing in place to manage the objects added to the map. I want to avoid a memory leak situation. I have looked into using a WeakHashMap, but that is apparently not thread-safe. I have created a map as follows, but the GC does not appear to be cleaning up the references I create.
private static Map<Integer, Object> locks = Collections.synchronizedMap(new WeakHashMap<Integer, Object>())
Is there something I am missing here that would make this solution work? Is WeakHashMap actually safe for me to use here?
Some example code:
public static void mainMethod(Integer id){
Object lockObject = getMapObject(id);
synchronized (lockObject) {
Object dbObj = loadDBObjFromDB(id);
//Do pre execution checks
if (dbObj.isInUse()) {
//fail here
}
dbObj.setAsInUseAndCommitToDB();
}
actOnObj(dbObj);
}
private static Object getMapObject(final Integer id) {
locks.putIfAbsent(id, new Object);
return locks.get(id);
}
Basically, I need to mark something in the database as in use. If another thread comes in and wants to do something on it, it needs to see if it is already in use. If it is, I fail and give the user feedback. I need to lock around loading, checking if it is in use, and updating that it is in use. I would like to use the map to avoid locking on an Integer object
I think that what you are looking for here is an implementation of ConcurrentHashSet (there are several out there, I'd look at Guava's). It is the same idea as a ConcurrentHashMap without needing a value (in fact, Guava's is based on ConcurrentHashSet per the documentation). Another alternative is simply doing what you are doing, and only using a single, statically created object as the value (since the value here is irrelevant):
private static final MAP_VALUE = new Object();
private static Object getMapObject(final Integer id) {
locks.putIfAbsent(id, MAP_VALUE);
return locks.get(id);
}
For the map, just make it a regular ConcurrentHashMap. No need to worry about weak references or weak hashmaps.
I need to know when I should add some synchronization block to my code when using ConcurrentHashMap. Let's say I have a method like:
private static final ConcurrentMap<String, MyObjectWrapper> myObjectsCache = new ConcurrentHashMap<>(CACHE_INITIAL_CAPACITY);
public List<MyObject> aMethod(List<String> ids, boolean b) {
List<MyObject> result = new ArrayList<>(ids.size());
for (String id : ids) {
if (id == null) {
continue;
}
MyObjectWrapper myObjectWrapper = myObjectsCache.get(id);
if (myObjectWrapper == null) {
continue;
}
if (myObjectWrapper.getObject() instanceof MyObjectSub) {
((MyObjectSub) myObjectWrapper.getObject()).clearAField();
myObjectWrapper.getObject().setTime(System.currentTimeMillis());
}
result.add(myObjectWrapper.getObject());
if (b) {
final MyObject obj = new MyObject(myObjectWrapper.getObject());
addObjectToDb(obj);
}
}
return result;
}
How should I efficiently make this method concurrent?
I think that the "get" is safe but once I get the value from cache and update the cached object's fields - there can be problems beacuse another thread could get the same wrapper and try to update the same underlying object... Should I add synchronization? And if so, then should I synchronize from "get" to end of loop iteration or the entire loop?
Maybe someone could share some more specific guidelines of proper and efficient use of ConcurrentHashMap when some more operations need to be done on the map keys/values inside loops etc...
I would be really grateful.
EDIT:
Some context for the question:
I'm currently working on refactoring of some dao classes in production code and a few of the classes used HashMaps for caching data retrieved from the database. All methods that used the cache (for write or reads) had their entire content inside a synchronized(cache) block (playing safe?). I don't have much experience with concurrency and I really want to use this opportunity to learn. I naively changed the HashMaps to ConcurrentHashMaps and now want to remove the synchronized bloocks where they're necessary. All caches are used for writes and reads. The presented method is based on one of the methods that I've changed and now I'm trying to learn when and to what extent synchronize. The methods clearAField just changes a value of one of the fields of the wrapped POJO object and addObjectToDb tries to add the object to the database.
Other example would be refilling of the cache:
public void findAll() throws SQLException{
// get data from database into a list
List<Data> data=getAllDataFromDatabase();
cacheCHM.clear();
cacheCHM.putAll(data);
}
In which case I should put the clear and putAll inside a synchronize(cacheCHM) block, right?
I've tried to find and read some posts/articles about the proper and efficient usage of CHM but most deal with single operations, without loops etc.... The best I've found would be:
http://www.javamadesoeasy.com/2015/04/concurrenthashmap-in-java.html
You've not mentioned what concurrency you expect to happen within your app, so I'm going to assume you have multiple threads calling aMethod, and nothing else.
You only have a single call to the ConcurrentHashMap: myObjectsCache.get(id), this is fine. In fact since nothing is writing data into your objectCache [see assumption above] you don't even need a ConcurrentHashMap! You'd be fine with any immutable collection. You have a suspicious line at the end: addObjectToDb(obj), does this method also affect your cache? If so it's still safe (probably, we'd have to see the method to be certain), but you definitely need the ConcurentHashMap.
The danger is where you change the objects, here:
myObjectWrapper.getObject().clearAField();
myObjectWrapper.getObject().setTime(System.currentTimeMillis());
It's possible for multiple threads to call these methods on the same object at the same time. Without knowing what these methods do, we can't say if this is safe or not. If these methods are both marked synchronised, or if you took care to ensure that it was safe for these methods to run concurrently then you're fine (but beware there's scope for these methods to run in different orders to what you might intuitively expect!). If you weren't so careful then there is a potential for data corruption.
A better approach to threadsaftey and caches is to use immutable objects. Here's what the MyObjectSub calss might look like if it were immutable [not sure why you need the wrapper - I'd omit that completely is possible]:
//Provided by way of example. You should consider generating these
//using http://immutables.github.io/ or similar
public class MyImmutableObject {
//If all members are final primitives or immutable objects
//then this class is threadsafe.
final String field;
final long time;
public MyImmutableObject(String field, long time) {
this.field = field;
this.time = time;
}
public MyImmutableObject clearField() {
//Since final fields can never be changed, our only option is to
//return a copy.
return new MyImmutableObject("", this.time);
}
public MyImmutableObject setTime(long newtime) {
return new MyImmutableObject(this.field, newtime);
}
}
If your objects are immutable then thread safety is a lot simpler. Your method would look something like this:
public List<Result> typicialCacheUsage(String key) {
MyImmutableObject obj = myObjectsCache.get(key);
obj = obj.clearField();
obj = obj.setTime(System.currentTimeMillis());
//If you need to put the object back in the cache you can do this:
myObjectsCache.put(key, obj);
List<Result> res = generateResultFromObject(obj);
return res;
}
public class ObjectA {
private void foo() {
MutableObject mo = new MutableObject();
Runnable objectB = new ObjectB(mo);
new Thread(objectB).start();
}
}
public class ObjectB implements Runnable {
private MutableObject mo;
public ObjectB(MutableObject mo) {
this.mo = mo;
}
public void run() {
//read some field from mo
}
}
As you can see from the code sample above, I pass a mutable object to a class that implements Runnable and will use the mutable object in another thread. This is dangerous because ObjectA.foo() can still alter the mutable object's state after starting the new thread. What is the preferred way to ensure thread safety here? Should I make copy of the MutableObject when passing it to ObjectB? Should the mutable object ensure proper synchronization internally? I've come across this many times before, especially when trying to use SwingWorker in a number of GUI applications. I usually try to make sure that ONLY immutable object references are passed to a class that will use them in another thread, but sometimes this can be difficult.
This is a hard question, and the answer, unfortunately, is 'it depends'. You have three choices when it comes to thread-safety of your class:
Make it Immutable, then you don't have to worry. But this isn't what you're asking.
Make it thread-safe. That is, provide enough concurrency control internal to the class that client code doesn't have to worry about concurrent threads modifying the object.
Make it not-thread safe, and force client code to have some kind of external synchronization.
You're essentially asking whether you should use #2 or #3. You are worried about the case where another developer uses the class and doesn't know that it requires external synchronization. I like using the JCIP annotations #ThreadSafe #Immutable #NotThreadSafe as a way to document the concurrency intentions. This isn't bullet-proof, as developers still have to read the documentation, but if everyone on the team understands these annotations and consistently applies them, it does make things clearer.
For your example, if you want to make the class not thread-safe, you could use AtomicReference to make it clear and provide synchronization.
public class ObjectA {
private void foo() {
MutableObject mo = new MutableObject();
Runnable objectB = new ObjectB(new AtomicReference<>( mo ) );
new Thread(objectB).start();
}
}
public class ObjectB implements Runnable {
private AtomicReference<MutableObject> mo;
public ObjectB(AtomicReference<MutableObject> mo) {
this.mo = mo;
}
public void run() {
//read some field from mo
mo.get().readSomeField();
}
}
I think you are overcomplicating it. If it is as the example (a local variable of which no reference is kept) you should trust that nobody will try to write to it. If it is more complicated (A.foo() has more LOC) if possible, create it only to pass to the thread.
new Thread(new MutableObject()).start();
If not (due to initializations), declare it in a block so it gets out of scope immediately, even maybe in a separate private method.
{
MutableObject mo = new MutableObject();
Runnable objectB = new ObjectB(mo);
new Thread(objectB).start();
}
....
Copy the object. You won't have any weird visibility problems because you pass the copy to a new Thread. Thread.start always happens before the new thread enters its run method. If you change this code to pass the object to an existing thread, you need proper synchronization. I recommend a blocking queue from Java.util.concurrent.
Without knowing your exact situation, this question will be difficult to answer precisely. The answer totally depends on what the MutableObject represents, how many other threads may modify it simultaneously, and whether or not the threads that read the object care whether its state changes while they are reading it.
With respect to thread-safety, internally synchronizing all reads and writes to MutableObject is provably the "safest" thing to do, but it comes at the cost of performance. If contention is really high on reads and writes, then your program may suffer performance issues. You can get better performance by sacrificing some guarantees on mutual exclusion - whether those sacrifices are worth the performance increases totally depends on the specific problem you're trying to solve.
You can also play some games with how you go about "internally synchronizing" your MutableObject, if that's what you end up doing. If you haven't already, I'd recommend reading up on the differences between volatile and synchronized and understand how each can be used to ensure thread safety for different situations.
Updated the question.. please check secodn part of question
I need to build up a master list of book ids. I have multiple threaded tasks which brings up a subset of book ids. As soon as each task execution is completed, I need to add them to the super list of book ids. Hence I am planning to pass below aggregator class instance to all of my execution tasks and have them call the updateBookIds() method. To ensure it's thread safe, I have kept the addAll code in synchronized block.
Can any one suggest is this same as Synchronized list? Can I just say Collections.newSynchronizedList and call addAll to that list from all thread tasks? Please clarify.
public class SynchronizedBookIdsAggregator {
private List<String> bookIds;
public SynchronizedBookIdsAggregator(){
bookIds = new ArrayList<String>();
}
public void updateBookIds(List<String> ids){
synchronized (this) {
bookIds.addAll(ids);
}
}
public List<String> getBookIds() {
return bookIds;
}
public void setBookIds(List<String> bookIds) {
this.bookIds = bookIds;
}
}
Thanks,
Harish
Second Approach
So after below discussions, I am currently planning to go with below approach. Please let me know if I am doing anything wrong here:-
public class BooksManager{
private static Logger logger = LoggerFactory.getLogger();
private List<String> fetchMasterListOfBookIds(){
List<String> masterBookIds = Collections.synchronizedList(new ArrayList<String>());
List<String> libraryCodes = getAllLibraries();
ExecutorService libraryBookIdsExecutor = Executors.newFixedThreadPool(BookManagerConstants.LIBRARY_BOOK_IDS_EXECUTOR_POOL_SIZE);
for(String libraryCode : libraryCodes){
LibraryBookIdsCollectionTask libraryTask = new LibraryBookIdsCollectionTask(libraryCode, masterBookIds);
libraryBookIdsExecutor.execute(libraryTask);
}
libraryBookIdsExecutor.shutdown();
//Now the fetching of master list is complete.
//So I will just continue my processing of the master list
}
}
public class LibraryBookIdsCollectionTask implements Runnable {
private String libraryCode;
private List<String> masterBookIds;
public LibraryBookIdsCollectionTask(String libraryCode,List<String> masterBookIds){
this.libraryCode = libraryCode;
this.masterBookIds = masterBookIds;
}
public void run(){
List<String> bookids = new ArrayList<String>();//TODO get this list from iconnect call
synchronized (masterBookIds) {
masterBookIds.addAll(bookids);
}
}
}
Thanks,
Harish
Can I just say Collections.newSynchronizedList and call addAll to that list from all thread tasks?
If you're referring to Collections.synchronizedList, then yes, that would work fine. That will give you a object that implements the List interface where all of the methods from that interface are synchronized, including addAll.
Consider sticking with what you have, though, since it's arguably a cleaner design. If you pass the raw List to your tasks, then they get access to all of the methods on that interface, whereas all they really need to know is that there's an addAll method. Using your SynchronizedBookIdsAggregator keeps your tasks decoupled from design dependence on the List interface, and removes the temptation for them to call something other than addAll.
In cases like this, I tend to look for a Sink interface of some sort, but there never seems to be one around when I need it...
The code you have implemented does not create a synchronization point for someone who accesses the list via getBookIds(), which means they could see inconsistent data. Furthermore, someone who has retrieved the list via getBookIds() must perform external synchronization before accessing the list. Your question also doesn't show how you are actually using the SynchronizedBookIdsAggregator class, which leaves us with not enough information to fully answer your question.
Below would be a safer version of the class:
public class SynchronizedBookIdsAggregator {
private List<String> bookIds;
public SynchronizedBookIdsAggregator() {
bookIds = new ArrayList<String>();
}
public void updateBookIds(List<String> ids){
synchronized (this) {
bookIds.addAll(ids);
}
}
public List<String> getBookIds() {
// synchronized here for memory visibility of the bookIds field
synchronized(this) {
return bookIds;
}
}
public void setBookIds(List<String> bookIds) {
// synchronized here for memory visibility of the bookIds field
synchronized(this) {
this.bookIds = bookIds;
}
}
}
As alluded to earlier, the above code still has a potential problem with some thread accessing the ArrayList after it has been retrieved by getBookIds(). Since the ArrayList itself is not synchronized, accessing it after retrieving it should be synchronized on the chosen guard object:
public class SomeOtherClass {
public void run() {
SynchronizedBookIdsAggregator aggregator = getAggregator();
List<String> bookIds = aggregator.getBookIds();
// Access to the bookIds list must happen while synchronized on the
// chosen guard object -- in this case, aggregator
synchronized(aggregator) {
<work with the bookIds list>
}
}
}
I can imagine using Collections.newSynchronizedList as part of the design of this aggregator, but it is not a panacea. Concurrency design really requires an understanding of the underlying concerns, more than "picking the right tool / collection for the job" (although the latter is not unimportant).
Another potential option to look at is CopyOnWriteArrayList.
As skaffman alluded to, it might be better to not allow direct access to the bookIds list at all (e.g., remove the getter and setter). If you enforce that all access to the list must run through methods written in SynchronizedBookIdsAggregator, then SynchronizedBookIdsAggregator can enforce all concurrency control of the list. As my answer above indicates, allowing consumers of the aggregator to use a "getter" to get the list creates a problem for the user of that list: to write correct code they must have knowledge of the synchronization strategy / guard object, and furthermore they must also use that knowledge to actively synchronize externally and correctly.
Regarding your second approach. What you have shown looks technically correct (good!).
But, presumably you are going to read from masterBookIds at some point, too? And you don't show or describe that part of the program! So when you start thinking about when and how you are going to read masterBookIds (i.e. the return value of fetchMasterListOfBookIds()), just remember to consider concurrency concerns there too! :)
If you make sure all tasks/worker threads have finished before you start reading masterBookIds, you shouldn't have to do anything special.
But, at least in the code you have shown, you aren't ensuring that.
Note that libraryBookIdsExecutor.shutdown() returns immediately. So if you start using the masterBookIds list immediately after fetchMasterListOfBookIds() returns, you will be reading masterBookIds while your worker threads are actively writing data to it, and this entails some extra considerations.
Maybe this is what you want -- maybe you want to read the collection while it is being written to, to show realtime results or something. But then you must consider synchronizing properly on the collection if you want to iterate over it while it is being written to.
If you would just like to make sure all writes to masterBookIds by worker threads have completed before fetchMasterListOfBookIds() returns, you could use ExecutorService.awaitTermination (in combination with .shutdown(), which you are already calling).
Collections.SynchronizedList (which is the wrapper type you'd get) would synchronize almost every method on either itself or a mutex object you pass to the constructor (or Collections.synchronizedList(...) ). Thus it would basically be the same as your approach.
All the methods called using the wrapper returned by Collections.synchronizedList() will be synchronized. This means that the addAll method of normal List when called by this wrapper will be something like this :-
synchronized public static <T> boolean addAll(Collection<? super T> c, T... elements)
So, every method call for the list (using the reference returned and not the original reference) will be synchronized.
However, there is no synchronization between different method calls.
Consider following code snippet :-
List<String> l = Collections.synchronizedList(new ArrayList<String>);
l.add("Hello");
l.add("World");
While multiple threads are accessing the same code, it is quite possible that after Thread A has added "Hello", Thread B will start and again add "Hello" and "World" both to list and then Thread A resumes. So, list would have ["hello", "hello", "world", "world"] instead of ["hello", "world", hello", "world"] as was expected. This is just an example to show that list is not thread-safe between different method calls of the list. If we want the above code to have desired result, then it should be inside synchronized block with lock on list (or this).
However, with your design there is only one method call. SO IT IS SAME AS USING Collections.synchronizedList().
Moreover, as Mike Clark rightly pointed out, you should also synchronized getBookIds() and setBookIds(). And synchronizing it over List itself would be more clear since it is like locking the list before operating on it and unlocking it after operating. So that nothing in-between can use the List.