How to know if a method is thread safe - java

Suppose I have a method that checks for a id in the db and if the id doesn't exit then inserts a value with that id. How do I know if this is thread safe and how do I ensure that its thread safe. Are there any general rules that I can use to ensure that it doesn't contain race conditions and is generally thread safe.
public TestEntity save(TestEntity entity) {
if (entity.getId() == null) {
entity.setId(UUID.randomUUID().toString());
}
Map<String, TestEntity > map = dbConnection.getMap(DB_NAME);
map.put(entity.getId(), entity);
return map.get(entity.getId());
}

This is a how long is a piece of string question...
A method will be thread safe if it uses the synchronized keyword in its declaration.
However, even if your setId and getId methods used synchronized keyword, your process of setting the id (if it has not been previously initialized) above is not. .. but even then there is an "it depends" aspect to the question. If it is impossible for two threads to ever get the same object with an uninitialised id then you are thread safe because you would never be attempting to concurrently modifying the id.
It is entirely possible, given the code in your question, that there could be two calls to the thread safe getid at the same time for the same object. One by one they get the return value (null) and immediately get pre-empted to let the other thread run. This means both will then run the thread safe setId method - again one by one.
You could declare the whole save method as synchronized, but if you do that the entire method will be single threaded which defeats the purpose of using threads in the first place. You tend to want to minimize the synchronized code to the bare minimum to maximize concurrency.
You could also put a synchronized block around the critical if statement and minimise the single threaded part of the processing, but then you would also need to be careful if there were other parts of the code that might also set the Id if it wasn't previously initialized.
Another possibility which has various pros and cons is to put the initialization of the Id into the get method and make that method synchronized, or simply assign the Id when the object is created in the constructor.
I hope this helps...
Edit...
The above talks about java language features. A few people mentioned facilities in the java class libraries (e.g. java.util.concurrent) which also provide support for concurrency. So that is a good add on, but there are also whole packages which address the concurrency and other related parallel programming paradigms (e.g. parallelism) in various ways.
To complete the list I would add tools such as Akka and Cats-effect (concurrency) and more.
Not to mention the books and courses devoted to the subject.
I just reread your question and noted that you are asking about databases. Again the answer is it depends. Rdbms' usually let you do this type of operation with record locks usually in a transaction. Some (like teradata) use special clauses such as locking row for write select * from some table where pi_cols = 'somevalues' which locks the rowhash to you until you update it or certain other conditions. This is known as pessimistic locking.
Others (notebly nosql) have optimistic locking. This is when you read the record (like you are implying with getid) there is no opportunity to lock the record. Then you do a conditional update. The conditional update is sort of like this: write the id as x provided that when you try to do so the Id is still null (or whatever the value was when you checked). These types of operations are usually down through an API.
You can also do optimistics locking in an RDBMs as follows:
SQL
Update tbl
Set x = 'some value',
Last_update_timestamp = current_timestamp()
Where x = bull AND last_update_timestamp = 'same value as when I last checked'
In this example the second part of the where clause is the critical bit which basically says "only update the record if no one else did and I trust that everyone else will update the last update to when they do". The "trust" bit can sometimes be replaced by triggers.
These types of database operations (if available) are guaranteed by the database engine to be "thread safe".
Which loops me back to the "how long is a piece of string" observation at the beginning of this answer...

Test-and-set is unsafe
a method that checks for a id in the db and if the id doesn't exit then inserts a value with that id.
Any test-and-set pair of operations on a shared resource is inherently unsafe, vulnerable to a race condition. If the two operations are separate (not atomic), then they must be protected as a pair. While one thread completes the test but has not yet done the set, another thread could sneak in and do both the test and the set. The first thread now completes its set without knowing a duplicate action has occurred.
Providing that necessary protection is too broad a topic for an Answer on Stack Overflow, as others have said here.
UPSERT
However, let me point out that an alternative approach to to make the test-and-set atomic.
In the context of a database, that can be done using the UPSERT feature. Also known as a Merge operation. For example, in Postgres 9.5 and later we have the INSERT INTO … ON CONFLICT command. See this explanation for details.
In the context of a Boolean-style flag, a semaphore makes the test-and-set atomic.

In general, when we say "a method is thread-safe" when there is no race-condition to the internal and external data structure of the object it belongs to. In other words, the order of the method calls are strictly enforced.
For example, let's say you have a HashMap object and two threads, thread_a and thread_b.
thread_a calls put("a", "a") and thread_b calls put("a", "b").
The put method is not thread-safe (refer to its documentation) in the sense that while thread_a is executing its put, thread_b can also go in and execute its own put.
A put contains reading and writing part.
thread_a.read("a")
thread_b.read("a")
thread_b.write("a", "b")
thread_a.write("a", "a")
If above sequence happens, you can say ... a method is not thread-safe.
How to make a method thread-safe is by ensuring the state of the whole object cannot be perturbed while the thread-safe method is executing. An easier way is to put "synchronized" keyword in method declarations.
If you are worried about performance, use manual locking using synchronized blocks with a lock object. Further performance improvement can be achieved using a very well designed semaphores.

Related

java, when (and for how long) can a thread cache the value of a non-volatile variable?

From this post: http://www.javamex.com/tutorials/synchronization_volatile_typical_use.shtml
public class StoppableTask extends Thread {
private volatile boolean pleaseStop;
public void run() {
while (!pleaseStop) {
// do some stuff...
}
}
public void tellMeToStop() {
pleaseStop = true;
}
}
If the variable were not declared volatile (and without other
synchronization), then it would be legal for the thread running the
loop to cache the value of the variable at the start of the loop and
never read it again.
In Java 5 or later:
is the last paragraph correct?
So, exactly at what moment can a thread cache the value of the pleaseStop variable (and for how long)? just before calling one of StoppableTask's functions (run, tellMeTpStop) of the object? (and the thread must update the variable when exiting the function at the latest?)
can you point me to a documentation/tutorial reference about this (Java 5 or later)?
Update: here it is my compilation of answers posted on this question:
Without using volatile nor synchronized, there are actually two problems with the above program:
1- Threads can cache the variable pleaseStop since the very first moment that the thread starts and don't update it never again. so, the loop would keep going forever. this can be solved by either using volatile or synchronized. This thread cache mechanism does not exist in C.
2- The java compiler can optimise the code, and replace while(!pleaseStop) {...} to if (!pleaseStop) { while (true) {...}}. so, the loop would keep going forever. again, this can be solved by either using volatile or synchronized. This compiler optimisation exists also in C.
Some more info:
https://www.ibm.com/developerworks/library/j-5things15/
When can it cache?
As for your question about "when can it cache" the value, the answer to that is "always". To understand what that means, read on. Processors have storage called caches, which make it possible for the running thread to access values in memory by reading from the cache rather than from memory. The running thread can also write to this cache as if it were writing the value to memory. Thus, so long as the thread is running, it could be using the cache to store the data it's using. Something has to explicitly happen to flush the value from the cache to memory. For a single-threaded process, this is all well and dandy, but if you have another thread, it might be trying to read the data from memory while the other thread is plugging away reading and writing it to the processor cache without flushing to memory.
How long can it cache?
As for the "for how long" part- the answer is unfortunately forever unless you do something about it. Synchronizing on the data in question is one way to force a flush from the cache so that all threads see the updates to the value. For more detail about ways to cause a flush, see the next section.
Where's some Documentation?
As for the "where's the documentation" question, a good place to start is here. For specifically how you can force a flush, java refers to this by discussing whether one action (such as a data write) appears to "happen before" another (like a data read). For more about this, see here.
What about volatile?
volatile in essence prevents the type of processor caching described above. This ensures that all writes to a variable are visible from other threads. To learn more, the tutorial you linked to in your post seems like a good start.
The relevant documentation is on the volatile keyword (Java Language Specification, Chapter 8.3.1.4) here and the Java memory model (Java Language Specification, Chapter 17.4) here
Declaring the parameter volatile ensures that there is some synchronization of actions by other threads that might change its value. Without declaring volatile, Java can reorder operations taken on a parameter by different threads.
As the Spec says (see 8.3.1.4), for parameters declared volatile,"accesses ... occur exactly as many times, and in exactly the same order, as they appear to occur during execution of the program text by each thread..."
So the caching you speak of can happen anytime if the parameter is not volatile. But there is enforcement of consistent access to that parameter by the Java memory model if the parameter is declared volatile. But no such enforcement would take place if not (unless the threads are synchronized).
The official documentation is in section 17 of the Java Language Specification, especially 17.4 Memory Model.
The correct viewpoint is to start by assuming multi-threaded code won't work, and try to force it to work whether it likes it or not. Without the volatile declaration, or similar, there would be nothing forcing the read of pleaseStop to ever see the write if it happens in another thread.
I agree with the Java Concurrency in Practice recommendation. It is a good distillation of the implications of the JLS material for practical Java programming.

In Java can I depend on reference assignment being atomic to implement copy on write?

If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
A bug was present in the code I posted. Synchronizing on global (a badly named variable) can fail to protect the syncronized block after a swap.
You could fix this by synchronizing on the class (moving the synchronized keyword to the method), but there may be other bugs. A safer and more maintainable solution is to use something from java.util.concurrent.
There is no "eventual consistency guarantee" in the code I posted, one way to make sure that readers do get to see the updates by writers is to use the volatile keyword.
On reflection the general problem that motivated this question was trying to implement lock free reads with locked writes in java, however my (solved) problem was with a collection, which may be unnecessarily confusing for future readers. So in case it is not obvious the code I posted works by allowing one writer at a time to perform edits to "some object" that is being read unprotected by multiple reader threads. Commits of the edit are done through an atomic operation so readers can only get the pre-edit or post-edit "object". When/if the reader thread gets the update, it cannot occur in the middle of a read as the read is occurring on the old copy of the "object". A simple solution that had probably been discovered and proved to be broken in some way prior to the availability of better concurrency support in java.
Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like Boolean.TRUE would work well.)
I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a Map<E,Boolean> my guess is that it does the same thing with setting all the values to Boolean.TRUE behind-the-scenes.
Update: Addressing the poster's example
That is probably what I will end up going with, but I am still curious about how my minimalist solution could fail. – MilesHampson
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the volatile modifier to global's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics of volatile in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference to global a volatile reference then it's possible that no thread will ever see any changes to the value of global until they try to update it, at which point entering the synchronized block will flush the local cache and get the updated reference value.
However, even with the addition of volatile there's still a huge problem. Here's a problem scenario with two threads:
We begin with the empty set, or global={}. Threads A and B both have this value in their thread-local cached memory.
Thread A obtains obtains the synchronized lock on global and starts the update by making a copy of global and adding the new key to the set.
While Thread A is still inside the synchronized block, Thread B reads its local value of global onto the stack and tries to enter the synchronized block. Since Thread A is currently inside the monitor Thread B blocks.
Thread A completes the update by setting the reference and exiting the monitor, resulting in global={1}.
Thread B is now able to enter the monitor and makes a copy of the global={1} set.
Thread A decides to make another update, reads in its local global reference and tries to enter the synchronized block. Since Thread B currently holds the lock on {} there is no lock on {1} and Thread A successfully enters the monitor!
Thread A also makes a copy of {1} for purposes of updating.
Now Threads A and B are both inside the synchronized block and they have identical copies of the global={1} set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside your synchronized block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:
private volatile Collection global = new HashSet(); // start threading after this
private final Object globalLock = new Object(); // final reference used for synchronization
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(globalLock) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
private final Set<Object> global = Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>());
Since the reference is final you don't need to worry about threads using stale references, and since the ConcurrentHashMap handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!
According to the relevant Java Tutorial,
We have already seen that an increment expression, such as c++, does not describe an atomic action. Even very simple expressions can define complex actions that can decompose into other actions. However, there are actions you can specify that are atomic:
Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).
Reads and writes are atomic for all variables declared volatile (including long and double variables).
This is reaffirmed by Section §17.7 of the Java Language Specification
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
It appears that you can indeed rely on reference access being atomic; however, recognize that this does not ensure that all readers will read an updated value for global after this write -- i.e. there is no memory ordering guarantee here.
If you use an implicit lock via synchronized on all access to global, then you can forge some memory consistency here... but it might be better to use an alternative approach.
You also appear to want the collection in global to remain immutable... luckily, there is Collections.unmodifiableSet which you can use to enforce this. As an example, you should likely do something like the following...
private volatile Collection global = Collections.unmodifiableSet(new HashSet());
... that, or using AtomicReference,
private AtomicReference<Collection> global = new AtomicReference<>(Collections.unmodifiableSet(new HashSet()));
You would then use Collections.unmodifiableSet for your modified copies as well.
// ... All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
You should know that making a copy here is redundant, as internally for (Object elm : global) creates an Iterator as follows...
final Iterator it = global.iterator();
while (it.hasNext()) {
Object elm = it.next();
}
There is therefore no chance of switching to an entirely different value for global in the midst of reading.
All that aside, I agree with the sentiment expressed by DaoWen... is there any reason you're rolling your own data structure here when there may be an alternative available in java.util.concurrent? I figured maybe you're dealing with an older Java, since you use raw types, but it won't hurt to ask.
You can find copy-on-write collection semantics provided by CopyOnWriteArrayList, or its cousin CopyOnWriteArraySet (which implements a Set using the former).
Also suggested by DaoWen, have you considered using a ConcurrentHashMap? They guarantee that using a for loop as you've done in your example will be consistent.
Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.
Internally, an Iterator is used for enhanced for over an Iterable.
You can craft a Set from this by utilizing Collections.newSetFromMap like follows:
final Set<E> safeSet = Collections.newSetFromMap(new ConcurrentHashMap<E, Boolean>());
...
/* guaranteed to reflect the state of the set at read-time */
for (final E elem : safeSet) {
...
}
I think your original idea was sound, and DaoWen did a good job getting the bugs out. Unless you can find something that does everything for you, it's better to understand these things than hope some magical class will do it for you. Magical classes can make your life easier and reduce the number of mistakes, but you do want to understand what they are doing.
ConcurrentSkipListSet might do a better job for you here. It could get rid of all your multithreading problems.
However, it is slower than a HashSet (usually--HashSets and SkipLists/Trees hard to compare). If you are doing a lot of reads for every write, what you've got will be faster. More importantly, if you update more than one entry at a time, your reads could see inconsistent results. If you expect that whenever there is an entry A there is an entry B, and vice versa, the skip list could give you one without the other.
With your current solution, to the readers, the contents of the map are always internally consistent. A read can be sure there's an A for every B. It can be sure that the size() method gives the precise number of elements that will be returned by the iterator. Two iterations will return the same elements in the same order.
In other words, allUpdatesGoThroughHere and ConcurrentSkipListSet are two good solutions to two different problems.
Can you use the Collections.synchronizedSet method? From HashSet Javadoc http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html
Set s = Collections.synchronizedSet(new HashSet(...));
Replace the synchronized by making global volatile and you'll be alright as far as the copy-on-write goes.
Although the assignment is atomic, in other threads it is not ordered with the writes to the object referenced. There needs to be a happens-before relationship which you get with a volatile or synchronising both reads and writes.
The problem of multiple updates happening at once is separate - use a single thread or whatever you want to do there.
If you used a synchronized for both reads and writes then it'd be correct but the performance may not be great with reads needing to hand-off. A ReadWriteLock may be appropriate, but you'd still have writes blocking reads.
Another approach to the publication issue is to use final field semantics to create an object that is (in theory) safe to be published unsafely.
Of course, there are also concurrent collections available.

How to make cache thread safe

I have a instance of a object which performs very complex operation.
So in the first case I create an instance and save it it my own custom cache.
From next times whatever thread comes if he finds that a ready made object is already present in the cache they take it from the cache so as to be good in performance wise.
I was worried about what if two threads have the same instance. IS there a chance that the two threads can corrupt each other.
Map<String, SoftReference<CacheEntry<ClassA>>> AInstances= Collections.synchronizedMap(new HashMap<String, SoftReference<CacheEntry<ClassA>>>());
There are many possible solutions:
Use an existing caching solution like EHcache
Use the Spring framework which got an easy way to cache results of a method with a simple #Cacheable annotation
Use one of the synchronized maps like ConcurrentHashMap
If you know all keys in advance, you can use a lazy init code. Note that everything in this code is there for a reason; change anything in get() and it will break eventually (eventually == "your unit tests will work and it will break after running one year in production without any problem whatsoever").
ConcurrentHashMap is most simple to set up but it has simple way to say "initialize the value of a key once".
Don't try to implement the caching by yourself; multithreading in Java has become a very complex area with Java 5 and the advent of multi-core CPUs and memory barriers.
[EDIT] yes, this might happen even though the map is synchronized. Example:
SoftReference<...> value = cache.get( key );
if( value == null ) {
value = computeNewValue( key );
cache.put( key, value );
}
If two threads run this code at the same time, computeNewValue() will be called twice. The method calls get() and put() are safe - several threads can try to put at the same time and nothing bad will happen, but that doesn't protect you from problems which arise when you call several methods in succession and the state of the map must not change between them.
Assuming you are talking about singletons, simply use the "demand on initialization holder idiom" to make sure your "check" works across all JVM's. This will also make sure all threads which are requesting the same object concurrently wait till the initialization is over and be given back only valid object instance.
Here I'm assuming you want a single instance of the object. If not, you might want to post some more code.
Ok If I understand your problem correctly, you are worried that 2 objects changing the state of the shared object will corrupt each other.
The short answer is yes they will.
If the object is expensive in creation but is needed in a read only manner. I suggest you make it immutable, this way you get the benefit of it being fast in access and at the same time thread safe.
If the state should be writable but you don't actually need threads to see each others updates. You can simply load the object once in an immutable cache and just return copies to anyone who asks for the object.
Finally if your object needs to be writable and shared (for other reasons than it just being expensive to create). Then my friend you need to handle thread safety, I don't know your case but you should take a look at the synchronized keyword, Locks and java 5 concurrency features, Atomic types. I am sure one of them will satisfy your need and I sincerely wish that your case is one of the first 2 :)
If you only have a single instance of the Object, have a quick look at:
Thread-safe cache of one object in java
Other wise I can't recommend the google guava library enough, in particular look at the MapMaker class.

Why ConcurrentHashMap.putifAbsent is safe?

I have been reading for concurency since yesterday and i dont know much things... However some things are starting to getting clear...
I understand why double check locking isnt safe (i wonder what is the propability the rare condition to occur) but volatile fixes the issue in 1.5 +....
But i wonder if this occurs with putifAbsent
like...
myObj = new myObject("CodeMonkey");
cHashM.putIfAbsent("keyy",myObj);
Then does this ensures that myObj would be 100% intialiased when another thread does a cHashM.get() ??? Because it could have a reference isnt completely initialised (the double check lock problem)
If you invoke concurrentHashMap.get(key) and it returns an object, that object is guaranteed to be fully initialized. Each put (or putIfAbsent) will obtain a bucket specific lock and will append the element to the bucket's entries.
Now you may go through the code and notice that the get method doesnt obtain this same lock. So you can argue that there can be an out of date read, that isn't true either. The reason here is that value within the entry itself is volatile. So you will be sure to get the most up to date read.
putIfAbsent method in ConcurrentHashMap is check-if-absent-then-set method. It's an atomic operation. But to answer the following part: "Then does this ensures that myObj would be 100% intialiased when another thread does a cHashM.get() ", it would depend on when the object is put into the HashMap. Usually there is a happens-before precedence, i.e., if the caller gets first before the object is placed in the map, then null would be returned, else the value would be returned.
The relevant part of the documentation is this:
Memory consistency effects: As with
other concurrent collections, actions
in a thread prior to placing an object
into a ConcurrentMap as a key or value
happen-before actions subsequent to
the access or removal of that object
from the ConcurrentMap in another
thread.
-- java.util.ConcurrentMap
So, yes you have your happens-before relationship.
I'm not an expert on this, but looking at the implementation of Segment in ConcurrentHashMap I see that the volatile field count appears to be used to ensure proper visibility between threads. All read operations have to read the count field and all write operations have to write to it. From comments in the class:
Read operations can thus proceed without locking, but rely
on selected uses of volatiles to ensure that completed
write operations performed by other threads are
noticed. For most purposes, the "count" field, tracking the
number of elements, serves as that volatile variable
ensuring visibility. This is convenient because this field
needs to be read in many read operations anyway:
- All (unsynchronized) read operations must first read the
"count" field, and should not look at table entries if
it is 0.
- All (synchronized) write operations should write to
the "count" field after structurally changing any bin.
The operations must not take any action that could even
momentarily cause a concurrent read operation to see
inconsistent data. This is made easier by the nature of
the read operations in Map. For example, no operation
can reveal that the table has grown but the threshold
has not yet been updated, so there are no atomicity
requirements for this with respect to reads.

Should volatile be used for attributes of domain model classes in Java web apps?

Here's my thinking:
Even though a HTTP request cycle is essentially handled by a 'single thread', each time a HTTP request is processed for that same session it is likely to be processed by a different thread from the thread pool.
Without the volatile keyword being used on a domain model object, whose lifecycle extends across multiple HTTP requests for the same session, then, according to my understanding, isn't it possible that the attribute could be thread local cached (an optimization by the compiler) in the thread that serviced the first HTTP request? If the second HTTP request is serviced by another thread then that second thread may not see the changes in that attribute that were made by the first thread.
Does this spell "Danger Will Robinson"? Or am I missing a vital plot point about the use (or not) of the volatile keyword?
I think you are forgetting that the threads handling the HTTP request first need to retrieve the instance of the domain model object from the HttpSession provided by your application server. The thread handling request 2 in the scenario you describe does not already have an instance of this domain model - it has to retrieve it from the session implementation at the start of handling each and every request.
I think it is completely reasonable to assume that the session-handling implementation in your application server is handling session data in such a way that memory model visibility issues are avoided. Apache Tomcat's default (non-clustered) HttpSession implementation, for example, stores the session attributes in a ConcurrentHashMap.
Adding volatile seems completely unnecessary to me. I have never seen this done for domain model objects handled by HTTP requests in a Servlet environment in any project I have worked in.
This would be a different story if thread-1 and thread-2 had references to the same object instance simulatenously while processing two different requests, and you were concerned about changes in one thread being visible to the other as each are processing the request, but this does not sound like what you are asking about.
Yes, if you are sharing an object between different threads, you may have race conditions. Without a happens before relationship, writes made by one thread may not be seen by a read in another thread.
Doing a volatile write in one thread and doing a volatile read of the same field in another thread establishes a happens before relationship between the two threads, and ensures visibility of the write.
This is a complicated problem, simply using a volatile keyword is probably not a good solution.
I think your understanding of it is correct. Given your description I would say it should be used. If its something more than a primitive type I would rather synchronize.
Good information on volatile:
http://www.javamex.com/tutorials/synchronization_volatile_when.shtml
If you have a mutable object in session, that is trouble. But usually the solution is not to guard individual fields; rather the entire object should be swapped.
Say you have the user object in the session. Most requests simply retrieve it, read it and display it.
There is a request that can modify user information. It would be a really bad idea to retrieve the user object, modify it. It's better to create complete new user object, and insert it into session.
In that case, fields in User don't need any protection; thread safety is guaranteed by session setAttribute() - getAttribute()
If you have concurrency issues, just adding 'volatile' probably won't help you.
As for keeping the object as an attribute of Session, I'd recommend you to keep just the object's ID, and use it to retrieve a 'live' instance when you need it (if you use Hibernate, successive retrieves will return the same object, so this shouldn't cause performance problems). Encapsulate all modification logic to this specific object into a single façade, and do the control concurrency there, using dababase locking.
Or, if you really, really, really want to use memory-based locking, and are really sure that you'll never have two instances of the application running in a cluster, make sure that your façade logic is synchronized at the right level. If your synchronization is too fine grained (low-level operations, such as volatile variables), it probably won't be enough to make your code thread-safe. For example, java.util.Hashtable is fully synchronized, but it doesn't mean anything if you have logic like this:
01 if (!hashtable.containsKey(key)) {
02 hashtable.put(key, calculate(key));
03 }
If two threads, say, t1 and t2, hit this block at the same time, t1 may execute line 01, then t2 may also execute 01, and then 02, and t1 then will execute 02, overwriting what t2 had done. The operations containsKey() and put() are atomic individually, but what should be atomic is the whole block.
Sometimes recalculating a value doesn't matter, but sometimes it does, and it will break.
When it comes to concurrency, there's no magic. I mean, seam some crappy frameworks try to sell you the idea that they solve this problem for you. They don't. Even if it works 99% of the time, it will break spectacularly when you go to production and start to get heavy traffic. Or (much, much) worse, it will silently generate wrong results.
Concurrency is one of the most complex problems in programming. And the only way to handle it is to avoid it. All this functional programming trend is not about dealing with concurrency, is about avoiding it altogether.
It turns out that volatile was not needed in the end. The problem that "appeared" to be fixed with volatile was actually a very subtle timing sensitive bug that was fixed in a much more elegant and proper way ;)
So sbrigdes was correct when he said "simply using a volatile keyword is probably not a good solution."

Categories

Resources