I'm getting a little unsure about what to expect from Concurrent Collections (e.g. ConcurrentMap) regarding visibility of the data in the collection.
A: Thread1 puts a complex object and Thread2 gets it. Will all attributes be visible in Thread2?
B: Thread1 puts a complex object and later changes some attributes. Then Thread2 gets it, will all changes be visible in Thread2?
I guess B is false, and if so I should synchronize every access on the complex object?
Pushing to a concurrent collection is defined as publishing it. See "Memory Consistency Properties" in the Package description.
This means if you just change a stored object, you do not get automatically a happens before relationship. You would need to make those changes synchronied/volatile or using a concurrent primitive itself.
A: If the object is immutable or if the object is mutable but all the properties are set before the object is added to the collection then yes, they will be all visible.
B: If no synchronisation mechanisms are in place then it is not guaranteed, it depends when the thread 2 accesses the object.
If you need this sort of behaviour guaranteed (i.e. the reading thread to be guaranteed to see all the modifications made by the mutator thread in a transactional-like manner) I suggest you set up a semaphoring mechanism. Even better, it would be simpler if you use immutable objects.
Related
In the book Java Concurrency in Practice by Brian Goetz et al.:
If you do not ensure that publishing the shared reference happens-before another thread loads that shared reference, then the write of the reference to the new object can be reordered (from the perspective of the thread consuming the object) with writes to its fields. In that case, another thread could see an up-to-date value for the object reference but out-of-date values for some or all of that object's state-a partially constructed object.
Does this mean that: in the thread publishing the object, the write of the reference to the new object is not reordered with writes to its fields; the write to its fields happens before the write of the reference. However, that publishing thread may flush the updated reference to main memory before it flushes the updated object fields. Therefore, the thread consuming the object may see a non-null reference for the object, yet see outdated values for the object fields? And in that sense, the operations are reordered for the consuming thread.
Yes.
The answer to your question is right there in the paragraph that you quoted, and you seem to echo the answer in your question.
One comment though: You said that, "[the] publishing thread may flush the updated reference to main memory before it flushes the updated object fields." If you're talking about Java code, then it's best to stick with what is written in the Java Language Specification (JLS).
The JLS tells you how a Java program is allowed to behave. It says nothing about "main memory" or "caches" or "flushing." It only says that without explicit synchronization, the updates that one thread performs in a certain order on two or more variables may seem to have happened in a different order when viewed from the perspective of some other thread. How or why that can happen is "implementation details."
in the thread publishing the object, the write of the reference to the new object is not reordered with writes to its fields; the write to its fields happens before the write of the reference.
Yes. Because in one single thread this process happens in Program Order which doesn't allow reordering: "If x and y are actions of the same thread and x comes before y in program order, then hb(x, y)." (https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5). We may rephrase you a bit: "the write of the reference to the new object is not reordered with writes to its fields", which means that if you read the reference to an object, it is guaranteed that you will read all its fields consecutively.
the thread consuming the object may see a non-null reference for the object, yet see outdated values for the object fields?
Yes, it may when you publish the object in unsafe manner, without appropriate HB edges implemented with memory barriers. Literally speaking, in absence of the HB/membars you get undefined behavior. This means that in other thread you can see/read anything (except out-of-thin-air (OoTA) values, explicitly forbidden by JMM). Safe publication makes all the values written before the publication visible to all readers that observed the published object. There are few most popular and simple ways to make the publication safe:
Publish the reference through a properly locked field (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.5)
Use the static initializer to do the initializing stores (http://docs.oracle.com/javase/specs/jls/se8/html/jls-12.html#jls-12.4)
Publish the reference via a volatile field (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field, which leads to the freeze action (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.5).
You also can use other actions which produces HBs like Thread.start() etc., but my day-to-day favorites are:
the final fields for immutable data
volatile/AtomicXXX fields and locks (explicit synchronized block/ReadWriteLock, implicit locks in BlockingQueue) for mutable data.
What happens if an object is modified outside of the concurrent map in java?
Say, I have a concurrent hash map and in one thread I retrieve a value from that map and modify its state. Will the other threads see the modification without an additional synchronization?
The key concept in the Java Memory Model is the happens before relation. You can only rely on things happening between threads if there is a happens before relationship between the two of them.
In the case of ConcurrentHashMap, there is a happens-before relationship between a put and a subsequent get of the value for the same key: updating the value and putting it into the map happens before getting the value and reading its state. Because of that happens before relationship, the update happens before the reading of the state, so you will see the updated state.
So, if you update the state of an object, and then put it into the map, if you subsequently get it from the map, you are guaranteed to see that updated state (until such time as you put again).
But, if you have a reference to that object outside the context of a ConcurrentHashMap, there is no automatic happens-before relationship. You have to create that relationship for yourself.
One way of doing this is with synchronization (as in using synchronized, on the same object in all threads); other ways include:
writing and reading a volatile variable
using a Lock
putting the object into the map again, and then getting from the map before you start using it in the other thread.
Short answer is no.
A concurrent map will only synchronize the access to the map. That is, if one thread writes the map, all other threads can see that without additional synchronization.
If you retrieve an object from the map and modify it without synchronization, and if another thread retrieves the same object to read it, then you have a race without explicit synchronization between those threads.
I have worker threads that generate objects and push them into a thread-safe Set. A processing thread periodically reads the Set and processes the elements.
While the object references themselves will be successfully retrieved from the set, the objects' variables are not thread-safe if accessed from the processing thread. Is there some pattern to do this, apart from making all the objects' internals volatile etc.? The objects may become more complex in the future, containing nested objects etc.
Assuming that no object will be externally modified by once placed into the Set, is there some way to "happens-before" whatever is currently in the Set before I begin processing it? The processing thread is already running and will not be created only after the Set has been populated.
The objects themselves are just data containers and have no inherent thread-safety. I can't make all the fields final since they may be modified multiple times before being placed into the Set.
If you have a thread safe set, this will establish happens before writes so you don't have to worry about whether the object is thread safe or not. This assumes that your producer doesn't alter or read the object after putting it in the collection.
If you make the objects immutable, this will make the relationship clearer, however I am assuming that once you pass the object to the shared storage, the writing thread no long alters the object and only the consuming thread reads or alters the object.
BTW I would pass the tasks via a queue using an ExecutorService as it is more efficient and written for you.
Volatile isn't quite the magic bullet in this case. Look at the possibility of switching to immutable objects for those passed between threads. Also, a threadsafe data structure that is queue based will give you better performance than most set implementations.
My situation is that I have two threads. The 1st thread produces a number of objects which the 2nd thread does not have access to until all of them are created. After that the 2nd thread reads fields in those objects but does so concurrently with the 1st. At this point no thread is changing the values of the fields of the objects.
The objects are not synchronized. Should I synchronize them or not?
What I would recommend is to use an AtomicReference<Collection<SomeObject>>. The first thread would produce the collection of objects and do a reference.put(collection). The 2nd thread would see the objects (reference.get()) after they have been set on the AtomicReference only. Here are the javadocs for AtomicReference. You could also set your objects as an array or any type of collection such as List.
If is important to realize that after your set the collection (or array) on the AtomicReference you cannot make any changes to the collection. You can't add additional items, clear it, etc.. If you want true concurrent access to a collection of objects then you should look into ConcurrentHashMap and friends.
Should I synchronize them or not?
If the objects are not going to be mutated at all after they are put in your collection then you do not need to make them synchronized.
There's nothing wrong with reading data from multiple threads at the same time. Issues arise when you attempt to modify that data. So long as the objects are fully initialized and the values are such that the second thread receives the actual value (no issues with caching etc), there no problem with reading data from multiple threads concurrently.
Assuming that I have the following code:
final Catalog catalog = createCatalog();
for (int i = 0; i< 100; i++{
new Thread(new CatalogWorker(catalog)).start();
}
"Catalog" is an object structure, and the method createCatalog() and the "Catalog" object structure has not been written with concurrency in mind. There are several non-final, non-volatile references within the product catalog, there may even be mutable state (but that's going to have to be handled)
The way I understand the memory model, this code is not thread-safe. Is there any simple way to make it safe ? (The generalized version of this problem is really about single-threaded construction of shared structures that are created before the threads explode into action)
No, there's no simple way to make it safe. Concurrent use of mutable data types is always tricky. In some situations, making each operation on Catalog synchronized (preferably on a privately-held lock) may work, but usually you'll find that a thread actually wants to perform multiple operations without risking any other threads messing around with things.
Just synchronizing every access to variables should be enough to make the Java memory model problem less relevant - you would always see the most recent values, for example - but the bigger problem itself is still significant.
Any immutable state in Catalog should be fine already: there's a "happens-before" between the construction of the Catalog and the new thread being started. From section 17.4.5 of the spec:
A call to start() on a thread
happens-before any actions in the
started thread.
(And the construction finishing happens before the call to start(), so the construction happens before any actions in the started thread.)
You need to synchronize every method that changes the state of Catalog to make it thread-safe.
public synchronized <return type> method(<parameter list>){
...
}
Assuming you handle the "non-final, non-volatile references [and] mutable state" (presumably by not actually mutating anything while these threads are running) then I believe this is thread-safe. From the JSR-133 FAQ:
When one action happens before
another, the first is guaranteed to be
ordered before and visible to the
second. The rules of this ordering are
as follows:
Each action in a thread happens before every action in that thread
that comes later in the program's
order.
An unlock on a monitor happens before every subsequent lock on that
same monitor.
A write to a volatile field happens before every subsequent read
of that same volatile.
A call to start() on a thread happens before any actions in the
started thread.
All actions in a thread happen before any other thread successfully
returns from a join() on that thread.
Since the threads are started after the call to createCatalog, the result of createCatalog should be visible to those threads without any problems. It's only changes to the Catalog objects that occur after start() is called on the thread that would cause trouble.