I have worker threads that generate objects and push them into a thread-safe Set. A processing thread periodically reads the Set and processes the elements.
While the object references themselves will be successfully retrieved from the set, the objects' variables are not thread-safe if accessed from the processing thread. Is there some pattern to do this, apart from making all the objects' internals volatile etc.? The objects may become more complex in the future, containing nested objects etc.
Assuming that no object will be externally modified by once placed into the Set, is there some way to "happens-before" whatever is currently in the Set before I begin processing it? The processing thread is already running and will not be created only after the Set has been populated.
The objects themselves are just data containers and have no inherent thread-safety. I can't make all the fields final since they may be modified multiple times before being placed into the Set.
If you have a thread safe set, this will establish happens before writes so you don't have to worry about whether the object is thread safe or not. This assumes that your producer doesn't alter or read the object after putting it in the collection.
If you make the objects immutable, this will make the relationship clearer, however I am assuming that once you pass the object to the shared storage, the writing thread no long alters the object and only the consuming thread reads or alters the object.
BTW I would pass the tasks via a queue using an ExecutorService as it is more efficient and written for you.
Volatile isn't quite the magic bullet in this case. Look at the possibility of switching to immutable objects for those passed between threads. Also, a threadsafe data structure that is queue based will give you better performance than most set implementations.
Related
What happens if an object is modified outside of the concurrent map in java?
Say, I have a concurrent hash map and in one thread I retrieve a value from that map and modify its state. Will the other threads see the modification without an additional synchronization?
The key concept in the Java Memory Model is the happens before relation. You can only rely on things happening between threads if there is a happens before relationship between the two of them.
In the case of ConcurrentHashMap, there is a happens-before relationship between a put and a subsequent get of the value for the same key: updating the value and putting it into the map happens before getting the value and reading its state. Because of that happens before relationship, the update happens before the reading of the state, so you will see the updated state.
So, if you update the state of an object, and then put it into the map, if you subsequently get it from the map, you are guaranteed to see that updated state (until such time as you put again).
But, if you have a reference to that object outside the context of a ConcurrentHashMap, there is no automatic happens-before relationship. You have to create that relationship for yourself.
One way of doing this is with synchronization (as in using synchronized, on the same object in all threads); other ways include:
writing and reading a volatile variable
using a Lock
putting the object into the map again, and then getting from the map before you start using it in the other thread.
Short answer is no.
A concurrent map will only synchronize the access to the map. That is, if one thread writes the map, all other threads can see that without additional synchronization.
If you retrieve an object from the map and modify it without synchronization, and if another thread retrieves the same object to read it, then you have a race without explicit synchronization between those threads.
I'm getting a little unsure about what to expect from Concurrent Collections (e.g. ConcurrentMap) regarding visibility of the data in the collection.
A: Thread1 puts a complex object and Thread2 gets it. Will all attributes be visible in Thread2?
B: Thread1 puts a complex object and later changes some attributes. Then Thread2 gets it, will all changes be visible in Thread2?
I guess B is false, and if so I should synchronize every access on the complex object?
Pushing to a concurrent collection is defined as publishing it. See "Memory Consistency Properties" in the Package description.
This means if you just change a stored object, you do not get automatically a happens before relationship. You would need to make those changes synchronied/volatile or using a concurrent primitive itself.
A: If the object is immutable or if the object is mutable but all the properties are set before the object is added to the collection then yes, they will be all visible.
B: If no synchronisation mechanisms are in place then it is not guaranteed, it depends when the thread 2 accesses the object.
If you need this sort of behaviour guaranteed (i.e. the reading thread to be guaranteed to see all the modifications made by the mutator thread in a transactional-like manner) I suggest you set up a semaphoring mechanism. Even better, it would be simpler if you use immutable objects.
My situation is that I have two threads. The 1st thread produces a number of objects which the 2nd thread does not have access to until all of them are created. After that the 2nd thread reads fields in those objects but does so concurrently with the 1st. At this point no thread is changing the values of the fields of the objects.
The objects are not synchronized. Should I synchronize them or not?
What I would recommend is to use an AtomicReference<Collection<SomeObject>>. The first thread would produce the collection of objects and do a reference.put(collection). The 2nd thread would see the objects (reference.get()) after they have been set on the AtomicReference only. Here are the javadocs for AtomicReference. You could also set your objects as an array or any type of collection such as List.
If is important to realize that after your set the collection (or array) on the AtomicReference you cannot make any changes to the collection. You can't add additional items, clear it, etc.. If you want true concurrent access to a collection of objects then you should look into ConcurrentHashMap and friends.
Should I synchronize them or not?
If the objects are not going to be mutated at all after they are put in your collection then you do not need to make them synchronized.
There's nothing wrong with reading data from multiple threads at the same time. Issues arise when you attempt to modify that data. So long as the objects are fully initialized and the values are such that the second thread receives the actual value (no issues with caching etc), there no problem with reading data from multiple threads concurrently.
I would like multiple threads to iterate through the elements in a LinkedList. I do not need to write into the LinkedList. Is it safe to do so? Or do I need a synchronized list to make it work?
Thank you!
They can do this safely, PROVIDED THAT:
they synchronize with (all of) the threads that have written the list BEFORE they start the iterations, and
no threads modify the list during the iterations.
The first point is necessary, because unless there is proper synchronization before you start, there is a possibility that one of the "writing" threads has unflushed changes for the list data structures in local cache memory or registers, or one of the reading threads has stale list state in its cache or registers.
(This is one of those cases where a solid understanding of the Java memory model is needed to know whether the scenario is truly thread-safe.)
Or do I need a synchronized list to make it work
You don't necessarily need to go that far. All you need to do is to ensure that there is a "happens-before" relationship at the appropriate point, and there are a variety of ways to achieve that. For instance, if the list is created and written by the writer thread, and the writer then passes the list to the reader thread objects before calling start() on them.
From the Java documentation:
Note that this implementation is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.
In other words, if you are truly just iterating through then you're alright, just be careful.
I am a bit confused with the requirements to synchronize the access the private instance variables in java.
I have an applicaion which executes scheduled tasks multithreaded. These tasks (instances of a class) have an instance variable that holds a value object. Further, these tasks have the run methods that execute the task by calling someother classes that hold the execution logic (they in turn use more value objects as part of the processing.)
Now at a high level it looks like all the parallel threads will spawn a chain of these tasks,instance variables , implementation classes and value objects. Do all these need to be made thread safe? all instance variables in all the possible classes and value objects that can be potentially invoked in parallel?
You need to make objects thread safe if multiple threads are going to access them at the same time and if their state is going to change.
It sounds like your task objects are not multi-threaded in that different threads won't access the same task. If that is true you wouldn't need to make your task objects thread safe.
Are the value objects mutable and are they shared in such a way that the same value object instance could be accessed by multiple threads at the same time? If either is yes then you need to make them thread safe.
The easiest way to make an object thread safe is to make it immutable. If its internal state can't change after the object is constructed then it is inherently thread safe. If you can't make your objects immutable then you need to synchronize access to any instance variables whose state could be changed.