I am a bit confused with the requirements to synchronize the access the private instance variables in java.
I have an applicaion which executes scheduled tasks multithreaded. These tasks (instances of a class) have an instance variable that holds a value object. Further, these tasks have the run methods that execute the task by calling someother classes that hold the execution logic (they in turn use more value objects as part of the processing.)
Now at a high level it looks like all the parallel threads will spawn a chain of these tasks,instance variables , implementation classes and value objects. Do all these need to be made thread safe? all instance variables in all the possible classes and value objects that can be potentially invoked in parallel?
You need to make objects thread safe if multiple threads are going to access them at the same time and if their state is going to change.
It sounds like your task objects are not multi-threaded in that different threads won't access the same task. If that is true you wouldn't need to make your task objects thread safe.
Are the value objects mutable and are they shared in such a way that the same value object instance could be accessed by multiple threads at the same time? If either is yes then you need to make them thread safe.
The easiest way to make an object thread safe is to make it immutable. If its internal state can't change after the object is constructed then it is inherently thread safe. If you can't make your objects immutable then you need to synchronize access to any instance variables whose state could be changed.
Related
What is the use of ThreadLocal when a Thread normally works on variable keeping it in its local cache ?
Which means thread1 do not know the value of same var in thread2 even if no ThreadLocal is used .
With multiple threads, although you have to do work to make sure you read the "most recent" value of a variable, you expect there to be effectively one variable per instance (assuming we're talking about instance fields here). You might read an out of date value unless you're careful, but basically you've got one variable.
With ThreadLocal, you're explicitly wanting to have one value per thread that reads the variable. That's typically for the sake of context. For example, a web server with some authentication layer might set a thread-local variable early in request handling so that any code within the execution of that request can access the authentication details, without needing any explicit reference to a context object. So long as all the handling is done on the one thread, and that's the only thing that thread does, you're fine.
A thread doesn't have to keep variables in its local cache -- it's just that it's allowed to, unless you tell it otherwise.
So:
If you want to force a thread to share its state with other threads, you have to use synchronization of some sort (including synchronized blocks, volatile variables, etc).
If you want to prevent a thread from sharing its state with other threads, you have to use ThreadLocal (assuming the object that holds the variable is known to multiple threads -- if it's not, then everything is thread-local anyway!).
It's kind of a global variable for the thread itself, so that any code running in the thread can access it directly. (A "really" global variable can be accessed by any code running in the "process"; we could call it ProcessLocal:)
Is global variable bad? Maybe; it should be avoided if we can. But sometimes we have no choice, we cannot pass the object through method parameters, and ThreadLocal proves to be useful in many designs without causing too much trouble.
Use of ThreadLocal is when an object is not thread-safe, but you want to avoid synchronizing access. So each thread stores data on its own Thread local storage memory. By default, data is shared between threads.
I have worker threads that generate objects and push them into a thread-safe Set. A processing thread periodically reads the Set and processes the elements.
While the object references themselves will be successfully retrieved from the set, the objects' variables are not thread-safe if accessed from the processing thread. Is there some pattern to do this, apart from making all the objects' internals volatile etc.? The objects may become more complex in the future, containing nested objects etc.
Assuming that no object will be externally modified by once placed into the Set, is there some way to "happens-before" whatever is currently in the Set before I begin processing it? The processing thread is already running and will not be created only after the Set has been populated.
The objects themselves are just data containers and have no inherent thread-safety. I can't make all the fields final since they may be modified multiple times before being placed into the Set.
If you have a thread safe set, this will establish happens before writes so you don't have to worry about whether the object is thread safe or not. This assumes that your producer doesn't alter or read the object after putting it in the collection.
If you make the objects immutable, this will make the relationship clearer, however I am assuming that once you pass the object to the shared storage, the writing thread no long alters the object and only the consuming thread reads or alters the object.
BTW I would pass the tasks via a queue using an ExecutorService as it is more efficient and written for you.
Volatile isn't quite the magic bullet in this case. Look at the possibility of switching to immutable objects for those passed between threads. Also, a threadsafe data structure that is queue based will give you better performance than most set implementations.
I have been reading about threadlocal and scenarios where it is useful.
I like the concept but was wondering how is it different from cloning?
So a threadlocal will return a new copy of a variable which means that we donot have to use synchronization. A good example is SimpleDateFormat object which is not thread safe and ThreadLocal provides a good way to use.
But why can't we simply create a new copy of varibale use clone ?
What is the value add provided by ThreadLocal class as compared to cloning?
ThreadLocal is not a replacement for synchronization or thread-safe object access. If the same object is assigned to a ThreadLocal from different threads, then the program is no more thread-safe than it was before: the same object will still be shared among the different threads.
ThreadLocal acts-like a variable; that is, it "names" or "refers to" an object:
[ThreadLocal] provides thread-local variables [.. such that] each thread that accesses one (via its get or set method) has its own, independently initialized copy of the variable.
That is, what ThreadLocal does is it provides get/set isolation between threads that use the same ThreadLocal object. So each thread could assign/retrieve its own different object to the ThreadLocal; but this would still require a "clone" or new instantiation to assign the different objects to begin with!
Remember, an assignment (or method invocation) never creates an implicit clone/copy/duplicate of an object - and this extends to ThreadLocal.
By using ThreadLocal you create as many variables as there are threads, without the need for any further checking. Remember however, that the storage itself does not guarantee thread-safety. You must make sure that each object stored in local storage is used only from that thread!
Should you clone objects manually, you would have to clone an object every time it is used, or check in which thread we are and then clone.
Besides - is cloning operation thread-safe? What would happen if two different threads attempted to clone an object? I actually do not know, but I think that it would not be good practice.
Using ThreadLocal is faster, the SimpleDateFormat instance stored in a ThreadLocal can be reused multiple times in the same thread, while cloning means creating a new object every time.
StringBuffer class having methods which are thread safe? OK but i have question that when the particular method will be called then it will be loaded on to stack and stack is thread safe so why we need the thread safe method?
It's quite possible to share a given StringBuffer instance across different threads in which case multiple threads will end up "modifying" or mutating the StringBuffer's internal state. This is why it's required to explicitly synchronize append methods on a StringBuffer.
But you are right. If you don't plan on sharing stuff across thread boundaries (or like they call "publish" the instance), it is more logical to just create a StringBuilder instance (which is the non-synchronized brother of StringBuffer) in a given method call and throw it away (or more like let the GC take care of it) after the method call ends.
There is another aspect which comes into play when you absolutely have to share instances across threads and at the same time feel that the cost of synchronizing each operation is way too much -- thread locals. Basically, the idea in this case is to make each thread have its own copy of a "mutable" entity. There is no locking required because the moment some other thread tries to access a thread local variable, you hand across a fresh/pre-configured instance. This is commonly used for stuff like sharing StringBuilder and DateFormat instances to boost performance.
If you want to compare between raw/unsafe sharing of a mutable object between threads versus using a thread local, take a look at the snippet I have hosted on Bitbucket.
My situation is that I have two threads. The 1st thread produces a number of objects which the 2nd thread does not have access to until all of them are created. After that the 2nd thread reads fields in those objects but does so concurrently with the 1st. At this point no thread is changing the values of the fields of the objects.
The objects are not synchronized. Should I synchronize them or not?
What I would recommend is to use an AtomicReference<Collection<SomeObject>>. The first thread would produce the collection of objects and do a reference.put(collection). The 2nd thread would see the objects (reference.get()) after they have been set on the AtomicReference only. Here are the javadocs for AtomicReference. You could also set your objects as an array or any type of collection such as List.
If is important to realize that after your set the collection (or array) on the AtomicReference you cannot make any changes to the collection. You can't add additional items, clear it, etc.. If you want true concurrent access to a collection of objects then you should look into ConcurrentHashMap and friends.
Should I synchronize them or not?
If the objects are not going to be mutated at all after they are put in your collection then you do not need to make them synchronized.
There's nothing wrong with reading data from multiple threads at the same time. Issues arise when you attempt to modify that data. So long as the objects are fully initialized and the values are such that the second thread receives the actual value (no issues with caching etc), there no problem with reading data from multiple threads concurrently.