Questions about using ThreadLocal in a Spring singleton scoped service

Questions about using ThreadLocal in a Spring singleton scoped service - java

In my singleton scoped service class below, all methods in the class require some user context that is known when Service.doA() is called. Instead of passing around info across methods, I was thinking about storing those values in TheadLocal. I have two questions about this approach:
1) Does the implementation below use ThreadLocal correctly? That is, it is thread-safe and the correct values will be read/written into the ThreadLocal?
2) Does ThreadLocal userInfo need to be cleaned up explicitly to prevent any memory leaks? Will it be garbage collected?
#Service
public class Service {
private static final ThreadLocal<UserInfo> userInfo = new ThreadLocal<>();
public void doA() {
// finds user info
userInfo.set(new UserInfo(userId, name));
doB();
doC();
}
private void doB() {
// needs user info
UserInfo userInfo = userInfo.get();
}
private void doC() {
// needs user info
UserInfo userInfo = userInfo.get();
}
}

1) The example code is ok, except for the name clashes in doB and doC where you're using the same name for the static variable referencing the ThreadLocal as you are for the local variable holding what you pull from the ThreadLocal.
2) The object you store in the ThreadLocal stays attached to that thread until explicitly removed. If your service executes in a servlet container, for instance, when a request completes its thread returns to the pool. If you haven't cleaned up the thread's ThreadLocal variable contents then that data will stick around to accompany whatever request the thread gets allocated for next. Each thread is a GC root, threadlocal variables attached to the thread won't get garbage-collected until after the thread dies. According to the API doc:
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).
If your context information is limited to the scope of one service, you're better off passing the information around through parameters rather than using ThreadLocal. ThreadLocal is for cases where information needs to be available across different services or in different layers, it seems like you're only overcomplicating your code if it will be used by only one service. Now if you have data that would be used by AOP advice on different disparate objects, putting that data in a threadlocal could be a valid usage.
To perform the clean-up typically you would identify a point where the thread is done with the current processing, for instance in a servlet filter, where the threadlocal variable can be removed before the thread is returned to the threadpool. You wouldn't use a try-finally block because the place where you insert the threadlocal object is nowhere near where you are cleaning it up.

When you use a ThreadLocal you need to make sure that you clean it up whatever happens because:
It creates somehow a memory leak as the value cannot be collected by the GC because an object is eligible for the GC if and only if there is no object anymore that has an hard reference directly or indirectly to the object. So for example here, your ThreadLocal instance has indirectly an hard reference to your value though its internal ThreadLocalMap, the only way to get rid of this hard reference is to call ThreadLocalMap#remove() as it will remove the value from the ThreadLocalMap. The other potential way to make your value eligible for the GC would be the case where your ThreadLocal instance is itself eligible for the GC but here it is a constant in the class Service so it will never be eligible for the GC which is what we want in your case. So the only expected way is to call ThreadLocalMap#remove().
It creates bugs hard to find because most of the time the thread that uses your ThreadLocal is part of a thread pool such that the thread will be reused for another request so if your ThreadLocal has not properly been cleaned up, the thread will reuse the instance of your object stored into your ThreadLocal that is potentially not even related to the new request which leads to complex bugs. So here for example we could get the result of a different user just because the ThreadLocal has not been cleaned up.
So the pattern is the following:
try {
userInfo.set(new UserInfo(userId, name));
// Some code here
} finally {
// Clean up your thread local whatever happens
userInfo.remove();
}
About thread safety, it is of course thread safe even if UserInfo is not thread safe because each thread will use its own instance of UserInfo so no instance of UserInfo stored into a ThreadLocal will be accessed or modified by several threads because a ThreadLocal value is somehow scoped the the current thread.

It definitely should be cleaned up after use. ThreadLocals leak memory extremely easily, both heap memory and permgen/metaspace memory via classloders leaking. In your case the best way would be:
public void doA() {
// finds user info
userInfo.set(new UserInfo(userId, name));
try {
doB();
doC();
} finally {
userInfo.remove()
}
}

Related

What is the use of ThreadLocal?

What is the use of ThreadLocal when a Thread normally works on variable keeping it in its local cache ?
Which means thread1 do not know the value of same var in thread2 even if no ThreadLocal is used .

With multiple threads, although you have to do work to make sure you read the "most recent" value of a variable, you expect there to be effectively one variable per instance (assuming we're talking about instance fields here). You might read an out of date value unless you're careful, but basically you've got one variable.
With ThreadLocal, you're explicitly wanting to have one value per thread that reads the variable. That's typically for the sake of context. For example, a web server with some authentication layer might set a thread-local variable early in request handling so that any code within the execution of that request can access the authentication details, without needing any explicit reference to a context object. So long as all the handling is done on the one thread, and that's the only thing that thread does, you're fine.

A thread doesn't have to keep variables in its local cache -- it's just that it's allowed to, unless you tell it otherwise.
So:
If you want to force a thread to share its state with other threads, you have to use synchronization of some sort (including synchronized blocks, volatile variables, etc).
If you want to prevent a thread from sharing its state with other threads, you have to use ThreadLocal (assuming the object that holds the variable is known to multiple threads -- if it's not, then everything is thread-local anyway!).

It's kind of a global variable for the thread itself, so that any code running in the thread can access it directly. (A "really" global variable can be accessed by any code running in the "process"; we could call it ProcessLocal:)
Is global variable bad? Maybe; it should be avoided if we can. But sometimes we have no choice, we cannot pass the object through method parameters, and ThreadLocal proves to be useful in many designs without causing too much trouble.

Use of ThreadLocal is when an object is not thread-safe, but you want to avoid synchronizing access. So each thread stores data on its own Thread local storage memory. By default, data is shared between threads.

How to a reference of a thread, and verification on ThreadLocal

I'm looking for verification on the following use of ThreadLocal.
I have a service, say ServiceA running on a set of processes, say processSetX in the system.
Which processSetX will be on ServiceA isn't known until runtime and may vary.
The processes in processSetX may run on different threads.
ServiceA has to recognize all processes in processSetX the same way.
For this, I'm supposed to write an ID value, say, of type String, to thread local storage (TLS) of a new thread and read this value later on when needed.
So, the ID of the first thread invoking ServiceA will be this ID for ServiceA to recognize them all. When this first thread starts another thread, it'll go onto this new thread's TLS and write this ID. From there on, every thread in this chain will pass this ID to the new one.
I'm looking to verify ThreadLocal is the way to work this.
I haven't used it before - I want to make sure.
TIA.
//==================
EDIT:
is there a way to get the calling thread's reference?
eg.:
a thread, say threadX is making a call to, say methodA(). is there a way for methodA() to know "who" is calling it?
if so - methodA() is able to invoke a getter method of threadX to read a value from its thread-local storage.
TIA
//=================
EDIT-2:
Thread.currentThread() returns something like Thread[main,5,main]. this may collide across threads.

I think at first, you just need a normal member variable.
For example:
// Thread
public class CalledThread extends Thread {
public String myId;
public void run() {
....
// Caller
CalledThread t = new CalledThread();
t.myId = "the thread ID";
t.start();
However, you won't be able to access myId once you start calling your service classes, so for that you could use a ThreadLocal.
In CalledThread assign the myId to your ThreadLocal in run.
threadLocal.set(myId)

Is initialization of objects Thread Safe in Java

I've written similar code to this in one of my applications but I'm not sure whether it is thread safe.
public class MyClass {
private MyObject myObject = new MyObject();
public void setObject(MyObject o) {
myObject = o;
}
public MyObject getObject() {
return myObject;
}
}
The setObject() and getObject() methods will be called by different threads. The getObject() method is going to be called by a thread that keeps drawing on a Canvas. For optimum FPS and smooth motion, I don't want that thread to be kept waiting for a synchronization lock. Hence, I want to avoid using synchronization unless it is really necessary. So is it really necessary here? Or is there any other better way to solve this problem?
And by the way, it doesn't matter if the thread receives an older copy of the object.

As for the status of your current version, it is definitely not thread-safe because concurrent access of myObject will establish a data race.
You didn't specify this, but if MyObject is not thread-safe itsef, then your program will not be thread-safe, regardless of what you do to the code you have shown.
it doesn't matter if the thread receives an older copy of the object.
The Java Memory Model allows much worse things than that to happen to objects accessed via a data race:
a thread may receive always the same object (the first one it happened to read);
a thread may observe the object with only some of the reachable values initialized (a torn object).
For optimum FPS and smooth motion, I don't want that thread to be kept waiting for a synchronization lock.
Have you spent any effort to actually measure how much time your threads are waiting for the lock? My guess: you didn't because that time is so short as to be undetectable.
However, your case doesn't even call for locks: just making your instance variable volatile will be enough to guarantee safe sharing of the object between threads.

No it's not thread safe - this could happen:
a thread may not see the latest version of myObject when calling getObject (which you can live with apparently)
a thread may never see any updates made to myObject by other threads when calling getObject
a thread may see an updated reference to a MyObject that is in a inconsistent state (partially constructed for example)
The easiest way to solve these issues is to mark myObject volatile.

You actually have a number of complications here:
myObject needs to be volatile. (Otherwise other threads may NEVER see changes).
The initial value of myObject will be fully constructed before MyClass is accessed so that is safe in this case, however in general you need to be careful about combining construction of objects and multi threading.

Yes, it is thread-safe under the said conditions, so long as you mark myObject as volatile. You will always get a correct MyObject instance from getObject().

You should make the shared variable volatile, to let a thread know that other threads/processes/etc may change its value.
Beside that, there's no concurrency issue in your code.
The second line, which creates an instance of MyObject when an instance of MyClass is created, is perfectly fine. No one will have access to the shared variable until the instance of MyObject is fully constructed (unless you leak the shared variable from within the constructor).
The setObject method is also fine - all it does is assign an object to the shared variable myObject. And since assignments are atomic, there's nothing to worry about.

ThreadLocal garbage collection

From javadoc
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).
from that it seems that objects referenced by a ThreadLocal variable are garbage collected only when thread dies. But what if ThreadLocal variable a is no more referenced and is subject for garbage collection? Will object references only by variable a be subject to garbage collection if thread that holds a is still alive?
for example there is following class with ThreadLocal variable:
public class Test {
private static final ThreadLocal a = ...; // references object b
}
This class references some object and this object has no other references to it. Then during context undeploy application classloader becomes a subject for garbage collection, but thread is from a thread pool so it does not die. Will object b be subject for garbage collection?

TL;DR : You cannot count on the value of a ThreadLocal being garbage collected when the ThreadLocal object is no longer referenced. You have to call ThreadLocal.remove or cause the thread to terminate
(Thanks to #Lii)
Detailed answer:
from that it seems that objects referenced by a ThreadLocal variable are garbage collected only when thread dies.
That is an over-simplification. What it actually says is two things:
The value of the variable won't be garbage collected while the thread is alive (hasn't terminated), AND the ThreadLocal object is strongly reachable.
The value will be subject to normal garbage collection rules when the thread terminates.
There is an important third case where the thread is still live but the ThreadLocal is no longer strongly reachable. That is not covered by the javadoc. Thus, the GC behaviour in that case is unspecified, and could potentially be different across different Java implementations.
In fact, for OpenJDK Java 6 through OpenJDK Java 8 (and other implementations derived from those code-bases) the actual behaviour is rather complicated. The values of a thread's thread-locals are held in a ThreadLocalMap object. The comments say this:
ThreadLocalMap is a customized hash map suitable only for maintaining thread local values. [...] To help deal with very large and long-lived usages, the hash table entries use WeakReferences for keys. However, since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space.
If you look at the code, stale map entries (with broken WeakReferences) may also be removed in other circumstances. If stale entry is encountered in a get, set, insert or remove operation on the map, the corresponding value is nulled. In some cases, the code does a partial scan heuristic, but the only situation where we can guarantee that all stale map entries are removed is when the hash table is resized (grows).
So ...
Then during context undeploy application classloader becomes a subject for garbage collection, but thread is from a thread pool so it does not die. Will object b be subject for garbage collection?
The best we can say is that it may be ... depending on how the application manages other thread locals the thread in question.
So yes, stale thread-local map entries could be a storage leak if you redeploy a webapp, unless the web container destroys and recreates all of the request threads in the thread pool. (You would hope that a web container would / could do that, but AFAIK it is not specified.)
The other alternative is to have your webapp's Servlets always clean up after themselves by calling ThreadLocal.remove on each one on completion (successful or otherwise) of each request.

ThreadLocal variables are hold in Thread
ThreadLocal.ThreadLocalMap threadLocals;
which is initialized lazily on first ThreadLocal.set/get invocation in the current thread and holds reference to the map until Thread is alive. However ThreadLocalMap uses WeakReferences for keys so its entries may be removed when ThreadLocal is referenced from nowhere else. See ThreadLocal.ThreadLocalMap javadoc for details

If the ThreadLocal itself is collected because it's not accessible anymore (there's an "and" in the quote), then all its content can eventually be collected, depending on whether it's also referenced somewhere else and other ThreadLocal manipulations happen on the same thread, triggering the removal of stale entries (see for example the replaceStaleEntry or expungeStaleEntry methods in ThreadLocalMap). The ThreadLocal is not (strongly) referenced by the threads, it references the threads: think of ThreadLocal<T> as a WeakHashMap<Thread, T>.
In your example, if the classloader is collected, it will unload the Test class as well (unless you have a memory leak), and the ThreadLocal a will be collected.

ThreadLocal contains a reference to a WeakHashMap that holds key-value pairs

It depends, it will not be garbage collected if your are referencing it as static or by singleton and your class is not unloaded, that is why in application server environment and with ThreadLocal values, you have to use some listener or request filter the be sure that you are dereferencing all thread local variables at the end of the request processing. Or either use some Request scope functionality of your framework.
You can look here for some other explanations.
EDIT: In the context of a thread pool as asked, of course if the Thread is garbaged thread locals are.

Object b will not be subject for garbage collection if it somehow refers to your Test class. It can happen without your intention. For example if you have a code like this:
public class Test {
private static final ThreadLocal<Set<Integer>> a =
new ThreadLocal<Set<Integer>>(){
#Override public Set<Integer> initialValue(){
return new HashSet<Integer>(){{add(5);}};
}
};
}
The double brace initialization {{add(5);}} will create an anonymous class which refers to your Test class so this object will never be garbage collected even if you don't have reference to your Test class anymore. If that Test class is used in a web app then it will refer to its class loader which will prevent all other classes to be GCed.
Moreover, if your b object is a simple object it will not be immediately subject for GC. Only when ThreadLocal.ThreadLocalMap in Thread class is resized you will have your object b subject for GC.
However I created a solution for this problem so when you redeploy your web app you will never have class loader leaks.

Why is java.lang.ThreadLocal a map on Thread instead on the ThreadLocal?

Naively, I expected a ThreadLocal to be some kind of WeakHashMap of Thread to the value type. So I was a little puzzled when I learned that the values of a ThreadLocal is actually saved in a map in the Thread. Why was it done that way? I would expect that the resource leaks associated with ThreadLocal would not be there if the values are saved in the ThreadLocal itself.
Clarification: I was thinking of something like
public class AlternativeThreadLocal<T> {
private final Map<Thread, T> values =
Collections.synchronizedMap(new WeakHashMap<Thread, T>());
public void set(T value) { values.put(Thread.currentThread(), value); }
public T get() { return values.get(Thread.currentThread());}
}
As far as I can see this would prevent the weird problem that neither the ThreadLocal nor it's left over values could ever be garbage-collected until the Thread dies if the value somehow strongly references the ThreadLocal itself.
(Probably the most devious form of this occurs when the ThreadLocal is a static variable on a class the value references. Now you have a big resource leak on redeployments in application servers since neither the objects nor their classes can be collected.)

Sometimes you get enlightened by just asking a question. :-) Now I just saw one possible answer: thread-safety. If the map with the values is in the Thread object, the insertion of a new value is trivially thread-safe. If the map is on the ThreadLocal you have the usual concurrency issues, which could slow things down. (Of course you would use a ReadWriteLock instead of synchronize, but the problem remains.)

You seem to be misunderstanding the problem of ThreadLocal leaks. ThreadLocal leaks occur when the same thread is used repeatedly, such as in a thread pool, and the ThreadLocal state is not cleared between usages. They're not a consequence of the ThreadLocal remaining when the Thread is destroyed, because nothing references the ThreadLocal Map aside from the thread itself.
Having a weakly reference map of Thread to thread-local objects would not prevent the ThreadLocal leak problem because the thread still exists in the thread pool, so the thread-local objects are not eligible for collection when the thread is reused from the pool. You'd still need to manually clear the ThreadLocal to avoid the leak.
As you said in your answer, concurrency control is simplified with the ThreadLocal Map being a single instance per thread. It also makes it impossible for one thread to access another's thread local objects, which might not be the case if the ThreadLocal object exposed an API on the Map you suggest.

I remember some years ago Sun changed the implementation of thread locals to its current form. I don't remember what version it was and what the old impl was like.
Anyway, for a variable that each thread should have a slot for, Thread is the natural container of choice. If we could, we would also add our thread local variable directly as a member of Thread class.

Why would the Map be on ThreadLocal? That doesn't make a lot of sense. So it'd be a Map of ThreadLocals to objects inside a ThreadLocal?
The simple reason it's a Map of Threads to Objects is because:
It's an implementation detail ie that Map isn't exposed in any way;
It's always easy to figure out the current thread (with Thread.currentThread()).
Also the idea is that a ThreadLocal can store a different value for each Thread that uses it so it makes sense that it is based on Thread, doesn't it?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.