ThreadLocal garbage collection - java

From javadoc
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).
from that it seems that objects referenced by a ThreadLocal variable are garbage collected only when thread dies. But what if ThreadLocal variable a is no more referenced and is subject for garbage collection? Will object references only by variable a be subject to garbage collection if thread that holds a is still alive?
for example there is following class with ThreadLocal variable:
public class Test {
private static final ThreadLocal a = ...; // references object b
}
This class references some object and this object has no other references to it. Then during context undeploy application classloader becomes a subject for garbage collection, but thread is from a thread pool so it does not die. Will object b be subject for garbage collection?

TL;DR : You cannot count on the value of a ThreadLocal being garbage collected when the ThreadLocal object is no longer referenced. You have to call ThreadLocal.remove or cause the thread to terminate
(Thanks to #Lii)
Detailed answer:
from that it seems that objects referenced by a ThreadLocal variable are garbage collected only when thread dies.
That is an over-simplification. What it actually says is two things:
The value of the variable won't be garbage collected while the thread is alive (hasn't terminated), AND the ThreadLocal object is strongly reachable.
The value will be subject to normal garbage collection rules when the thread terminates.
There is an important third case where the thread is still live but the ThreadLocal is no longer strongly reachable. That is not covered by the javadoc. Thus, the GC behaviour in that case is unspecified, and could potentially be different across different Java implementations.
In fact, for OpenJDK Java 6 through OpenJDK Java 8 (and other implementations derived from those code-bases) the actual behaviour is rather complicated. The values of a thread's thread-locals are held in a ThreadLocalMap object. The comments say this:
ThreadLocalMap is a customized hash map suitable only for maintaining thread local values. [...] To help deal with very large and long-lived usages, the hash table entries use WeakReferences for keys. However, since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space.
If you look at the code, stale map entries (with broken WeakReferences) may also be removed in other circumstances. If stale entry is encountered in a get, set, insert or remove operation on the map, the corresponding value is nulled. In some cases, the code does a partial scan heuristic, but the only situation where we can guarantee that all stale map entries are removed is when the hash table is resized (grows).
So ...
Then during context undeploy application classloader becomes a subject for garbage collection, but thread is from a thread pool so it does not die. Will object b be subject for garbage collection?
The best we can say is that it may be ... depending on how the application manages other thread locals the thread in question.
So yes, stale thread-local map entries could be a storage leak if you redeploy a webapp, unless the web container destroys and recreates all of the request threads in the thread pool. (You would hope that a web container would / could do that, but AFAIK it is not specified.)
The other alternative is to have your webapp's Servlets always clean up after themselves by calling ThreadLocal.remove on each one on completion (successful or otherwise) of each request.

ThreadLocal variables are hold in Thread
ThreadLocal.ThreadLocalMap threadLocals;
which is initialized lazily on first ThreadLocal.set/get invocation in the current thread and holds reference to the map until Thread is alive. However ThreadLocalMap uses WeakReferences for keys so its entries may be removed when ThreadLocal is referenced from nowhere else. See ThreadLocal.ThreadLocalMap javadoc for details

If the ThreadLocal itself is collected because it's not accessible anymore (there's an "and" in the quote), then all its content can eventually be collected, depending on whether it's also referenced somewhere else and other ThreadLocal manipulations happen on the same thread, triggering the removal of stale entries (see for example the replaceStaleEntry or expungeStaleEntry methods in ThreadLocalMap). The ThreadLocal is not (strongly) referenced by the threads, it references the threads: think of ThreadLocal<T> as a WeakHashMap<Thread, T>.
In your example, if the classloader is collected, it will unload the Test class as well (unless you have a memory leak), and the ThreadLocal a will be collected.

ThreadLocal contains a reference to a WeakHashMap that holds key-value pairs

It depends, it will not be garbage collected if your are referencing it as static or by singleton and your class is not unloaded, that is why in application server environment and with ThreadLocal values, you have to use some listener or request filter the be sure that you are dereferencing all thread local variables at the end of the request processing. Or either use some Request scope functionality of your framework.
You can look here for some other explanations.
EDIT: In the context of a thread pool as asked, of course if the Thread is garbaged thread locals are.

Object b will not be subject for garbage collection if it somehow refers to your Test class. It can happen without your intention. For example if you have a code like this:
public class Test {
private static final ThreadLocal<Set<Integer>> a =
new ThreadLocal<Set<Integer>>(){
#Override public Set<Integer> initialValue(){
return new HashSet<Integer>(){{add(5);}};
}
};
}
The double brace initialization {{add(5);}} will create an anonymous class which refers to your Test class so this object will never be garbage collected even if you don't have reference to your Test class anymore. If that Test class is used in a web app then it will refer to its class loader which will prevent all other classes to be GCed.
Moreover, if your b object is a simple object it will not be immediately subject for GC. Only when ThreadLocal.ThreadLocalMap in Thread class is resized you will have your object b subject for GC.
However I created a solution for this problem so when you redeploy your web app you will never have class loader leaks.

Related

Class level, instance level and local ThreadLocals

I understand how class level thread locals makes sense. Being associated with thread, we need thread locals to be shared among different instances and classes across that thread. So we need to make them class level. If we want to share thread local across different instances of same class, we can make them private static. If we want to share thread local across different classes, we can make them public static.
Q0. correct me if am wrong with above
My doubts are about instance scoped (non-static) thread locals and local (defined inside some method) thread locals:
Q1. Is there any valid use case for instance scoped (non-static) thread locals?
Q2. Is there any valid use case for local (defined inside some method) thread locals?
Q3. Are instance scoped (non-static) thread locals deleted when an instance is garbage collected?
Q4. Are local (defined inside some method) thread locals deleted when method returns?
ThreadLocal when implemented correctly as a static variable acts essentially as an instance variable for all threads that have access to it. Even though there's a single ThreadLocal variable, the mechanism makes it so that each thread has its own instance of the value in it.
Therefore
Q1. No, it doesn't make sense to have an instance scoped ThreadLocal. This doesn't mean you couldn't write code that would use an instance scoped TL, but you would need to keep track (in your developer mind) of both the instance and the thread being used for correct functionality, that even if you would find a use case that the code would solve, there would be a lot better way to handle it.
Q2. No. As a local variable can never have more than a single thread access it, it would not differ from a regular local variable.
Q3. The ThreadLocal<> wrapper becomes unreachable, but the actual variable is still contained in the thread's map, as you correctly said. This causes a resource/memory leak, as it can't be cleared until the thread stops.
Q4. Same as with Q3, if you lose the wrapper reference, it's an instant leak. If you assign the reference somewhere, it's just weird programming. A method local ThreadLocal variable would be extremely worrying code.
The class is not something you'd want to use too much anyway in modern code (or even older code), and it's not compatible with reactive programming, but if you do use it the usage is straight-forward. A single ThreadLocal most easily implemented as a class level variable.
Q2. Is there any valid use case for local (defined inside some method) thread locals?
First, lets's just be clear. If you say "a local Foobar" (for any class Foobar), then it's not entirely clear what you are talking about. Variables can be "class level" (i.e., static) or "instance level," or "local;" but a Foobar instance is not a variable. The variables in a Java program can only refer to Foobar instances that are allocated on the heap. It's very easy, and very common to have more than one variable in a program refer to the same instance.
ThreadLocal is a class, and instances of ThreadLocal are objects on the heap. The same ThreadLocal object could be referenced by a static ThreadLocal variable and also, at the same time, referenced by local variables in one or more threads.
When you say "a local ThreadLocal," you could be talking about a local variable that holds a reference to a ThreadLocal instance that is shared with other threads, -OR- you could be talking about a ThreadLocal instance that is only referenced by one local variable. The second case would not make any sense because that instance could not be shared by multiple threads.
Q1. Is there any valid use case for instance scoped (non-static) thread locals?
Maybe so, but I would call it a "code smell." (That is, a reason to look closely at the code and see whether it could be better organized.) I personally would never use ThreadLocal in new code. The only times I have ever used it is, while porting older, single-threaded code into a multi-threaded system; and when I did it, the variables in question always were static (i.e., class level) variables.
I personally try never to use static in new code except in cases where some function is clearly labelled as returning a reference to a "singleton object."
Q3., Q4. [...when are instances deleted...]?
An instance will be eligible to be deleted when there is no "live" variable in the program that refers to it. One way that can happen is if the only variable that refers to it is a local variable of some function, and that function returns. A second way it can happen is if the only variable that refers to the instance is assigned to refer to some other instance. A third way is if the only references to it are from instance variables of other objects, and all of those other objects are themselves, eligible to be deleted.
In most Java implementations, the instance will not be immediately deleted when it becomes eligible for deletion. The actual deletion will happen some time later. When, depends on the strategies employed by the JRE's garbage collector and on the patterns of object use by the program.

Questions about using ThreadLocal in a Spring singleton scoped service

In my singleton scoped service class below, all methods in the class require some user context that is known when Service.doA() is called. Instead of passing around info across methods, I was thinking about storing those values in TheadLocal. I have two questions about this approach:
1) Does the implementation below use ThreadLocal correctly? That is, it is thread-safe and the correct values will be read/written into the ThreadLocal?
2) Does ThreadLocal userInfo need to be cleaned up explicitly to prevent any memory leaks? Will it be garbage collected?
#Service
public class Service {
private static final ThreadLocal<UserInfo> userInfo = new ThreadLocal<>();
public void doA() {
// finds user info
userInfo.set(new UserInfo(userId, name));
doB();
doC();
}
private void doB() {
// needs user info
UserInfo userInfo = userInfo.get();
}
private void doC() {
// needs user info
UserInfo userInfo = userInfo.get();
}
}
1) The example code is ok, except for the name clashes in doB and doC where you're using the same name for the static variable referencing the ThreadLocal as you are for the local variable holding what you pull from the ThreadLocal.
2) The object you store in the ThreadLocal stays attached to that thread until explicitly removed. If your service executes in a servlet container, for instance, when a request completes its thread returns to the pool. If you haven't cleaned up the thread's ThreadLocal variable contents then that data will stick around to accompany whatever request the thread gets allocated for next. Each thread is a GC root, threadlocal variables attached to the thread won't get garbage-collected until after the thread dies. According to the API doc:
Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).
If your context information is limited to the scope of one service, you're better off passing the information around through parameters rather than using ThreadLocal. ThreadLocal is for cases where information needs to be available across different services or in different layers, it seems like you're only overcomplicating your code if it will be used by only one service. Now if you have data that would be used by AOP advice on different disparate objects, putting that data in a threadlocal could be a valid usage.
To perform the clean-up typically you would identify a point where the thread is done with the current processing, for instance in a servlet filter, where the threadlocal variable can be removed before the thread is returned to the threadpool. You wouldn't use a try-finally block because the place where you insert the threadlocal object is nowhere near where you are cleaning it up.
When you use a ThreadLocal you need to make sure that you clean it up whatever happens because:
It creates somehow a memory leak as the value cannot be collected by the GC because an object is eligible for the GC if and only if there is no object anymore that has an hard reference directly or indirectly to the object. So for example here, your ThreadLocal instance has indirectly an hard reference to your value though its internal ThreadLocalMap, the only way to get rid of this hard reference is to call ThreadLocalMap#remove() as it will remove the value from the ThreadLocalMap. The other potential way to make your value eligible for the GC would be the case where your ThreadLocal instance is itself eligible for the GC but here it is a constant in the class Service so it will never be eligible for the GC which is what we want in your case. So the only expected way is to call ThreadLocalMap#remove().
It creates bugs hard to find because most of the time the thread that uses your ThreadLocal is part of a thread pool such that the thread will be reused for another request so if your ThreadLocal has not properly been cleaned up, the thread will reuse the instance of your object stored into your ThreadLocal that is potentially not even related to the new request which leads to complex bugs. So here for example we could get the result of a different user just because the ThreadLocal has not been cleaned up.
So the pattern is the following:
try {
userInfo.set(new UserInfo(userId, name));
// Some code here
} finally {
// Clean up your thread local whatever happens
userInfo.remove();
}
About thread safety, it is of course thread safe even if UserInfo is not thread safe because each thread will use its own instance of UserInfo so no instance of UserInfo stored into a ThreadLocal will be accessed or modified by several threads because a ThreadLocal value is somehow scoped the the current thread.
It definitely should be cleaned up after use. ThreadLocals leak memory extremely easily, both heap memory and permgen/metaspace memory via classloders leaking. In your case the best way would be:
public void doA() {
// finds user info
userInfo.set(new UserInfo(userId, name));
try {
doB();
doC();
} finally {
userInfo.remove()
}
}

Run thread until object/parent is dereferenced

I have a Java class which needs a monitor running parallel when instantiated. I want to keep running this monitor until the instance is not running any more or it is not referenced.
Usually I tend to use a active flag as a variable, which is closed when the class is shutdown/closed, however this has to be managed carefully and it has to be called when closing.
I am also aware of the finalize member of Object but as I remember it is not safe to use it or is it for this purpose?
Additionally a monitor might have circular references to the monitored object of course, but this might be an other issue.
You could like the object to be monitored in the thread using a WeakReference. This allows the garbage collector to collect and destroy the object.
In the thread you would have to check each time if the referenced object still exists every time you perform your checks. If it no longer exists you can safely exit the thread.
As the garbage collector does not immediately destroy objects there may be an unknown time span where the tread is still active but the monitored object is no longer used.

How to properly dispose of ThreadLocal variables?

What is the cleanest way to dispose of ThreadLocal variables so that they are subject to garbage collection? I read from the docs that:
...after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).
But sometimes threads can be pooled or are not expected to die. Does the ThreadLocal#remove() method actually make the value subject to garbage collection?
ThreadLocal.remove() is indeed removing a reference to the value... and if there is no more other living reference to it : the value will be soon garbage collected.
When the thread died, the thread is removed form the GC-root... therefore the entry for the thread in the ThreadLocal is subject to GC... therefore the value for this entry in the ThreadLocal is subject to GC. But once again, if you have another living ref to the value : it won't be garbage collected.
If the thread is reused (because part of a pool or ...) : it is important to call remove() so that the value can be garbage collected, but also to avoid unexpected behavior when a new job is executed on a recycled thread (the new job don't need to know the value used by the previous job)

Why is java.lang.ThreadLocal a map on Thread instead on the ThreadLocal?

Naively, I expected a ThreadLocal to be some kind of WeakHashMap of Thread to the value type. So I was a little puzzled when I learned that the values of a ThreadLocal is actually saved in a map in the Thread. Why was it done that way? I would expect that the resource leaks associated with ThreadLocal would not be there if the values are saved in the ThreadLocal itself.
Clarification: I was thinking of something like
public class AlternativeThreadLocal<T> {
private final Map<Thread, T> values =
Collections.synchronizedMap(new WeakHashMap<Thread, T>());
public void set(T value) { values.put(Thread.currentThread(), value); }
public T get() { return values.get(Thread.currentThread());}
}
As far as I can see this would prevent the weird problem that neither the ThreadLocal nor it's left over values could ever be garbage-collected until the Thread dies if the value somehow strongly references the ThreadLocal itself.
(Probably the most devious form of this occurs when the ThreadLocal is a static variable on a class the value references. Now you have a big resource leak on redeployments in application servers since neither the objects nor their classes can be collected.)
Sometimes you get enlightened by just asking a question. :-) Now I just saw one possible answer: thread-safety. If the map with the values is in the Thread object, the insertion of a new value is trivially thread-safe. If the map is on the ThreadLocal you have the usual concurrency issues, which could slow things down. (Of course you would use a ReadWriteLock instead of synchronize, but the problem remains.)
You seem to be misunderstanding the problem of ThreadLocal leaks. ThreadLocal leaks occur when the same thread is used repeatedly, such as in a thread pool, and the ThreadLocal state is not cleared between usages. They're not a consequence of the ThreadLocal remaining when the Thread is destroyed, because nothing references the ThreadLocal Map aside from the thread itself.
Having a weakly reference map of Thread to thread-local objects would not prevent the ThreadLocal leak problem because the thread still exists in the thread pool, so the thread-local objects are not eligible for collection when the thread is reused from the pool. You'd still need to manually clear the ThreadLocal to avoid the leak.
As you said in your answer, concurrency control is simplified with the ThreadLocal Map being a single instance per thread. It also makes it impossible for one thread to access another's thread local objects, which might not be the case if the ThreadLocal object exposed an API on the Map you suggest.
I remember some years ago Sun changed the implementation of thread locals to its current form. I don't remember what version it was and what the old impl was like.
Anyway, for a variable that each thread should have a slot for, Thread is the natural container of choice. If we could, we would also add our thread local variable directly as a member of Thread class.
Why would the Map be on ThreadLocal? That doesn't make a lot of sense. So it'd be a Map of ThreadLocals to objects inside a ThreadLocal?
The simple reason it's a Map of Threads to Objects is because:
It's an implementation detail ie that Map isn't exposed in any way;
It's always easy to figure out the current thread (with Thread.currentThread()).
Also the idea is that a ThreadLocal can store a different value for each Thread that uses it so it makes sense that it is based on Thread, doesn't it?

Categories

Resources