Whether ThreadLocal value is GCed in Java ThreadPool?

Whether ThreadLocal value is GCed in Java ThreadPool? - java

In my code, I want to test ThreadLocal's GC strategy. I use two methods. One is ThreadPool, the other is a self-created thread. In the first scenarios, JVM doesn't GC Thread's ThreadLocalMap seemly(No finalize() output). The other works well.
I've found. In October 2007, Josh Bloch (co-author of java.lang.ThreadLocal along with Doug Lea) wrote:
"The use of thread pools demands extreme care. Sloppy use of thread
pools in combination with sloppy use of thread locals can cause
unintended object retention, as has been noted in many places."
I guess ThreadPool may be dangerous to use ThreadLocal.
Here is my code(JDK8 environment)
public class ThreadLocalDemo_Gc {
static volatile ThreadLocal<SimpleDateFormat> tl = new ThreadLocal<SimpleDateFormat>(){
// overwrite finalize, such that the message will be printed when GC happens.
protected void finalize() throws Throwable{
System.out.println(this.toString() + " is gc(threadlocal)");
}
};
// Let the main thread wait for all workers.
static volatile CountDownLatch cd = new CountDownLatch(10);
public static class ParseDate implements Runnable{
int i = 0;
public ParseDate(int i) {
super();
this.i = i;
}
#Override
public void run() {
try {
if(tl.get() == null){
tl.set(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"){
// overwrite finalize, such that the message will be printed when GC happens.
protected void finalize() throws Throwable {
System.out.println(this.toString() + " is gc(sdf)");
}
});
// new sdf object is created in ThreadLocalMap
System.out.println(Thread.currentThread().getId() + ":create SimpleDateFormat");
}
Date t = tl.get().parse("2017-3-26 17:03:" + i % 60);
} catch (ParseException e) {
e.printStackTrace();
} finally {
cd.countDown();
}
}
}
// code with ThreadPool
// public static void main(String[] args) throws InterruptedException {
// ExecutorService es = Executors.newFixedThreadPool(10);
//
// for(int i = 0; i < 10; i++){
// es.execute(new ParseDate(i));
// }
// cd.await();
//
// System.out.println("mission complete");
//
// tl = null; // free the weak reference
// System.gc();
// System.out.println("First GC complete");
// es.shutdown();
// }
// not pooling threads
public static void main(String[] args) throws InterruptedException {
Thread[] all = new Thread[10];
for(int i = 0; i < 10; i++){
all[i] = new Thread(new ParseDate(i));
}
for(int i = 0; i < 10; i++){
all[i].start();
}
cd.await();
tl = null;
System.gc();
System.out.println("First GC complete");
}
}
After running the first main() function. None of the SimpleDateFormat object is GCed. The second main() function indeed does that job.
first main function
second main function
Edit #1
Thanks to Gray's remind. The real problem, which results in no output in the function finalize(), is the ThreadPool may not be truly collected. In the test code, only shutdown() was used. However, the worker threads may not be collected after this process. So more safer way is invoking awaitTermination(). This function does generate all worker threads instance, and the resource those belongs to is collected, spefically ThreadLocalMap.
Here is the revision of the main() with ThreadPool
// code with ThreadPool
public static void main(String[] args) throws InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(10);
for(int i = 0; i < 10; i++){
es.execute(new ParseDate(i));
}
cd.await();
es.shutdown();
es.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
System.gc();
}
This version of main() works well, all the collection message from finalize() method are printed.
Finally, Java GC may not collect the value when the instance of Entry's key has no stable reference. As ThreadLocalMap's key is the weak reference, the Entry's key becomes null. However, the Entry's value is not GCed. This conclusion may be proved in my test.

I guess ThreadPool may be dangerous to use ThreadLocal.
I wouldn't go this far. I would say that you need to take into account that the ThreadLocal storage won't be reaped unless the thread itself terminates.
But in looking at your test code, there are a lot of problems with both the ExecutorService and direct thread main methods. In both cases you are not properly joining with the completed threads. Ditch the CountDownLatch and do the following before the gc() call:
for (int i = 0; i < 10; i++) {
all[i].join();
}
or
es.shutdown();
es.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
But the real problem with your code is that you have a race condition with the Finalizer thread. The gc thread finishes but the actually finalizing of the objects happens in another "Finalizer" thread after the GC completed. If you just put a 1 second sleep at the end of the main() you should see all 10 SDFs reaped.
What this demonstrates really is that it is hard to force objects to the GC'd in such a way. Putting System.out.println(...) commands in a finalizer() gives me the chills even thinking about it even though I know you are doing it to learn more about ThreadLocal's memory usage.
I think that storing things in ThreadLocals if done carefully shouldn't be a problem. In your thread's run method, I would just do a try / finally block and make sure to do a threadLocal.remove() in the finally so the thread cleans itself up before exiting. But I don't even bother with that if I have a background thread which is running for the life of my application. It is really only threads that come and go that you need to be particularly worried about.
Lastly, there is no need for a ThreadLocal field to be volatile and it should be a static within the ParseDate if possible.
Hope this helps.

Instances of ThreadLocal itself are merely a view into a map stored on the thread itself. The instance being collected does not actually guarantee that the reference is severed.
It can be approximated as threadInstance.privateField = WeakHashMap<ThreadLocal<T>,T>.
That means if the Thread instance becomes unreachable so become all associated values held by ThreadLocal. On the other hand when the ThreadLocal instance becomes unreachable that only means the map key is nulled (being a weak reference), the value is still held alive by the map until some accesses to the map clean the value.The map cleaning is performed lazily, so cleaning up ThreadLocal references does not have the same effect as letting threads terminate.
The third way of cleaning it is calling threadLocal.remove() from within the thread.
And of course it's a common pattern to have shared static final ThreadLocal<T> tl accessors within a class. When combined with a thread pool that means those values will stay alive as long as the thread pool does unless you use remove()

A Thread in a ThreadPool may never terminate until the ThreadPool does. That's the whole point of ThreadPool. So it never gets GC'd. So of course the Thread's ThreadLocal doesn't get GC'd either.

Related

Java code exits after some seconds due to concurrency

I am writing the same code on tutorial. But in tutorial the program never exits, my in my computer it exits after 4 seconds. Why?
tutorial with exact time where this code is shown: https://youtu.be/vzBw1LPupnA?t=169
public class Main {
private static boolean stopRequested;
public static void main(String[] args) throws InterruptedException {
Thread backgroundThread = new Thread(() -> {
int i = 0;
while (!stopRequested) {
i++;
System.out.println("i = " + i);
}
});
backgroundThread.start();
TimeUnit.SECONDS.sleep(1);
stopRequested = true;
}
}

The reason that you are seeing different behavior on your machine and in the video is because the program has unspecified behavior. (Or to put it another way, it is not thread-safe.)
You have two threads accessing and updating a shared variable without taking the necessary steps that will guarantee that changes made by one thread are visible to the other. What happens in that case is not specified.
In some cases (e.g. on some platforms) the changes will be visible, either immediately or within a short time.
On others, the changes may never be visible.
In technical terms, there must be a happens-before relationship between the write by on thread and the subsequent read by the other thread. This can be provided by both threads synchronizing on the same mutex or lock, by using a volatile variable, and in other ways. But this code doesn't do any of those things, so there is no guarantee that the state change will be visible.
For more details, read about the Java Memory Model.
The above is sufficient to explain the difference, but there may be a more direct explanation.
In practice, something like a System.out.println can lead to changes in the visibility. Underneath the covers, the println call will typically result in synchronization on the output stream's buffers. That can result in a serendipitous happens-before that is sufficient to guarantee visibility. But this behavior is not specified, so you should not rely on it.
At any rate, adding trace statements can change the behavior of multi-threaded coded. And the fact that you (apparently) added them in your version is a second possible explanation for the difference.
The bottom line here is that a program with a memory visibility flaw is broken, but you may not be able to demonstrate that it is broken.

As the excellent Answer by Stephen C says, your code is not thread-safe.
Establishing an AtomicBoolean early on addresses the visibility problem explained in that other Answer. This class is a thread-safe wrapper around its payload boolean value.
The volatile keyword is another solution. But I find the Atomic… classes simpler and more obvious.
Also, in modern Java we rarely need to address the Thread class directly. Instead, use the Executors framework. Define your task as a Runnable or Callable, and submit to an executor service.
Something like this untested code.
public class Main {
private static final AtomicBoolean stopRequested = new AtomicBoolean( false ) ;
public static void main(String[] args) throws InterruptedException {
Runnable task = () -> {
int i = 0;
while ( ! stopRequested.get() ) {
i++;
System.out.println("i = " + i);
TimeUnit.MILLISECONDS.sleep(100); // Don’t spin too fast.
}
};
ExecutorService es = Executors.newSingleThreadedExecutorService() ;
es.submit( task ) ;
TimeUnit.SECONDS.sleep(1);
stopRequested.set( true ) ;
TimeUnit.SECONDS.sleep(1);
// Shut down here executor service. Boilerplate taken from Javadoc.
es.shutdown(); // Disable new tasks from being submitted
try {
// Wait a while for existing tasks to terminate
if (!es.awaitTermination(60, TimeUnit.SECONDS)) {
es.shutdownNow(); // Cancel currently executing tasks
// Wait a while for tasks to respond to being cancelled
if (!es.awaitTermination(60, TimeUnit.SECONDS))
System.err.println("Executor service did not terminate");
}
} catch (InterruptedException ex) {
// (Re-)Cancel if current thread also interrupted
es.shutdownNow();
// Preserve interrupt status
Thread.currentThread().interrupt();
}
}
}

My java unit test failed if there is a call to wait method inside a synchronized method

I am learning multi-threads programming in java recently. And I don't understand why the following test case will fail. Any explanation will be much appreciated.
Here is MyCounter.java.
public class MyCounter {
private int count;
public synchronized void incrementSynchronized() throws InterruptedException {
int temp = count;
wait(100); // <-----
count = temp + 1;
}
public int getCount() {
return count;
}
}
This is my unit test class.
public class MyCounterTest {
#Test
public void testSummationWithConcurrency() throws InterruptedException {
int numberOfThreads = 100;
ExecutorService service = Executors.newFixedThreadPool(10);
CountDownLatch latch = new CountDownLatch(numberOfThreads);
MyCounter counter = new MyCounter();
for (int i = 0; i < numberOfThreads; i++) {
service.submit(() -> {
try {
counter.incrementSynchronized();
} catch (InterruptedException e) {
e.printStackTrace();
}
latch.countDown();
});
}
latch.await();
assertEquals(numberOfThreads, counter.getCount());
}
}
But if I remove wait(100) from the synchronized method incrementSynchronized, the test will succeed. I don't understand why wait(100) will affect the result.

Solomons suggestion to use sleep is a good one. If you use sleep instead of wait, you should see the test pass.
Using wait causes the thread to relinquish the lock, allowing other threads to proceed and overwrite the value in count. When the thread's wait times out, it acquires the lock again, then writes a value to count that may be stale by now.
The typical usage of wait is when your thread can't do anything useful until some condition is met. Some other thread eventually satisfies that condition and a notification gets sent that will inform the thread it can resume work. In the meantime, since there is nothing useful the thread can do, it releases the lock it is holding (because other threads need the lock in order to make progress meeting the condition that the thread is waiting for) and goes dormant.
Sleep doesn't release the lock so there won't be interference from other threads. For either the sleeping case or the case where you delete the wait call, the lock is held for the duration of the operation, nothing else can change count, so it is threadsafe.
Be aware that in real life, outside of learning exercises, sleeping with a lock held is usually not a great idea. You want to minimize the time that a task holds a lock so you can get more throughput. Threads denying each other the use of a lock is not helpful.
Also be aware that getCount needs to be synchronized as well, since it is reading a value written by another thread.

Why jvm recreate thread pool in case fixedThreadPool and don't do it in case of cachedThreadPool?

I have the code sample:
public class ThreadPoolTest {
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 100; i++) {
if (test() != 5 * 100) {
throw new RuntimeException("main");
}
}
test();
}
private static long test() throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(100);
CountDownLatch countDownLatch = new CountDownLatch(100 * 5);
Set<Thread> threads = Collections.synchronizedSet(new HashSet<>());
AtomicLong atomicLong = new AtomicLong();
for (int i = 0; i < 5 * 100; i++) {
Thread.sleep(100);
executorService.submit(new Runnable() {
#Override
public void run() {
try {
threads.add(Thread.currentThread());
atomicLong.incrementAndGet();
countDownLatch.countDown();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(e);
}
}
});
}
executorService.shutdown();
countDownLatch.await();
if (threads.size() != 100) {
throw new RuntimeException("test");
}
return atomicLong.get();
}
}
I especially made application to work long.
And I see jvisualVM.
Each time gap threadpool was recreated.
After several minutes I see:
but if I use newCachedThreadPool instead of newFixedThreadPool I see constant picture:
Can you explain this behaviour?
P.S.
Problem was that exception occures in code and second iteration was not started

To answer your question; just look here:
private static long test() throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(100);
The JVM creates a new ThreadPool during each run of test(), because you tell it to do so.
In other words: if you intend to re-use the same threadpool, then avoid creating/shutting down your instances all the time.
In that sense, the simple fix is: move the creation of that ExecutorService into your main() method; and pass the service as argument to your test() method.
Edit: regarding your last comment on cached vs. fixed threadpool; you probably want to look into this question.

Because you asked it to, in your code ? :) Try moving the Pool creation code outside the test.

From docs:
newFixedThreadPool
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until it is explicitly shutdown.
newCachedThreadPool
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.

When is runnable object garbage collected in ExecutorService?

I have a runnable object A which exchanges heart beat signals with a server on instantiation. I submit n such objects to a executor service with fixed thread pool size of n. When the run method encounters exception it would return. For a given case, all my threads encounter exception and return, but the object created remains alive and keeps on exchanging the heart beat signals. How do I mark such objects up for garbage collection so that they would stop the heart beat signals exchange?
class A implements Runnable {
public void run(){
try{
\\throws error
} catch(Exception e){
\\returns
}
}
public static void main(){
ExecutorService executor = Executors.newFixedThreadPool(n)
for(i = 1 to n){
A a = new A()
executor.submit(a)
}
}
}
Should I put a awaitTermination call at the end of my main and do a return?
Edit:
Putting the question other way, one way to terminate the executorservice after all the threads return would be to call shutdown() after the for loop and call awaitTermination with Integer.MAX long seconds which is roughly 70 years ( which is a time constraint I am reluctant to impose). Is there any other alternative?

one way to terminate the executorservice after all the threads return would be to call shutdown() after the for loop and call awaitTermination with Integer.MAX long seconds which is roughly 70 years
as the doc says the awaitTermination method will block util:
all tasks have completed execution after a shutdown request
or the timeout occurs,
or the current thread is interrupted, whichever happens first
So it will game over as soon as one of the three event turn up, rather than have to wait 70 years.

calling shutdown() on pool means the pool will no longer accept any new task for execution, but the current ones will run without interruption.
calling awaitTermination(timeout) holds the calling thread till the pool is finished, but if timeout is reached, then current thread throws execption, but it will not affect the tasks in pool.
If your runnable throws uncought exception when is run by thread pool, then this runnable is no longer in run state - thread pool doesn't hold any reference to such object usually.
If you use FixedThreadPool, then this pool will create as many threads as you wish, and will not stop any of them until you call shutdown() on this pool.
If you don't have reference to the runnable object that throwed the exception it behaves as regular unreferenced Object to be Garbage Collected.
if you call shutdown() and then awaitTermination() on thread pool, and your program doesn't stop anyway, that means not all instances of your runnable have thrown an exception, and some are still running thus blocking the pool from complete shutdown.
In java you can't kill or stop running thread just like that (you can only kill entire JVM using eg. System.exit(0), but not just choosen thread), if you need such functionality you need to program the body of the runnable in a way that lets you communicate somehow with it, ie. using some "volatile boolean" variable, and that it will respond to change in the value of this variable - it means that you need to add "if checks" for the value of this variable in the body of the run() method that will return when it should.

The tasks themselves are eligible for garbage collecting as soon as their execution is complete. If and when they are actually collected depends on the garbage collector.
Example code:
public class Main implements Runnable {
#Override
protected void finalize() throws Throwable {
super.finalize();
System.out.println("finalize");
}
#Override
public void run() {
try {
throw new Exception("Error");
} catch (Exception e) {
//returns
}
}
public static void main(String args[]) {
int n = 8;
ExecutorService executor = Executors.newFixedThreadPool(n);
for (int i = 0 ; i < n; ++i) {
Main a = new Main();
executor.submit(a);
}
System.gc();
System.out.println("end");
}
}

Is this synchronized block need?

Is the synchronized block on System.out.println(number); need the following code?
import java.util.concurrent.CountDownLatch;
public class Main {
private static final Object LOCK = new Object();
private static long number = 0L;
public static void main(String[] args) throws InterruptedException {
CountDownLatch doneSignal = new CountDownLatch(10);
for (int i = 0; i < 10; i++) {
Worker worker = new Worker(doneSignal);
worker.start();
}
doneSignal.await();
synchronized (LOCK) { // Is this synchronized block need?
System.out.println(number);
}
}
private static class Worker extends Thread {
private final CountDownLatch doneSignal;
private Worker(CountDownLatch doneSignal) {
this.doneSignal = doneSignal;
}
#Override
public void run() {
synchronized (LOCK) {
number += 1;
}
doneSignal.countDown();
}
}
}
I think it's need because there is a possibility to read the cached value.
But some person say that:
It's unnecessary.
Because when the main thread reads the variable number, all of worker thread has done the write operation in memory of variable number.

doneSignal.await() is a blocking call, so your main() will only proceed when all your Worker threads have called doneSignal.countDown(), making it reach 0, which is what makes the await() method return.
There is no point adding that synchronized block before the System.out.println(), all your threads are already done at that point.
Consider using an AtomicInteger for number instead of synchronizing against a lock to call += 1.

It is not necessary:
CountDownLatch doneSignal = new CountDownLatch(10);
for (int i = 0; i < 10; i++) {
Worker worker = new Worker(doneSignal);
worker.start();
}
doneSignal.await();
// here the only thread running is the main thread
Just before dying each thread countDown the countDownLatch
#Override
public void run() {
synchronized (LOCK) {
number += 1;
}
doneSignal.countDown();
}
Only when the 10 thread finish their job the doneSignal.await(); line will be surpass.

It is not necessary because you are waiting for "done" signal. That flush memory in a way that all values from the waited thread become visible to main thread.
However you can test that easily, make inside the run method a computation that takes several (millions) steps and don't get optimized by the compiler, if you see a value different than from the final value that you expect then your final value was not already visible to main thread. Of course here the critical part is to make sure the computation doesn't get optimized so a simple "increment" is likely to get optimized. This in general is usefull to test concurrency where you are not sure if you have correct memory barriers so it may turn usefull to you later.

synchronized is not needed around System.out.println(number);, but not because the PrintWriter.println() implementations are internally synchronized or because by the time doneSignal.await() unblocks all the worker threads have finished.
synchronized is not needed because there's a happens-before edge between everything before each call to doneSignal.countDown and the completion of doneSignal.await(). This guarantees that you'll successfully see the correct value of number.

Needed
No.
However, as there is no (documented) guarantee that there will not be any interleaving it is possible to find log entries interleaved.
System.out.println("ABC");
System.out.println("123");
could print:
AB1
23C
Worthwhile
Almost certainly not. Most JVMs will implement println with a lock open JDK does.
Edge case
As suggested by #DimitarDimitrov, there is one further possible use for that lock and it is to ensure a memory barrier is crossed befor accessing number. If that is the concern then you do not need to lock, all you need to do is make number volatile.
private static volatile long number = 0L;

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.