I'm looking at a jstack log and this is what i see:
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2" #250 daemon prio=5 os_prio=0 tid=0x00007f9de0016000 nid=0x7e54 runnable [0x00007f9d6495a000]
java.lang.Thread.State: RUNNABLE
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
- locked <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1" #249 daemon prio=5 os_prio=0 tid=0x00007f9de000c000 nid=0x7e53 waiting for monitor entry [0x00007f9d649db000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
- locked <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0" #248 daemon prio=5 os_prio=0 tid=0x00007f9de001a000 nid=0x7e52 waiting for monitor entry [0x00007f9d64a5c000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
- locked <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
So, in this log, each of these three threads has managed to get the same lock and the bottom two threads are actually blocked waiting for the same lock.
Can someone please explain to me what this stack log means?
The last two threads are waiting to be notified by using the instance of ThreadPoolAsynchronousRunner as monitor, so the source of that will look something like this:
synchronized(asyncRunner) {
// ...
asyncRunner.wait();
// ...
}
As soon as you call wait, the synchronization on asyncRunner is "released", i.e. other parts of the application can enter a block that is synchronized on that instance. In your particular case it seems that this has happened and the first thread's wait-call returned and it's currently processing some data that comes from it. You still see multiple locked-lines in the thread-dump to show you that the code is currently within a synchronized-block but as said, the "lock" is released when calling wait.
The technique you see here as a thread-dump is quite common before the concurrent-package was added to the JDK to avoid costly thread-creations. And your thread-dump looks like this kind of implementation. Here is a simple implementation how it might look like "under the hood":
// class ThreadPoolAsynchronousRunner
private Deque<AsyncMessage> queue;
public synchronized void addAsyncMessage(AsyncMessage msg) {
queue.add(msg);
notifyAll();
}
public void start() {
for (int i = 0; i < 4; i++) {
PoolThread pt = new PoolThread(this);
pt.start();
}
}
The ThreadPoolAsynchronousRunner`` starts PoolThreads and does a notifyAll if a new message to be processed is added.
// PoolThread
public PoolThread(ThreadPoolAsynchronousRunner parent) {
this.parent = parent;
}
public void run() {
try {
while (true) {
AsyncMessage msg = null;
synchronized(parent) {
parent.wait();
if (!parent.queue.isEmpty()) {
msg = queue.removeFirst();
}
}
if (msg != null) {
processMsg(msg);
}
}
}
catch(InterruptedException ie) {
// exit
}
}
notifyAll will lead all wait-methods of all threads to return, so you have to check if the queue in the parent still contains data (sometimes wait returns even without a notification taken place, so you need this check even if not using notifyAll). If that's the case you start the processing method. You should do that outside the synchronized-block otherwise your async-processing class only processes one message at the time (unless, that's what you want - but then why run multiple PoolThread-instances?)
Only Thread-#2 has managed to get Object lock successfully and it is in RUNNABLE state. Other 2 threads, i.e., Thread-#0 and Thread-#1 are waiting for that lock to be released by Thread-#2. As long as Thread-#2 holds the lock, Thread-#0 and Thread-#1 will remain locked and will be in a state BLOCKED.
If you have access to source code, You can review that code just to ensure if locking and unlocking is done in proper order and lock has been been held only for part of code where it is necessary. Remember these 2 threads are not in WAIT state but in BLOCKED state which is a step after WAIT state and just a step before getting in to RUNNABLE state as soon as lock is available.
There is no problem observed in this log snippet. This is not a deadlock scenario yet.
What I can see and understand is that
Thread-#2 is in Runnable state and has acquired a lock on an Object
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2"
java.lang.Thread.State: RUNNABLE
Thread-#1 and Thread-#0 are waiting for that Object lock to be released and hence blocked right now.
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1"
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0"
java.lang.Thread.State: BLOCKED (on object monitor) at
java.lang.Object.wait(Native Method) -
waiting on <0x00000006fa818a38>
Related
I am using Log4j2 with async logging. There are more than 10 threads running parallel in my application and one thread is blocking all other threads.
Thread dump shows below.
"Scheduler_Worker-7" #143 prio=5 os_prio=0 tid=0x00007fae2ac55800 nid=0x18fc runnable [0x00007faceab12000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:236)
- locked <0x00000005522ce890> (a java.lang.Object)
at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:202)
at org.apache.logging.log4j.core.async.AsyncLogger.access$100(AsyncLogger.java:67)
at org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:157)
at org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:130)
at org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:222)
at org.apache.logging.log4j.spi.AbstractLogger.log(AbstractLogger.java:2117)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2034)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1899)
at com.ragi.common.util.Log4jWrapper.debug(Log4jWrapper.java:106)
and other threads stack is like below
"Scheduler_Worker-6" #142 prio=5 os_prio=0 tid=0x00007fae2ad2c800 nid=0x18fb waiting for monitor entry [0x00007faceac13000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:236)
- waiting to lock <0x00000005522ce890> (a java.lang.Object)
at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:202)
at org.apache.logging.log4j.core.async.AsyncLogger.access$100(AsyncLogger.java:67)
at org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:157)
at org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:130)
at org.apache.logging.log4j.spi.ExtendedLoggerWrapper.logMessage(ExtendedLoggerWrapper.java:222)
at org.apache.logging.log4j.spi.AbstractLogger.log(AbstractLogger.java:2117)
at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
at com.ragi.common.util.Log4jWrapper.debug(Log4jWrapper.java:94)
This clearly shows that Scheduler_Worker-7 thread is holding lock on 0x00000005522ce890 and other threads (Example is Scheduler_Worker-6) are waiting to lock 0x00000005522ce890.
Also, from heap dumps, I observed that log ring queue buffer is full. As I am using default DefaultAsyncQueueFullPolicy, when the buffer is full, the new log events should go directly to the appender instead of the buffer.
Why Scheduler_Worker-7 thread is holding lock on object for so much time and went to TIMED WAITING state and other threads which are trying to log went to blocked states.
Note: There are no deadlocks observed from thread dump.
I'm new to Java Multithreading. Curious to know the state of idle thread incase of ThreadPoolExecutor. Is it in RUNNABLE/WAITING?
In case the idle threads are in RUNNABLE state, how are the new tasks attached to idle threads? AFAIK we assign a runnable/callable object to thread/pool. But my question is how does ThreadPoolExecutor assign queued runnable objects to idle thread??
It's easy enough to find out:
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import java.io.IOException;
public class ThreadExample {
public static void main(String[] args) throws IOException {
Executor executor = Executors.newFixedThreadPool(5);
// force the threads to be started
for (int i = 0; i < 5; i++) {
executor.execute(() -> {
try {Thread.sleep(1000);} catch (InterruptedException e) {
}
});
}
// don't terminate
System.in.read();
}
}
Run it:
$ javac ThreadExample.java
$ java ThreadExample
In another console, having waited at least one second for the tasks to complete:
$ ps
PID TTY TIME CMD
3640 ttys000 0:00.25 -bash
5792 ttys000 0:00.15 java ThreadExample
5842 ttys001 0:00.05 -bash
$ jstack 5792
...
"pool-1-thread-1" #12 prio=5 os_prio=31 cpu=1.77ms elapsed=13.37s tid=0x00007fe99f833800 nid=0xa203 waiting on condition [0x00007000094b2000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#11.0.2/Native Method)
- parking to wait for <0x000000061ff9e998> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base#11.0.2/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base#11.0.2/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.take(java.base#11.0.2/LinkedBlockingQueue.java:433)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base#11.0.2/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base#11.0.2/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base#11.0.2/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base#11.0.2/Thread.java:834)
...
All the pool threads are in that state.
Curious to know the state of idle thread in case of ThreadPoolExecutor. Is it in RUNNABLE/WAITING?
It will be WAITING. It is waiting (in a Queue.take() call) for a new task to appear on the work queue. In the current implementations this involves a mechanism similar to wait / notify.
Your second question is therefore moot.
However, it is worth noting the following:
No "idle" thread will ever be RUNNABLE.
In current generation HotSpot JVMs, the actual scheduling (deciding which threads get priority and assigning them a core to run on) is handled by the operating system.
In Loom JVMs (Loom is still an Incubator project), light-weight virtual threads ("fibres") are scheduled (to a native thread) by the JVM rather than the OS.
I am using javas Thread to connect via SMTP to our mailprovider as this can take some time until it finishes and I dont want the request to wait.
But it looks like the threads are not closed after they are finished.
I noticed this in the debug mode of Eclipse:
For each time I create a new Thread(), it adds one running thread, but it is not closing it (at least I assume this, as eclipse still shows Running).
This is my code:
Thread mailThread = new Thread() {
public void run() {
System.out.println("Does it work?");
try {
Transport t = session.getTransport("smtp");
t.connect("user","pass");
t.sendMessage(message,message.getAllRecipients());
t.close();
System.out.println("SENT");
return;
} catch (MessagingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
}
};
mailThread.start();
Is this working as intended? Or does Running in eclipse mean something different?
I suggest not only to use the debugger to see, to see which threads you have at a certain point in time. Debuggers might display threads which are active during a break point but should not be there under normal conditions.
It is preferrable to use the command line tool jstack to create thread dumps. This will dump all the threads in a JVM at a certain point in time.
Here are some instructions on how to use it: https://helpx.adobe.com/uk/experience-manager/kb/TakeThreadDump.html
Another thing could help you debugging and finding threads in the dump: give threads a name using the string in one of the constructor.
new Thread("foo")
Then it becomes easier to find these in the thread dump.
If you call a thread "foo" then it will show up in a thread dump like this:
"foo" #16 prio=5 os_prio=0 tid=0x0000000041970800 nid=0x41f8 waiting on condition [0x000000004244e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(java.base#9/Native Method)
at stackoverflow.ThreadReferenceTest$1.run(ThreadReferenceTest.java:14)
Locked ownable synchronizers:
- None
"Service Thread" #15 daemon prio=9 os_prio=0 tid=0x0000000041914000 nid=0x3d90 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
i have been try to use the LMAX distruptor to buffer the content produced by one of my programs and publish them to another program as a batch of records (well i am still unable to get the consumer batching part done). But even without using the batching of the records, it works as it should be. But my problem is eventhough i used call the
`disruptor.shutdown()` and `executorService.shutdownNow()`
as it is given in one of the examples, it doesn't stop executing the program. It does even execute statement below those methods as well. When i print
executorService.isShutdown();
it returns true. Can someone help me with this...
Edit
"pool-1-thread-1" prio=10 tid=0x00007f57581b9800 nid=0x1bec waiting on condition [0x00007f573eb0d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d9110148> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Some tips that help me:
1. Set daemon flag on ExecutorService threads
Do this with a ThreadFactory (or Guava's ThreadFactoryBuilder).
Example:
final ThreadFactory threadFactory =
new ThreadFactory() {
#Override
public Thread newThread(Runnable r) {
final ThreadFactory threadFactory = Executors.defaultThreadFactory();
final Thread thread = threadFactory.newThread(r);
thread.setDaemon(true);
return thread;
}
};
final ExecutorService executorService =
Executors.newFixedThreadPool(threadCount, threadFactory);
2. Shutdown order
Disruptor.shutdown(long, TimeUnit)
Disruptor.halt()
ExecutorService.shutdown()
ExecutorService.awaitTermination(long, TimeUnit)
Impatient shutdown example:
try {
disruptor.shutdown(0, TimeUnit.NANOSECONDS);
// if shutdown is successful:
// 1. exception is not thrown (obviously)
// 2. Disruptor.halt() is called automatically (less obvious)
}
catch (TimeoutException e) {
disruptor.halt();
}
executorService.shutdown();
executorService.awaitTermination(0, TimeUnit.NANOSECONDS);
3. Use a Shutdown Hook
These are called even when System.exit(int) is called, but not if your JVM is killed with SIGKILL (or the equivalent on non-POSIX platforms).
Runtime.getRuntime()
.addShutdownHook(
new Thread(
() -> {
// shutdown here
}));
Your Java Process only stops, when all threads (that are non daemon threads) are finished.
Probably some thread is still running, maybe in a lock, maybe in a loop.
To see what thread are still running you can use the jdk-tools:
Use jps to get the ids of running Java processes:
C:\DVE\jdk\jdk8u45x64\jdk1.8.0_45\bin>jps
4112 TestExMain
With the right id for your program use the command jstack:
C:\DVE\jdk\jdk8u45x64\jdk1.8.0_45\bin>jstack 4112
2015-09-17 09:12:45
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode):
"Service Thread" #9 daemon prio=9 os_prio=0 tid=0x000000001d208800 nid=0x1b7c runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"main" #1 prio=5 os_prio=0 tid=0x0000000002260800 nid=0x1324 waiting on condition [0x000000000224f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.example.TestExMain.main(TestExMain.java:8)
For example, here you'll see a thread Service Thread which is a daemon - this thread wont stop your program from shutting down.
The Thread main is not a daemon thread - the Java-Process will wait for this thread to finish, before it stops.
For each thread you'll see a stack trace, at which position the thread is - with that you can find the code that possible keeps the thread from running.
That specific Thread you have there is somehow locked (im not shure why, it could be a wait() call, a synchronize block or some other locking mechanism). When that thread doesn't stop when you call disruptor.shutdown() it might be a bug in that lib you use.
I'm seeing a problem with multiple Threads deadlocking on the same line of code.
I cannot reproduce the problem locally or in any test, but yet Thread Dumps from Production have show the problem quite clearly.
I can't see why the Threads would become blocked on the synchronized line below, since there is no other synchronization on the Object in the call stack or in any other Thread. Does anyone have any idea what is going on, or how I can even reproduce this issue (Currently trying with 15 Threads all hitting trim() in a loops, while processing 2000 tasks through my Queue - But unable to reproduce)
In the Thread dump below, I think the multiple Threads with the 'locked' status may be a manifestation of Java Bug: http://bugs.java.com/view_bug.do?bug_id=8047816 where JStack reports Threads in wrong state.
(I'm using JDK Version: 1.7.0_51)
Cheers!
Here is a view of the Threads in the Thread dump.....
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Here is the Java code extracted, which shows where the error is...
public class Deadlock {
final Deque<Object> delegate = new ArrayDeque<>();
final long maxSize = Long.MAX_VALUE;
private final AtomicLong totalExec = new AtomicLong();
private final Map<Object, AtomicLong> totals = new HashMap<>();
private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
public void trim() {
//Possible optimization is evicting in chunks, segmenting by arrival time
while (this.totalExec.longValue() > this.maxSize) {
final Object t = this.delegate.peek();
final Deque<Long> execTime = this.execTimes.get(t);
final Long exec = execTime.peek();
if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
//If Job Started Inside of Window, remove and re-loop
remove();
}
else {
//Otherwise exit the loop
break;
}
}
}
public Object remove() {
Object removed;
synchronized (this.delegate) { //4 Threads deadlocking on this line !
removed = this.delegate.pollFirst();
}
if (removed != null) {
itemRemoved(removed);
}
return removed;
}
public void itemRemoved(final Object t) {
//Decrement Total & Queue
final AtomicLong catTotal = this.totals.get(t);
if (catTotal != null) {
if (!this.execTimes.get(t).isEmpty()) {
final Long exec = this.execTimes.get(t).pollFirst();
if (exec != null) {
catTotal.addAndGet(-exec);
this.totalExec.addAndGet(-exec);
}
}
}
}
}
From the documentation for HashMap
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
(Emphasis theirs)
You are both reading and writing to/from the Maps in an unsynchronized manner.
I see no reason to assume that your code is thread safe.
I suggest that you have an infinite loop in trim caused by this lack of thread safety.
Entering a synchronized block is relatively slow, so it's likely that a thread dump will always show at least a few threads waiting to obtain the lock.
Your first thread is holding the lock while waiting for pollFirst.
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
The other threads are waiting to obtain the lock.
You will need to provide the entire thread dump to determine which thread is holding the lock on 0x0000000052ec4000, which is what is preventing your pollFirst call from returning.
In order to deadlock, you need at least two be locking on at least two objects in the same thread at the same time which is something the code you posted doesn't appear to do. The bug you point to may apply but as I read it, it's a cosmetic issue and that the threads are not 'locked', but waiting to acquire a lock on the object in question (the ArrayDeque). You should see a "deadlock" message in your logs if you have a deadlock. It will call out the two threads that are blocking each other.
I don't believe the thread dump says there are deadlocks. It's simply telling you how many threads are waiting on the monitor at the moment you took the dump. Since only one thread may have the monitor at a given moment, it shouldn't be very surprising.
What behavior are you seeing in your application that lead you to believe you have a deadlock? There's a lot missing from your code particularly where the objects in the delegate Dequeue are coming from. My guess is you don't have an outright deadlock but some other issue that may look like a deadlock.
Thanks to the responses here, it became clear that the issue was none Thread Safe usage of multiple Collections.
To resolve the issue, I've made the trim method synchronized and replaced usage of HashMap with ConcurrentHashMap and ArrayDeque with LinkedBlockingDeque
(Concurrent Collections FTW!)
A further planned enhancement is to change the usage of 2 separate Maps into a single Map containing a Custom Object, that way keeping the operations (in itemRemoved) atomic.