I'm seeing a problem with multiple Threads deadlocking on the same line of code.
I cannot reproduce the problem locally or in any test, but yet Thread Dumps from Production have show the problem quite clearly.
I can't see why the Threads would become blocked on the synchronized line below, since there is no other synchronization on the Object in the call stack or in any other Thread. Does anyone have any idea what is going on, or how I can even reproduce this issue (Currently trying with 15 Threads all hitting trim() in a loops, while processing 2000 tasks through my Queue - But unable to reproduce)
In the Thread dump below, I think the multiple Threads with the 'locked' status may be a manifestation of Java Bug: http://bugs.java.com/view_bug.do?bug_id=8047816 where JStack reports Threads in wrong state.
(I'm using JDK Version: 1.7.0_51)
Cheers!
Here is a view of the Threads in the Thread dump.....
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Here is the Java code extracted, which shows where the error is...
public class Deadlock {
final Deque<Object> delegate = new ArrayDeque<>();
final long maxSize = Long.MAX_VALUE;
private final AtomicLong totalExec = new AtomicLong();
private final Map<Object, AtomicLong> totals = new HashMap<>();
private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
public void trim() {
//Possible optimization is evicting in chunks, segmenting by arrival time
while (this.totalExec.longValue() > this.maxSize) {
final Object t = this.delegate.peek();
final Deque<Long> execTime = this.execTimes.get(t);
final Long exec = execTime.peek();
if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
//If Job Started Inside of Window, remove and re-loop
remove();
}
else {
//Otherwise exit the loop
break;
}
}
}
public Object remove() {
Object removed;
synchronized (this.delegate) { //4 Threads deadlocking on this line !
removed = this.delegate.pollFirst();
}
if (removed != null) {
itemRemoved(removed);
}
return removed;
}
public void itemRemoved(final Object t) {
//Decrement Total & Queue
final AtomicLong catTotal = this.totals.get(t);
if (catTotal != null) {
if (!this.execTimes.get(t).isEmpty()) {
final Long exec = this.execTimes.get(t).pollFirst();
if (exec != null) {
catTotal.addAndGet(-exec);
this.totalExec.addAndGet(-exec);
}
}
}
}
}
From the documentation for HashMap
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
(Emphasis theirs)
You are both reading and writing to/from the Maps in an unsynchronized manner.
I see no reason to assume that your code is thread safe.
I suggest that you have an infinite loop in trim caused by this lack of thread safety.
Entering a synchronized block is relatively slow, so it's likely that a thread dump will always show at least a few threads waiting to obtain the lock.
Your first thread is holding the lock while waiting for pollFirst.
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
The other threads are waiting to obtain the lock.
You will need to provide the entire thread dump to determine which thread is holding the lock on 0x0000000052ec4000, which is what is preventing your pollFirst call from returning.
In order to deadlock, you need at least two be locking on at least two objects in the same thread at the same time which is something the code you posted doesn't appear to do. The bug you point to may apply but as I read it, it's a cosmetic issue and that the threads are not 'locked', but waiting to acquire a lock on the object in question (the ArrayDeque). You should see a "deadlock" message in your logs if you have a deadlock. It will call out the two threads that are blocking each other.
I don't believe the thread dump says there are deadlocks. It's simply telling you how many threads are waiting on the monitor at the moment you took the dump. Since only one thread may have the monitor at a given moment, it shouldn't be very surprising.
What behavior are you seeing in your application that lead you to believe you have a deadlock? There's a lot missing from your code particularly where the objects in the delegate Dequeue are coming from. My guess is you don't have an outright deadlock but some other issue that may look like a deadlock.
Thanks to the responses here, it became clear that the issue was none Thread Safe usage of multiple Collections.
To resolve the issue, I've made the trim method synchronized and replaced usage of HashMap with ConcurrentHashMap and ArrayDeque with LinkedBlockingDeque
(Concurrent Collections FTW!)
A further planned enhancement is to change the usage of 2 separate Maps into a single Map containing a Custom Object, that way keeping the operations (in itemRemoved) atomic.
Related
I'm looking at a jstack log and this is what i see:
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2" #250 daemon prio=5 os_prio=0 tid=0x00007f9de0016000 nid=0x7e54 runnable [0x00007f9d6495a000]
java.lang.Thread.State: RUNNABLE
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
- locked <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1" #249 daemon prio=5 os_prio=0 tid=0x00007f9de000c000 nid=0x7e53 waiting for monitor entry [0x00007f9d649db000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
- locked <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0" #248 daemon prio=5 os_prio=0 tid=0x00007f9de001a000 nid=0x7e52 waiting for monitor entry [0x00007f9d64a5c000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:534)
- locked <0x00000006fa818a38> (a com.mchange.v2.async.ThreadPoolAsynchronousRunner)
So, in this log, each of these three threads has managed to get the same lock and the bottom two threads are actually blocked waiting for the same lock.
Can someone please explain to me what this stack log means?
The last two threads are waiting to be notified by using the instance of ThreadPoolAsynchronousRunner as monitor, so the source of that will look something like this:
synchronized(asyncRunner) {
// ...
asyncRunner.wait();
// ...
}
As soon as you call wait, the synchronization on asyncRunner is "released", i.e. other parts of the application can enter a block that is synchronized on that instance. In your particular case it seems that this has happened and the first thread's wait-call returned and it's currently processing some data that comes from it. You still see multiple locked-lines in the thread-dump to show you that the code is currently within a synchronized-block but as said, the "lock" is released when calling wait.
The technique you see here as a thread-dump is quite common before the concurrent-package was added to the JDK to avoid costly thread-creations. And your thread-dump looks like this kind of implementation. Here is a simple implementation how it might look like "under the hood":
// class ThreadPoolAsynchronousRunner
private Deque<AsyncMessage> queue;
public synchronized void addAsyncMessage(AsyncMessage msg) {
queue.add(msg);
notifyAll();
}
public void start() {
for (int i = 0; i < 4; i++) {
PoolThread pt = new PoolThread(this);
pt.start();
}
}
The ThreadPoolAsynchronousRunner`` starts PoolThreads and does a notifyAll if a new message to be processed is added.
// PoolThread
public PoolThread(ThreadPoolAsynchronousRunner parent) {
this.parent = parent;
}
public void run() {
try {
while (true) {
AsyncMessage msg = null;
synchronized(parent) {
parent.wait();
if (!parent.queue.isEmpty()) {
msg = queue.removeFirst();
}
}
if (msg != null) {
processMsg(msg);
}
}
}
catch(InterruptedException ie) {
// exit
}
}
notifyAll will lead all wait-methods of all threads to return, so you have to check if the queue in the parent still contains data (sometimes wait returns even without a notification taken place, so you need this check even if not using notifyAll). If that's the case you start the processing method. You should do that outside the synchronized-block otherwise your async-processing class only processes one message at the time (unless, that's what you want - but then why run multiple PoolThread-instances?)
Only Thread-#2 has managed to get Object lock successfully and it is in RUNNABLE state. Other 2 threads, i.e., Thread-#0 and Thread-#1 are waiting for that lock to be released by Thread-#2. As long as Thread-#2 holds the lock, Thread-#0 and Thread-#1 will remain locked and will be in a state BLOCKED.
If you have access to source code, You can review that code just to ensure if locking and unlocking is done in proper order and lock has been been held only for part of code where it is necessary. Remember these 2 threads are not in WAIT state but in BLOCKED state which is a step after WAIT state and just a step before getting in to RUNNABLE state as soon as lock is available.
There is no problem observed in this log snippet. This is not a deadlock scenario yet.
What I can see and understand is that
Thread-#2 is in Runnable state and has acquired a lock on an Object
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2"
java.lang.Thread.State: RUNNABLE
Thread-#1 and Thread-#0 are waiting for that Object lock to be released and hence blocked right now.
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1"
"com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0"
java.lang.Thread.State: BLOCKED (on object monitor) at
java.lang.Object.wait(Native Method) -
waiting on <0x00000006fa818a38>
I've got a thread dump for a deadlock and I can't see the cause. On first inspection it looks like some client code simply fails to acquire the lock on a ReentrantLock which is owned by MyClass:
"qtp1450652220-77" Id=77 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync#1e319fef owned by "pool-2-thread-2" Id=1651
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync#1e319fef
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at com.mycode.MyClass.methodName(MyClass.java:1008)
However the owning thread's dump is:
"pool-2-thread-2" Id=1651 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#56171f7a
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#56171f7a
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Number of locked synchronizers = 1
- java.util.concurrent.locks.ReentrantLock$NonfairSync#1e319fef
Sure enough the lock on the ReentrantLock is listed at the bottom. But what surprises me is there's none of my client code in the thread dump. There's no indication as to how that ReentrantLock was acquired in the first place, so how can I fix it?
The code in MyClass is:
public Collection<String> methodName() {
interruptLock.lock();
try {
/* do stuff */
return tagsToReturn;
} finally {
interruptLock.unlock();
}
}
Line 1008 is the interruptLock.lock(); line.
It is possible that you have to capture the thread stack with jstack and -l option:
https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstack.html
-l
Long listing. Prints additional information about locks such as a list of owned java.util.concurrent ownable synchronizers. See the
AbstractOwnableSynchronizer class description at
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/AbstractOwnableSynchronizer.html
I am using javas Thread to connect via SMTP to our mailprovider as this can take some time until it finishes and I dont want the request to wait.
But it looks like the threads are not closed after they are finished.
I noticed this in the debug mode of Eclipse:
For each time I create a new Thread(), it adds one running thread, but it is not closing it (at least I assume this, as eclipse still shows Running).
This is my code:
Thread mailThread = new Thread() {
public void run() {
System.out.println("Does it work?");
try {
Transport t = session.getTransport("smtp");
t.connect("user","pass");
t.sendMessage(message,message.getAllRecipients());
t.close();
System.out.println("SENT");
return;
} catch (MessagingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
}
};
mailThread.start();
Is this working as intended? Or does Running in eclipse mean something different?
I suggest not only to use the debugger to see, to see which threads you have at a certain point in time. Debuggers might display threads which are active during a break point but should not be there under normal conditions.
It is preferrable to use the command line tool jstack to create thread dumps. This will dump all the threads in a JVM at a certain point in time.
Here are some instructions on how to use it: https://helpx.adobe.com/uk/experience-manager/kb/TakeThreadDump.html
Another thing could help you debugging and finding threads in the dump: give threads a name using the string in one of the constructor.
new Thread("foo")
Then it becomes easier to find these in the thread dump.
If you call a thread "foo" then it will show up in a thread dump like this:
"foo" #16 prio=5 os_prio=0 tid=0x0000000041970800 nid=0x41f8 waiting on condition [0x000000004244e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(java.base#9/Native Method)
at stackoverflow.ThreadReferenceTest$1.run(ThreadReferenceTest.java:14)
Locked ownable synchronizers:
- None
"Service Thread" #15 daemon prio=9 os_prio=0 tid=0x0000000041914000 nid=0x3d90 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
My application listen on kafka topic and dump data into cassandra. Threads loads some information from mongo too. Lag in kafka topic getting increased. I have seen that mostly threads are blocked while loading some class. I am attaching my thread_dump below.
"KafkaConsumer-49" prio=10 tid=0x00007f1178fdd000 nid=0x78e0 waiting for monitor entry [0x00007f1155fb5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:403)
- waiting to lock <0x00000006c0655b58> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.springframework.util.ClassUtils.forName(ClassUtils.java:258)
at org.springframework.data.convert.SimpleTypeInformationMapper.resolveTypeFrom(SimpleTypeInformationMapper.java:56)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:103)
at org.springframework.data.convert.DefaultTypeMapper.getDefaultedTypeToBeUsed(DefaultTypeMapper.java:144)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:121)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:186)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:176)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:172)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:75)
at org.springframework.data.mongodb.core.MongoTemplate$ReadDbObjectCallback.doWith(MongoTemplate.java:1840)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1536)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1336)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1322)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:495)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:486)
at com.snapdeal.coms.timemachine.mao.TimeMachineMao.getVendorProductsForUploadId(TimeMachineMao.java:32)
at com.snapdeal.coms.timemachine.service.TimeMachineService.getVendorProductsForUploadIdAndSupc(TimeMachineService.java:35)
at com.snapdeal.coms.timemachine.event.SupcUploadIdStateUpdateEventHandler.handleEvent(SupcUploadIdStateUpdateEventHandler.java:40)
KafkaConsumer-48" prio=10 tid=0x00007f1178fdb000 nid=0x78df waiting for monitor entry [0x00007f11560b6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:403)
- waiting to lock <0x00000006c0655b58> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.springframework.util.ClassUtils.forName(ClassUtils.java:258)
at org.springframework.data.convert.SimpleTypeInformationMapper.resolveTypeFrom(SimpleTypeInformationMapper.java:56)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:103)
at org.springframework.data.convert.DefaultTypeMapper.getDefaultedTypeToBeUsed(DefaultTypeMapper.java:144)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:121)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:186)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:176)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:172)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:75)
at org.springframework.data.mongodb.core.MongoTemplate$ReadDbObjectCallback.doWith(MongoTemplate.java:1840)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1536)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1336)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1322)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:495)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:486)
at com.snapdeal.coms.timemachine.mao.TimeMachineMao.getVendorProductsForUploadId(TimeMachineMao.java:32)
at com.snapdeal.coms.timemachine.service.TimeMachineService.getVendorProductsForUploadIdAndSupc(TimeMachineService.java:35)
at com.snapdeal.coms.timemachine.event.SupcUploadIdStateUpdateEventHandler.handleEvent(SupcUploadIdStateUpdateEventHandler.java:40)
at com.snapdeal.coms.timemachine.TimeMachine.onEvent(TimeMachine.java:109)
"KafkaConsumer-47" prio=10 tid=0x00007f1178fd9800 nid=0x78de waiting for monitor entry [0x00007f11561b7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:403)
- waiting to lock <0x00000006c0655b58> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.springframework.util.ClassUtils.forName(ClassUtils.java:258)
at org.springframework.data.convert.SimpleTypeInformationMapper.resolveTypeFrom(SimpleTypeInformationMapper.java:56)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:103)
at org.springframework.data.convert.DefaultTypeMapper.getDefaultedTypeToBeUsed(DefaultTypeMapper.java:144)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:121)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:186)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:176)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:172)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:75)
at org.springframework.data.mongodb.core.MongoTemplate$ReadDbObjectCallback.doWith(MongoTemplate.java:1840)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1536)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1336)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1322)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:495)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:486)
"KafkaConsumer-46" prio=10 tid=0x00007f1178fd8000 nid=0x78dd waiting for monitor entry [0x00007f11562b8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.ClassLoader.loadClass(ClassLoader.java:403)
- waiting to lock <0x00000006c0655b58> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.springframework.util.ClassUtils.forName(ClassUtils.java:258)
at org.springframework.data.convert.SimpleTypeInformationMapper.resolveTypeFrom(SimpleTypeInformationMapper.java:56)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:103)
at org.springframework.data.convert.DefaultTypeMapper.getDefaultedTypeToBeUsed(DefaultTypeMapper.java:144)
at org.springframework.data.convert.DefaultTypeMapper.readType(DefaultTypeMapper.java:121)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:186)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:176)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:172)
at org.springframework.data.mongodb.core.convert.MappingMongoConverter.read(MappingMongoConverter.java:75)
at org.springframework.data.mongodb.core.MongoTemplate$ReadDbObjectCallback.doWith(MongoTemplate.java:1840)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1536)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1336)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1322)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:495)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:486)
at com.snapdeal.coms.timemachine.mao.TimeMachineMao.getVendorProductsForUploadId(TimeMachineMao.java:32)
at com.snapdeal.coms.timemachine.service.TimeMachineService.getVendorProductsForUploadIdAndSupc(TimeMachineService.java:35)
at com.snapdeal.coms.timemachine.event.SupcUploadIdStateUpdateEventHandler.handleEvent(SupcUploadIdStateUpdateEventHandler.java:40)
I am not sure why all the threads are blocked. I thought class get loaded only one time and later no need to take any lock .
Did you try using the ConsumerOffsetChecker to see if your consumers are still alive ? you can try the following command from inside your $KAFKA_ROOT_DIR/ folder
bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect zkhost:zkport --topic topic1
Here's few note taken from their FAQ page
If consumer offset is not moving after some time, then consumer is likely to have stopped. If consumer offset is moving, but consumer lag (difference between the end of the log and the consumer offset) is increasing, the consumer is slower than the producer. If the consumer is slow, the typical solution is to increase the degree of parallelism in the consumer. This may require increasing the number of partitions of a topic.
the above faq pages also explains possible reasons behind your consumer getting blocked, might worth take a look at it.
Problem was with the fetching data from mongo. there was huge data and pagination was not implemented and there was no socket timeout on the particular request hence threads were getting blocked.
Using ehCache 2.4.4, I seem to have gotten into a deadlock on the ehCache Segment object. From other logging, I know that the 'waiting thread', 1694 last ran anything 9 hours before this stack trace was generated. In the meantime, 1696 has gone and done a lot of other work, so this lock is definitely being held errantly.
I'm pretty confident that I am not directly locking any Segment instances directly, so I assume this is some kind of issue internal to the library. Any ideas?
"Model Executor - 1696" Id=1696 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#92eb1ed
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at java.util.concurrent.PriorityBlockingQueue.poll(Unknown Source)
at com.rtrms.application.modeling.local.BlockingTaskList.takeTask(BlockingTaskList.java:20)
at com.rtrms.application.modeling.local.ModelExecutor.executeNextTask(ModelExecutor.java:71)
at com.rtrms.application.modeling.local.ModelExecutor.run(ModelExecutor.java:46)
Locked synchronizers: count = 1
- java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync#4a3d767f
"Model Executor - 1694" Id=1694 in WAITING on lock=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync#4a3d767f
owned by Model Executor - 1696 Id=1696
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(Unknown Source)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(Unknown Source)
at net.sf.ehcache.store.compound.Segment.unretrievedGet(Segment.java:248)
at net.sf.ehcache.store.compound.CompoundStore.unretrievedGet(CompoundStore.java:191)
at net.sf.ehcache.store.compound.impl.DiskPersistentStore.containsKeyInMemory(DiskPersistentStore.java:72)
at net.sf.ehcache.Cache.searchInStoreWithStats(Cache.java:1884)
at net.sf.ehcache.Cache.get(Cache.java:1549)
at com.rtrms.amoeba.cache.DistributedModeledSecurities.get(DistributedModeledSecurities.java:57)
at com.rtrms.amoeba.modeling.AssertPersistedModeledSecurities.get(AssertPersistedModeledSecurities.java:44)
at com.rtrms.application.modeling.tasks.ExpandableModelingTask.getNextUnexecutedTask(ExpandableModelingTask.java:35)
at com.rtrms.application.modeling.local.BlockingTaskList.takeTask(BlockingTaskList.java:36)
at com.rtrms.application.modeling.local.ModelExecutor.executeNextTask(ModelExecutor.java:71)
at com.rtrms.application.modeling.local.ModelExecutor.run(ModelExecutor.java:46)
Locked synchronizers: count = 0
Turns out that calls like Cache.acquireWriteLockOnKey end up obtaining a lock on the internal Segment, so this apparent deadlock was caused by a .unlock call that wasn't in a finally block.
Editorial comment: It also implies that you can get contention trying to lock two different keys that just happened to be in the same Segment, which is pretty unfortunate.