Deadlock acquiring locks - java

I've got a thread dump for a deadlock and I can't see the cause. On first inspection it looks like some client code simply fails to acquire the lock on a ReentrantLock which is owned by MyClass:
"qtp1450652220-77" Id=77 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync#1e319fef owned by "pool-2-thread-2" Id=1651
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync#1e319fef
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at com.mycode.MyClass.methodName(MyClass.java:1008)
However the owning thread's dump is:
"pool-2-thread-2" Id=1651 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#56171f7a
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#56171f7a
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Number of locked synchronizers = 1
- java.util.concurrent.locks.ReentrantLock$NonfairSync#1e319fef
Sure enough the lock on the ReentrantLock is listed at the bottom. But what surprises me is there's none of my client code in the thread dump. There's no indication as to how that ReentrantLock was acquired in the first place, so how can I fix it?
The code in MyClass is:
public Collection<String> methodName() {
interruptLock.lock();
try {
/* do stuff */
return tagsToReturn;
} finally {
interruptLock.unlock();
}
}
Line 1008 is the interruptLock.lock(); line.

It is possible that you have to capture the thread stack with jstack and -l option:
https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstack.html
-l
Long listing. Prints additional information about locks such as a list of owned java.util.concurrent ownable synchronizers. See the
AbstractOwnableSynchronizer class description at
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/AbstractOwnableSynchronizer.html

Related

Deadlock while using ORMLite

I have a multithreaded server Java application which receives requests and does queries/updates to a Postgres DB through OrmLite. Under load, several requests come in which are interested in the same DB row. Thread1 might select, change values and then update. At the same time Thread2 tries something similar. This is currently not synchronized and not done inside a transaction. Without surprise, the update of Thread1, might not be seen by Thread2. That's OK (Thread2 can overwrite results from Thread1) and is not my problem.
However, when running this application for some time, I get to a deadlock situation, which results in all available DB connections being used up (and then crash). It seems it is not a standard deadlock (with a circular lock dependency), instead most threads are waiting on a lock, and the thread holding this lock seems to be waiting for a socket read (which probably does not happen, see below).
Using
OrmLite 5.1,
JVM is Java 1.8.0_251 Hotspot Client VM,
Postgres JDBC 42.2.9
How should I go forward to fix this?
Below are relevant parts of the thread dump (analyzed by https://spotify.github.io/threaddump-analyzer )
The thread holding the main lock (0x00000000c0179e18), seems to be waiting on a socket:
"RaspService-2089": running, holding [0x00000000c0179e18, 0x00000000c2c1f6c0]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:335)
at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:505)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:141)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:211)
at org.postgresql.Driver.makeConnection(Driver.java:458)
at org.postgresql.Driver.connect(Driver.java:260)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at com.j256.ormlite.jdbc.JdbcConnectionSource.makeConnection(JdbcConnectionSource.java:266)
at com.j256.ormlite.jdbc.JdbcPooledConnectionSource.getReadWriteConnection(JdbcPooledConnectionSource.java:140)
at com.j256.ormlite.dao.BaseDaoImpl.update(BaseDaoImpl.java:408)
at vgs.vigi.servlet.OrmLite.update(OrmLite.java:361)
at vgs.vigi.servlet.CachedDao.update(CachedDao.java:287)
at vgs.vigi.ble.RaspClient.run(RaspClient.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
26 threads waiting to free connections wait on that lock with stacks like:
"pool-4-thread-96": waiting to acquire [0x00000000c0179e18], holding [0x00000000c0b250a8]
at com.j256.ormlite.jdbc.JdbcPooledConnectionSource.releaseConnection(JdbcPooledConnectionSource.java:168)
at com.j256.ormlite.dao.BaseDaoImpl.create(BaseDaoImpl.java:331)
at vgs.vigi.servlet.OrmLite.create(OrmLite.java:181)
at vgs.vigi.servlet.CachedDao.create(CachedDao.java:126)
at vgs.vigi.logic.Notification.sendNotification(Notification.java:491)
at vgs.vigi.logic.Notification$1.run(Notification.java:640)
at vgs.lib.MyTimer$2.run(MyTimer.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
More threads waiting to release connections
"RaspService-828": waiting to acquire [0x00000000c0179e18], holding [0x00000000c1187f88]
at com.j256.ormlite.jdbc.JdbcPooledConnectionSource.releaseConnection(JdbcPooledConnectionSource.java:168)
at com.j256.ormlite.dao.BaseDaoImpl.update(BaseDaoImpl.java:412)
at vgs.vigi.servlet.OrmLite.update(OrmLite.java:361)
at vgs.vigi.servlet.CachedDao.update(CachedDao.java:287)
at vgs.vigi.ble.CmdRaspExcutor$8.exec(CmdRaspExcutor.java:318)
at vgs.vigi.ble.RaspClient.run(RaspClient.java:182)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
While many also try to acquire connections:
"RaspService-991": waiting to acquire [0x00000000c0179e18], holding [0x00000000c1624058]
at com.j256.ormlite.jdbc.JdbcPooledConnectionSource.getReadWriteConnection(JdbcPooledConnectionSource.java:125)
at com.j256.ormlite.dao.BaseDaoImpl.update(BaseDaoImpl.java:408)
at vgs.vigi.servlet.OrmLite.update(OrmLite.java:361)
at vgs.vigi.servlet.CachedDao.update(CachedDao.java:287)
at vgs.vigi.ble.RaspClient.run(RaspClient.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Some GC is happening (inconsistent means Thread is "BLOCKED (on object monitor)" without waiting for anything)
"qtp1719311117-931": inconsistent?, holding [0x00000000eabfc510]
at java.lang.Runtime.gc(Native Method)
at java.lang.System.gc(System.java:993)
at vgs.vigi.servlet.OrmLite.clearCache(OrmLite.java:33)
at vgs.vigi.servlet.OrmLite.dao(OrmLite.java:215)
at vgs.vigi.servlet.OrmLite.getAll(OrmLite.java:300)
at vgs.vigi.servlet.CachedDao.getAll(CachedDao.java:227)
at vgs.lib.Ajax.sGetAll(Ajax.java:101)
...
And also GC in another thread (which is explicitely coded in our code - not sure why though)
"RaspService-1882": running, holding [0x00000000c05c84c8, 0x00000000c2bed7b0]
at java.lang.Runtime.gc(Native Method)
at java.lang.System.gc(System.java:993)
at vgs.vigi.servlet.OrmLite.clearCache(OrmLite.java:33)
at vgs.vigi.servlet.OrmLite.dao(OrmLite.java:215)
at vgs.vigi.servlet.OrmLite.update(OrmLite.java:360)
at vgs.vigi.servlet.CachedDao.update(CachedDao.java:285)
at vgs.vigi.ble.RaspClient.run(RaspClient.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Can I expect OrmLite to be safe with a multithreaded approach as described?
Are there best practices to avoid this issue (while still keeping the multithreaded nature of the server)?
Update
I have a thread dump of a second run, which looks a bit different.
Here the thread holding the lock that everyone is waiting for is inconsistent
"RaspService-1405": inconsistent?, holding [0x00000000c01da9b8, 0x00000000c1cfed28]
With a raw stack of:
"RaspService-1405" #1469 prio=5 os_prio=0 tid=0x0000000021579800 nid=0xa2f4 waiting for monitor entry [0x000000002b36e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.j256.ormlite.jdbc.JdbcPooledConnectionSource.getReadWriteConnection(JdbcPooledConnectionSource.java:125)
- locked <0x00000000c01da9b8> (a java.lang.Object)
at com.j256.ormlite.dao.BaseDaoImpl.update(BaseDaoImpl.java:408)
at vgs.vigi.servlet.OrmLite.update(OrmLite.java:361)
at vgs.vigi.servlet.CachedDao.update(CachedDao.java:287)
at vgs.vigi.ble.RaspClient.run(RaspClient.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000000c1cfed28> (a java.util.concurrent.ThreadPoolExecutor$Worker)
There is also a RUNNING thread which reads from a connection. Not sure whether that is blocked:
"RaspService-1410": running, holding [0x00000000ed1877c8, 0x00000000c1cfe208]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:335)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2008)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:158)
at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:124)
at com.j256.ormlite.jdbc.JdbcDatabaseConnection.update(JdbcDatabaseConnection.java:294)
at com.j256.ormlite.jdbc.JdbcDatabaseConnection.update(JdbcDatabaseConnection.java:217)
at com.j256.ormlite.stmt.mapped.MappedUpdate.update(MappedUpdate.java:101)
at com.j256.ormlite.stmt.StatementExecutor.update(StatementExecutor.java:472)
at com.j256.ormlite.dao.BaseDaoImpl.update(BaseDaoImpl.java:410)
at vgs.vigi.servlet.OrmLite.update(OrmLite.java:361)
at vgs.vigi.servlet.CachedDao.update(CachedDao.java:287)
at vgs.vigi.ble.RaspClient.run(RaspClient.java:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Update 2:
Here's a graph of sessions as seen from pgAdmin
seems to be waiting on a socket:
It waits until new connection is created, also holding lock inside pool.
Do you have session limit in Postgres? If you do I suggest you set it slightly bigger than pool size in Java.
Otherwise, it is easy to have deadlock if pool size is equal to sessions limit size
All connections are taken (Java pool limit is reached, session count limit is reached)
Application tries to get new connection, takes pool lock and blocked by PG
Application tries to release connection, it cannot take pool lock so cannot release connection to PG, so session limit is still reached

Java: threads of pool get blocked without finishing work

I have a "blocked thread" problem. Let me explain how this application work. This is a file processor, it reads files from directories, do something with them and writes an output to another directory. It has threads that read files from different directories an add them to queues. Each of this "reading thread" has an ExecutorService built with Executors.newFixedThreadPool(someNumber) of threads that consumes files from queues, process them and write the output.
For example, now is configured in this way:
ReadingThread1 with 2 ProcessingThreads (pool-2-thread-x)
ReadingThread2 with 5 ProcessingThreads (pool-3-thread-x)
ReadingThread3 with 1 ProcessingThread (pool-4-thread-1)
ReadingThread4 with 1 ProcessingThread (pool-5-thread-1)
This application has been working for years and now I added another behavior to the file processing. The problem is that after many hours (sometimes 6, 8 or 20 hours) of running some of the threads get blocked and we have to rollback to the previous version.
This two threads were blocked more or less at the same time.
Name: pool-4-thread-1
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#64336985
Total blocked: 512 Total waited: 7,797
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Last logged line:
13:32:49:830 [pool-4-thread-1] DEBUG ClassB:79 - MessageB
Name: pool-2-thread-2
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#65ffe5b2
Total blocked: 528 Total waited: 7,795
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Last logged line:
13:35:18:997 [pool-2-thread-2] INFO ClassA:110 - MessageA
This three threads were blocked more or less at the same time.
Name: pool-3-thread-1
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#2d9e6700
Total blocked: 1,506 Total waited: 7,421
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Last logged line:
11:55:22:477 [pool-3-thread-1] INFO ClassA:110 - MessageA
Name: pool-3-thread-4
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#2d9e6700
Total blocked: 1,969 Total waited: 7,456
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Last logged line:
11:55:03:962 [pool-3-thread-4] INFO ClassA:110 - MessageA
Name: pool-3-thread-5
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#2d9e6700
Total blocked: 1,747 Total waited: 7,306
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
Last logged line:
11:53:32:769 [pool-3-thread-5] INFO ClassC:117 - MessageC
The threads are blocked at sun.misc.Unsafe.park(Native Method). I read that it's just the thread waiting for a new task to process. But in this case, when the application start running thread pools are created and assigned for a task that does not end until the application is stopped. Reading the logs I can see that the thread went to that waiting state without finishing the job (MessageA, MessageB and MessageC are not the last messages to log).
I don't understand how those threads stopped doing what they were doing (at different lines of code) and entered in that waiting state.
Thank you!

Thread is in WAITING state: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

My application is under heavy load and I am getting below logs for
sudo -u tomcat jstack <java_process_id>
The below thread is consuming the messages from Kafka, and it got stuck. Since this thread is in WAITING state, no more kafka messages are being consumed.
"StreamThread-3" #91 daemon prio=5 os_prio=0 tid=0x00007f9b5c606000 nid=0x1e4d waiting on condition [0x00007f9b506c5000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000073aad9718> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
at ch.qos.logback.core.AsyncAppenderBase.put(AsyncAppenderBase.java:160)
at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:148)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
at ch.qos.logback.classic.Logger.error(Logger.java:538)
at com.abc.system.solr.repo.AbstractSolrRepository.doSave(AbstractSolrRepository.java:316)
at com.abc.system.solr.repo.AbstractSolrRepository.save(AbstractSolrRepository.java:295)
I also found this post
WAITING at sun.misc.Unsafe.park(Native Method)
but it didn't help me in my case.
What else I could investigate to get more details in such case?
I also ran into same problem. But, luckily I got my issue resolved by playing around with the size of pool and number of producer and consumer.
Try to check if there is any way to configure following.
Size of your thread pool
Number of consumers/producers (if we can configure in kafka)
Make sure that thread pool should have enough threads to serve consumer and producer both.

Too many parking to wait threads

I am analyzing an application hang, and through the Thread Dumps, I am having 90% of worker threads in this state:
"pool-3-thread-352" #13082 prio=5 os_prio=0 tid=0x00007ff6407fc800
nid=0x1e94 waiting on condition [0x00007ff5a53b4000]
java.lang.Thread.State: TIMED_WAITING (parking) at
sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000044af6bcd0> (a java.util.concurrent.SynchronousQueue$TransferStack) at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-21-thread-214" #13081 prio=5 os_prio=0 tid=0x0000000002e6a800
nid=0x1e92 waiting on condition [0x00007ff5a54b5000]
java.lang.Thread.State: TIMED_WAITING (parking) at
sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004ad95fba8> (a java.util.concurrent.SynchronousQueue$TransferStack) at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
As per my understanding, these are basically request worker threads on a tomcat sever, waiting on a blocking queue until a request comes. When a request comes, one thread will get permit and will run to execute the request.
So if no tasks are available these threads will wait (park) on the queue. When a task is available, one worker thread will get permit and become a running thread. It will execute the task.
But these threads still can cause issue if too many threads in the thread pool are created and they will be eating up resource.
Zero Deadlocks found, but still the app hanging, with almost Exceptions everywhere of type:
javax.ws.rs.ProcessingException: RESTEASY004655: Unable to invoke request
at org.jboss.resteasy.client.jaxrs.engines.ApacheHttpClient4Engine.invoke(ApacheHttpClient4Engine.java:287)
at com.agfa.orbis.core.client.service.rest.ClientHttpEngineWrapper.invoke(ClientHttpEngineWrapper.java:59)
at org.jboss.resteasy.client.jaxrs.internal.ClientInvocation.invoke(ClientInvocation.java:436)
at org.jboss.resteasy.client.jaxrs.internal.ClientInvocationBuilder.get(ClientInvocationBuilder.java:159)
at com.agfa.hap.crs.commons.client.rest.RestClient.getResponse(RestClient.java:238)
at com.agfa.hap.crs.commons.client.rest.RestClient.get(RestClient.java:70)
at com.agfa.hap.crs.alertsystem.client.orbis.ForwardedUserAlertsMonitor.getSharedAlertState(ForwardedUserAlertsMonitor.java:88)
at com.agfa.hap.crs.alertsystem.client.orbis.ForwardedUserAlertsMonitor.getCurrentAlertState(ForwardedUserAlertsMonitor.java:79)
at com.agfa.hap.crs.alertsystem.client.orbis.AbstractAlertMonitor.requestMonitorUpdate(AbstractAlertMonitor.java:275)
at com.agfa.hap.crs.alertsystem.client.orbis.AbstractAlertMonitor$10.execute(AbstractAlertMonitor.java:823)
at com.agfa.hap.crs.alertsystem.client.orbis.AbstractAlertMonitor$Task.call(AbstractAlertMonitor.java:952)
at com.agfa.hap.crs.alertsystem.client.orbis.AbstractAlertMonitor$Task.call(AbstractAlertMonitor.java:942)
at com.agfa.hap.crs.alertsystem.client.orbis.AbstractAlertMonitor$TaskWrapper.call(AbstractAlertMonitor.java:925)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:992)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:535)
at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:403)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.jboss.resteasy.client.jaxrs.engines.ApacheHttpClient4Engine.invoke(ApacheHttpClient4Engine.java:283)
... 16 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
at sun.security.ssl.InputRecord.read(InputRecord.java:505)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
... 29 more
Am looking to link these exceptions through the threads activity !!!
Any idea why the connection is closed incorrectly ?!!!!
These Threads are waiting for something to happen. As you wrote:
these are basically request worker threads on a tomcat sever, waiting on a blocking queue until a request comes
As far as I understand, this happens under low load. So a too big ThreadPool will not be a problem. If you're really worried about it, you can configure a maxIdleTime for ThreadPools. So Tomcat is going to kill the old idle threads - until the ThreadPool reaches the minSpareThreads.
This is the thread pool documentation for Tomcat 8.
This is the thread pool documentation for Tomcat 7.
This is the thread pool documentation for Tomcat 6.

Java Deadlock during a synchronized on a local resource?

I'm seeing a problem with multiple Threads deadlocking on the same line of code.
I cannot reproduce the problem locally or in any test, but yet Thread Dumps from Production have show the problem quite clearly.
I can't see why the Threads would become blocked on the synchronized line below, since there is no other synchronization on the Object in the call stack or in any other Thread. Does anyone have any idea what is going on, or how I can even reproduce this issue (Currently trying with 15 Threads all hitting trim() in a loops, while processing 2000 tasks through my Queue - But unable to reproduce)
In the Thread dump below, I think the multiple Threads with the 'locked' status may be a manifestation of Java Bug: http://bugs.java.com/view_bug.do?bug_id=8047816 where JStack reports Threads in wrong state.
(I'm using JDK Version: 1.7.0_51)
Cheers!
Here is a view of the Threads in the Thread dump.....
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9c2680> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-2" daemon prio=10 tid=0x00002aca001a5000 nid=0x6a3a waiting for monitor entry [0x0000000052d83000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ed518> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-1" daemon prio=10 tid=0x00002aca00183000 nid=0x6a39 waiting for monitor entry [0x0000000052c42000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Locked ownable synchronizers:
- <0x00002aaf5f9ecde8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"xxx>Job Read-0" daemon prio=10 tid=0x0000000006a83000 nid=0x6a36 waiting for monitor entry [0x000000005287f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- waiting to lock <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
at com.mycompany.collections.CustomQueue.itemProcessed(CustomQueue.java:302)
at com.mycompany.collections.CustomQueue.trackCompleted(CustomQueue.java:147)
at java.util.concurrent.ThreadPoolExecutor.afterExecute(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Here is the Java code extracted, which shows where the error is...
public class Deadlock {
final Deque<Object> delegate = new ArrayDeque<>();
final long maxSize = Long.MAX_VALUE;
private final AtomicLong totalExec = new AtomicLong();
private final Map<Object, AtomicLong> totals = new HashMap<>();
private final Map<Object, Deque<Long>> execTimes = new HashMap<>();
public void trim() {
//Possible optimization is evicting in chunks, segmenting by arrival time
while (this.totalExec.longValue() > this.maxSize) {
final Object t = this.delegate.peek();
final Deque<Long> execTime = this.execTimes.get(t);
final Long exec = execTime.peek();
if (exec != null && this.totalExec.longValue() - exec > this.maxSize) {
//If Job Started Inside of Window, remove and re-loop
remove();
}
else {
//Otherwise exit the loop
break;
}
}
}
public Object remove() {
Object removed;
synchronized (this.delegate) { //4 Threads deadlocking on this line !
removed = this.delegate.pollFirst();
}
if (removed != null) {
itemRemoved(removed);
}
return removed;
}
public void itemRemoved(final Object t) {
//Decrement Total & Queue
final AtomicLong catTotal = this.totals.get(t);
if (catTotal != null) {
if (!this.execTimes.get(t).isEmpty()) {
final Long exec = this.execTimes.get(t).pollFirst();
if (exec != null) {
catTotal.addAndGet(-exec);
this.totalExec.addAndGet(-exec);
}
}
}
}
}
From the documentation for HashMap
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
(Emphasis theirs)
You are both reading and writing to/from the Maps in an unsynchronized manner.
I see no reason to assume that your code is thread safe.
I suggest that you have an infinite loop in trim caused by this lack of thread safety.
Entering a synchronized block is relatively slow, so it's likely that a thread dump will always show at least a few threads waiting to obtain the lock.
Your first thread is holding the lock while waiting for pollFirst.
"xxx>Job Read-3" daemon prio=10 tid=0x00002aca001a6800 nid=0x6a3b waiting for monitor entry [0x0000000052ec4000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.mycompany.collections.CustomQueue.remove(CustomQueue.java:101)
- locked <0x00002aae6465a650> (a java.util.ArrayDeque)
at com.mycompany.collections.CustomQueue.trim(CustomQueue.java:318)
The other threads are waiting to obtain the lock.
You will need to provide the entire thread dump to determine which thread is holding the lock on 0x0000000052ec4000, which is what is preventing your pollFirst call from returning.
In order to deadlock, you need at least two be locking on at least two objects in the same thread at the same time which is something the code you posted doesn't appear to do. The bug you point to may apply but as I read it, it's a cosmetic issue and that the threads are not 'locked', but waiting to acquire a lock on the object in question (the ArrayDeque). You should see a "deadlock" message in your logs if you have a deadlock. It will call out the two threads that are blocking each other.
I don't believe the thread dump says there are deadlocks. It's simply telling you how many threads are waiting on the monitor at the moment you took the dump. Since only one thread may have the monitor at a given moment, it shouldn't be very surprising.
What behavior are you seeing in your application that lead you to believe you have a deadlock? There's a lot missing from your code particularly where the objects in the delegate Dequeue are coming from. My guess is you don't have an outright deadlock but some other issue that may look like a deadlock.
Thanks to the responses here, it became clear that the issue was none Thread Safe usage of multiple Collections.
To resolve the issue, I've made the trim method synchronized and replaced usage of HashMap with ConcurrentHashMap and ArrayDeque with LinkedBlockingDeque
(Concurrent Collections FTW!)
A further planned enhancement is to change the usage of 2 separate Maps into a single Map containing a Custom Object, that way keeping the operations (in itemRemoved) atomic.

Categories

Resources