Java DNS resolution hangs forever - java

I am using the curator framework to connect to a zookeeper server, but running into weird DNS resolution issue. Here is the jstack dump for the thread,
#21 prio=5 os_prio=0 tid=0x0000000001888800 nid=0x3a46 runnable [0x00007f25e69f3000]
java.lang.Thread.State: RUNNABLE
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at org.apache.zookeeper.client.StaticHostProvider.resolveAndShuffle(StaticHostProvider.java:117)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:81)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:1096)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:1006)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:804)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:679)
at com.netflix.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:72)
- locked <0x00000000fd761f40> (a com.netflix.curator.HandleHolder$1)
at com.netflix.curator.HandleHolder.getZooKeeper(HandleHolder.java:46)
at com.netflix.curator.ConnectionState.reset(ConnectionState.java:122)
at com.netflix.curator.ConnectionState.start(ConnectionState.java:95)
at com.netflix.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:137)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:167)
The thread seems to be stuck in the native method and never returns. Also it occurs very randomly, so haven't been able to reproduce consistently. Any ideas?

We are also trying to solve this problem. Looks like this is due to glibc bug: https://bugzilla.kernel.org/show_bug.cgi?id=99671 or the kernel bug: https://bugzilla.redhat.com/show_bug.cgi?id=1209433 depending on who you ask ;)
Also worth reading: https://access.redhat.com/security/cve/cve-2013-7423 and https://alas.aws.amazon.com/ALAS-2015-617.html
To confirm that this is indeed the case attach gdb to the java process:
gdb --pid <JavaProcessPid>
then from gdb:
info threads
find a thread that does recvmsg:
thread <HangingThreadId>
and then
backtrace
and if you see something like this then you know that glibc/kernel upgrade will help:
#0 0x00007fc726ff27cd in recvmsg () from /lib64/libc.so.6
#1 0x00007fc727018765 in make_request () from /lib64/libc.so.6
#2 0x00007fc727018b9a in __check_pf () from /lib64/libc.so.6
#3 0x00007fc726fdbd57 in getaddrinfo () from /lib64/libc.so.6
#4 0x00007fc706dd9635 in Java_java_net_Inet6AddressImpl_lookupAllHostAddr () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-0.b17.el6_7.x86_64/jre/lib/amd64/libnet.so
Update: Looks like the kernel wins. Please see this thread: http://www.gossamer-threads.com/lists/linux/kernel/2264958 for details.
Also there is a tool to verify that your system is affected by the kernel bug you can use this simple program: https://gist.github.com/stevenschlansker/6ad46c5ccb22bc4f3473
to verify:
curl -o pf_dump.c https://gist.githubusercontent.com/stevenschlansker/6ad46c5ccb22bc4f3473/raw/22cfe72f6708de1e3468c1e0fa3888aafae42db4/pf_dump.c
gcc pf_dump.c -pthread -o pf_dump
./pf_dump
And if the output is:
[26170] glibc: check_pf: netlink socket read timeout
Aborted
Then the system is affected. If the output is something like:
exit success [7618] exit success [7265] exit success
then the system is ok.
In the AWS context, upgrading AMIs to (2016.3.2) with the new kernel seems to have fixed the problem.

Related

What's the defined_classes in a Java thread dump?

Rinning jstack -e produces a dump like this (at least in Java 19):
"Thread-0" #25 [23276] prio=5 os_prio=0 cpu=0.00ms elapsed=593.30s allocated=6720B defined_classes=1 tid=0x0000023dafe60b20 nid=23276 waiting for monitor entry [0x000000796a4ff000]
What does "defined_classes" mean here?
This output is coming from the enhancement JDK-8200720. Its implementation defines this values as follow:
defined_classes=... : The number of classes defined by this thread
This might hint to a thread that loads too many classes.
It was added in commit d1b24f2ceca5 on 25 Jun 2018.
This attribute gives the number of classes defined by this thread.

beanshell - Deadlock issue

Has anyone got any experience of having deadlocks with beanshell? This is something we have been encountering recently in our production system where script execution is blocking other threads, due to it's lock on classloading via tomcat. The following is the stacktrace for the lock owner in thread dump:
"Thread-64" : 150 : BLOCKED : cpu=37812500000 : cpuLoad= 0.0
BlockedCount:93354 BlockedTime:-1 LockName:java.lang.Object#219d66b6 LockOwnerID:151 LockOwnerName:Thread-65
WaitedCount:13 WaitedTime:-1 InNative:false IsSuspended:false at org.apache.catalina.webresources.AbstractSingleArchiveResourceSet.getArchiveEntries(AbstractSingleArchiveResourceSet.java:66)
at org.apache.catalina.webresources.AbstractArchiveResourceSet.getResource(AbstractArchiveResourceSet.java:262)
at org.apache.catalina.webresources.StandardRoot.getResourceInternal(StandardRoot.java:281)
at org.apache.catalina.webresources.Cache.getResource(Cache.java:62)
at org.apache.catalina.webresources.StandardRoot.getResource(StandardRoot.java:216)
at org.apache.catalina.webresources.StandardRoot.getClassLoaderResource(StandardRoot.java:225)
at org.apache.catalina.loader.WebappClassLoaderBase.findClassInternal(WebappClassLoaderBase.java:2173)
at org.apache.catalina.loader.WebappClassLoaderBase.findClass(WebappClassLoaderBase.java:811)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1260)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1119)
at java.lang.Class.forName0(Class.java:-2)
at java.lang.Class.forName(Class.java:348)
at bsh.classpath.ClassManagerImpl.classForName(null:-1)
at bsh.NameSpace.classForName(null:-1)
at bsh.NameSpace.getImportedClassImpl(null:-1)
at bsh.NameSpace.getClassImpl(null:-1)
at bsh.NameSpace.getClass(null:-1)
at bsh.Name.consumeNextObjectField(null:-1)
at bsh.Name.toObject(null:-1)
at bsh.BSHAmbiguousName.toObject(null:-1)
at bsh.BSHAmbiguousName.toObject(null:-1)
at bsh.BSHPrimaryExpression.eval(null:-1)
at bsh.BSHPrimaryExpression.eval(null:-1)
at bsh.BSHVariableDeclarator.eval(null:-1)
at bsh.BSHTypedVariableDeclaration.eval(null:-1)
at bsh.Interpreter.eval(null:-1)
at bsh.Interpreter.eval(null:-1)
at bsh.Interpreter.eval(null:-1)
at my.package.MyClassFile(MyClassFile:2332)
I see that Groovy is a more popular choice for Java scripting, but I haven't seen many posts where it says that bsh can cause deadlocks.
It would be good to get some ideas from SO users.
Regards,
There's a fix for one dead lock in GUI does not start in Java 8 found in Beanshell (almost latest) version 2.0b5.
You can open a new issue in Beanshell project.
It may be connected to ClassManagerImpl:
Bsh has a multi-tiered class loading architecture. No class loader is
created unless/until a class is generated, the classpath is modified,
or a class is reloaded.
Note: we may need some synchronization in here

JRocket : Thread Stuck at jrockit/vm/Locks.park0

Seeing very strange behaviour. My code is executing well but not sure what happen, method is calling to other method but other method doesnt get called ( i cant see logs which is there in the first line of other method )
"jaxws-engine-1-thread-2" id=447 idx=0x73c tid=4031 prio=5 alive, parked, native_blocked, daemon
at jrockit/vm/Locks.park0(J)V(Native Method)
at jrockit/vm/Locks.park(Locks.java:2230)
at sun/misc/Unsafe.park(ZJ)V(Native Method)
at java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:196)
at java/util/concurrent/SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
at java/util/concurrent/SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)
at java/util/concurrent/SynchronousQueue.poll(SynchronousQueue.java:874)
at java/util/concurrent/ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:955)
at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
at java/lang/Thread.run(Thread.java:682)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
Code -
public static void startMicroSessionTimer(TimerName timerName, Data Data) {
logger.debug("Starting a micro-timer for timer name: " + timerName);
//Start a micro timer to process the soap response in worker thread
SipApplicationSession applicationSession = Util.getAppSession((String)Data.get(DataAttribute.ID));
Util. AbcTimer (applicationSession, 1L, timerName.getTimerName());
}
public static void AbcTimer(SipApplicationSession appSession,
long timeInMillies, String timerName) {
logger.debug("Inside AbcTimer”);
//Some Logic
}
Logs -
16 May 2018 09:13:07,506 [jaxws-engine-1-thread-12] DEBUG -----SOME LOGS…..
16 May 2018 09:13:07,506 [jaxws-engine-1-thread-12] DEBUG [AbcUtils] [ODhlNjQ0ZjAzMTMzN2U5MGNhMTE2MTgxOTg2MTdmYjA.] Starting a micro-timer for timer name: HAHAHA
Not able to see any log after above line for Thread jaxws-engine-1-thread-12. As per log this log Inside AbcTimer should come as it is in the starting of called method ie AbcTimer. There is no Exception occured.
I have taken ThreadDump as well which I have posted above.
Not Sure but think that it is a machine specific issue. Also google it and saw that this type of issue occurred to other people as well but i didnt get the solution.
Using below JRocket Version
java version "1.6.0_141"
Java(TM) SE Runtime Environment (build 1.6.0_141-b12)
Oracle JRockit(R) (build R28.3.13-15-173128-1.6.0_141-20161219-1845-linux-x86_64, compiled mode)

Why WebSphere's threads hangs up?

I have WAS 7 and Filenet CE 5.1 and have a troubles.
Why WebSphere's threads hangs up. Is it JDBC driver error?
Could you kindly advice me. Thank a lot!
[22.06.16 13:14:58:921 YEKT] 0000001d ThreadMonitor W WSVR0605W: Thread "WebContainer : 15" (00000047) was active for 631301 msec and can be hanged up. Total threads that can be hang up: 69.
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:140)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1782)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:4838)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:6150)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:402)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:350)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:332)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecute(WSJdbcPreparedStatement.java:942)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.execute(WSJdbcPreparedStatement.java:618)
at com.filenet.engine.dbpersist.DBExecutionElement.execute(DBExecutionElement.java:218)
at com.filenet.engine.dbpersist.DBExecutionContext.getNextResult(DBExecutionContext.java:106)
at com.filenet.engine.dbpersist.DBStatementList.executeStatements(DBStatementList.java:161)
at com.filenet.engine.persist.DBStatementList2.executeStatementsNoResult(DBStatementList2.java:57)
at com.filenet.engine.persist.IndependentPersister.executeChangeWork(IndependentPersister.java:409)
at com.filenet.engine.persist.IndependentPersister.executeChange(IndependentPersister.java:225)
at com.filenet.engine.persist.SubscribablePersister.executeChange(SubscribablePersister.java:172)
at com.filenet.engine.jca.impl.RequestBrokerImpl.executeChanges(RequestBrokerImpl.java:1266)
at com.filenet.engine.jca.impl.RequestBrokerImpl.executeChanges(RequestBrokerImpl.java:1146)
at com.filenet.engine.ejb.EngineCoreBean._executeChanges(EngineCoreBean.java:618)
the stack indicates that the thread is waiting to recieve data from your database.
Possible causes could include:
the database is down (or unable to communicate over the network)
a deadlock has occurred in the database
you are fetching some really big data set and/or doing so inefficiently such that the statement is taking an excessive amount of time. You never mentioned if your query ever completes or not, but if it does, I suspect this option is the suspect.

Infinite 100% CPU usage at java.io.FileInputStream.readBytes(Native Method)

I'm right now debugging a program which has two threads per one external process, and those two threads keep on reading Process.getErrorStream() and Process.getInputStream() using a while ((i = in.read(buf, 0, buf.length)) >= 0) loop.
Sometimes when the external process crashes due to a JVM crash (see these hs_err_pid.log files), those threads which read the stdout/stderr of that external process begin consuming 100% CPU and never exit. The loop body is not being executed (I've added a logging statement there), so the infinite loop appears to be inside the native method java.io.FileInputStream.readBytes.
I've reproduced this on both Windows 7 64-bit (jdk1.6.0_30 64-bit, jdk1.7.0_03 64-bit), and Linux 2.6.18 (jdk1.6.0_21 32-bit). The code in question is here and it is used like this. See those links for the full code - here are the interesting bits:
private final byte[] buf = new byte[256];
private final InputStream in;
...
int i;
while ((i = this.in.read(this.buf, 0, this.buf.length)) >= 0) {
...
}
The stack traces look like
"PIT Stream Monitor" daemon prio=6 tid=0x0000000008869800 nid=0x1f70 runnable [0x000000000d7ff000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:220)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x00000007c89d6d90> (a java.io.BufferedInputStream)
at org.pitest.util.StreamMonitor.readFromStream(StreamMonitor.java:38)
at org.pitest.util.StreamMonitor.process(StreamMonitor.java:32)
at org.pitest.util.AbstractMonitor.run(AbstractMonitor.java:19)
Locked ownable synchronizers:
- None
or
"PIT Stream Monitor" daemon prio=6 tid=0x0000000008873000 nid=0x1cb8 runnable [0x000000000e3ff000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:220)
at org.pitest.util.StreamMonitor.readFromStream(StreamMonitor.java:38)
at org.pitest.util.StreamMonitor.process(StreamMonitor.java:32)
at org.pitest.util.AbstractMonitor.run(AbstractMonitor.java:19)
Locked ownable synchronizers:
- None
With the Sysinternals Process Explorer I was able to get native stack traces of those threads. Most often, over 80% of the time, the stack trace looks like this:
ntdll.dll!NtReadFile+0xa
KERNELBASE.dll!ReadFile+0x7a
kernel32.dll!ReadFile+0x59
java.dll!handleRead+0x2c
java.dll!VerifyClassCodesForMajorVersion+0x1d1
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
This also happens quite often:
ntdll.dll!RtlNtStatusToDosErrorNoTeb+0x52
ntdll.dll!RtlNtStatusToDosError+0x23
KERNELBASE.dll!GetCurrentThreadId+0x2c
KERNELBASE.dll!CreatePipe+0x21a
kernel32.dll!ReadFile+0x59
java.dll!handleRead+0x2c
java.dll!VerifyClassCodesForMajorVersion+0x1d1
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
ntdll.dll!RtlNtStatusToDosErrorNoTeb+0x42
ntdll.dll!RtlNtStatusToDosError+0x23
KERNELBASE.dll!GetCurrentThreadId+0x2c
KERNELBASE.dll!CreatePipe+0x21a
kernel32.dll!ReadFile+0x59
java.dll!handleRead+0x2c
java.dll!VerifyClassCodesForMajorVersion+0x1d1
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
And sometimes it's executing this part of the code:
java.dll!VerifyClassCodesForMajorVersion+0xc3
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
java.dll!Java_sun_io_Win32ErrorMode_setErrorMode+0x847c
java.dll!VerifyClassCodesForMajorVersion+0xd7
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
jvm.dll!JNI_GetCreatedJavaVMs+0x1829f
java.dll!VerifyClassCodesForMajorVersion+0x128
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
jvm.dll+0x88c1
jvm.dll!JNI_GetCreatedJavaVMs+0x182a7
java.dll!VerifyClassCodesForMajorVersion+0x128
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
java.dll!VerifyClassCodesForMajorVersion+0x10b
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
jvm.dll!JNI_CreateJavaVM+0x1423
java.dll!VerifyClassCodesForMajorVersion+0x190
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
jvm.dll+0x88bf
jvm.dll!JNI_CreateJavaVM+0x147d
java.dll!VerifyClassCodesForMajorVersion+0x190
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
java.dll!VerifyClassCodesForMajorVersion+0x1aa
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
java.dll!VerifyClassCodesForMajorVersion+0x1c3
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
java.dll!VerifyClassCodesForMajorVersion+0x224
java.dll!Java_java_io_FileInputStream_readBytes+0x1d
Any ideas how to solve this problem? Is this a known problem with the JVM? Is there a workaround?
I've not yet been able to reproduce this locally, but the two possible workarounds might be
Play around with in.available().
Redirect stout and stderr in the external process to a socket and
read this from the controlling process instead.

Categories

Resources