I have a Java process ran from shell script on Ubuntu 14.04, that crashes abnormally without any visible reason and no logs. The Java program uses Twitter's Userstream API. I've been looking for traces in /var/log but did not find anything that could explicitly point to a problem. Please advise how approach this issue and find any useful log that could indicate the problem.
Also, this is my Java version:
Java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
The first step in troubleshooting HotSpot crashes is to locate a crash log, alternatively called fatal error log. By default, these are named hs_err_pid%pid.log, with %pid being the process ID of the crashed VM, and can usually be found in the cwd where the process started. The name and location of said log can also be influenced through the -XX:ErrorFile-VM parameter. Example:
-XX:ErrorFile=/var/log/java/java_error%pid.log.
You can find more information about the crash logs themselves here.
The contents of that log can give an indication on what happened, and where approximately. Be prepared tho, that situations in which a HotSpot crash occurs, are usually not caused by mere bugs in the hosted Java program. An extensive guide to interpreting these crash logs can be found here here.
If no such log can be located after a crash, odds are the VM did not crash, but terminated normally. In that case, a remote debugging session might be in order. Remote debugging is detailed here and also has some topics on SO already.
(There's the very remote chance that circumstances do not permit the log to be written, i.e. no available file handles during the crash.)
The problem was that my process terminated when I disconnected SSH from the server. In order to run the process in background and prevent process termination when disconnected use 'nohup':
~$ nohup process_name &
I found the answer in this thread.
Related
I've encountered an interesting problem when running the following piece of Java code:
File.createTempFile("temp.cnt.ent", "cnt.feat.tmp", directory);
The following exception is thrown:
Exception in thread "main" java.io.IOException: Identifier removed
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.checkAndCreate(File.java:1704)
at java.io.File.createTempFile(File.java:1792)
I have never had this problem before and Google doesn't seem to have much for me. The system runs Scientific Linux release 5.8 (Linux 2.6.18-274.3.1.el5 x86_64) and the Java version is
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
The file system (Lustre) has 80TB of free space.
Any suggestions are greatly appreciated.
You are encountering synchronisation errors between the various instances. Lustre doesn't support file locking, which is probably what java.io.UnixFileSystem.createFileExclusively uses to avoid concurrency woes. (I say "probably" because it doesn't appear to be documented anywhere.)
Without locking it's only a matter of time until file operations interfere with each other. Reducing the number of instances is not a solution because it just makes it less likely to occur.
I believe the solution is to insure that each instance creates files in a different sub-directory
I guess that you see an EIDRM. At least the error message looks like that. The IOException wraps an error message from the underlying native libraries.
This is not a real answer to your problem, but maybe a useful hint.
http://docs.oracle.com/cd/E19455-01/806-1075/msgs-1432/index.html has some information an additional pointers.
The problem seems to be related to having too many instances of the application at a time (each in a separate VM). For some unknown reason the OS refuses to allow the creation of a temp file. Workaround: run less instances.
One day ago, after a few months of normal working, our java app starts to crash occasionally with the following error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (safepoint.cpp:247), pid=2075, tid=140042095163136
# guarantee(PageArmed == 0) failed: invariant
#
# JRE version: 6.0_23-b05
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode linux-amd64 compressed oops)
# An error report file with more information is saved as:
# /var/chat/jSocketer/build/hs_err_pid2075.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
I looked in hs_err_pid2075.log and saw that there was an active thread, that processed a network communication. However there wasn't any application or environment changes done in the last few months. Also there wasn't any load growth.
What can I do to understand, what is the reason of crash? Are there any common steps to investigate a jvm crash?
UPD
http://www.wuala.com/ubear/public
The crash is in the JVM, not in external native code. However, the operation it crashed on has been initiated by and external DLL.
This line in the hs_err_pid file explains the operation that crashed:
VM_Operation (0x00007f5e16e35450): GetAllStackTraces, mode: safepoint, requested by thread 0x0000000040796000
Now, thread 0x0000000040796000 is
0x0000000040796000 JavaThread "YJPAgent-Telemetry" daemon [_thread_blocked, id=2115, stack(0x00007f5e16d36000,0x00007f5e16e37000)]
which is a thread created by Yourkit. "GetAllStackTraces" is something that a profiler needs to call in order to do sampling. If you remove the profiler, the crash will not happen.
With this information It's not possible to say what causes the crash, but you can try the following: Remove all -XX VM parameters, -verbose:gc and the debugging VM parameters. They might interfere with the profiling interface of the JVM.
Update
Code that calls java.lang.Thread#getAllStackTraces() or java.lang.Thread#getStackTrace() may trigger the same crash
The two times I've witnessed recurring JVM crashes were both due to hardware failure, namely RAM. Running a memtest utility is the first thing I'd try.
I can see from the error report that you have the YourKit agent loaded. Its telemetry thread is mentioned as the requester for the operation that appears to fail. Try running the application without the YJP agent to see if you can still reproduce the crash.
Generally, JVM crashes are pretty hard to diagnose. They could happen due to a bug in some JNI code or in the JRE itself. If you suspect the latter, it may be worth submitting a bug report to Oracle.
Either way, I'd recommend to upgrade to the latest release of Java 6 to make sure it's not a known issue that's already been fixed. At the time of this writing the current release is Java 6 update 29.
If you're not messing with anything that would cause this directly (which basically means using native code or libraries that call native code) then it's almost always down to a bug in the JVM or hardware issue.
If it's been running fine for ages and has now started to crash then it seems to me like the hardware issue is the most likely of the two. Can you run it on another machine to rule out the issue? Of course, it definitely wouldn't hurt to upgrade to the latest Java update as well.
Switching to another version of linux-kernel "fixes" the JVM crush problem (http://forum.proxmox.com/threads/6998-Best-strategy-to-handle-strange-JVM-errors-inside-VPS?p=40286#post40286). It helped me with my real server. There was Ubuntu server 10.04 LTS OS on it with kernel 2.6.32-33 version. So kernel update resolved this issue. JVM has no crash anymore.
I frequently use the sendsignal tool for WebSphere Application Server processes (server, launchClient, wsadmin, etc.) in order to generate heap dumps. However, sendsignal doesn't work on 64-bit machines. Does anyone know of an alternative for sending the ctrl-break to a remote process?
Update: Bengt points out that this is basically the same as the following question. So, I guess my question is: has anyone found a way around this limitation in the past two years?
Can I send a ctrl-C (SIGINT) to an application on Windows?
Why use a control break for the dumps why not use the commands that are provided precisely to do this kind of activity?
https://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/tprf_generatingheapdumps.html
$AdminControl invoke $objectName generateHeapDump
This provides you with the required info and you get the dumps that you are after.
HTH
Manglu
http://www.latenighthacking.com/projects/2003/sendSignal/
Latest comments link to working x64 versions for windows 2003 and windows 2008 too:
(2013-9-26) : Both, the 32-bit and 64-bit EXE versions can be
downloaded from the following link:
https://github.com/walware/statet/tree/master/de.walware.statet.r.console.core/win32
-- GeorgeP (2014-3-7) : I built both 32-bit and 64-bit version with Ctrl-C, it's called SendSignalCtrlC.exe and you can download it at:
https://dl.dropboxusercontent.com/u/49065779/sendsignalctrlc/x86/SendSignalCtrlC.exe
https://dl.dropboxusercontent.com/u/49065779/sendsignalctrlc/x86_64/SendSignalCtrlC.exe
-- Juraj Michalak
In certain well-understood circumstances, our application will open too many sockets (database connections) and reach the maximum open files that the OS allows. We understand this; we are fixing the issue and also bumping up the limit.
What we can't explain is why parts of our application don't recover even after the number of connections abates and we're well within the limit.
In this case, it's an application running under Tomcat.
When this happens, we first start seeing "Too many open files" errors:
SEVERE: Socket accept failed
java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:61)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:310)
at java.lang.Thread.run(Thread.java:619)
Eventually, we start seeing NoClassDefFoundErrors inside an application thread that's trying to open HTTP connections:
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ControllerThreadSocketFactory
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:128)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1349)
[...]
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1387)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1233)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 8 more
When the errant connections go away, the server starts accepting connections again, and everything seems ok, but we're left with the latter error constantly being spewed to stderr.
Although the application typically logs unloaded classes to stdout, I don't see any such logs just before, during or after the "Too many open files" errors.
My initial theory was that the Hotspot JVM would unload seemingly unused classes when it encounters "Too many open files," but if so, it doesn't log the fact.
Edit: As Stephen C indicates below, if it is unloading the class, and encounters an error the first time it reloads, that could explain why it never recovers. I think that's a good working theory. Is it documented in the Sun docs? Why would it not log that the class is being unloaded the way unloading a class usually is?
Platform details:
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
Apache Tomcat Version 6.0.18
I think that the reason you are getting repeated ClassNotFoundExceptions is that the first attempted class initialization of ControllerThreadSocketFactory failed due to the Socket leakage problem. Your code is now repeatedly doing things that is retriggering class initialization for the class, and they are reporting the original problem.
If a class initialization fails first time, that's it. The JVM will not try to do it again.
Facing the same issue using Weblogic 8.1 / JRockIt R27.2 and a bunch of webapps that tries to load resourcebundles and then fails due to the limit on the number of open files. Stopping and starting the application (i.e. unloading and loading classloaders) make things works again.
I spent the last 4 hours trying to set up Eclipse TPTP memory
profiling on a Tomcat instance that must be run remotely (i.e. not in
Eclipse). This should be possible according to the TPTP and Agent
Controller docs.
I installed the TPTP components (4.6.0) into my Eclipse (Galileo)
workbench, along with the Agent Controller according to the
instructions on the website. To enable the agent, I added the
following options to the command line that starts the Tomcat instance:
-agentlib:JPIBootLoader=JPIAgent:server=enabled;HeapProf:allocsites=true
and added the following directories to the front of the PATH:
D:\dev\tools\ac\plugins\org.eclipse.tptp.javaprofiler
D:\dev\tools\ac\bin
When attempting to start Tomcat I consistently got the following error
message:
ERROR: JDWP unable to get necessary JVMTI capabilities. ["debugInit.c",L279]
I did a lot of Googling but found nothing relevant; I tried
reinstalling TPTP and various versions of the Agent Controller.
In the end the problem turned out to be that I was starting Tomcat
with the "jpda" option, which catalina.bat translates into
-Xdebug -Xrunjdwp:transport=.....
Removing the "jpda" command argument caused JVMTI to start working.
SO, the question is: I found nothing during any of my searches to
indicate that a JVMTI agent is incompatible with debugging. Can
someone explain what is going on and why JVMTI + JDWP is not a valid
setup?
None of the answers so far are correct and this is the first hit that comes up on Google if you query the error mentioned, so I feel some clarification is needed.
JVMTI and JDWP do work together, in fact they generally must be used together. You will get ERROR: JDWP unable to get necessary JVMTI capabilities if -Xrunjdwp (and/or possibly -agentlib:jdwp) is specified more than once on the command line. To fix it, make sure you only have one of -Xrunjdwp or -agentlib:jdwp in your command line.
For more details, read on...
JVMTI (Java Virtual Machine Tool Interface) is the successor to JVMDI (Java Virtual Machine Debug Interface) and JVMPI (Java Virtual Machine Profiling Interface). It incorporates the functionality of both JVMDI and JVMPI, both of which were deprecated in Java 5 and removed in Java 6. It is the API that exposes the internals of the JVM for the purposes of debugging and profiling.
JDWP (Java Debug Wire Protocol) is a protocol that describes a simple mechanism for transmitting commands and responses. As far as I know, it is the only way for a debugger sitting outside the JVM to communicate with it and to interface with the JVMTI.
JDI (Java Debugger Interface) is a client-side (debugger-side) API which exposes some of the features of JVMTI while making use of JDWP more or less transparently.
The bug mentioned in Bob Dobbs's answer concerns the misleading error message, and the fact that the JVM will try to load JDWP once for every time it is specified on the command line. It doesn't state anywhere that JDWP and JVMTI cannot be used together.
More info here: https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/rzaha/jpdebuga.htm
I ran into the same problem as you, but I came up with a JVM bug report (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6354345) that cast some light on the issue. It basically comes down to the Java agent library not ever being intended to be loaded twice into the same VM. Sucks, but seems like it's basic limitation of the agent system that you can't do both at the same time.
For me it was the same issue as Code Bling post, they were duplicate -Xrunjdwp didn't realize there were a second -Xrunjdwp as it was hidden in the variable %JAVA_OPTIONS%, check your Application Server start script.