How do I investigate the cause of a JVM crash?

How do I investigate the cause of a JVM crash? - java

One day ago, after a few months of normal working, our java app starts to crash occasionally with the following error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (safepoint.cpp:247), pid=2075, tid=140042095163136
# guarantee(PageArmed == 0) failed: invariant
#
# JRE version: 6.0_23-b05
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode linux-amd64 compressed oops)
# An error report file with more information is saved as:
# /var/chat/jSocketer/build/hs_err_pid2075.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
I looked in hs_err_pid2075.log and saw that there was an active thread, that processed a network communication. However there wasn't any application or environment changes done in the last few months. Also there wasn't any load growth.
What can I do to understand, what is the reason of crash? Are there any common steps to investigate a jvm crash?
UPD
http://www.wuala.com/ubear/public

The crash is in the JVM, not in external native code. However, the operation it crashed on has been initiated by and external DLL.
This line in the hs_err_pid file explains the operation that crashed:
VM_Operation (0x00007f5e16e35450): GetAllStackTraces, mode: safepoint, requested by thread 0x0000000040796000
Now, thread 0x0000000040796000 is
0x0000000040796000 JavaThread "YJPAgent-Telemetry" daemon [_thread_blocked, id=2115, stack(0x00007f5e16d36000,0x00007f5e16e37000)]
which is a thread created by Yourkit. "GetAllStackTraces" is something that a profiler needs to call in order to do sampling. If you remove the profiler, the crash will not happen.
With this information It's not possible to say what causes the crash, but you can try the following: Remove all -XX VM parameters, -verbose:gc and the debugging VM parameters. They might interfere with the profiling interface of the JVM.
Update
Code that calls java.lang.Thread#getAllStackTraces() or java.lang.Thread#getStackTrace() may trigger the same crash

The two times I've witnessed recurring JVM crashes were both due to hardware failure, namely RAM. Running a memtest utility is the first thing I'd try.

I can see from the error report that you have the YourKit agent loaded. Its telemetry thread is mentioned as the requester for the operation that appears to fail. Try running the application without the YJP agent to see if you can still reproduce the crash.
Generally, JVM crashes are pretty hard to diagnose. They could happen due to a bug in some JNI code or in the JRE itself. If you suspect the latter, it may be worth submitting a bug report to Oracle.
Either way, I'd recommend to upgrade to the latest release of Java 6 to make sure it's not a known issue that's already been fixed. At the time of this writing the current release is Java 6 update 29.

If you're not messing with anything that would cause this directly (which basically means using native code or libraries that call native code) then it's almost always down to a bug in the JVM or hardware issue.
If it's been running fine for ages and has now started to crash then it seems to me like the hardware issue is the most likely of the two. Can you run it on another machine to rule out the issue? Of course, it definitely wouldn't hurt to upgrade to the latest Java update as well.

Switching to another version of linux-kernel "fixes" the JVM crush problem (http://forum.proxmox.com/threads/6998-Best-strategy-to-handle-strange-JVM-errors-inside-VPS?p=40286#post40286). It helped me with my real server. There was Ubuntu server 10.04 LTS OS on it with kernel 2.6.32-33 version. So kernel update resolved this issue. JVM has no crash anymore.

Related

EXCEPTION_ACCESS_VIOLATION Tomcat 9 outside JVM

I'm writing a messaging app as my final project for 12 grade and I have an error that I have no idea how to even approach. I disabled and re-enabled features of the web app one by one to see when the error occurs, and I observed that when I try to bring the active conversations that one user has, my server crashes. This doesn't happen constantly, it seems like random occurrences. Here is the error message by the server:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000007034537d, pid=13536, tid=9944
#
# JRE version: Java(TM) SE Runtime Environment 18.9 (11.0.10+8) (build 11.0.10+8-LTS-162)
# Java VM: Java HotSpot(TM) 64-Bit Server VM 18.9 (11.0.10+8-LTS-162, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
SELECT * FROM conversations WHERE id1 = '1' OR id2 = '1' ORDER BY lastUpdate DESC LIMIT 9 OFFSET 0;#
CSELECT * FROM accounts WHERE id = 16;
[sqlite-3.30.1-03ac4057-b812-49a1-9a10-f02ba1c22986-sqlitejdbc.dll+0x537d]
#
# No core dump will be written. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\paRa\AppData\Local\Temp\\hs_err_pid13536.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
From what I saw on other posts, this error message should point where the error is located, in my case if I go to the specific lines where those SQL queries are executed, I have no error.
What it should be noted is that when I put a break point in the API of the activeConversations function, the server never crashes, so I was thinking that it had something to do with how much information is being handled. However, there isn't much information transferred at the moment since I don't have that much information in the database.

The most likely explanation (outside of a bug in sqlite) is that the JVM is running the webserver with a lot of threads but the sqlite configuration is using single threaded mode and thus causing memory corruption.
Check that you are starting/configuring the sqlite with at least multi threaded mode.
https://sqlite.org/threadsafe.html

Ok, so I want to mention that #AlBlue message was really helpful in resolving my problem. As he said my database (SQLite) was handling multiple requests at the same time, and so it crashed. Since I did not know how to configure sqlite to multi threading, I used a deffered object in jquery to make a function run after another. You can see what i used by clicking on this link: Jquery Deffered Function (site is not seccured btw, so be carefull, I will also post the code down below in the case some of you do not want to access the site:
var functionOne = function() {
var r = $.Deferred();
// Do your whiz bang jQuery stuff here
console.log('Function One');
return r;
};
var functionTwo = function() {
// Do your whiz bang jQuery stuff here
console.log('Function Two');
};
// Now call the functions one after the other using the done method
functionOne().done( functionTwo() );
Again thank you to everyone for the answers! I am recent to this platform and I do not how this procedure works. Since my answer is a solution to my problem, and not to the error itself, then I should give the correct answer to #AlBlue correct? I'll update the answer once somebody comments. I want to give propper credit!
Edit: I read the tooltip of the button =))

An access violation means that native code (in this case inside SQLite) is trying to access memory it is not allowed to access, this usually indicates a bug in that native library (or in another native component it is depending on). It is unlikely - though not impossible - the problem is in your own code.
As a first step, try to see if using a newer version of SQLite solves the problem. If that doesn't solve it, try to reduce the code necessary to reproduce the problem and report it either to the authors or SQLite or the SQLite JDBC driver you're using.

Java process crashes on Linux (ubuntu 14.04)

I have a Java process ran from shell script on Ubuntu 14.04, that crashes abnormally without any visible reason and no logs. The Java program uses Twitter's Userstream API. I've been looking for traces in /var/log but did not find anything that could explicitly point to a problem. Please advise how approach this issue and find any useful log that could indicate the problem.
Also, this is my Java version:
Java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

The first step in troubleshooting HotSpot crashes is to locate a crash log, alternatively called fatal error log. By default, these are named hs_err_pid%pid.log, with %pid being the process ID of the crashed VM, and can usually be found in the cwd where the process started. The name and location of said log can also be influenced through the -XX:ErrorFile-VM parameter. Example:
-XX:ErrorFile=/var/log/java/java_error%pid.log.
You can find more information about the crash logs themselves here.
The contents of that log can give an indication on what happened, and where approximately. Be prepared tho, that situations in which a HotSpot crash occurs, are usually not caused by mere bugs in the hosted Java program. An extensive guide to interpreting these crash logs can be found here here.
If no such log can be located after a crash, odds are the VM did not crash, but terminated normally. In that case, a remote debugging session might be in order. Remote debugging is detailed here and also has some topics on SO already.
(There's the very remote chance that circumstances do not permit the log to be written, i.e. no available file handles during the crash.)

The problem was that my process terminated when I disconnected SSH from the server. In order to run the process in background and prevent process termination when disconnected use 'nohup':
~$ nohup process_name &
I found the answer in this thread.

JMonkeyEngine crash on Intel video adapter

I am using JME in my application and sometimes it crashes with the following message:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x3d601ad7, pid=168, tid=4012
#
# JRE version: 6.0_29-b11
# Java VM: Java HotSpot(TM) Client VM (20.4-b02 mixed mode, sharing windows-x86)
# Problematic frame:
# C [ig4dev32.dll+0x21ad7]
#
# An error report file with more information is saved as:
# C:\...\hs_err_pid168.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
The log file can be found by this link: http://sergpank.heliohost.org/log.html
The strangest thing is that in my case I get crashes only ob builded code, but when I am launcing it from the Eclipse, everything works fine on my machine. On machines with AMD video adapters nothing crashes. On other machines with Intel videocard sometimes crashes appear and on the debug stage.
I am starting to suppose that this happens because of incorrect ant setup (in startup.ini the following path is set up: -Djava.library.path=lib/dlls so dlls is seen for the project). But still can't get why it works almost perfectly on AMD and crashes on Intel.
Maybe it is something related to the ant, and I have to add dlls to the manfest... looking through the documentation and can't find the way how it can be done.
Solution:
On 64 bit system is necessary to use the corresponding JVM(64-bit) and then nothing crashes =))

Crash happens because 32-bit JVM was used on 64-bit OS. It seems that in this case 32-bit dlls was loaded and that's why crash happened.
Issue is reproducible only on Intel video cards, I think it can be considered as a serious bug. If Intel would like to fix it or propose a working solution/workaround this could be great! =)

Avoid doing the heavy work of the OpenGL stuff in the Swing Event Dispatch Thread (note the thread which crashes the JVM: =>0x3a88e000 JavaThread "AWT-EventQueue-0" [_thread_in_native, id=5228, stack(0x3b170000,0x3b1c0000)). I believe OpenGL stuff should be done in the thread offered by JMonkeyEngine using the event dispatch mechanism it has. If you're using somebody else's API for Swing rendering, you might need to change it or do it a different way.
Edit: it looks like AWTGLCanvas does this, changing the context to the current thread. It looks like the Intel drivers may have trouble with context switches if normal fullscreen 3D stuff works. Strictly speaking, the GL thread context stuff shouldn't be necessary as you can always dispatch work to be executed to a singular OpenGL thread which should be fine as long as you only have a single OpenGL rendering viewport. The LWJGL Canvas implementation assumes you will want multiple viewports when this isn't necessarily the case. You can recode this to only support one viewport if that is fine, and you shouldn't get crashes as the code is simpler.

How serious is the Java7 "Solr/Lucene" bug?

Apparently Java7 has some nasty bug regarding loop optimization: Google search.
From the reports and bug descriptions I find it hard to judge how significant this bug is (unless you use Solr or Lucene).
What I'd like to know:
How likely is it that my (any) program is affected?
Is the bug deterministic enough that normal testing will catch it?
Note: I can't make users of my program use -XX:-UseLoopPredicate to avoid the problem.

The problem with any hotspot bugs, is that you need to reach the compilation threshold (e.g. 10000) before it can get you: so if your unit tests are "trivial", you probably won't catch it.
For example, we caught the incorrect results issue in lucene, because this particular test creates 20,000 document indexes.
In our tests we randomize different interfaces (e.g. different Directory implementations) and indexing parameters and such, and the test only fails 1% of the time, of course its then reproducable with the same random seed. We also run checkindex on every index that tests create, which do some sanity tests to ensure the index is not corrupt.
For the test we found, if you have a particular configuration: e.g. RAMDirectory + PulsingCodec + payloads stored for the field, then after it hits the compilation threshold, the enumeration loop over the postings returns incorrect calculations, in this case the number of returned documents for a term != the docFreq stored for the term.
We have a good number of stress tests, and its important to note the normal assertions in this test actually pass, its the checkindex part at the end that fails.
The big problem with this, is that lucene's incremental indexing fundamentally works by merging multiple segments into one: because of this, if these enums calculate invalid data, this invalid data is then stored into the newly merged index: aka corruption.
I'd say this bug is much sneakier than previous loop optimizer hotspot bugs we have hit (e.g. sign-flip stuff, https://issues.apache.org/jira/browse/LUCENE-2975). In that case we got wacky negative document deltas, which make it easy to catch. We also only had to manually unroll a single method to dodge it. On the other hand, the only "test" we had initially for that was a huge 10GB index of http://www.pangaea.de/, so it was painful to narrow it down to this bug.
In this case, I spent a good amount of time (e.g. every night last week) trying to manually unroll/inline various things, trying to create some workaround so we could dodge the bug and not have the possibility of corrupt indexes being created. I could dodge some cases, but there were many more cases I couldn't... and I'm sure if we can trigger this stuff in our tests there are more cases out there...

Simple way to reproduce the bug. Open eclipse (Indigo in my case), and Go to Help/Search. Enter a search string, you will notice that eclipse crashes. Have a look at the log.
# Problematic frame:
# J org.apache.lucene.analysis.PorterStemmer.stem([CII)Z
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x0000000007b79000): JavaThread "Worker-46" [_thread_in_Java, id=264, stack(0x000000000f380000,0x000000000f480000)]
siginfo: ExceptionCode=0xc0000005, reading address 0x00000002f62bd80e
Registers:

The problem, still exist as of Dec 2, 2012
in both Oracle JDK
java -version
java version "1.7.0_09"
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)
and openjdk
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (fedora-2.3.3.fc17.1-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
Strange that individually any of
-XX:-UseLoopPredicate or -XX:LoopUnrollLimit=1
option prevent bug from happening,
but when used together - JDK fails
see e.g.
https://bugzilla.redhat.com/show_bug.cgi?id=849279

Well it's two years later and I believe this bug (or a variation of it) is still present in 1.7.0_25-b15 on OSX.
Through very painful trial and error I have determined that using Java 1.7 with Solr 3.6.2 and autocommit <maxTime>30000</maxTime> seems to cause index corruption. It only seems to happen w/ 1.7 and maxTime at 30000- if I switch to Java 1.6, I have no problems. If I lower maxTime to 3000, I have no problems.
The JVM does not crash, but it causes RSolr to die with the following stack trace in Ruby:
https://gist.github.com/armhold/6354416. It does this reliably after saving a few hundred records.
Given the many layers involved here (Ruby, Sunspot, Rsolr, etc) I'm not sure I can boil this down into something that definitively proves a JVM bug, but it sure feels like that's what's happening here. FWIW I have also tried JDK 1.7.0_04, and it also exhibits the problem.

As I understand it, this bug is only found in the server jvm. If you run your program on the client jvm, you are in the clear. If you run your program on the server jvm it depends on the program how serious the problem can be.

Java Crashed application - how to read crash file generated by JVM?

I have java aplication and it started to crash sudently, without no exception. But sometimes JVM creates crash log file, which has name like: "hs_err_pid10930.log". Can anybody read it and tell me what is wrong? I am not able to find out what si wrong. The only reasonable info which I find here is that swap size is 0. I that a problem? How could it occure?
You can find the file here: http://chessfriends-release.s3.amazonaws.com/logs/hs_err_pid10930.log?AWSAccessKeyId=AKIAJP5BYGKOCMCDVZHA&Expires=1305128715&Signature=XEZMuJ0xNSM6YTcdwsI04ahhiYk%3D
Thanks.
Libor

Whenever you get a crash like this it's almost never the Java programmer's fault because the JVM is crashing which it shouldn't. By looking at your log file, it looks like it's crashing somewhere in the OpenJDK's JVM; I don't know what specifically is causing it. I would suggest you try out Oracle's official JDK rather than OpenJDK.
I'm not an expert on reading these sorts of crash dumps, but this is the part I use to identify what is causing the problem:
# Problematic frame:
# V [libjvm.so+0x64d62d]
This is at the top of the dump. It's not always libjvm.so; I've seen some with like libGL.so.
If you would like to file a bug, the dump includes this statement:
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-6/
I don't know what it is you're doing that causes the crash, and maybe there is a workaround. But under no circumstances should the JVM crash, so this is a bug in the JVM you're using.
Edit
The log says you're running Ubuntu 9.10; there have been two Ubuntu releases since then so I doubt filing a bug would do any good unless you test this out on either Ubuntu 10.04 or 10.10. I don't know if you're able to upgrade to a newer version, but your problem may have already been fixed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.