We've now encountered two times a SIGSEGV crash in Oracle 1.8.0_60 JVM where the error log doesn't even implicate any library, just says that crash happened in native code:
# Problematic frame:
# C 0x00007f6d04000000
#
and
# Problematic frame:
# C 0x00007fc6ec048ff0
#
Both times the thread that crashes is an application thread. Either a ForkJoinPool or Tomcat ajp-bio thread.
What could've gone wrong? Normally when there's a failure in native libraries, there's a name of the library included.
What I see in sanitized hs_err-logs for the first crash and the second crash (full version for comparison) doesn't give me much more ideas about what could've gone wrong as there seems to be enough memory. Just the metaspace is even near to running out.
The environment runs on 64-bit Linux:
[foo#bar ~]$ uname -a
Linux bar 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
You only posted one full log so it's not really possible to spot a pattern here, but the C frame is in a non-executable memory region and outside the code space. The VM events also show a flurry of re/deoptimizations and a bias revocation. So my guess is that might be a miscompilation.
Things you can try:
update your JVM. 8.0_60-b27 is not the latest patch level.
try -XX:-UseBiasedLocking -XX:-TieredCompilation
try -XX:-UseBiasedLocking -XX:TieredStopAtLevel=1
If updating the vm does not fix it but one of the options does then it's probably a VM bug and you should file with your linux distribution or oracle.
Related
I have followed the installation instructionrs http://bendemott.blogspot.de/2013/11/installing-pylucene-4-451.html for pylucene using the latest pylucene-4.9.0.0.
And when i tried to to lucene.initVM(), I get the following error:
alvas#ubi:~$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lucene
>>> lucene.initVM()
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007ffba22808b8, pid=5189, tid=140718811092800
#
# JRE version: OpenJDK Runtime Environment (7.0_65-b32) (build 1.7.0_65-b32)
# Java VM: OpenJDK 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 2.5.3
# Distribution: Ubuntu 14.04 LTS, package 7u71-2.5.3-0ubuntu0.14.04.1
# Problematic frame:
# V [libjvm.so+0x6088b8] jni_RegisterNatives+0x58
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/alvas/hs_err_pid5189.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
# http://icedtea.classpath.org/bugzilla
#
Aborted (core dumped)
And the file http://pastebin.com/6B8FyC4Z
Is there something wrong with my IceTea configuration? or my JDK? or JRE?
How should I resolve the problem?
So I took a look at your stack trace, and I don't think the issue was specifically pyLucene. In the stack trace, you see this error:
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000
If you look at the first part, SIGSEGV, that means you have a segmentation fault somewhere in your system. SEGV_MAPERR is the specific error, which means that OpenJDK was trying to map memory to an object and failed. This could've been caused by not enough memory, a bad pagefile/virtual memory, bad address space, or even a bad library. Why it worked on another machine could be anything. Core dumps are really useful, so if you can run
ulimit -c unlimited
that will help give you something to look at. Was this in a VM or on a physical machine? I've seen random sigsegv in my Ubuntu VMs if they don't have enough memory allocated for various Java tasks. I saw this on my ESXi hypervisors specifically, and I noticed it the most was when ESXi started to perform memory swapping. I was able to resolve this by increasing memory, rebooting the VM, and making sure my hypervisor wasn't swapping memory. Let me know if that helps. :)
Edit: I also noticed that if the underlying storage provider had poor performance, that would impact with swap data and I feel that was also am impact with sigsegv issues.
I ran a script on MATLAB and it worked fine, when i want to run the script again, then MATLAB stuck in busy! i found a file "hs_err_pid1124" in the directory i works in it contain the following:
A fatal error has been detected by the Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 16384000 bytes for GrET in
C:\BUILD_AREA\jdk6_17\hotspot\src\share\vm\utilities\growableArray.cpp. Out of swap
space?
#
# Internal Error (allocation.inline.hpp:39), pid=1124, tid=1380
# Error: GrET in
C:\BUILD_AREA\jdk6_17\hotspot\src\share\vm\utilities\growableArray.cpp
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Client VM (14.3-b01 mixed mode windows-x86 )
.
.
.
My computer RAM is 4G, i increased the System Swap Space, but still the problem not solved!!
Thanks,
The most likely suspect here is your code. I would expect you to do something strange (opening a file, and not closing it later?! Reading each file into a continiously growing variable?!).
However, without code this is hard to diagnose.
Here is what you can do:
Evaluate the visible memory usage: Put a breakpoint somewhere halfway through, and inspect the size of the largest variables. Also check the total size. (If the error is a regular matlab error, you could also use dbstop if error)
Persuade matlab to release memory: If step 1 yields nothing, you may actually be doing things right, but perhaps matlab does not manage its memory properly. This is rare, but occurs sometimes when repeating simple tasks a lot of times. In this case you can place the pack command somewhere in your code. Probably it will help.
This is the first time I am in this situation with Java.
Java just core dumps with the following error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
[thread 140213457409792 also had an error]# Internal Error (safepoint.cpp:300), pid=4327
, tid=140213211031296
# guarantee(PageArmed == 0) failed: invariant
#
# JRE version: 6.0_24-b24
# Java VM: OpenJDK 64-Bit Server VM (20.0-b12 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea6 1.11.4
# Distribution: Ubuntu 12.04 LTS, package 6b24-1.11.4-1ubuntu0.12.04.1
# An error report file with more information is saved as:
# /tmp/hs_err_pid4327.log
#
# If you would like to submit a bug report, please include
# instructions how to reproduce the bug and visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-6/
when I tried running it on a mac os, it core dumps at the same place (the JREs must be different)... so it must be something related to the code. I have no idea how to debug this, this is not an exception, and the log file specified up there does not give me much information. Any ideas what I can do about it to find the bug?
The /tmp/hs_err_pid4327.log file should contain a stack trace of where the core occurred. Unless you are making a JNI call, it is probably a Java bug.
The core dump is telling what you should do...
If you would like to submit a bug report, please include
instructions how to reproduce the bug and visit:
https://bugs.launchpad.net/ubuntu/+source/openjdk-6/
A quick look makes me think this is already reported.
The bug probably isn't in your code, per se. It's most likely an environmental issue - perhaps a JVM bug, perhaps some unusual condition, and most likely, both - a bug that occurs rarely, under an odd circumstance.
Google for the key elements in the message (e.g. "safepoint.cpp:100"), look at the other reports, and look for things you have in common or workarounds that may apply. In this case, one set of reports suggests that heavy multithreading may contribute to the problem.
Check if you have a hprof file in your application directory. Optionally you could dump at will using
jmap -dump:file=<file_name> <pid>
and then analyze the dump using MAT http://www.eclipse.org/mat/
You could also consider other tools quoted here :
Tool for analyzing java core dump
I have been working on a large java application. It is quite parallel, and uses several fixedThreadPools (each with 8 threads). I am running it on a computer with 2 cores, each with 4 processors. My program is analyzing large sets of data, and the analysis is saved (serialized) after every set, though it works across data sets, and so is re-loaded every time I run a new one (and then saved).
My problem is this: after running 4-5 data sets (takes about 2 days, and I'm pretty happy with my coding efficiency) it will crash, after exactly the same amount of time on the 5th set (no matter which data set I use). The program is repetitive, and so there is nothing new in the code going on at this time. It is reproducible, and I am not sure what to do. I can post the full error log if that would help... I understand that this problem is ambiguous without a lot more detailed information, but if there are any go-to suggestions, it would be greatly appreciated.
I have been testing different settings to see if anything helps, and right now I am running with the following arguments.
-Xmx6g -Xmx12g -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
Thanks,
Joe
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000000000000000, pid=18454, tid=140120548144896
#
# JRE version: 7.0_03-b147
# Java VM: OpenJDK 64-Bit Server VM (22.0-b10 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea7 2.1.1pre
# Distribution: Ubuntu precise (development branch), package 7~u3-2.1.1~pre1-1ubuntu2
# Problematic frame:
# C 0x0000000000000000
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-7/
#
Just a hard guess...
Might be it is not able to create more files
if you are running this in linux Try running
ulimit -c unlimited
Before you run your java program... This should help in two ways
It should increase the file creation limit
If any error occurs it will create the Core dump.
See how many file IO it is using while the program is running.
I'd instrument it with something like Visual VM. It'll show what's happening in memory, threads, CPU, objects created, etc. in real time as your app runs.
The nice version that I have is for Oracle/Sun JVMs only. There's one that ships with the JDK, but I don't believe it shows as much detail as the version 1.6.3 with all plugins installed.
Just add -Dorg.eclipse.swt.browser.DefaultType=mozilla in eclipse.ini file .
My JVM (1.6.0_29) keeps crashing on intensive use when indexing documents with Lucene.
I get:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00002b6b196d767c, pid=26417, tid=1183217984
#
# JRE version: 6.0_29-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J org.apache.lucene.store.DataInput.readVInt()I
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
Environment:
JDK: 1.6u29 (same issue with 1.6_02)
Lucene Version 3.4.0
vm_info: Java HotSpot(TM) 64-Bit Server VM (20.4-b02) for linux-amd64 JRE (1.6.0_29-b11), built on Oct 3 2011 01:19:20 by "java_re" with gcc 3.2.2 (SuSE Linux)
OS:CentOS release 5.0 (Final)
jvm_args: -Dcatalina.home=/var/local/tomcat-8081 -Dcatalina.base=/var/local/tomcat-8081 -Djava.io.tmpdir=/var/tmp -Dfile.encoding=UTF-8 -Xmx1024M -XX:MaxPermSize=96m
It seems to be a jdk issue that was fixed in jdk 1.7, but other issues where introduced.
https://issues.apache.org/jira/browse/LUCENE-3335
"Java 7 contains a fix to the readVInt issue since 1.6.0_21 (approx, LUCENE-2975)"
So, how can I fix this issue using JDK 1.6?
Should I upgrade to jdk 1.7?
these JDK issues are also fixed in 1.6.9_29 (not only 1.7.0u1). ReadVInt can no longer crash. So your crash is not related to any of the "famous java6/7 bugs" (the vint bug does not crash your JVM at all it just corrupts your index by returning wrong values - and this one is definitely fixed since Lucene 3.1).
But there is another chance you can crash your JVM: You are on a 64 bit platform (Linux), so the default directory implementation is MMapDirectory. Lucene uses a hack to be able to unmap mapped files from virtual address space. This is not allowed by the JVM itsself, but makes unmapping dependent on garbage collector, which is a problem for Lucene. By default MMapDirectory unmaps the files after closing the IndexInputs. MMapDirectory is not synchronized at all, so when another thread tries to access the IndexInput after unmapping it will access an unmapped address and will SIGSEGV.
If your code would be correct this cannot happen, but it looks like you are using an already closed IndexReader/IndexWriter to access the index. Before Lucene 3.5 (will come out soon), missing checks in IndexReader will make it possible that an already closed IndexReader with all its closed (and unmapped) IndexInputs tries to access index data and segfaults.
In 3.5 we added additional safety checks to prevent this illegal access, but its not 100% (as synchronization is missing). I would review the code and check that nothing accesses closed index.
A simple check to see if this is your issue would be to use NIOFSDirectory (slower on Linux) instead of MMapDirectory. If it does not crash and possibly throws AlreadyClosedExceptions, the bug is accessing closed indexes.
According to this article the following could cause it in Java 6 as well:
Please note: Also Java 6 users are affected, if they use one of those
JVM options, which are not enabled by default:
-XX:+OptimizeStringConcat or
-XX:+AggressiveOpts
Are you using any of them?