I've encountered an interesting problem when running the following piece of Java code:
File.createTempFile("temp.cnt.ent", "cnt.feat.tmp", directory);
The following exception is thrown:
Exception in thread "main" java.io.IOException: Identifier removed
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.checkAndCreate(File.java:1704)
at java.io.File.createTempFile(File.java:1792)
I have never had this problem before and Google doesn't seem to have much for me. The system runs Scientific Linux release 5.8 (Linux 2.6.18-274.3.1.el5 x86_64) and the Java version is
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
The file system (Lustre) has 80TB of free space.
Any suggestions are greatly appreciated.
You are encountering synchronisation errors between the various instances. Lustre doesn't support file locking, which is probably what java.io.UnixFileSystem.createFileExclusively uses to avoid concurrency woes. (I say "probably" because it doesn't appear to be documented anywhere.)
Without locking it's only a matter of time until file operations interfere with each other. Reducing the number of instances is not a solution because it just makes it less likely to occur.
I believe the solution is to insure that each instance creates files in a different sub-directory
I guess that you see an EIDRM. At least the error message looks like that. The IOException wraps an error message from the underlying native libraries.
This is not a real answer to your problem, but maybe a useful hint.
http://docs.oracle.com/cd/E19455-01/806-1075/msgs-1432/index.html has some information an additional pointers.
The problem seems to be related to having too many instances of the application at a time (each in a separate VM). For some unknown reason the OS refuses to allow the creation of a temp file. Workaround: run less instances.
Related
I have a Java process ran from shell script on Ubuntu 14.04, that crashes abnormally without any visible reason and no logs. The Java program uses Twitter's Userstream API. I've been looking for traces in /var/log but did not find anything that could explicitly point to a problem. Please advise how approach this issue and find any useful log that could indicate the problem.
Also, this is my Java version:
Java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
The first step in troubleshooting HotSpot crashes is to locate a crash log, alternatively called fatal error log. By default, these are named hs_err_pid%pid.log, with %pid being the process ID of the crashed VM, and can usually be found in the cwd where the process started. The name and location of said log can also be influenced through the -XX:ErrorFile-VM parameter. Example:
-XX:ErrorFile=/var/log/java/java_error%pid.log.
You can find more information about the crash logs themselves here.
The contents of that log can give an indication on what happened, and where approximately. Be prepared tho, that situations in which a HotSpot crash occurs, are usually not caused by mere bugs in the hosted Java program. An extensive guide to interpreting these crash logs can be found here here.
If no such log can be located after a crash, odds are the VM did not crash, but terminated normally. In that case, a remote debugging session might be in order. Remote debugging is detailed here and also has some topics on SO already.
(There's the very remote chance that circumstances do not permit the log to be written, i.e. no available file handles during the crash.)
The problem was that my process terminated when I disconnected SSH from the server. In order to run the process in background and prevent process termination when disconnected use 'nohup':
~$ nohup process_name &
I found the answer in this thread.
I have written a short application that converts files from their raw data to XML (ECGs). I have about 350000 files to convert, and the convertion itself is done via a library that I got from the manufacturer of the ECG devices. To make use of multiple processors and cores in the machine I'm using to do the convertion I wrote a "wrapper application" that creates a pool of threads, which is then used to do the convertion in separate threads. It works somewhat ok, but unfortunately I do get random errors causing the whole application to stop (85k files have been converted over the past 3-4 days and I have had four of those errors):
A fatal error has been detected by the Java Runtime Environment:
EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x71160a6c, pid=1468, tid=1396
JRE version: Java(TM) SE Runtime Environment (8.0_20-b26) (build 1.8.0_20-b26)
Java VM: Java HotSpot(TM) Client VM (25.20-b23 mixed mode windows-x86 )
Problematic frame:
C [msvcr100.dll+0x10a6c]
I would suspect that it's the library that I'm using that causes these, so I don't think I can do all that much to fix it. If that error happens, I run then program and let it start where it left off before crashing. Right now I have to do that manually but was hoping that there is some way to let Eclipse restart the program (with an argument of the filename where it should start). Does anyone know if there is some way to do that?
Thanks!
It is not entirely clear, but I think you are saying that you have a 3rd party Java library (with a native code component) that you are running within one JVM using multiple threads.
If so, I suspect that the problem is that the native part of the 3rd-party application is not properly multi-threaded, and that is the root cause of the crashes. (I don't expect that you want to track down the cause of the problem ...)
Instead of using one JVM with multiple converter threads, use multiple JVMs each with a single converter thread. You can spread the conversions across the JVMs either by partitioning the work statically, or by some form of queueing mechanism.
Or ... you could modify your existing wrapper so that the threads launched the converter in a separate JVMs using ProcessBuilder. If a converter JVM crashes, the wrapper thread that launched it could just launch it again. Alternatively, it could just make a note of the failed conversion and move onto the next one. (You need to be a bit careful with retrying, in case it is something about the file that you are converting that is triggering the JVM crash.)
For the record, I don't know of an existing "off the shelf" solution.
It seems that you are using the x86 (32-bit) version of Java. Maybe you could try it with the x64 (64-bit) version. That has sometimes worked for me in the past.
The problem seems to be in the native library, but maybe if you try it with 64-bit Java, it will use a 64-bit version of the native library?
One day ago, after a few months of normal working, our java app starts to crash occasionally with the following error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (safepoint.cpp:247), pid=2075, tid=140042095163136
# guarantee(PageArmed == 0) failed: invariant
#
# JRE version: 6.0_23-b05
# Java VM: Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode linux-amd64 compressed oops)
# An error report file with more information is saved as:
# /var/chat/jSocketer/build/hs_err_pid2075.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#
I looked in hs_err_pid2075.log and saw that there was an active thread, that processed a network communication. However there wasn't any application or environment changes done in the last few months. Also there wasn't any load growth.
What can I do to understand, what is the reason of crash? Are there any common steps to investigate a jvm crash?
UPD
http://www.wuala.com/ubear/public
The crash is in the JVM, not in external native code. However, the operation it crashed on has been initiated by and external DLL.
This line in the hs_err_pid file explains the operation that crashed:
VM_Operation (0x00007f5e16e35450): GetAllStackTraces, mode: safepoint, requested by thread 0x0000000040796000
Now, thread 0x0000000040796000 is
0x0000000040796000 JavaThread "YJPAgent-Telemetry" daemon [_thread_blocked, id=2115, stack(0x00007f5e16d36000,0x00007f5e16e37000)]
which is a thread created by Yourkit. "GetAllStackTraces" is something that a profiler needs to call in order to do sampling. If you remove the profiler, the crash will not happen.
With this information It's not possible to say what causes the crash, but you can try the following: Remove all -XX VM parameters, -verbose:gc and the debugging VM parameters. They might interfere with the profiling interface of the JVM.
Update
Code that calls java.lang.Thread#getAllStackTraces() or java.lang.Thread#getStackTrace() may trigger the same crash
The two times I've witnessed recurring JVM crashes were both due to hardware failure, namely RAM. Running a memtest utility is the first thing I'd try.
I can see from the error report that you have the YourKit agent loaded. Its telemetry thread is mentioned as the requester for the operation that appears to fail. Try running the application without the YJP agent to see if you can still reproduce the crash.
Generally, JVM crashes are pretty hard to diagnose. They could happen due to a bug in some JNI code or in the JRE itself. If you suspect the latter, it may be worth submitting a bug report to Oracle.
Either way, I'd recommend to upgrade to the latest release of Java 6 to make sure it's not a known issue that's already been fixed. At the time of this writing the current release is Java 6 update 29.
If you're not messing with anything that would cause this directly (which basically means using native code or libraries that call native code) then it's almost always down to a bug in the JVM or hardware issue.
If it's been running fine for ages and has now started to crash then it seems to me like the hardware issue is the most likely of the two. Can you run it on another machine to rule out the issue? Of course, it definitely wouldn't hurt to upgrade to the latest Java update as well.
Switching to another version of linux-kernel "fixes" the JVM crush problem (http://forum.proxmox.com/threads/6998-Best-strategy-to-handle-strange-JVM-errors-inside-VPS?p=40286#post40286). It helped me with my real server. There was Ubuntu server 10.04 LTS OS on it with kernel 2.6.32-33 version. So kernel update resolved this issue. JVM has no crash anymore.
Apparently Java7 has some nasty bug regarding loop optimization: Google search.
From the reports and bug descriptions I find it hard to judge how significant this bug is (unless you use Solr or Lucene).
What I'd like to know:
How likely is it that my (any) program is affected?
Is the bug deterministic enough that normal testing will catch it?
Note: I can't make users of my program use -XX:-UseLoopPredicate to avoid the problem.
The problem with any hotspot bugs, is that you need to reach the compilation threshold (e.g. 10000) before it can get you: so if your unit tests are "trivial", you probably won't catch it.
For example, we caught the incorrect results issue in lucene, because this particular test creates 20,000 document indexes.
In our tests we randomize different interfaces (e.g. different Directory implementations) and indexing parameters and such, and the test only fails 1% of the time, of course its then reproducable with the same random seed. We also run checkindex on every index that tests create, which do some sanity tests to ensure the index is not corrupt.
For the test we found, if you have a particular configuration: e.g. RAMDirectory + PulsingCodec + payloads stored for the field, then after it hits the compilation threshold, the enumeration loop over the postings returns incorrect calculations, in this case the number of returned documents for a term != the docFreq stored for the term.
We have a good number of stress tests, and its important to note the normal assertions in this test actually pass, its the checkindex part at the end that fails.
The big problem with this, is that lucene's incremental indexing fundamentally works by merging multiple segments into one: because of this, if these enums calculate invalid data, this invalid data is then stored into the newly merged index: aka corruption.
I'd say this bug is much sneakier than previous loop optimizer hotspot bugs we have hit (e.g. sign-flip stuff, https://issues.apache.org/jira/browse/LUCENE-2975). In that case we got wacky negative document deltas, which make it easy to catch. We also only had to manually unroll a single method to dodge it. On the other hand, the only "test" we had initially for that was a huge 10GB index of http://www.pangaea.de/, so it was painful to narrow it down to this bug.
In this case, I spent a good amount of time (e.g. every night last week) trying to manually unroll/inline various things, trying to create some workaround so we could dodge the bug and not have the possibility of corrupt indexes being created. I could dodge some cases, but there were many more cases I couldn't... and I'm sure if we can trigger this stuff in our tests there are more cases out there...
Simple way to reproduce the bug. Open eclipse (Indigo in my case), and Go to Help/Search. Enter a search string, you will notice that eclipse crashes. Have a look at the log.
# Problematic frame:
# J org.apache.lucene.analysis.PorterStemmer.stem([CII)Z
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x0000000007b79000): JavaThread "Worker-46" [_thread_in_Java, id=264, stack(0x000000000f380000,0x000000000f480000)]
siginfo: ExceptionCode=0xc0000005, reading address 0x00000002f62bd80e
Registers:
The problem, still exist as of Dec 2, 2012
in both Oracle JDK
java -version
java version "1.7.0_09"
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)
and openjdk
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (fedora-2.3.3.fc17.1-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
Strange that individually any of
-XX:-UseLoopPredicate or -XX:LoopUnrollLimit=1
option prevent bug from happening,
but when used together - JDK fails
see e.g.
https://bugzilla.redhat.com/show_bug.cgi?id=849279
Well it's two years later and I believe this bug (or a variation of it) is still present in 1.7.0_25-b15 on OSX.
Through very painful trial and error I have determined that using Java 1.7 with Solr 3.6.2 and autocommit <maxTime>30000</maxTime> seems to cause index corruption. It only seems to happen w/ 1.7 and maxTime at 30000- if I switch to Java 1.6, I have no problems. If I lower maxTime to 3000, I have no problems.
The JVM does not crash, but it causes RSolr to die with the following stack trace in Ruby:
https://gist.github.com/armhold/6354416. It does this reliably after saving a few hundred records.
Given the many layers involved here (Ruby, Sunspot, Rsolr, etc) I'm not sure I can boil this down into something that definitively proves a JVM bug, but it sure feels like that's what's happening here. FWIW I have also tried JDK 1.7.0_04, and it also exhibits the problem.
As I understand it, this bug is only found in the server jvm. If you run your program on the client jvm, you are in the clear. If you run your program on the server jvm it depends on the program how serious the problem can be.
Hi is there any cache or settings of the jvm to speed up methods call?
e.g: I do have a web service and when I call it once per 10minutes or so it's quite slow processing takes around 8-10s in comparison to calling it once per 20seconds-the result is roughly around 5s.
Nothing else except this is running on the server. Is there a way to speed it up? (I cannot cache any objects or so.)
I used JProfiler, I call it with the same parameters. It's doing exatly the same thing. The difference is between times when I call it. How long the server is idle. 1 or 30minutes is difference.
Thanks
EDIT: platform is: AIX
java: IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc64-64 ..
server: tomcat
Factors which could explain such behavior:
Swapping - Low activity OS processed gets swapped out
Priority - OS process priority can a bit mysterious sometimes, especially if a process is mostly idle.
JIT - The more you call a method, the more it gets optmized
GC - Garbage collector take some time to stabilize, which can result in different behaviour under different loads. The -server option is basically a preset configuration for the JVM.
Pooling - Threads and other resources gets pooled. Under low activity pool may shrink and object will need to be re-allocated
Investigating such kind of issue can be hard. I would suggest you try to correlated OS-level information with JVM-level information. A profiler is maybe not the best tool, try with JConsole, mem, etc.
Try also to identify the deviation between different typical scenario, e.g. first request after startup, request under heavily load, request under medium load, etc. Try to identify when the response time change.
Use a Profiler.
JProfiler would be a good choice.
Using a profiler you can identity hotspots in your code.
Once you have indentified those hotspots you can think about various way to improve those hot spots in terms of space and time.
Apart from improving the code itself, make sure the JVM that runs the server is the server vm and not the client vm. Use the -server parameter.
client vm:
Java HotSpot(TM) Client VM (build 14.1-b02, mixed mode, sharing)
server vm:
Java HotSpot(TM) Server VM (build 14.0-b16, mixed mode)
To find out what you run on production you can programmatically do:
System.getProperty("java.vm.name");
which should give you something like: Java HotSpot(TM) 64-Bit Server VM
Otherwise (if you don't want to touch the code) you can do a thread dump and have a look at the top for something like: Full thread dump Java HotSpot(TM) Server VM (16.0-b13 mixed mode)
Slowness after an idle period is usually caused by the process being swapped out to disk. Modern systems are never idle, and systems like Windows aggressively swap background processes out (non-focus os enough) which frequently happened to Eclipse users.
Please edit your question with information about your platform.
How much load is the server under? After 10 minutes a heavily loaded server has probably swapped out your service to disk or (depending on the server software) shut down the instance of your service that served the last request. This means spooling up the service for your new request is really slow.
A couple of ways you could solve this:
Move the service to a server that isn't as heavily loaded
Investigate better or more appropriate software to run your service - you haven't mentioned what you are using (eg Tomcat? JBoss?).
do you have swap enabled on the machine? If so, have you tried turning it off? It seems a clear case of your memory being swapped out.