I have my java-written application being "killed" after some time of work.
Java application is started from SH script under Linux, which is running for some time. After then the PID displayed and "killed" word said.
Like this:
runMyServer.sh: line 3: 3593 Killed java -Xmx2024m -cp ...
There is information about out of memory event in the system log. So it looks like out of memory error.
My question is: when OutOfMemroyError exception can be not generating?
You probably have too little memory on your system or run processes that eat up all ram and swap. When GNU/Linux runs out of memory it will kill processes using much memory. This is basically just kill on the process, so it is not you Java process running out of memory, but rather the OS.
To avoid your java application to be killed by the OOM killer just add enough swap to your system and disable memory overcomitting.
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo "/swapfile swap swap defaults 0 0" >> /etc/fstab
echo 2 > /proc/sys/vm/overcommit_memory
Related
My java process stopped responding. I tried to jstack but failed with below error.
21039: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
Then I used -F option, but "No deadlocks found."
Other info:
java version: java version
jmap: jmap
jstat: jstat
jinfo: jinfo
Can anyone help have a look and share some links on troubleshooting this kind of java "not responding" issue?
Possible reasons of Unable to open socket file problem:
The target PID is not a HotSpot JVM process.
This is obviously not your case, since jinfo PID works fine.
JVM is started with -XX:+DisableAttachMechanism option.
This can be also verified by jinfo PID.
Attach socket /tmp/.java_pidNNN has been deleted.
There is a common practice to clean /tmp automatically with some scheduled script. In this case you should configure the cleanup software not to delete .java_pid* files.
How to check: run lsof -p PID | grep java_pid
If it lists a socket file, but the file does not exist, then this is exactly the described problem.
Credentials of the current user (euid/egid) do not match the owner of the attach socket. Make sure you run jstack by the same user as JVM. Attach won't work if you run jstack by a different user, even if this user is root.
/tmp directory of the target process is not the same as /tmp of your shell. This may happen in the following cases:
JVM is started in a different mount namespace. Typically this happens when JVM runs in a Docker container. Running jstack from within the same container will help.
JVM is started in chroot environment. For example, LXC containers may use chroot.
How to check: run readlink -f /proc/PID/root/tmp to see if it points to /tmp or to some other directory.
Current working directory of the target JVM belongs to a file system that does not allow to change permissions. CIFS and DrvFs (WSL) are examples of such file systems.
How to check: run umask 077; touch /proc/PID/cwd/somefile.tmp, then verify that file owner is yourself, and file permissions are 600.
JVM is busy and cannot reach a safepoint. For instance, JVM is in the middle of long-running garbage collection.
How to check: run kill -3 PID. JVM should print a thread dump and heap info in its console. If JVM does not dump anything, but the process consumes almost 100% CPU or shows high I/O utilization, then this looks like the described problem.
JVM process is suspended.
How to check: run ps PID. The STAT column of alive JVM process should be Sl.
More about the internals of jstack.
There is also jattach project which is a better alternative to jstack / jmap. It can automatically handle credentials issue, it works with Docker containers, supports chroot'ed JVMs and handles uncommon file systems.
I’m running a conversion project from svn to git. As the application is single threaded, I’m moving the project to a Faster PC.
So without any options bar httpSpooling = true; It runs OK on a VM – 4 CPU's, 20 Gb of Ram.
RAM Usage with two separate instances is 8GB, hitting a max of 9.8Gb.
Jobs Paused, Zipped & SCP'd to new machine – Bare Metal build of Deb9 (same as VM) i7 (8 CPUs(effective)) 16GB ram.
However when starting just one instance of SubGit; I get either Java out of memory or GC Overhead Limit Exceeded.
I’ve tried adding the following permutations to repo.git/subgit/config to [daemon]
javaOptions = -noverify -client -Djava.awt.headless=true -Xmx8g -XX:+UseParallelGC -XX:-UseGCOverheadLimit – This gives GC Overhead Limit Exceeded Error
#javaOptions = -noverify -client -Djava.awt.headless=true -Xmx8g -XX:+UseParallelGC -XX:-UseGCOverheadLimit – (OPS Disabled) Gives an out of memory error.
javaOptions = -noverify -client -Djava.awt.headless=true –Xmx12g -XX:-UseGCOverheadLimit – this gives out of memory errors.
I’ve tried other settings too, including changing –client for –server, but that appears to be more two way conversion, which is not something I’m trying to do.
There should be plenty of RAM based on the application usage on a system running successfully, so unless SubGit is ignoring some values, I can’t tell.
The 'javaOptions' in the [daemon] section may indeed be ignored depending on the operation you run: those java options affect SubGit daemon, but not the 'subgit install' or 'subgit fetch' operation. Since you've mentioned that repositories were moved to another machine, I believe, you have invoked either of those two commands to restart the mirror and that's why that 'daemon.javaOptions' is ignored. To tune SubGit's java options edit it right in the SubGit launching script (EXTRA_JVM_ARGUMENTS line):
EXTRA_JVM_ARGUMENTS="-Dsun.io.useCanonCaches=false -Djava.awt.headless=true -Djna.nosys=true -Dsvnkit.http.methods=Digest,Basic,NTLM,Negotiate -Xmx512m"
As for the memory consumption itself, it depends on which operations are being run. It's not completely clear how did you pause the jobs on the virtual machine (by 'subgit shutdown' or in another way?), which operations were running at that time (initial translation or regular fetches) and how did you restart the jobs on the new machine.
I'm tuning our product for G1GC, and as part of that testing, I'm experiencing regular segfaults on my Spark Workers, which of course causes the JVM to crash. When this happens, the Spark Worker/Executor JVM automagically restarts itself, which then overwrites the GC logs that were written for the previous Executor JVM.
To be honest, I'm not quite sure the mechanism for how the Executor JVM restarts itself, but I launch the Spark Driver service via init.d, which in turn calls off to a bash script. I do use a timestamp in that script that gets appended to the GC log filename:
today=$(date +%Y%m%dT%H%M%S%3N)
SPARK_HEAP_DUMP="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_LOG_HOME}/heapdump_$$_${today}.hprof"
SPARK_GC_LOGS="-Xloggc:${SPARK_LOG_HOME}/gc_${today}.log -XX:LogFile=${SPARK_LOG_HOME}/safepoint_${today}.log"
GC_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:+PrintFlagsFinal -XX:+PrintJNIGCStalls -XX:+PrintTLAB -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=15 -XX:GCLogFileSize=48M -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintReferenceGC -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"
I think the problem is that this script sends these options along to the Spark Driver, which then passes them off to the Spark Executors (via the -Dspark.executor.extraJavaOptions argument), which are all separate servers, and when the Executor JVM crashes, it simply uses the command that had been originally sent to start back up, which would mean that the timestamp portion of the GC log filename is static:
SPARK_STANDALONE_OPTS=`property ${SPARK_APP_CONFIG}/spark.properties "spark-standalone.extra.args"`
SPARK_STANDALONE_OPTS="$SPARK_STANDALONE_OPTS $GC_OPTS $SPARK_GC_LOGS $SPARK_HEAP_DUMP"
exec java ${SPARK_APP_HEAP_DUMP} ${GC_OPTS} ${SPARK_APP_GC_LOGS} \
${DRIVER_JAVA_OPTIONS} \
-Dspark.executor.memory=${EXECUTOR_MEMORY} \
-Dspark.executor.extraJavaOptions="${SPARK_STANDALONE_OPTS}" \
-classpath ${CLASSPATH} \
com.company.spark.Main >> ${SPARK_APP_LOGDIR}/${SPARK_APP_LOGFILE} 2>&1 &
This is making it difficult for me to debug the cause of the segfaults, since I'm losing the activity and state of the Workers that led up to the JVM crash. Any ideas for how I can handle this situation and keep the GC logs on the Workers, even after a JVM crash/segfault?
If you are using Java 8 and above, you may consider getting away with it by adding %p to the log file name to introduce the PID which will be kind of unique per crash.
I had to run jmap in order to take heap dump of my process. but jvm returned:
Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
So I used the -F:
./jmap -F -dump:format=b,file=heap.bin 10330
Attaching to process ID 10331, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Dumping heap to heap.bin ...
Using -F is allright for taking heap dump?
I am waiting 20 minutes and not finished yet. Any ideas why?
jmap vs. jmap -F, as well as jstack vs. jstack -F use completely different mechanisms to communcate with the target JVM.
jmap / jstack
When run without -F these tools use Dynamic Attach Mechanism. This works as follows.
Before connecting to Java process 1234, jmap creates a file .attach_pid1234 at the working directory of the target process or at /tmp.
Then jmap sends SIGQUIT to the target process. When JVM catches the signal and finds .attach_pid1234, it starts AttachListener thread.
AttachListener thread creates UNIX domain socket /tmp/.java_pid1234 to listen to commands from external tools.
For security reasons when a connection (from jmap) is accepted, JVM verifies that credentials of the socket peer are equal to euid and egid of JVM process. That's why jmap will not work if run by different user (even by root).
jmap connects to the socket, and sends dumpheap command.
This command is read and executed by AttachListener thread of the JVM. All output is sent back to the socket. Since the heap dump is made in-process directly by JVM, the operation is really fast. However, JVM can do this only at safepoints. If a safepoint cannot be reached (e.g. the process is hung, not responding, or a long GC is in progress), jmap will timeout and fail.
Let's summarize the benefits and the drawbacks of Dynamic Attach.
Pros.
Heap dump and other operations are run collaboratively by JVM at the maximum speed.
You can use any version of jmap or jstack to connect to any other version of JVM.
Cons.
The tool should be run by the same user (euid/egid) as the target JVM.
Can be used only on live and healthy JVM.
Will not work if the target JVM is started with -XX:+DisableAttachMechanism.
jmap -F / jstack -F
When run with -F the tools switch to special mode that features HotSpot Serviceability Agent. In this mode the target process is frozen; the tools read its memory via OS debugging facilities, namely, ptrace on Linux.
jmap -F invokes PTRACE_ATTACH on the target JVM. The target process is unconditionally suspended in response to SIGSTOP signal.
The tool reads JVM memory using PTRACE_PEEKDATA. ptrace can read only one word at a time, so too many calls required to read the large heap of the target process. This is very and very slow.
The tool reconstructs JVM internal structures based on the knowledge of the particular JVM version. Since different versions of JVM have different memory layout, -F mode works only if jmap comes from the same JDK as the target Java process.
The tool creates heap dump itself and then resumes the target process.
Pros.
No cooperation from target JVM is required. Can be used even on a hung process.
ptrace works whenever OS-level privileges are enough. E.g. root can dump processes of all other users.
Cons.
Very slow for large heaps.
The tool and the target process should be from the same version of JDK.
The safepoint is not guaranteed when the tool attaches in forced mode. Though jmap tries to handle all special cases, sometimes it may happen that target JVM is not in a consistent state.
Note
There is a faster way to take heap dumps in forced mode. First, create a coredump with gcore, then run jmap over the generated core file. See the related question.
I just found that jmap (and presumably jvisualvm when using it to generate a heap dump) enforces that the user running jmap must be the same user running the process attempting to be dumped.
in my case the jvm i want a heap dump for is being run by linux user "jboss". so where sudo jmap -dump:file.bin <pid> was reporting "Unable to open socket:", i was able to grab my heap dump using:
sudo -u jboss jmap -dump:file.bin <pid>
If your application is runing as a systemd service.You should open service file that under /usr/lib/systemd/system/ and named by your service name. Then check whether privateTmp attribute is true.
If it is true,you shoud change it to false,then refresh service by command as follow:
systemctl daemon-reload
systemctl restart [servicename]
If you want runing jmap/jcmd before restart, you can make use of the execStop script in the service file. Just put command in it and to execute systemctl stop [service name]
Just like ben_wing said, you can run with:
sudo -u jboss-as jmap -dump:file.bin <pid>
(in my case the user is jboss-as, but yours could be jboss or some other.)
But it was not enough, because it asked me for a password ([sudo] password for ec2-user:), although I could run sudo without prompting me for a password with other commands.
I found the solution here, and I just needed to add another sudo first:
sudo sudo -u jboss-as jmap -dump:file.bin <pid>
It works with other commands like jcmd and jinfo too.
It's usually solved with -F.
As stated in the message:
The -F option can be used when the target process is not responding
I encountered a situation where the full GC made it impossible to execute the command.
I would like to stop Cassandra from dumping hprof files as I do not require the use of them.
I also have very limited disk space (50GB out of 100 GB is used for data), and these files swallow up all the disk space before I can say "stop".
How should I go about it?
Is there a shell script that I could use to erase these files from time to time?
It happens because Cassandra starts with -XX:+HeapDumpOnOutOfMemoryError Java option. Which is good stuff if you want to analyze. Also, if you are getting lots of heap-dump that indicate that you should probably tune the memory available to Cassandra.
I haven't tried it. But to block this option, comment the following line in $CASSANDRA_HOME/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"
Optionally, you may comment this block as well, but not really required, I think. This block is available in 1.0+ version I guess. I can't find this in 0.7.3.
# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
fi
Let me know if this worked.
Update
...I guess it is JVM throwing it out when Cassandra crashes / shuts down. Any way to prevent that one from happening?
If you want to disable JVM heap-dump altogether, see here how to disable creating java heap dump after VM crashes?
I'll admit i haven't used Cassandra, but from what i can tell, it shouldn't be dumping any hprof files unless you enable it at compile time, or the program experiences an OutofMemoryException. So try looking there.
in terms of a shell script, if the files are being dumped to a specific location you can use this command to delete all *.hprof files.
find /my/location/ -name *.hprof -delete
this is using the -delete directive from find that deletes all files that match the search. Look at the man page for find for more search options if you need to narrow it down more.
You can use cron to run a script at a given time, which would satisfy your "time to time" requirement, most linux distros have a cron installed, and work off of a crontab file. You can find out more about the crontab by using man crontab
Even if you update cassandra-env.sh to point to the heapdump path it will still not work. The reason was that from the upstart script /etc/init.d/cassandra there is this line which creates the default HeapDump path
start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p "$PIDFILE" -- \
-p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null || return 2
I'm not an upstart expert but what I did was just removed the param which creates the duplicate. Another weird observation also when checking cassandra process via ps aux you'll notice that you'll see some parameters being written twice. If you source cassandra-env.sh and print $JVM_OPTS you'll notice those variables okay.