We have an open beta of an app which occasionally causes the heapspace to overflow. The JVM reacts by going on a permanent vacation.
To analyze this I would like to peek into the memory at the point where it failed. Java does not want me to do this. The process is still in memory but it doesn't seem to be recognized as a java process.
The server in question is a debian Lenny server, Java 6u14
/opt/jdk/bin# ./jmap -F -dump:format=b,file=/tmp/apidump.hprof 11175
Attaching to process ID 11175, please wait...
sun.jvm.hotspot.debugger.NoSuchSymbolException: Could not find symbol "gHotSpotVMTypeEntryTypeNameOffset" in any of the known library names (libjvm.so, libjvm_g.so, gamma_g)
at sun.jvm.hotspot.HotSpotTypeDataBase.lookupInProcess(HotSpotTypeDataBase.java:390)
at sun.jvm.hotspot.HotSpotTypeDataBase.getLongValueFromProcess(HotSpotTypeDataBase.java:371)
at sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:102)
at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85)
at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568)
at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)
at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jmap.JMap.runTool(JMap.java:179)
at sun.tools.jmap.JMap.main(JMap.java:110)
Debugger attached successfully.
sun.jvm.hotspot.tools.HeapDumper requires a java VM process/core!
The solution was very simple. I was running the jmap as root, but I had to run it as the user who started the jvm. I will now go hide my head in shame.
I was running the jmap and the application with the same user and still get the error.
The solution was run that comand before the jmap
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
Than is just use jmap and will works fine
jmap -heap 17210
If someone tries to get Heap Dump of Java application in Docker container.
This is the only solution that worked for me:
docker exec <container-name> jcmd 1 GC.heap_dump /tmp/docker.hprof
It basically dumps the heap of process with pid=1 using jcmd
See https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html
Future Googlers:
This could also happen if you installed the JDK while the process you're trying to jmap was running.
If that's the case, restart the java process.
Follow the below steps to take the thread and Heap dumps from a docker container
Run the below command to bash into the container. Please change the CONTAINER_NAME appropriately
docker exec -it CONTAINER_NAME bash
Then type jps to find the all the Java application details and extract the PID for your application
jps
Then run the below command to get the thread dump. Please change the PID appropriately
jstack PID > threadDump.tdump
Then run the below command to get the Heap dump. Please change the PID appropriately
jmap -dump:live,format=b,file=heapDump.hprof PID
Then exit from the docker container and download the threadDump.tdump and heapDump.hprof from the docker container by running the below command. Please change the CONTAINER_NAME appropriately
sudo docker cp CONTAINER_NAME:threadDump.tdump .
sudo docker cp CONTAINER_NAME:heapDump.hprof .
What happens if you just run
./jmap -heap 11175
And are you sure the application JVM is identical to the JMAP JVM? (same version, etc)
You need to use the jmap that comes with the JVM.
I got the same jmap error on a linux machine that have two different OpenJdks installed. First I installed OpenJDK 1.6 and after that OpenJDK 1.7.
A call of ...
/usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -XshowSettings:properties -version
# produce the following output ...
...
java.library.path = /usr/java/packages/lib/amd64
/usr/lib/x86_64-linux-gnu/jni
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu
/usr/lib/jni
/lib
/usr/lib
...
java version "1.7.0_65"
With including '/usr/lib' every with OpenJDK 1.7.* started program includes the libraries of the first installed JDK (in my case OpenJDK 1.6.*). So the jmap versions of Java6 and Java7 failed.
After I changed the start for the Java7 programms with included OpenJDK 1.7 libraries ...
/usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -Djava.library.path=/usr/lib/jvm/java- \
7-openjdk-amd64/jre/lib/amd64/server:/usr/java/packages/lib/amd64: \
/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/ \
x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib ...
I was able access proccess with the Java 7 version of the jmap program. But it needs a sudo to run.
I have the same problem, I'm trying to find a memory leak in a process running inside a Docker container. I wasn't able to use jmap, instead I used this:
jcmd <pid> GC.class_histogram
This gives you a list of the objects in the memory. And from the Oracle documentation:
It is recommended to use the latest utility, jcmd instead of jmap utility for enhanced diagnostics and reduced performance overhead. https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks004.html
1.Execute "Docker ps", will give the container Id of all services and collect the container id foe TSC.
2.Execute "docker exec -it CONTAINER_ID bash" (replace CONTAINER_ID with TSC Container id)
3.Bash will come and then execute the "jps" on bash, that will give you the PID for process(it will be 1 for jar)
4.Execute the "jstack PID > threadDump.tdump"(replace PID with process id received in step 3, it should be 1)
5.Execute the "jmap -dump:format=b,file=heapDump.hprof PID"(replace PID with process id received in step 3, it should be 1)
6.Then we have to exit the bash using "exit" command
7.Execute "sudo docker cp CONTAINER_ID:heapDump.hprof ." from ec2 command line, that will copy the dump file on ec2 machine present working directory.
8.Execute "sudo docker cp CONTAINER_ID:threadDump.tdump ." from ec2 command line, that will copy the dump file on ec2 machine present working directory.
When none of these work or if you don't want to change sensitive OS flags such as ptrace_scope:
Either you can use jconsole/jvisualvm to trigger heap dumps or run any JMX client directly from console as follows as you are doing it locally on the machine that needs the dump and so is faster:
echo 'jmx_invoke -m com.sun.management:type=HotSpotDiagnostic dumpHeap heapdump-20160309.hprof false' | java -jar jmxsh.jar -h $LOCALHOST_OR_IP -p $JMX_PORT
I used the wget https://github.com/davr/jmxsh/raw/master/jmxsh.jar for this example.
What worked for me was to simply issue the command with sudo as in:
sudo jmap -heap 21797
In my case it is not as simple as check the user :(
I have a script called collectd-java which invokes jstat and jmap. I've checked by top that such script is launched, as expected, by the user owning the JVM. However, jstat gives me what I need and jmap can't attach. Here is the script - the echo stuff is just the format I need to present the values:
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}"
INTERVAL="${COLLECTD_INTERVAL:-60}"
MAIN_CLASS="my.fully.qualified.MainClass"
PID=$(pgrep -f ${MAIN_CLASS})
get_jstat_classloaderdata() {
VALUE=`jstat -class $PID 1 1 | awk '{print $1}' | grep -vi loaded`
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-java_classloader_loaded\" interval=$INTERVAL N:$VALUE"
VALUE=`jstat -class $PID 1 1 | awk '{print $2}' | grep -vi bytes`
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-java_classloader_bytesload\" interval=$INTERVAL N:$VALUE"
VALUE=`jstat -class $PID 1 1 | awk '{print $3}' | grep -vi unload`
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-java_classloader_unloaded\" interval=$INTERVAL N:$VALUE"
VALUE=`jstat -class $PID 1 1 | awk '{print $4}' | grep -vi bytes`
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-java_classloader_bytesunload\" interval=$INTERVAL N:$VALUE"
VALUE=`jstat -class $PID 1 1 | awk '{print $5}' | grep -vi time`
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-java_classloader_time\" interval=$INTERVAL N:$VALUE"
}
get_jmap_heapdata() {
VALUE=$(jmap -heap ${PID} | grep MinHeapFreeRatio |awk '{print $3}')
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-jmap_minheapfreeratio\" interval=$INTERVAL N:$VALUE"
VALUE=$(jmap -heap ${PID} | grep MaxHeapFreeRatio|awk '{print $3}')
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-jmap_maxheapfreeratio\" interval=$INTERVAL N:$VALUE"
VALUE=$(jmap -heap ${PID} | grep MaxHeapSize|awk '{print $3}')
echo "PUTVAL \"$HOSTNAME/exec-cecoco/gauge-jmap_maxheapsize\" interval=$INTERVAL N:$VALUE"
}
##Do it
get_jmap_heapdata
get_jstat_classloaderdata
Jstat succeeds and jmap fails. Does anyone understands it ?
Not sure why a plain "jmap " fails when I docker exec -it into my container running centos7 systemd and a java service, but below jmap options worked for me. Thanks:
https://dkbalachandar.wordpress.com/2016/07/05/thread-dump-from-a-docker-container/
[root#b29924306cfe /]# jmap 170
Attaching to process ID 170, please wait...
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 170: Operation not permitted
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 170: Operation not permitted
[root#b29924306cfe /]# jmap -dump:live,format=b,file=heapDump.hprof 170
Dumping heap to /heapDump.hprof ...
Heap dump file created
Related
I am using ubuntu where OpenJDK is installed , and I want to check if heap size goes beyond 80% an alert should be sent
I know command which is
jcmd GC.heap_info
, but the problem is process id will be continuously get change on ubuntu.
so can any one suggest script for this.
You can use jps to find your java process:
jps -l
2770 org.netbeans.Main
5144 jdk.jcmd/sun.tools.jps.Jps
So when you want the process id of NetBeans for example. You can do something like:
jps -l | grep org.netbeans.Main | cut -f1 -d ' '
This information you can pass to jcmd:
jcmd `jps -l | grep org.netbeans.Main | cut -f1 -d ' '` GC.heap_info
I have a BI application (looker) runs on a linux VM.
tobe able to restart the service, I need to clear the existing java process.
In below screenshot, after run below script, there is a java process, but not showing in the list when I run jps script. What's the reason? and how can I properly terminate this java process?
ps aux | grep java
Have you tried these ?
killall java
or
kill $(pidof java)
As you can see from your image, the process id is changing each time 9287 / 9304 and represents | grep java - and not a java VM!
A common fix is to filter the ps results for not matching | grep -v, such as:
ps aux | grep java | grep -v --regexp=grep.\*java
If there are results above you could append commands to read the process ids and kill command:
kill -TERM $(ps aux | grep java | grep -v --regexp=grep.\*java | awk '{print $2}')
Note: the above will kill all processes with "java" in name so is not very useful if there are multiple java services for same account. You may need to add filter for specific Java VMs.
I am working on a Gradle Java project. Which starts Tomcat for testing and stops it later.
I am supposed to kill this Tomcat instance when the test fails.
I tried using "ps aux| grep tomcat | grep -v grap | awk {print $2}" command to get the process id and kill the process.
But on Production, there will be so many Tomcat processes running simultaneously by many users, I just want the tomcat process started by my build.gradle for test execution.
So how can I accomplish the task? Please provide me some guidelines.
You need to find a unique string in the output of 'ps aux' which differentiates your test tomcat and others'.
I currently use the below script to run 'shutdown.sh' first and then kill the PID as most of the times, the application stops but the process does not stop.
PID=`ps -ef | grep $JAVA_HOME/bin/java | grep "$TOMCAT_LOC"/conf | grep -v grep | awk '{ print $2 }'`
if [ $PID ]; then
echo tomcat is running with PID:$PID.
# Stop or Kill running Tomcat
if [[ -f $TOMCAT_LOC/bin/shutdown.sh ]]; then
[[ ! -x $TOMCAT_LOC/bin/shutdown.sh ]] && chmod a+x $TOMCAT_LOC/bin/shutdown.sh
$TOMCAT_LOC/bin/shutdown.sh >>/dev/null
sleep 20
fi
kill -9 $PID
sleep 3
else
echo tomcat is not running
fi
You may also look at configuring a PID file by editing the 'catalina.sh' which you can read later to find out your PID.
# CATALINA_PID (Optional) Path of the file which should contains the pid
# of the catalina startup java process, when start (fork) is
# used
Java JRE has tool called jps in $JAVA_HOME/bin folder.
It's similar to unix ps command but for java only.
You can use it to determined exac java process you need.
Using this tool is more recommended and actually it is more useful, when you have more than one java applications is running on your host...
for example I have running h2 database and many other apps, but wanna kill only h2, so I can use jps to get it PID
$ jps
17810 GradleDaemon
17798 GradleWrapperMain
17816 h2-1.4.197.jar
17817 GradleDaemon
17818 GradleDaemon
18011 Jps
16479
and then just kill needed process:
kill -9 17816
and all other java apps will continue work normally. I not sure about tomcat, but I think it can be done in similar way, something like that:
kill -9 $(jps | grep tomcat | awk '{print $1}')
Lastly, little bit offtopic, but a specially to your case: correct way would be using start/stop/restart scripts provided by tomcat
The correct way to terminate a Tomcat instance is via its own shutdown command. You should not be thinking of processes, or PIDs, or kills, at all.
so if you want to kill tomcat from that user from which you have logged in then try following and let me know if this helps you.
ps -ef | grep -v grep | grep `whoami` | grep tomcat
So by ps -ef I am listing all the processes then grep -v grep will remove the grep command's process then grep whoami will look for your currently logged in user then grep tomcat will look only for tomcat process, test it once and if All is Well then you could kill it.
By the way how about tomcat stop script? In case it is there you could use that also.
You can use shell variable $!. It represents the PID of the most recent background command.
yourCommand &
CMD_PID=$!
echo $CMD_PID
I have a java app on my (Ubuntu) server. If I do this, then it starts correctly:
/usr/bin/java -cp /home/jenkins/veta/lily.jar com.sugarapp.lily.Main
but I don't know how to get its PID. I don't know how to stop it with an init.d script.
I have a different app, written in Clojure, and for it I was able to write an init.d script that works great. So I tried to refashion that init.d script for my Java app, and this is what I got:
WORK_DIR="/home/jenkins/veta"
NAME="lily"
JAR="lily.jar"
USER="jenkins"
DAEMON="/usr/bin/java"
DAEMON_ARGS=" -cp /home/jenkins/veta/lily.jar com.sugarapp.lily.Main"
start () {
echo "Starting lily..."
if [ ! -f $WORK_DIR/lily.pid ]; then
/sbin/start-stop-daemon --start --verbose --background --chdir $WORK_DIR --exec $DAEMON --pidfile $WORK_DIR/lily.pid --chuid "$USER" --make-pidfile -- $DAEMON_ARGS
else
echo "lily is already running..."
fi
}
stop () {
echo "Stopping lily..."
/sbin/start-stop-daemon --stop --exec $DAEMON --pidfile $WORK_DIR/lily.pid
rm $WORK_DIR/lily.pid
}
But this doesn't work. Although the PID in $WORK_DIR/lily.pid changes every time I run the script, no process with that PID ever seems to run. If I try:
ps aux | grep java
I don't see this app, nor if I try using the PID.
So is there a way I can use the first command, but somehow capture the PID, so I can store it for later?
I just want a reliable way to stop and start this app. That can be by PID or some other factor. I'm open to suggestions.
UPDATE:
Maybe my question is unclear? Something like jps or ps will give me too many answers. If I do something like "ps aux | grep java" I'll see that there are 5 different java apps running on the server. The start-stop-daemon won't know which PID belongs to this particular app, nor can I figure out what I should feed into my init.d script.
If your system has jdk installed there is an utility called jps which resides in jdk/bin. It will display the list of running java process. Make use of it.
If jdk is not installed in your machine then you have to grep the java process from ps -eaf command.
If you want the pid from the command line, this might work:
myCommand & echo $!
Which I copied from the accepted response to a very similar topic in ServerFault: https://serverfault.com/a/205504
I am running Elasticsearch (ES) on a small AWS Ubuntu box, and working on tuning the performance of the box overall.
After a recent deploy using Saltstack, I noticed the number of running instances went from two to three -- after being at two for several months. The uptick in instances seems to correspond to an uptick in memory usage.
I confirmed with ps that there are three java processes running on the box:
PID TTY TIME CMD
9295 ? 00:02:08 java
14398 ? 00:00:12 java
26175 ? 00:40:48 java
When I stop ES with command "sudo service elasticsearch stop", I was still left with two ES processes running according to ps:
PID TTY TIME CMD
9295 ? 00:02:08 java
26175 ? 00:40:48 java
I restarted the service and then I had three again. This seems really strange two me because it seemed like two of the services were unresponsive to the stop command. (Could this be a so-called Zombie or Orphan process?)
I manually killed all three processes and restarted ES, and now have only a single ES instance. I wondered if these wayward java processes were related to some other service, but after killing all three, New Relic confirmed a large drop in memory usage and processes -- so they were definitely all ES-related processes:
My question is why, after a deploy, would the number of running instances go up?
Is there a functional Elasticsearch reason for this, or was this a bug?
What would cause either Elasticsearch or any service on Ubuntu in general to go into this unresponsive state?
Any insight is greatly appreciated!
What do you get when running this command:
lsof -i :9200-9399 | tail -n +2 | awk '{print $2}' | xargs ps -p
lsof -i :9200-9399 will list all open files on the port range 9200-9399, i.e. the default port range used by ES. Change the range if your configuration differs.
tail -n +2 will ditch the first output line from the lsof command, it contains column headers and it's not useful
awk '{print $2}' will fetch only the process id (PID) from the lsof output
Finally, xargs ps -p will run the ps command to find out what process runs under the PID fetched by the awk command.
You should get an output like below and that might get you started in your investigation.
PID TTY TIME CMD
21199 ttys011 4:39.26 /usr/bin/java -Xms256m -Xmx1g....
22234 ttys012 5:12.22 /usr/bin/java -Xms256m -Xmx1g....
23444 ttys013 3:33.54 /usr/bin/java -Xms256m -Xmx1g....