I'm considering whipping up a script to
run once a minute (or every five minutes)
run jstack against a running JVM in production
parse the jstack output and tally up things I'm interested in
export the results for graphing 24/365 by a centralized Cacti installation on another server
But I've no clue as to how expensive or invasive jstack is on a running JVM. How expensive is it to execute jstack on a running JVM? Am I setting myself up for a world of hurt?
I know this answer is late to the party but the expensive part of jstack comes from attaching to the debugger interface not generally generating the stack traces with an important exception (and the heap size does not matter at all):
Arbitrary stack traces can be generated on a safe point only or while the thread is waiting (i.e. outside java scope). If the thread is waiting/outside java scope the stack requesting thread will carry the task by doing the stack walk on its own. However, you might not wish to "interrupt" a thread to walk its own stack, esp while it is holding a lock (or doing some busy wait). Since there is no way to control the safe points - that's a risk need to be considered.
Another option compared to jstack avoiding attaching to the debugging interface: Thread.getAllStackTraces() or using the ThreadMXBean, run it in the process, save to a file and use an external tool to poll that file.
Last note: I love jstack, it's immense on production system.
Measure. One of the time variants (/usr/bin/time I believe) has a -p option which allows you to see the cpu resources used:
ravn:~ ravn$ /usr/bin/time -p echo Hello World
Hello World
real 0.32
user 0.00
sys 0.00
ravn:~ ravn$
This means that it took 0.32 seconds wall time, using 0.00 seconds of cpu time in user space and 0.00 seconds in kernel space.
Create a test scenario where you have a program running but not doing anything, and try comparing with the usage WITH and WITHOUT a jstack attaching e.g. every second. Then you have hard numbers and can experiment to see what would give a reasonable overhead.
My hunch would be that once every five minutes is neglectible.
Depending on the number of threads and the size of your heap, jstack could possibly be quite expensive. JStack is meant for troubleshooting and wasn't designed for stats gathering. It might be better to use some form on instrumentation or expose a JMX interface to get the information you want directly, rather than have to parse stack traces.
Related
I have a Java Application (web-based) that at times shows very high CPU Utilization (almost 90%) for several hours. Linux TOP command shows this. On application restart, the problem goes away.
So to investigate:
I take Thread Dump to find what threads are doing. Several Threads are found in 'RUNNABLE' state, some in few other states. On taking repeated Thread Dumps, i do see some threads that are always present in 'RUNNABLE' state. So, they appear to be the culprit.
But I am unable to tell for sure, which Thread is hogging the CPU or has gone into a infinite loop (thereby causing high CPU util).
Logs don't necessarily help, as the offending code may not be logging anything.
How do I investigate - What part of the application or what-thread is causing High CPU Utilization? - Any other ideas?
If a profiler is not applicable in your setup, you may try to identify the thread following steps in this post.
Basically, there are three steps:
run top -H and get PID of the thread with highest CPU.
convert the PID to hex.
look for thread with the matching HEX PID in your thread dump.
You may be victim of a garbage collection problem.
When your application requires memory and it's getting low on what it's configured to use the garbage collector will run often which consume a lot of CPU cycles.
If it can't collect anything your memory will stay low so it will be run again and again.
When you redeploy your application the memory is cleared and the garbage collection won't happen more than required so the CPU utilization stays low until it's full again.
You should check that there is no possible memory leak in your application and that it's well configured for memory (check the -Xmx parameter, see What does Java option -Xmx stand for?)
Also, what are you using as web framework? JSF relies a lot on sessions and consumes a lot of memory, consider being stateless at most!
In the thread dump you can find the Line Number as below.
for the main thread which is currently running...
"main" #1 prio=5 os_prio=0 tid=0x0000000002120800 nid=0x13f4 runnable [0x0000000001d9f000]
java.lang.Thread.State: **RUNNABLE**
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:313)
at com.rana.samples.**HighCPUUtilization.main(HighCPUUtilization.java:17)**
During these peak CPU times, what is the user load like? You say this is a web based application, so the culprits that come to mind is memory utilization issues. If you store a lot of stuff in the session, for instance, and the session count gets high enough, the app server will start thrashing about. This is also a case where the GC might make matters worse depending on the scheme you are using. More information about the app and the server configuration would be helpful in pointing towards more debugging ideas.
Flame graphs can be helpful in identifying the execution paths that are consuming the most CPU time.
In short, the following are the steps to generate flame graphs
yum -y install perf
wget https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz
tar -xvf async-profiler-1.8.3-linux-x64.tar.gz
chmod -R 777 async-profiler-1.8.3-linux-x64
cd async-profiler-1.8.3-linux-x64
echo 1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
JAVA_PID=`pgrep java`
./profiler.sh -d 30 $JAVA_PID -f flame-graph.svg
flame-graph.svg can be opened using browsers as well, and in short, the width of the element in stack trace specifies the number of thread dumps that contain the execution flow relatively.
There are few other approaches to generating them
By introducing -XX:+PreserveFramePointer as the JVM options as described here
Using async-profiler with -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints as described here
But using async-profiler without providing any options though not very accurate, can be leveraged with no changes to the running Java process with low CPU overhead to the process.
Their wiki provides details on how to leverage it. And more about flame graphs can be found here
Your first approach should be to find all references to Thread.sleep and check that:
Sleeping is the right thing to do - you should use some sort of wait mechanism if possible - perhaps careful use of a BlockingQueue would help.
If sleeping is the right thing to do, are you sleeping for the right amount of time - this is often a very difficult question to answer.
The most common mistake in multi-threaded design is to believe that all you need to do when waiting for something to happen is to check for it and sleep for a while in a tight loop. This is rarely an effective solution - you should always try to wait for the occurrence.
The second most common issue is to loop without sleeping. This is even worse and is a little less easy to track down.
You did not assign the "linux" to the question but you mentioned "Linux top". And thus this might be helpful:
Use the small Linux tool threadcpu to identify the most cpu using threads. It calls jstack to get the thread name. And with "sort -n" in pipe you get the list of threads ordered by cpu usage.
More details can be found here:
http://www.tuxad.com/blog/archives/2018/10/01/threadcpu_-_show_cpu_usage_of_threads/index.html
And if you still need more details then create a thread dump or run strace on the thread.
I am analyzing the differences between approaches for taking thread dumps. Below are the couple of them I am researching on
Defining a jmx bean which triggers jstack through Runtime.exec() on clicking a declared bean operation.
Daemon thread executing "ManagementFactory.getThreadMXBean().dumpAllThreads(true, true)" repeatedly after a predefined interval.
Comparing the thread dump outputs between the two, I see the below disadvantages with approach 2
Thread dumps logged with approach 2 cannot be parsed by open source thread dump analyzers like TDA
The ouput does not include the native thread id which could be useful in analyzing high cpu issues (right?)
Any more?
I would appreciate to get suggestions/inputs on
Are there any disadvantages of executing jstack through Runtime.exec() in production code? any compatibility issues on various operating systems - windows, linux?
Any other approach to take thread dumps?
Thank you.
Edit -
A combined approach of 1 and 2 seems to be the way to go. We can have a dedicated thread running in background and printing the thread dumps in the log file in a format understood by the thread dump analyzers.
If any extra information is need (like say probably the native thread id) which is logged only by the jstack output, we do it manually as required.
You can use
jstack {pid} > stack-trace.log
running as the user on the box where the process is running.
If you run this multiple times you can use a diff to see which threads are active more easily.
For analysing the stack traces I use the following sampled periodically in a dedicated thread.
Map<Thread, StackTraceElement[]> allStackTraces = Thread.getAllStackTraces();
Using this information you can obtain the thread's id, run state and compare the stack traces.
With Java 8 in picture, jcmd is the preferred approach.
jcmd <PID> Thread.print
Following is the snippet from Oracle documentation :
The release of JDK 8 introduced Java Mission Control, Java Flight Recorder, and jcmd utility for diagnosing problems with JVM and Java applications. It is suggested to use the latest utility, jcmd instead of the previous jstack utility for enhanced diagnostics and reduced performance overhead.
However, shipping this with the application may be licensing implications which I am not sure.
If its a *nix I'd try kill -3 <PID>, but then you need to know the process id and maybe you don't have access to console?
I'd suggest you do all the heap analysis on a staging environment if there is such an env, then reflect your required Application Server tuning on production if any. If you need the dumps for analysis of your application's memory utilization, then perhaps you should consider profiling it for a better analysis.
Heap dumps are usually generated as a result of OutOfMemoryExceptions resulting from memory leaks and bad memory management.
Check your Application Server's documentation, most modern servers have means for producing dumps at runtime aside from the normal cause I mentioned earlier, the resulting dump might be vendor specific though.
My question is, do JVM's share some kind of resource related to threading or processes that could cause ProcessBuilder performance to spike after a month or more of normal usage? Using java 6 update 21 for all apps.
Over the past several months, we've noticed that a single server in our data center (Sparc M4000 running Solaris 10) can go about 6-8 weeks with no problems. Quickly, however, performance on an application that utilizes the ProcessBuilder class to run scripts takes a huge performance hit - with ProcessBuilder.start taking over a minute to return sometimes. After a reboot, and for several weeks after, normal return time is in the 10s or maybe 100 millisecond range.
I wrote a separate small application that creates 5 threads, and each thread runs the 'ls' command using ProcessBuilder 10 times serially, then I gather stats from that in order to monitor the original problem. This application exits after each run, and is run from cron only once an hour. It usually only takes a second or two.
Last night, ProcessBuilder times spiked again to over a minute for each ProcessBuilder.start call, after 45 days of uptime and normal behavior.
top shows no memory or CPU hogs. I did try to do a jstack on the test app, but got the error 'Can't create thread_db agent'.
Any ideas?
We had a similar problem on our application which runs in Linux. The Linux JVM code uses a fork which means the address space gets mapped and copied each time you exec. We were executing many small short lived processes. It appears a main difference from your app is that we had a relatively large heap (around 240GB) so I'm sure that had an impact. We ended up implementing our own spawning code using JNI and posix spawn. Here is a link to the question/answer: Slowing process creation under java
I have a single, large heap (up to 240GB, though in the 20-40GB range for most of this phase of execution) JVM [1] running under Linux [2] on a server with 24 cores. We have tens of thousands of objects that have to be processed by an external executable & then load the data created by those executables back into the JVM. Each executable produces about half a megabyte of data (on disk) that when read right in, after the process finishes, is, of course, larger.
Our first implementation was to have each executable handle only a single object. This involved the spawning of twice as many executables as we had objects (since we called a shell script that called the executable). Our CPU utilization would start off high, but not necessarily 100%, and slowly worsen. As we began measuring to see what was happening we noticed that the process creation time [3] continually slows. While starting at sub-second times it would eventually grow to take a minute or more. The actual processing done by the executable usually takes less than 10 seconds.
Next we changed the executable to take a list of objects to process in an attempt to reduce the number of processes created. With batch sizes of a few hundred (~1% of our current sample size), the process creation times start out around 2 seconds & grow to around 5-6 seconds.
Basically, why is it taking so long to create these processes as execution continues?
[1] Oracle JDK 1.6.0_22
[2] Red Hat Enterprise Linux Advanced Platform 5.3, Linux kernel 2.6.18-194.26.1.el5 #1 SMP
[3] Creation of the ProcessBuilder object, redirecting the error stream, and starting it.
My guess is that you MIGHT be running into problems with fork/exec, if Java is using the fork/exec system calls to spawn subprocesses.
Normally fork/exec is fairly efficient, because fork() does very little - all pages are copy-on-write. This stops being so true with very large processes (i.e. those with gigabytes of pages mapped) because the page tables themselves take a relatively long time to create - and of course, destroy, as you immediately call exec.
As you're using a huge amount of heap, this might be affecting you. The more pages you have mapped in, the worse it may become, which could be what's causing the progressive slowdown.
Consider either:
Using posix_spawn, if that is NOT implemented by fork/exec in libc
Using a single subprocess which is responsible for creating / reaping others; spawn this once and use some IPC (pipes etc) to tell it what to do.
NB: This is all speculation; you should probably do some experiments to see whether this is the case.
Most likely you are running out of a resource. Are your disks getting busier as you create these processes. Do you ensure you have less processes than you have cores? (To minimise context switches) Is your load average below 24?
If your CPU consumption is dropping you are likely to be hitting IO (disk/network) contention i.e. the processes cannot get/write data fast enough to keep them busy. If you have 24 cores, how many disks do you have?
I would suggest you have one process per CPU (in your case I imagine 4) Give each JVM six tasks to run concurrently to use all its cores without overloading the system.
You would be much better off using a set of long lived processes pulling your data off of queues and sending them back that constantly forking new processes for each event, especially from the host JVM with that enormous heap.
Forking a 240GB image is not free, it consumes a large amount of virtual resources, even if only for a second. The OS doesn't know how long the new process will be aware so it must prepare itself as if the entire process will be long lived, thus it sets up the virtual clone of all 240GB before obliterating it with the exec call.
If instead you had a long lived process that you could end objects to via some queue mechanism (and there are many for both Java and C, etc.), that would relieve you of some of the pressure of the forking process.
I don't know how you are transferring the data form the JVM to the external program. But if your external program can work with stdin/stdout, then (assuming you're using unix), you could leverage inetd. Here you make a simple entry in the inetd configuration file for your process, and assign it a port. Then you open up a socket, pour the data down in to it, then read back from the socket. Inetd handles the networking details for you and your program works as simply with stdin and stdout. Mind you'll have an open socket on the network, which may or may not be secure in your deployment. But it's pretty trivial to set up even legacy code to run via a network service.
You could use a simple wrapper like this:
#!/bin/sh
infile=/tmp/$$.in
outfile=/tmp/$$.out
cat > $infile
/usr/local/bin/process -input $infile -output $outfile
cat $outfile
rm $infile $outfile
It's not the highest performing server on the planet designed to zillions of transactions, but it's sure a lot faster than forking 240GB over and over and over.
I most agree with Peter. Your are most probably suffering from IO bottlenecks. Once you have may process the OS has to work harder too for trivial tasks hence having exponential performance penalty.
So the 'solution' could be to create 'consumer' processes, only initialise certain few; as Peter suggested one per CPU or more. Then use some form of IPC to 'transfer' these objects to the consumer processes.
Your 'consumer' processes should manage sub-process creation; the processing executable which I presume you don't have any access to, and this way you don't clutter the OS with too many executables and the 'job' will be "eventually" complete.
I have an application running on Websphere Application Server 6.0 and it crashes nearly every day because of Out-Of-Memory. From verbose GC is certain there are the memory leaks(many of them)
Unfortunately the application is provided by external vendor and getting things fixed is slow & painful process. As part of the process I need to gather the logs and heapdumps each time the OOM occurs.
Now I'm looking for some way how to automate it. Fundamental problem is how to detect OOM condition. One way would be to create shell script which will periodically search for new heapdumps. This approach seems me a kinda dirty. Another approach might be to leverage the JMX somehow. But I have little or no experience in this area and don't have much idea how to do it.
Or is in WAS some kind of trigger/hooks for this? Thank you very much for every advice!
You can pass the following arguments to the JVM on startup and a heap dump will be automatically generated on an OutOfMemoryError. The second argument lets you specify the path for the heap dump file. By using this at least you could check for the existence of a specific file to see if a heap dump has occurred.
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=<value>
I see two options if you want heap dumping automated but #Mark's solution with heap dump on OOM isn't satisfactory.
You can use the MemoryMXBean to detect high memory pressure, and then programmatically create a heap dump if the usage (or usage delta) seems high.
You can periodically get memory usage info and generate heap dumps with a cron'd shell script using jmap (works both locally and remote).
It would be nice if you could have a callback on OOM, but, uhm, that callback probably would just crash with an OOM error. :)
Have you looked at JConsole ? It uses JMX to give you visibility of a variety of JVM metrics, including memory info. It would probably be worth monitoring your application using this to begin with, to get a feel for how/when the memory is consumed. You may find the memory is consumed uniformly over the day, or when using certain features.
Take a look at the detecting low memory section of the above link.
If you need you can then write a JMX client to watch the application automatically and trigger whatever actions required. JConsole will indicate which JMX methods you need to poll.
And alternative to waiting until the application has crashed may be to script a controlled restart like every night if you're optimistic that it can survive for twelve hours..
Maybe even websphere can do that for you !?
You could add a listener (Session scoped or Application scope attribute listener) class that would be called each time a new object is added in session/app scope.
In this - you can attempt to check the total memory used by app (Log it) as as call run gc (note that invoking it will not imply gc will always run)
(The above is for the logging part and gc based on usage growth)
For scheduled gc:
In addition you can keep a timer task class that runs after every few hrs and does a request for gc.
Our experience with ITCAM has been less than stellar from the monitoring perspective. We dumped it in favor of CA Wily Introscope.
Have you had a look on the jvisualvm tool in the latest Java 6 JDK's?
It is great for inspecting running code.
I'd dispute that the you need the heap dumps when the OOM occurs. Periodic gathering of the information over time should give the picture of what's going on.
As has been observed various tools exist for analysing these problems. I have had success with ITCAM for WebSphere, as an IBMer I have ready access to that. We were very quickly able to indentify the exact lines of code in out problem situation.
If there's any way you can get a tool of that nature then that's the way to go.
It should be possible to write a simple program to get the process list from the kernel and scan it to see if your WAS process is still running. On a Unix box you could probably whip up something in Perl in a few minutes (if you know Perl), not sure how difficult it would be under Windows. Run it as a scheduled task every five minutes or so, and if the process doesn't show up you could have it fork off another process that would deal with the heap dump and re-start WAS.