I was reading about the concepts of Thread Dumps and found various ways to take Thread Dumps but no articles have mentioned on what basis/issues/reasons the dumps should be taken. We perform a load test on a particular server targeting a JVM and take the Thread Dumps when we observe high CPU utilization or hogging threads. Is this correct ? Can someone throw some light on the reasons on when Thread Dumps should be taken normally or during any load tests.
We use the jstack command to capture the dump:
/app/jdk/jdk1.7.0_111/bin/jstack -l <ProcessID> > <PathToSaveTheFile>
TIA.
Thread dumps are used for post-mortem debugging. IMO, what you are doing is right. I don't see a reason for taking a dump in normal conditions.
Non- Less invasive debugging
You can do thread dumps whenever you see fit to analyse anything to do with thread lock contention, deadlock detection, system resource contention, ...
This is why there are tools to facilitate thread dump whenever we see fit, not only after a JVM crash. Analysis of multiple dumps over time paints fuller picture, than the last crash dump.
This means you can perform less invasive thread debugging without attaching a profiler, which in most cases slows down (and possibly alters some dynamic properties) of the application execution.
Related
I understand that we can use killall -3 java to get the threaddump.
My question is:
If multiple java processes are running, which process's thread dump is taken ?
Or are thread dumps taken for all processes ?
Thread dumps aren't "taken" in the situation you describe. Java JVMs generally respond to the signal by writing the dump to stdout. It may be captured and stored, but that doesn't change the fundamental principle.
Consequently, you can signal all JVMs on a host to produce a thread dump if you wish.
In many cases, it's more productive to use a utility like jstack to collect the thread dump, because it will allow better control of where the dump is actually written.
I have a Java Application (web-based) that at times shows very high CPU Utilization (almost 90%) for several hours. Linux TOP command shows this. On application restart, the problem goes away.
So to investigate:
I take Thread Dump to find what threads are doing. Several Threads are found in 'RUNNABLE' state, some in few other states. On taking repeated Thread Dumps, i do see some threads that are always present in 'RUNNABLE' state. So, they appear to be the culprit.
But I am unable to tell for sure, which Thread is hogging the CPU or has gone into a infinite loop (thereby causing high CPU util).
Logs don't necessarily help, as the offending code may not be logging anything.
How do I investigate - What part of the application or what-thread is causing High CPU Utilization? - Any other ideas?
If a profiler is not applicable in your setup, you may try to identify the thread following steps in this post.
Basically, there are three steps:
run top -H and get PID of the thread with highest CPU.
convert the PID to hex.
look for thread with the matching HEX PID in your thread dump.
You may be victim of a garbage collection problem.
When your application requires memory and it's getting low on what it's configured to use the garbage collector will run often which consume a lot of CPU cycles.
If it can't collect anything your memory will stay low so it will be run again and again.
When you redeploy your application the memory is cleared and the garbage collection won't happen more than required so the CPU utilization stays low until it's full again.
You should check that there is no possible memory leak in your application and that it's well configured for memory (check the -Xmx parameter, see What does Java option -Xmx stand for?)
Also, what are you using as web framework? JSF relies a lot on sessions and consumes a lot of memory, consider being stateless at most!
In the thread dump you can find the Line Number as below.
for the main thread which is currently running...
"main" #1 prio=5 os_prio=0 tid=0x0000000002120800 nid=0x13f4 runnable [0x0000000001d9f000]
java.lang.Thread.State: **RUNNABLE**
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:313)
at com.rana.samples.**HighCPUUtilization.main(HighCPUUtilization.java:17)**
During these peak CPU times, what is the user load like? You say this is a web based application, so the culprits that come to mind is memory utilization issues. If you store a lot of stuff in the session, for instance, and the session count gets high enough, the app server will start thrashing about. This is also a case where the GC might make matters worse depending on the scheme you are using. More information about the app and the server configuration would be helpful in pointing towards more debugging ideas.
Flame graphs can be helpful in identifying the execution paths that are consuming the most CPU time.
In short, the following are the steps to generate flame graphs
yum -y install perf
wget https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz
tar -xvf async-profiler-1.8.3-linux-x64.tar.gz
chmod -R 777 async-profiler-1.8.3-linux-x64
cd async-profiler-1.8.3-linux-x64
echo 1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
JAVA_PID=`pgrep java`
./profiler.sh -d 30 $JAVA_PID -f flame-graph.svg
flame-graph.svg can be opened using browsers as well, and in short, the width of the element in stack trace specifies the number of thread dumps that contain the execution flow relatively.
There are few other approaches to generating them
By introducing -XX:+PreserveFramePointer as the JVM options as described here
Using async-profiler with -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints as described here
But using async-profiler without providing any options though not very accurate, can be leveraged with no changes to the running Java process with low CPU overhead to the process.
Their wiki provides details on how to leverage it. And more about flame graphs can be found here
Your first approach should be to find all references to Thread.sleep and check that:
Sleeping is the right thing to do - you should use some sort of wait mechanism if possible - perhaps careful use of a BlockingQueue would help.
If sleeping is the right thing to do, are you sleeping for the right amount of time - this is often a very difficult question to answer.
The most common mistake in multi-threaded design is to believe that all you need to do when waiting for something to happen is to check for it and sleep for a while in a tight loop. This is rarely an effective solution - you should always try to wait for the occurrence.
The second most common issue is to loop without sleeping. This is even worse and is a little less easy to track down.
You did not assign the "linux" to the question but you mentioned "Linux top". And thus this might be helpful:
Use the small Linux tool threadcpu to identify the most cpu using threads. It calls jstack to get the thread name. And with "sort -n" in pipe you get the list of threads ordered by cpu usage.
More details can be found here:
http://www.tuxad.com/blog/archives/2018/10/01/threadcpu_-_show_cpu_usage_of_threads/index.html
And if you still need more details then create a thread dump or run strace on the thread.
I am analyzing the differences between approaches for taking thread dumps. Below are the couple of them I am researching on
Defining a jmx bean which triggers jstack through Runtime.exec() on clicking a declared bean operation.
Daemon thread executing "ManagementFactory.getThreadMXBean().dumpAllThreads(true, true)" repeatedly after a predefined interval.
Comparing the thread dump outputs between the two, I see the below disadvantages with approach 2
Thread dumps logged with approach 2 cannot be parsed by open source thread dump analyzers like TDA
The ouput does not include the native thread id which could be useful in analyzing high cpu issues (right?)
Any more?
I would appreciate to get suggestions/inputs on
Are there any disadvantages of executing jstack through Runtime.exec() in production code? any compatibility issues on various operating systems - windows, linux?
Any other approach to take thread dumps?
Thank you.
Edit -
A combined approach of 1 and 2 seems to be the way to go. We can have a dedicated thread running in background and printing the thread dumps in the log file in a format understood by the thread dump analyzers.
If any extra information is need (like say probably the native thread id) which is logged only by the jstack output, we do it manually as required.
You can use
jstack {pid} > stack-trace.log
running as the user on the box where the process is running.
If you run this multiple times you can use a diff to see which threads are active more easily.
For analysing the stack traces I use the following sampled periodically in a dedicated thread.
Map<Thread, StackTraceElement[]> allStackTraces = Thread.getAllStackTraces();
Using this information you can obtain the thread's id, run state and compare the stack traces.
With Java 8 in picture, jcmd is the preferred approach.
jcmd <PID> Thread.print
Following is the snippet from Oracle documentation :
The release of JDK 8 introduced Java Mission Control, Java Flight Recorder, and jcmd utility for diagnosing problems with JVM and Java applications. It is suggested to use the latest utility, jcmd instead of the previous jstack utility for enhanced diagnostics and reduced performance overhead.
However, shipping this with the application may be licensing implications which I am not sure.
If its a *nix I'd try kill -3 <PID>, but then you need to know the process id and maybe you don't have access to console?
I'd suggest you do all the heap analysis on a staging environment if there is such an env, then reflect your required Application Server tuning on production if any. If you need the dumps for analysis of your application's memory utilization, then perhaps you should consider profiling it for a better analysis.
Heap dumps are usually generated as a result of OutOfMemoryExceptions resulting from memory leaks and bad memory management.
Check your Application Server's documentation, most modern servers have means for producing dumps at runtime aside from the normal cause I mentioned earlier, the resulting dump might be vendor specific though.
I'm using a thread pool that should be able to execute hundreds of concurrent tasks. However the tasks usually do very little computation and spend most of their time waiting on some server response. So if the thread pool size contains hundreds of threads just a few of them will be active while most of them will be waiting.
I know in general this is not a good practice for thread pools usage but the current design does not permit making my tasks asynchronous so that they can return the control without waiting for the server's response. So given this limitation I guess my biggest problem is increased memory consumption for the threads stack space.
So is there any way to use some kind of light-weight threads that does not consume much memory?
I now there's a JVM option -Xss to control the stacks memory but it's seems there's no way to control this per thread pool or thread only as opposed to changing it for all the threads inside the VM, right?
Also do you have any suggestions for a better solution to my problem?
I know in general this is not a good practice for thread pools usage
I disagree. I think this is a perfect practice. Are you seeing problems with this approach, because otherwise, switching from standard threads, smacks of premature optimization to me.
So is there any way to use some kind of light-weight threads that does not consume much memory?
I think you are already there. Threads are relatively lightweight already and I see no reason to worry about hundreds of them unless you are working in a very constrained JVM.
Also do you have any suggestions for a better solution to my problem?
Any solution that I see would be a lot more complicated and would again be the definition of premature optimization. For example, you could use NIO and do your own scheduling of the thread when the server response was available but this is the sort of thing that you get for free with threads.
So is there any way to use some kind of light-weight threads that does not consume much memory?
Using plain threads in a thread pool is likely to be light weight enough.
I now there's a JVM option -Xss to control the stacks memory but it's seems there's no way to control this per thread pool or thread only as opposed to changing it for all the threads inside the VM, right?
This is the maximum size per thread. Its the size at which you want to get a StackOverFlowError rather than keep running. IMHO, There is little benefit in tuning this on a per thread basis.
The thread stack uses main memory for the portion which is actually used and virtual memory for the rest. Virtual memory is cheap if you have a 64-bit JVM. If this is a concern I would switch to 64-bit.
Also do you have any suggestions for a better solution to my problem?
If you have thousands of threads you might consider using non-blocking IO. It doesn't sound like you need to worry. In tests I have done, having 10,000 active threads consumes one CPU (if the threads are otherwise not doing anything) For every hundred threads, you could be wasting 1% of one CPU. This is unlikely to be a problem if you have spare CPU.
I sometimes have to look at thread dumps from a Tomcat server. However, this is a very slow process as my application uses thread pools with a couple of hundred threads. Most of the thread dumps I look at include the same stack trace for many of the threads as they are idle waiting for work.
Are there any tools which would parse a thread dump and only show me the unique stack traces along with a count of the number of threads in each state? This would allow me to quickly ignore the tens or hundreds of threads which are waiting in a common location for work.
I have tried the Thread Dump Analyzer but this doesn't do any summarisation of common stack traces.
I have written a tool to do what I wanted.
Java Thread Dump Analysis Tool
Haven't used it for a while but Samurai might be of interest.
Here's an online tool that does exactly what you ask for, no installation necessary:
http://spotify.github.io/threaddump-analyzer/