Troubleshooting Java process with very high CPU usage - Tomcat application - java

I have a java application that runs on Tomcat (which runs as a service on Windows), the java process for which continues to eat up CPU before eventually requiring me to restart the Tomcat service.
First my setup:
Windows 2003 server
Tomcat 6, running as service using Wrapper
JDK: 1.6.0_20
I was seeing catch issues here and there leading up to yesterday. I had to restart midday yesterday, then at 2:30 this morning, then today I could barely restart the application and open jconsole to monitor it before it was hitting 99% CPU usage again. Through a combination of things I'm not quite sure of, it seems like I got the JVM to cycle itself and the app was hovering in the 10-30% CPU usage range for a couple hours. However, then it started to creep up again, finally going into its 99% CPU usage breakdown. I was also having trouble with high memory usage, but that has stayed fairly normal and steady since I so-called got the JVM to "cycle" (bad terminology perhaps, but this is really what it seemed to do - and in the wrapper log there was a dump of all the classes it was reloading after).
Then I was digging around some more and found a JRE 6 Update 24 installed on the server (I didn't install it as I do thorough testing with each java update - but maybe my server admin did the update). I attempted, but can't uninstall this. Thus, I get different versions when I do a java -version versus javac -version
java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)
javac -version
javac 1.6.0_20
Could this difference be causing a JVM conflict of sorts? JAVA_HOME and my PATH variables both point to the correct JDK installation.
Hoping for more stability, I decided to change my app to run on the previous JDK that was still installed - JDK 1.6.0_04. I changed the wrapper.conf, set env variables, cleaned and rebuilt, and started. This does seem more stable and has been up for about 4 hours. The CPU usage has climbed to the 90s, then it seems to clear itself out again.
I've done heapdumps then ran them through the Memory Analyzer in Eclipse (nothing new found there), I've used jconsole with jtop to look at threads - nothing jumps out, thus why I continue to be curious if it's a java/jvm issue. So, I know this is a long post - but I don't really know where to go from here. Any ideas?
(I've done exhaustive web searching on this and some articles have pointed to possibly a Quartz issue or Hibernate queries not flushing. Nothing has changed in the app since I started seeing the CPU issues, so I'm not sure where to start troubleshooting if it could indeed be linked to either.)

This isn't an easy problem. You are doing all of the basics to see if it something jumps out. It sounds like there is either a slow leak that builds up over time to the point where it can't operate. That sounds like GC is thrashing and app comes unresponsive. It could also be runaway background job(s) eating on the CPU and just doesn't complete, that might explain the long delay. You could try turning off any quartz to see if it stays up longer that might help lead you in a direction, or crank it up so it shows up sooner.
I know you've done some jconsole watching, but I think you need to revisit and watch your memory usage, the threads run time, how much time you're spending in GC, and watching what portions of memory are being eaten up (is it Eden, Tenure that's running out?).
I'd make sure you are writing out start and end messages for your background jobs running in Quartz. Then you can correlate when they start and finish with when this problem starts. Also will tell you if your jobs are finishing or not.
It's probably time to drop it into a profiler (instead of jconsole) so you can see where in the code it's spending time or what's blowing up memory. A real profiler will let you see all that data mashed up on your code and classes. My favorites is JProfiler, but YourKit is also good. You can get a 7-30 day trial so you'll have plenty of time to profile and figure your issue out without having to buy it.
Start this early in the morning so you'll hopefully see something by early night.

Related

How to enable java HotSpot VM compiler

I am using java 1.8.0_05, Java HotSpot(TM) 64-Bit Server VM
I am running a java web app on tomcat 8.0.43
I recently deployed my .war file by dropping it in the webapps folder.
This resulted in the following message being logged:
Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler
has been disabled. Java HotSpot(TM) 64-Bit Server VM warning: Try
increasing the code cache size using -XX:ReservedCodeCacheSize=
CodeCache: size=245760Kb used=244058Kb max_used=244079Kb free=1701Kb
bounds [...]
total_blobs=48344 nmethods=47669 adapters=584 compilation: disabled
(not enough contiguous free space left)
How can I check what the current status of the compiler is now, to see if it's still disabled?
How can I enable the compiler? Can I simply restart tomcat?
There doesn't seem to be any noticeable different in how my application is running (eg: in terms of speed).
Interestingly, I didn't get this message when deploying the same application to an identical server. This is why I would like to first just turn the compiler back on rather than changing settings (eg: ReservedCodeCacheSize) as the message recommends.
Then, if the problem persists I can see which settings I need to change.
Addressing your individual questions + 1 recommendation:
How to check if the JIT compiler is still disabled?
The easiest thing to do is to start up a jvisualvm (already shipped with JDK), then check the used codecache space. If your CodeCache is full, the JIT compiler will remain disabled. to check the Code Cache memory space:
install the MBeans JVisualVM plugin.
go to Mbeans
open java.lang/MemoryPool/Code Cache
check variable "Usage" (double-click)
This will give you an overview of where you are.
How can I enable the compiler? Can I simply restart tomcat?
Yes, a restart will certainly reset the state of the cache. The only other way to restart your compiler would be if you have already started the JVM with the right parameters. (enabling UseCodeCacheFlushing)
No difference in how my application is running?
JIT optimizes your code, but depending on your application and the way you use it, you might not see any noticable difference. Assuming you run a webapp (because of Tomcat), the network transmission speed or your browser rendering pages are likely orders of magnitude slower than what JIT gains you in terms of core Java speed.
"I didn't get this message when deploying the same application"
JIT compiling is dependent on the code that is being executed at that moment. The same application might run quite differently under the hood on the level where JIT works. When it comes to low-level functions, the more 3rd party libraries you use, the less you can be sure about what is happening on all those threads you have no control over of.
the suggestion:
Please upgrade that Java version. It is very rare (u_05) to be on such an early JDK8 version, and quite dangerous. Java8 was not the most stable release when it came out, and had easily reproducible bugs even at later releases. There have been over 1000 bugs fixed in JDK8. Many of these were directly addressing JIT issues. If you have any control over the environment you are talking to, upgrade it. If you do not, notify the responsible person.
I had this issue a while ago and this is what I cant tell you:
Once the Code Cache becomes full the compiler is automatically disabled.
Will it be automatically restarted?
No. And it will stay down until the JVM is restarted.
Can I simply restart tomcat?
Yes. But it will probably happen again.
There doesn't seem to be any noticeable different in how my application is running (eg: in terms of speed).
In the long run there will be some issues since code that could be cached and optimized can no longer be compiled and stored there.
What can you do?
You could increase a bit -XX:ReservedCodeCacheSize
You could enable -XX:+UseCodeCacheFlushing. The drawback is that if your CodeCache size is way too low, and you constantly hit the flushing threshold, the performance will be affected since you are spending CPU resources in the flushing process.
I would increase a bit the CodeCacheSize, enable the flush, and monitor the App with VisualVM or something that lets you look at the current state of the CodeCache. Monitoring will help you understand if you are reaching the thresholds once in a while or if it happens a lot.
Remember that CodeCache is separated from the Heap, so looking at HeapSize won't help you.
Edit:
Regarding VisualVM, here are the official steps to connect to a remote JVM:
https://docs.oracle.com/javase/8/docs/technotes/guides/visualvm/applications_remote.html
Just make sure JMX is enabled and it should work right away.
Regarding the issue with many apps running at the same time... Well yeah, technically Standard Tomcat starts one JVM for all the apps. Cache Space will be shared.
You could also monitor this case by Attaching VisualVM to the JVM, undeploying an app and checking if the space has been freed.
You could also consider using an Enterprise container which will let you create one JVM per App.

How can I debug a non-responsive server, when the profiler can't collect samples?

I have been having occasional problems with a server I wrote. It's in Clojure, but I don't think that matters, and we can pretend it's in Java. Anyway, it works fine for hours at a time, but goes into fits where it behaves very badly: all activity stops, for around fifteen seconds, and then it works normally for a few seconds, then stops for fifteen seconds...and so on for (usually) about ten minutes or so, after which it goes back to behaving normally.
I've done a lot of profiling of it with YourKit, and I've ruled out a number of plausible suspects:
It's not a garbage collection issue: I'm running it with -XX:+UseConcMarkSweepGC, and I've verified that the server continues to run just fine during both minor and major collections, due to the concurrent nature of this garbage collector. And we're not thrashing as we run out of total memory or something: the current heap size is well below its max.
I don't think it's a locking/synchronization issue, but I'm not 100% sure on that. The YourKit profiler shows threads waiting sometimes, eg competing over the lock for System.out to produce log messages, but the only long waits are for worker threads in threadpools when there's nothing to do. And of course YourKit says it's never detected any deadlocks.
It's not something caused by having the profiler attached, because it still happens even if I boot the server up and then leave it alone without ever attaching the profiler.
It's not some other process on the system taking up all the CPU time: top shows CPU usage at 100% for my java process, and basically 0% for everything else.
My biggest problem is that I can't see what the server is doing during these strange funks, because the profiler stops receiving samples. Here's a graph of the CPU usage chart:
The left side of the graph is normal operation, during which we get profiler samples every second or so. The right side is "broken", and is very spiky because the profiler is only getting samples every ten seconds or so. In the samples it does get, the server seems to be doing its usual business: responding to requests and so on; and the logs confirm that it is doing normal stuff, but only at the times the profiler has samples for: during the upward-sloping "straight lines" on the graph, for which the profiler has no samples, the server is doing nothing at all.
So, does this graph look familiar to anyone? Have you had this problem before and fixed it? Or can you point me in the direction of a tool that can figure out what my server is doing during the time when YourKit can't? In case it matters, the server machine is running Ubuntu 10.04, and
$ java -version
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.10) (rhel-1.28.1.10.10.el5_8-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
Okay, from the comments it seems clear to me we are not going to be able to figure this out with the information you've given so far. The best we can do is to give suggestions on how to debug it...
I would try to use jstack during one of the spikes and see if you can use that to figure out where it hangs.
If you have no chance to measure or debug in code try to look form the outside.
I would at first to try to reproduce the problem. In other words is there a external event that produce the behavior. Try to change the load on server. Switch every thing you can to reproduce the problem.
Maybe it's also a good idea to sniff the network traffic (tcpdump) to find something interesting around the time when you server hangs.
You can also run it on another operating system to check if it depends from your installation environment.
If you can't reproduce a situation where the problem occurs, try to find situations where you don't get the problem. For instance remove the server from net. Shutdown all other services.
If you can't find with that any change of behavior of your program try to reduce the complexity of your working code and see if you can find a internal module that seems to be related with the problem.
Have you had this problem before and fixed it? Or can you point me in
the direction of a tool that can figure out what my server is doing
during the time when YourKit can't?
If you have shell access on the server and can see stdout, try taking a thread dump when the server becomes unresponsive. Not sure if this will give you anything different than what jstack (mentioned in the other answer) would give you or not.
On Ubuntu: kill -QUIT <java-pid> (will not actually kill the Java process).
http://www.crazysquirrel.com/computing/java/basics/java-thread-dump.jspx

Tomcat dies suddenly

Trying to diagnose some bizarre Tomcat (7.0.21) and/or JVM errors on a 64-bit linux (CentOS) machine.
I'm load testing our server application and tried hitting it with 100K messages. Launched jvisualvm and kept my eye on the heap the whole time. Everything was looking great* (see below) until I got to about 93K processed messages and then Tomcat just died. Ran a ps on Tomcat's PID number to confirm it was dead.
Up until this crash:
Load test had been running for about 90 minutes; should have finished shortly thereafter since we were at 93K/100K)
CPU was holding strong around 45%
Used heap was around 2GB (plus or minus a bunch after GCs) but heap size grew from 4GB to MAX_HEAP after about 30 minutes
Class loading/unloading was cycling normally
Thread dumps were normal
Nowhere in the server code are any calls to System.exit() - so we can rule that right out (and yes I've double-checked!!!).
I'm not sure if this is Tomcat crashing or the JVM (how do I tell?). And even if I did know, I can't seem to find any indication of what went wrong:
All of the server app's logs just stop without any ERROR messages (even though we have logging universally set to DEBUG and higher)
Tomcat's catalina.out and respect localhost_access_* files just stop without any info
I've heard it is possible to have Tomcat log a coredump when it does but not sure how to do that and online examples aren't helping much.
How would SO go about diagnosing this? What steps should I take to start ruling out all of the possible factors?
Thanks in advance!
If the JVM crashes, you should have a hs_err_pidNNN.log file; you don't have to do anything to enable this. Its location depends on your OS and how you are running Tomcat. On Windows, they can show up on your desktop, unless you are running as a service. Otherwise, they should be in the current working directory of the crashed process.
Your operating system probably provides additional tools for process monitoring; you could describe your environment more, or perhaps ask at serverfault.com.
It's also possible that jvisualvm is actually causing the crash.
I'd try reproducing the problem, and progressively simplify the scenario to help isolate the cause.
Another possibility is that the OS is running out of memory and the OOM Killer is killing your process. In this case, the JVM wouldn't get an opportunity to write a heap dump, or an hs_err_pid file.
You can use the option java -XX:+HeapDumpOnOutOfMemoryError to create a heap dump for jvm crash due to out of memory error.
More details here Using HeapDumpOnOutOfMemoryError parameter for heap dump for JBoss.
Sorry I had to remove the green check from #erickson. I finally figured out what was killing Tomcat.
It looks like a profiler plugin is not configured correctly with VisualVM and attempting to run a profile on the Tomcat process killed it.
Investigating why right now, and will update this answer once I know more.

Netbeans Java Debugger crashing with Out of Memory Errors

Recently, while working on a JSF web app, using Netbeans 6.8, I am constantly getting PermGen: Out Of Memory Errors. I have also noticed that this is not related to hot swapping the code, as some people suggested on the forums; I generally restart my local web server, Tomcat 6.0, whenever I redeploy the code. This used to happen to me once in awhile, but as of late, it was been occurring constantly. I usually can't go more than two minutes before it crashes.
The important observation I've made about this problem, is that it only seems to happen when running the debugger. If I launch the server, regularly, it will run indefinitely. As soon as I run in debug mode, this problem occurs.
I've tried all the tips I've found so far of increasing the JAVA_OPT memory settings for Java in Tomcat; I've tried increasing the available memory for Netbeans in netbeans.conf. Still no luck. If you want to see the specific configuration changes I've made, I can post that as well.
I've also read that this can be a result of memory leaks in Java. I've tried running Netbean's profiler, but it would generally crash as well before I could do anything really useful. Additionally, when it did run, all the object allocations with ridiculous generations were things in java libraries, or primitives -- char[]s were the biggest memory hog of the app, for example, with the largest generations.
I would really like to know if anyone has had a similar problem before, and if so, how they solved it. This is starting to seriously impede my ability to do my work.
Thanks for any help.
add this entry in catlina.sh(or bat), it worked for me
JAVA_OPTS="-Djava.awt.headless=true -Dfile.encoding=UTF-8
-server -Xms1536m -Xmx1536m
-XX:NewSize=256m -XX:MaxNewSize=512m -XX:PermSize=512m
-XX:MaxPermSize=512m -XX:+DisableExplicitGC
Something I have found useful to track down memory leaks without running a profiler or a debugger is to use the "jmap -histo " command (comes with the jdk). Save the output of this program to a file. Run this every few minutes while your application is running. Collect up the outputs and look for objects that are always increasing in number and size. I even wrote a quick app to graph selected objects over time to really highlight run away objects just to make it easier to see where leaks might be occurring.

How to gather profiling information for a Java 1.4 application?

A Java application I support that runs on JRE 1.4.2_12 is hanging near midnight every night. I'd like to try and record as much profiling information as I can to discover if there is an issue in the JVM or external to the app.
I'd like to use HPROF to collect as much information as possible.
Is there a way to have HPROF dump its cpu sample and memory allocation report every minute instead of at the termination of the JVM?
Is there a different, more appropriate profiler that can collect information like this?
Rather than relying on dump files, I would try hooking up a profiler to the VM and leave it attached until the hang up occurs. Then use the profiler to introspect the state of the threads.
The use of Java 1.4 is a minor issue here, since 1.4's debug interface is not great, but some profilers still support it. I can particularly recommend YourKit, which is commercial, but offers an evaluation licence. It's the best profiler I've used, but some margin.
First things first: did you analyze the thread dump when your application hangs? A lot of the time that has enough information to troubleshoot a hanging java app...
Ctrl-Break in the process window on Windows, or kill -QUIT [pid] on Linux.
I would first try to determine if its actually your app or something else.
Are there any other apps on the box, if so do they run any batch around midnight. It could be a situation of your app suffering from a lack of resources due to other things running on the box or chewing up bandwidth.
Was this always the case or did it start recently. If this is new look at what changed on the box as a whole not just your own app.

Categories

Resources