I'm having trouble with a Jetty 9 server application that seems to go into some kind of resting state after a longer period of idleness. Normally the memory usage of the Java process is ~500 MB, but after being idle for some time it seems to drop down to less than 50MB. The first request that comes takes up to several seconds to respond whereas requests are normally on the scale of tens of milliseconds. But after one or two requests it seems like the application is back to it's normal responsive state.
I'm running on the 32-bit Oracle Java 8 JVM. My JVM configuration is very basic:
java -server -jar start.jar
I was hoping that this issue might be solvable through JVM configuration. Does anyone know if there's any particular parameter to disable this type of behavior?
edit: Based on the comment from Ivan, I was able to identify the source of the issue. Turns out Windows was swapping parts of the Java process out to disk. See my own answer below for a description of my solution.
Based on the comment from Ivan, I was able to identify the source of the issue. Turns out Windows was swapping parts of the Java process out to disk. This was clearly visible when comparing the private working set to the commit size in the task manager.
My solution to this was two-fold. First, I made a simple scheduled job inside my server app that runs every minute and does a simple test run to make sure that the important services never go inactive for long periods. I'm hoping this should ensure that Windows doesn't regard the related pages as inactive.
Afterwards, I also noticed that the process was executing with "Below normal" priority. So I changed the script that starts the server to ensure that it's running with "High" priority going forward. This seems likely to affect swapping behavior and may very well also have been enough to resolve the issue on it's own, but I only found it after already deploying my first solution so that remains unclear. In any case, everything seems to be working as it should now.
Related
I have a number of Java processes using OpenJDK 11 running on Windows Server 2019. The server has two physical processors and 36 total cores; it is an HP machine. When I start my processes, I see work allocation in Task Manager across all the cores. This is good. However after the processes run for some period of time, not a consistent amount of time, the machine begins to only utilize only half the cores.
I am working off a few theories:
The JDK has some problem that is preventing it from consistently accessing all the cores.
Something with Windows Server 2019 is causing a problem, limiting Java from accessing all the cores.
There is a thermal management problem and one processor is getting too hot and the OS is directing all the processing to the other processor.
There is some issue with hyper-threading and the 'logical' processors that is causing the process to not be able to utilize all the cores.
I've tried searching for JDK issues and haven't found anything like this mentioned. I went down to the server and while it's running a little warm, it didn't appear excessively hot. I have not yet tried disabling hyper-threading. I have tried a number of parameters to force the JVM to use all the cores and indeed the process initially does use all the cores; I can see the activity in Task Manager.
Anyone have any thoughts? This is a really baffling problem and I'd appreciate any ideas.
UPDATE: I am able to make it use the other processor by using the Task Manager to assign one of the java.exe processes to the other processor. This is also working from the java invocation on the command line as well with an argument for which socket to use.
Now that said, this feels like a hack. I don't see why I should have to manually assign a socket to each of my java processes; that job should be left to the OS. I'm still not sure exactly where the problem is, if it's the OS or what.
I have been having occasional problems with a server I wrote. It's in Clojure, but I don't think that matters, and we can pretend it's in Java. Anyway, it works fine for hours at a time, but goes into fits where it behaves very badly: all activity stops, for around fifteen seconds, and then it works normally for a few seconds, then stops for fifteen seconds...and so on for (usually) about ten minutes or so, after which it goes back to behaving normally.
I've done a lot of profiling of it with YourKit, and I've ruled out a number of plausible suspects:
It's not a garbage collection issue: I'm running it with -XX:+UseConcMarkSweepGC, and I've verified that the server continues to run just fine during both minor and major collections, due to the concurrent nature of this garbage collector. And we're not thrashing as we run out of total memory or something: the current heap size is well below its max.
I don't think it's a locking/synchronization issue, but I'm not 100% sure on that. The YourKit profiler shows threads waiting sometimes, eg competing over the lock for System.out to produce log messages, but the only long waits are for worker threads in threadpools when there's nothing to do. And of course YourKit says it's never detected any deadlocks.
It's not something caused by having the profiler attached, because it still happens even if I boot the server up and then leave it alone without ever attaching the profiler.
It's not some other process on the system taking up all the CPU time: top shows CPU usage at 100% for my java process, and basically 0% for everything else.
My biggest problem is that I can't see what the server is doing during these strange funks, because the profiler stops receiving samples. Here's a graph of the CPU usage chart:
The left side of the graph is normal operation, during which we get profiler samples every second or so. The right side is "broken", and is very spiky because the profiler is only getting samples every ten seconds or so. In the samples it does get, the server seems to be doing its usual business: responding to requests and so on; and the logs confirm that it is doing normal stuff, but only at the times the profiler has samples for: during the upward-sloping "straight lines" on the graph, for which the profiler has no samples, the server is doing nothing at all.
So, does this graph look familiar to anyone? Have you had this problem before and fixed it? Or can you point me in the direction of a tool that can figure out what my server is doing during the time when YourKit can't? In case it matters, the server machine is running Ubuntu 10.04, and
$ java -version
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.10) (rhel-1.28.1.10.10.el5_8-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
Okay, from the comments it seems clear to me we are not going to be able to figure this out with the information you've given so far. The best we can do is to give suggestions on how to debug it...
I would try to use jstack during one of the spikes and see if you can use that to figure out where it hangs.
If you have no chance to measure or debug in code try to look form the outside.
I would at first to try to reproduce the problem. In other words is there a external event that produce the behavior. Try to change the load on server. Switch every thing you can to reproduce the problem.
Maybe it's also a good idea to sniff the network traffic (tcpdump) to find something interesting around the time when you server hangs.
You can also run it on another operating system to check if it depends from your installation environment.
If you can't reproduce a situation where the problem occurs, try to find situations where you don't get the problem. For instance remove the server from net. Shutdown all other services.
If you can't find with that any change of behavior of your program try to reduce the complexity of your working code and see if you can find a internal module that seems to be related with the problem.
Have you had this problem before and fixed it? Or can you point me in
the direction of a tool that can figure out what my server is doing
during the time when YourKit can't?
If you have shell access on the server and can see stdout, try taking a thread dump when the server becomes unresponsive. Not sure if this will give you anything different than what jstack (mentioned in the other answer) would give you or not.
On Ubuntu: kill -QUIT <java-pid> (will not actually kill the Java process).
http://www.crazysquirrel.com/computing/java/basics/java-thread-dump.jspx
I'm seeing strange behavior (file missing, file outdated) in a java program of mine that has to save some information at shutdown (using shutdownhooks), that in turn use the TERM signal.
The obvious workaround is to save as soon as that info is modified, but for performance reasons i'd like to avoid this.
Thing is it seems to me that the tolerance value is set ridiculously short and init (i think that's the name of the watchdog proces) is actually killing the JVM before it can terminate. I don't think that's it's a bug with my app, because i used a testcase where it waited at least 20 seconds but was still terminated almost instantly.
You can see this behavior in shutdown and logout, and also in netbeans and it's opened tabs (it won't save them, at least recent 7.1 on java 7).
Is this something i can't avoid and need to work around?
The documentation for telinit(8) says that the init process waits 5 seconds between sending the SIGTERM and SIGKILL signals. This delay can be changed through the -t option.
The same -t option is supported by shutdown(8) and relayed to telinit. Therefore, if you want to increase the delay globally on your system, you'll have to edit either your /etc/inittab configuration file or the helper files in /etc/init.d, depending on your distribution.
I am running into trouble to determine what is wrong with my software.
The situation is;
-The program is always running on background and every X minutes performs some actions.
-Right now it is set to check every 1 minute a certain directory and see if there are new files in it.
-If there are new files, they are processed and moved somewhere else.
-If not, it simply logs the event and goes idle again.
I Assume that when new files appear, CPU usage can be somewhat high.
The problem comes when, even if I dont put new files in the directory for many days, the CPU usage will raise to ~90% every minute it checks for new entrys, then after some seconds, return back to <1% usage.
The same process under windows seems somehow stable, staying always on low cpu usage.
If I monitor the CPU activty monthly, I can see that the average CPU usage for my java process keeps growing up (without putting new files to 'activate' the rest of the process), and I have to restart the process for it to return to lower CPU usage levels.
I really dont happen to understand this behaviour, so I dont really know what may be affecting this.
If the log file is somewhat 'big', like 10-20mb would it require that much cpu to log a new entry every minute?
If there are many libraries loaded in the classpath for this process, will the cpu usage be increased even though many of this libraries wont be used most all the time?
Excuse me if I haven't been very clear on my question, I am somewhat new to this.
Thanks every one in advance, regards.
--edit--
I note your advices, I will do some monitoring and I will post some code / results to share with you and see what can you come up with!
I am really lost right now!
I your custom monitoring code is causing a problem, you could always use something standard like Apache Commons IO's FileAlterationMonitor. It's simple to implement and it might be faster than fixing your current code.
Are you talking about a simple console application or a swing/awt app ?
Is the application run every minute via OS underlying at schedule or it's a simple server process ?
If the process is run as a server how do you launch the VM ? (server VM or client VM - -server switch on cmd line)
You may check also your garbage collector, sometimes logging framework use up too many object without releasing their references.
Regards
M.
[UPDATE: I forgot to add that this 30 sec. freezing problem only happens the first time I try to load a file from the server. Subsequent loads are very quick. Maybe some strange reverse DNS lookup? I am hosting on Google's appengine.]
I started a little project recently called http://www.chartle.net which is build around an applet.
Startup time is an important factor in the user's experience of an applet. I collect statistics and am shocked that I find often very long startup times (factor 50 to 100 higher then necessary)
The applet starts in 1-3 seconds depending on the speed of your computer and connection. Still for some users it takes up to 100 sec.
I have mixed results from my own tests. Mostly it is very fast but sometimes freezes the browser for a long time and the Java console doesn't tell me why. Best guess is, that it stalls when loading a saved chart.
Please help me figuring this out - best test by opening an already saved chart (click on one of the 'create' links at http://www.chartle.net/gallery)
Cheers,
Dieter
This is generic help rather than specific for your demo (which loaded fine for me in a few attempts).
Freezing applets
In the JDK bin directory there is a very handy programme called jstack. Refresh your browser window until it crashes and then run:
jstack *process_id*
This will give you the stack trace of any frozen Java process. If Java is not a separate process then you can use the browser's process (eg for Opera).
The following few problems were/are common for me:
I reccommend you use invokeLater rather than invokeAndWait on the init method (although you can't do this if you use start/stop methods)
Opera's custom java plugin acts very poorly...
Deadlocks caused by synch blocks and invokeAndWait's
Slow applets
Possibly the browser is fetching resources from the server, unable to use the jar file?
It may be that only the old plugin causes these problems. That means basically all people running on OSX and other users with Java prior to 1.6_update_10.
So, I would really appreciate people with such setups to watch their Java console and describe the first startup behaviour.
Cheers,
Dieter