I can set the value of the parameter forkCount to any desired number, say 12, and I'd expect to have 12 new Java processes of type surefirebooter when running tests like these. But ps shows that I only sometimes get the 12 expected Java processes (to be precise: I get them extremely rarely). Instead I typically get less, sometimes even only three or four. Execution of my hundreds of unit tests also appears to be slow then.
The running processes also often disappear from the ps output (terminate, I assume) before the unit tests are done. In some cases all of them, then the execution hangs indefinitely.
Documentation wasn't too clear about this, but I'd expect to have the given number of processes all the time until all unit tests are done.
Maybe the surefirebooter processes run into some problem and terminate prematurely. I see no error message, though. Where should I see them? Can I switch on some debug logging? Switching on the debug mode of Surefire changed the test results, so I didn't follow that path very far.
I'm running ~1600 unit tests in ~400 classes which takes ~7 minutes in the best case. Execution time varies greatly, sometimes the whole thing terminates after more than an hour.
In some cases, on the other hand, the surefirebooter processes continue to run after execution finished (successfully) and puts massive load on the system (so it seems to be busy waiting for something).
Questions:
Can anybody explain these observed phenomena?
Can anybody give advice what to change in order to have a more proper execution? (I. e. with the desired number of surefirebooter processes at all times.)
Can anybody give advice on how to debug the situation? See messages about what happens with the surefirebooter processes? (I tried using strace but that also changed the behavior so dramatically that the call didn't terminate anymore.)
My hypothesis #1 would be that oom_killer can be the culprit. #2 would be that forked processes go into swap and/or spend crazy amount of time garbage collecting stuff
To debug:
Which platform you run this on?
If that's something of *nix kind, could you please check dmesg or /var/log/messages for the messages telling about killed processes after the run?
In cases where you have processes busy waiting, could you please try a) collect stacktrace with jstack (both for forked processes and the main one) b) quantify massive load on the system in terms of cpu / memory usage / amount of stuff paged in / paged out
If none of those proves useful, I'd try to fork surefire ForkStarter adding more logging events and comparing the logs of successful runs and failed ones for more clues. (--debug or -X argument to maven to output debug messages).
Related
I started to work with JMH lately and wondered if there is a possible way to debug it.
First thing I did was try to debug it like any other program, but it threw "No transports initialized", so I couldn't debug it in the old fashioned way.
Next thing I did is to try to search on the internet and found someone who said that you need to put the forks to 0, tried it and it worked.
Unfortunately, I couldn't really understand why the forks are impacting the debug, or how forks impact the way I see things in the JMH.
All I know so far is that when you put .forks(number) on the OptionBuilder it says on how many process the benchmark will run. But if I put .forks(3) it's running each #Benchmark method on 3 process async?
An explanation about the .forks(number), .threads(number) how they are changing the way benchmarks run and how they impact debug would really clear things.
So normal JMH run has two processes running at any given time. The first ("host") process handles the run. The second ("forked") process is the process that runs one trial of a given benchmark -- achieving isolation. Requesting N forks either via annotation or command line (which takes precedence over annotations) makes N consecutive invocations for forked process. Requesting zero forks runs the workload straight in the hosted process itself.
So, there are two things to do with debugging:
a) Normally, you can attach to the forked process, and debug it there. It helps to configure workload to run longer to have plenty of time to attach and look around. The forked process usually has ForkedMain as its entry point, visible in jps.
b) If the thing above does not work, ask -f 0, and attach to the host process itself.
This was quite a pain in the soft side to get to work (now that I wrote this, it sounds trivial)
First of all I was trying to debug DTraceAsmProfiler (perfasm, but for Mac), using gradle.
First of all, I have a gradle task called jmh that looks like this:
task jmh(type: JavaExec, dependsOn: jmhClasses) {
// the actual class to run
main = 'com.eugene.AdditionPollution'
classpath = sourceSets.jmh.compileClasspath + sourceSets.jmh.runtimeClasspath
// I build JMH from sources, thus need this
repositories {
mavenLocal()
maven {
url '/Users/eugene/.m2/repository'
}
}
}
Then I needed to create a Debug Configuration in Intellij (nothing un-usual):
And then simply run:
sudo gradle -Dorg.gradle.debug=true jmh --debug-jvm
Imagine you have command-line application that takes input file and does something with it. Now imagine you want to sample/profile this application. If it were Visual Studio you would just select profiling method (sampling/instrumentation) and VS would run application for you and collect data while program completes. But as far as I can see there is no similar functionality in VisualVM. You have to run your application, then select it in VisualVM and then explicitly start sampling/profiling. The problem is that sometimes execution of program with certain input data takes less time than it is required to setup VisualVM. Also with such an approach there is no possibility to batch profile application. Someone has suggested to start application in debug mode from Eclipse and set breakpoint somewhere in the beginning of main() method. Then setup VisualVM and continue execution. But I have suspicion that running in Debug vs Release mode has performance implications on its own.
Suggestions?
There is a new Startup Profiler plugin for VisualVM 1.3.6, which allows you to profile your application from its startup. See this article for additional information.
If the program does I/O, the Visual Studio sampler will not see the I/O because it is a "CPU Sampler" (even if nearly all of the time is spent waiting for I/O).
If you use Instrumentation, you won't see any line-level information because it only summarizes at the function level.
I use this technique.
If the program runs too quickly to sample, just put a temporary outer loop around it of, say, 100 or 1000 iterations.
The difference between Debug and Release mode will be next to nothing unless you are spending a good fraction of time in tight loops, in your code, where the loops do not contain any function calls, OR if you are doing data structure operations that do a lot of validation in the libraries.
If you are, then your samples will show that you are, and you will know that Release will make a speed difference.
As far as batch profiling is concerned, I don't. I just keep an eye on the program's overall throughput rate. If there is some input that seems to make it take too long, then I do the sampling procedure on the program with that input, see what the problem is, and fix it.
We are creating a bunch of PoSH scripts and running them in an orderly manner from Java
(Building a process, handling all input/output streams that may cause hangups, and then invoke it using local admin privileges).
We are facing an interesting occurrence in some of our deployments.
When tracking the log file of this operation we sometime see a considerable delay. The
delay is consistent and clocks at 35 seconds:
If there's a log listing just prior to the process building and the next logging is done
from the invoked PoSH script - The delay between them is consistent and of 35 seconds.
It is consistent for all scripts invoked on that machine.
This behavior is not consistent. We have several (unrelated) machines that exhibit that behavior, but also many other that show 2-5 seconds which we accept as a normal time for building the process.
Our Java process is a 32bit process, and most of the delaying machines are 64bit VMs of Win2K8 server R2. Most of these VMs are using domain authentication and policies (Different ones of different customers).
We tried running the Powershell via various different methods (Such as PSTools) but with no real findings - It always starts in a few seconds. Comparing machine installations did not bring us any insights either. Performance does not seem to be an issue either though I'll admit we haven't analyzed it too deeply, just looked closely at the task manager.
It is important to mention that the process never hangs - It will run and run swiftly when started. The delay is happening during the startup of the PoSH process.
Any ideas, advices or speculative directions will be more than welcome.
Thanks,
Yaron
I am running a Java Program in command prompt
The normal course is after successfully executing the program it comes back to prompt .. what are the possible reasons it will not come back to prompt after successfully executing the program
why is it not coming back to prompt after execution
usually it comes back but sometimes it doesn't...
This sounds like a race condition. Something in your application's shutdown sequence is non-deterministic, and it works or does not work depending on various platform specific (and possibly external) factors. There is probably no point figuring out what those factors are (or might be), since it won't help you fix the problem.
Only difference is in RAM hard disk capacity mine is slower.. Can it be possible reason?
These could be factors, but they are not the cause of the problem. So focus on figuring out what makes your application non-deterministic.
As others have said, without more information (and relevant code) we can only guess.
When the application has failed to shut down, get it to give you a thread dump. Or try shutting it down while it is attached to a debugger. These may allow you to get some clues as to what is going wrong.
Finally, the brute force solution is simply to have the main method (or whatever) call System.exit(0) on its way out. But beware of the possibility of files not being flushed, etc if you do that.
Because it's not finishing. If it's sometimes happening and sometimes not, my instinct is that you have some sort of race condition. Probably one of your cleanup steps is hanging if another action has or hasn't been taken.
Without source code this will be hard to debug.
There could be an active thread still running which is not in "daemon" mode. For example, if you have a Swing GUI and all of the frames are closed the Event Dispatch thread is still active so the JVM will not exit.
I'm considering whipping up a script to
run once a minute (or every five minutes)
run jstack against a running JVM in production
parse the jstack output and tally up things I'm interested in
export the results for graphing 24/365 by a centralized Cacti installation on another server
But I've no clue as to how expensive or invasive jstack is on a running JVM. How expensive is it to execute jstack on a running JVM? Am I setting myself up for a world of hurt?
I know this answer is late to the party but the expensive part of jstack comes from attaching to the debugger interface not generally generating the stack traces with an important exception (and the heap size does not matter at all):
Arbitrary stack traces can be generated on a safe point only or while the thread is waiting (i.e. outside java scope). If the thread is waiting/outside java scope the stack requesting thread will carry the task by doing the stack walk on its own. However, you might not wish to "interrupt" a thread to walk its own stack, esp while it is holding a lock (or doing some busy wait). Since there is no way to control the safe points - that's a risk need to be considered.
Another option compared to jstack avoiding attaching to the debugging interface: Thread.getAllStackTraces() or using the ThreadMXBean, run it in the process, save to a file and use an external tool to poll that file.
Last note: I love jstack, it's immense on production system.
Measure. One of the time variants (/usr/bin/time I believe) has a -p option which allows you to see the cpu resources used:
ravn:~ ravn$ /usr/bin/time -p echo Hello World
Hello World
real 0.32
user 0.00
sys 0.00
ravn:~ ravn$
This means that it took 0.32 seconds wall time, using 0.00 seconds of cpu time in user space and 0.00 seconds in kernel space.
Create a test scenario where you have a program running but not doing anything, and try comparing with the usage WITH and WITHOUT a jstack attaching e.g. every second. Then you have hard numbers and can experiment to see what would give a reasonable overhead.
My hunch would be that once every five minutes is neglectible.
Depending on the number of threads and the size of your heap, jstack could possibly be quite expensive. JStack is meant for troubleshooting and wasn't designed for stats gathering. It might be better to use some form on instrumentation or expose a JMX interface to get the information you want directly, rather than have to parse stack traces.