Why does my Java app run faster with profiler attached?

Why does my Java app run faster with profiler attached? - java

I am developing a Java 8 SE application in Netbeans. A new feature I added recently to the app was running too slowly (about a minute, until the calculations stopped). So I fired up the profiler to see what is the major bottleneck. To my surprise, the calculations completed in about 7 seconds.
Couldn't believe it at first, but the results were correct.
Tried it a few times again, but the app always ran 10 times faster with the profiler attached to it. I also tried to run the compiled .jar file directly from the Windows command line, but the computations took about a minute again and again.
How is it possible, that the attached profiler provides such a massive boost to the performance? What changes does it do to the JVM or application?
BTW, I am using native OpenCV in these calculations with provided Java wrapper, if it makes any difference.
//Edit - Additional info: I am using the built-in Netbeans 8.1 profiler, which I believe is basically VisualVM. As for a profiling method I chose to monitor "Methods" and their execution times and invocation counts. The performance bump happens both with instrumented and sampled profiling.

Unfortunately there probably isn't one single answer that will explain why this is the case. Of course, it will depend on what the program is doing as well as how the program is being launched. For example, if you're using the profiler to launch the application (as opposed to connecting afterwards) then it may be that the profiler is launching with different configuration (heap size, garbage collector etc.) and that is the cause of the difference.
If you run jcmd you should see a list of processes. You can then run jcmd <id> VM.flags to see what the JVM has been configured with, and verify that the same are for the application when under a profiler and when it isn't.
Another possibility is that your program is excessively locking, and this excessive locking is causing thrashing in your application when the profiler isn't attached. With it attached the locking may be slower, resulting in the application threads co-operating and ultimately making faster progress.
However these are just suggestions of how you can investigate further; it's quite likely that there is another as yet undiscovered problem that you're seeing which is completely different (e.g. it's defaulting to a different level of logging ...)

Related

Java Mission Control says "few profiling samples", why, and what are my other options?

I'm profiling a Java application using Java Mission Control, and it's saying on the main page of the flight recording that "This recording contains few profiling samples even though CPU load is high.
The profiling data is thus likely not relevant."
It seems to be telling the truth. I asked it to sample every 10 ms for 3 minutes which should be 18000 samples, but I only see 996 samples.
It goes on to explain "The profiling data is thus likely not relevant. This might be because the application is running a lot JNI code or that the JVM is spending a lot of time in GC, class loading, JIT compilation etc."
Hmm, I don't have any native methods, and it shouldn't be loading classes or doing any JIT at the stage I recorded (well into the repetitive number crunching part of the code.) It doesn't look like it's spending an inordinate amount of time garbage collecting either.
We used to use hprof to profile this product, with much success. Hprof helped immensely in figuring out where we were relying on the main thread execution, so we could parallelize the hotspots into multiple threads. But that tool got discontinued in Java 9 so we're moving onward to Java Mission Control. It has a lot going for it, but if it can't identify what line numbers the VM threads are sitting on at random sample times, it's not very useful. Is there some other tool to use? Or, is there a way to debug this further from within Java Mission Control? It also looks like JVisualVM is no longer included in Java 9.

If you have many more running threads than cores, the sampling thread could be starved and not able to wake up at the interval you specified.

The answer is probably as simple as you having more threads than cores, and thus most of them not being scheduled on CPU at the time of sampling. The JFR method sampler will only keep samples of threads actually on CPU. The idea is to provide you of a view of where you are actually spending the time executing your Java code.
Now, we know that there are cases where you want to get random samples of all threads, no matter what they are doing. We are adding new profiling capabilities/events in JDK 10.

Profiling application with VisualVM

Imagine you have command-line application that takes input file and does something with it. Now imagine you want to sample/profile this application. If it were Visual Studio you would just select profiling method (sampling/instrumentation) and VS would run application for you and collect data while program completes. But as far as I can see there is no similar functionality in VisualVM. You have to run your application, then select it in VisualVM and then explicitly start sampling/profiling. The problem is that sometimes execution of program with certain input data takes less time than it is required to setup VisualVM. Also with such an approach there is no possibility to batch profile application. Someone has suggested to start application in debug mode from Eclipse and set breakpoint somewhere in the beginning of main() method. Then setup VisualVM and continue execution. But I have suspicion that running in Debug vs Release mode has performance implications on its own.
Suggestions?

There is a new Startup Profiler plugin for VisualVM 1.3.6, which allows you to profile your application from its startup. See this article for additional information.

If the program does I/O, the Visual Studio sampler will not see the I/O because it is a "CPU Sampler" (even if nearly all of the time is spent waiting for I/O).
If you use Instrumentation, you won't see any line-level information because it only summarizes at the function level.
I use this technique.
If the program runs too quickly to sample, just put a temporary outer loop around it of, say, 100 or 1000 iterations.
The difference between Debug and Release mode will be next to nothing unless you are spending a good fraction of time in tight loops, in your code, where the loops do not contain any function calls, OR if you are doing data structure operations that do a lot of validation in the libraries.
If you are, then your samples will show that you are, and you will know that Release will make a speed difference.
As far as batch profiling is concerned, I don't. I just keep an eye on the program's overall throughput rate. If there is some input that seems to make it take too long, then I do the sampling procedure on the program with that input, see what the problem is, and fix it.

Java JVM or Eclipse Start-up Overhead

I have a network threaded application that I run under Eclipse (Indigo) and Javd 1.7x. For quite a while I have have noticed that the first run of the application produced front and end loaded degradation in performance, for example if I was to load up the application and and then hit it (using a test harness) with say 100 network packets the first few responses would be heavily erratic and the last few. [edit] Without unloading the application, and just running the test harness again, the application performs normally.[end edit]
I decided to get to try and get to the bottom of it and loaded up VisualVM 1.3.5 to profile the behaviour. The CPU Usage has a distinct spike going from 10% to over 50% at the beginning of the run. After the spikes, everything appears normal, and as stated above subsequent runs do not have the leading spikes in CPU Utilisation and the profile of subsequent runs is identical to the profile between the spikes of the first run. There doesn't appear to be any evidence that the number of threads is causing it, but there is a small rise. Heap space increases from 100MB to 200MB, but other than that everything appears normal.
Any thoughts would be welcome.
Thanks

Its fairly typical for system performance to be erratic the first time you run a test. This is due to the operating system reading libraries, JAR files, and other data off of disk and storing it in cache. Once this has been done that first time all subsequent runs will be much faster and more consistent.
Also, keep in mind that the JVM will also tend to be slower right after it starts up. Due to its hotspot analysis and just-in-time compiling, the code will need to run a little while before the JVM optimizes the byte code for your particular workload.

This is typical for OSGi environments, where bundles may be initialize lazily upon first access of a bundles class or services.
You can figure out if this is the case in your scenario by starting eclipse with -console -consolelog arguments.
When the console opens and the application was loaded, issue the ss command and note which bundles are marked LAZY. Then, run your test, issue ss again, and see if one of the LAZY bundles now became ACTIVE. If so, you can force eager start of your bundles via the configuration/config.ini file. This can also be accomplished via the IStartup extension point.

Profiling short-lived Java applications

Is there any Java profiler that allows profiling short-lived applications? The profilers I found so far seem to work with applications that keep running until user termination. However, I want to profile applications that work like command-line utilities, it runs and exits immediately. Tools like visualvm or NetBeans Profiler do not even recognize that the application was ran.
I am looking for something similar to Python's cProfile, in that the profiler result is returned when the application exits.

You can profile your application using the JVM builtin HPROF.
It provides two methods:
sampling the active methods on the stack
timing method execution times using injected bytecode (BCI, byte codee injection)
Sampling
This method reveals how often methods were found on top of the stack.
java -agentlib:hprof=cpu=samples,file=profile.txt ...
Timing
This method counts the actual invocations of a method. The instrumenting code has been injected by the JVM beforehand.
java -agentlib:hprof=cpu=times,file=profile.txt ...
Note: this method will slow down the execution time drastically.
For both methods, the default filename is java.hprof.txt if the file= option is not present.
Full help can be obtained using java -agentlib:hprof=help or can be found on Oracles documentation

Sun Java 6 has the java -Xprof switch that'll give you some profiling data.
-Xprof output cpu profiling data

A program running 30 seconds is not shortlived. What you want is a profiler which can start your program instead of you having to attach to a running system. I believe most profilers can do that, but you would most likely like one integrated in an IDE the best. Have a look at Netbeans.

Profiling a short running Java applications has a couple of technical difficulties:
Profiling tools typically work by sampling the processor's SP or PC register periodically to see where the application is currently executing. If your application is short-lived, insufficient samples may be taken to get an accurate picture.
You can address this by modifying the application to run a number of times in a loop, as suggested by #Mike. You'll have problems if your app calls System.exit(), but the main problem is ...
The performance characteristics of a short-lived Java application are likely to be distorted by JVM warm-up effects. A lot of time will be spent in loading the classes required by your app. Then your code (and library code) will be interpreted for a bit, until the JIT compiler has figured out what needs to be compiled to native code. Finally, the JIT compiler will spend time doing its work.
I don't know if profilers attempt to compensate to for JVM warmup effects. But even if they do, these effects influence your applications real behavior, and there is not a great deal that the application developer can do to mitigate them.
Returning to my previous point ... if you run a short lived app in a loop you are actually doing something that modifies its normal execution pattern and removes the JVM warmup component. So when you optimize the method that takes (say) 50% of the execution time in the modified app, that is really 50% of the time excluding JVM warmup. If JVM warmup is using (say) 80% of the execution time when the app is executed normally, you are actually optimizing 50% of 20% ... and that is not worth the effort.

If it doesn't take long enough, just wrap a loop around it, an infinite loop if you like. That will have no effect on the inclusive time percentages spent either in functions or in lines of code. Then, given that it's taking plenty of time, I just rely on this technique. That tells which lines of code, whether they are function calls or not, are costing the highest percentage of time and would therefore gain the most if they could be avoided.

start your application with profiling turned on, waiting for profiler to attach. Any profiler that conforms to Java profiling architecture should work. i've tried this with NetBeans's profiler.
basically, when your application starts, it waits for a profiler to be attached before execution. So, technically even line of code execution can be profiled.
with this approach, you can profile all kinds of things from threads, memory, cpu, method/class invocation times/duration...
http://profiler.netbeans.org/

The SD Java Profiler can capture statement block execution-count data no matter how short your run is. Relative execution counts will tell you where the time is spent.

You can use a measurement (metering) recording: http://www.jinspired.com/site/case-study-scala-compiler-part-9
You can also inspect the resulting snapshots: http://www.jinspired.com/site/case-study-scala-compiler-part-10
Disclaimer: I am the architect of JXInsight/OpenCore.

I suggest you try yourkit. It can profile from the start and dump the results when the program finishes. You have to pay for it but you can get an eval license or use the EAP version without one. (Time limited)

YourKit can take a snapshot of a profile session, which can be later analyzed in the YourKit GUI. I use this to feature to profile a command-line short-lived application I work on. See my answer to this question for details.

How to gather profiling information for a Java 1.4 application?

A Java application I support that runs on JRE 1.4.2_12 is hanging near midnight every night. I'd like to try and record as much profiling information as I can to discover if there is an issue in the JVM or external to the app.
I'd like to use HPROF to collect as much information as possible.
Is there a way to have HPROF dump its cpu sample and memory allocation report every minute instead of at the termination of the JVM?
Is there a different, more appropriate profiler that can collect information like this?

Rather than relying on dump files, I would try hooking up a profiler to the VM and leave it attached until the hang up occurs. Then use the profiler to introspect the state of the threads.
The use of Java 1.4 is a minor issue here, since 1.4's debug interface is not great, but some profilers still support it. I can particularly recommend YourKit, which is commercial, but offers an evaluation licence. It's the best profiler I've used, but some margin.

First things first: did you analyze the thread dump when your application hangs? A lot of the time that has enough information to troubleshoot a hanging java app...
Ctrl-Break in the process window on Windows, or kill -QUIT [pid] on Linux.

I would first try to determine if its actually your app or something else.
Are there any other apps on the box, if so do they run any batch around midnight. It could be a situation of your app suffering from a lack of resources due to other things running on the box or chewing up bandwidth.
Was this always the case or did it start recently. If this is new look at what changed on the box as a whole not just your own app.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.