Is there any Java profiler that allows profiling short-lived applications? The profilers I found so far seem to work with applications that keep running until user termination. However, I want to profile applications that work like command-line utilities, it runs and exits immediately. Tools like visualvm or NetBeans Profiler do not even recognize that the application was ran.
I am looking for something similar to Python's cProfile, in that the profiler result is returned when the application exits.
You can profile your application using the JVM builtin HPROF.
It provides two methods:
sampling the active methods on the stack
timing method execution times using injected bytecode (BCI, byte codee injection)
Sampling
This method reveals how often methods were found on top of the stack.
java -agentlib:hprof=cpu=samples,file=profile.txt ...
Timing
This method counts the actual invocations of a method. The instrumenting code has been injected by the JVM beforehand.
java -agentlib:hprof=cpu=times,file=profile.txt ...
Note: this method will slow down the execution time drastically.
For both methods, the default filename is java.hprof.txt if the file= option is not present.
Full help can be obtained using java -agentlib:hprof=help or can be found on Oracles documentation
Sun Java 6 has the java -Xprof switch that'll give you some profiling data.
-Xprof output cpu profiling data
A program running 30 seconds is not shortlived. What you want is a profiler which can start your program instead of you having to attach to a running system. I believe most profilers can do that, but you would most likely like one integrated in an IDE the best. Have a look at Netbeans.
Profiling a short running Java applications has a couple of technical difficulties:
Profiling tools typically work by sampling the processor's SP or PC register periodically to see where the application is currently executing. If your application is short-lived, insufficient samples may be taken to get an accurate picture.
You can address this by modifying the application to run a number of times in a loop, as suggested by #Mike. You'll have problems if your app calls System.exit(), but the main problem is ...
The performance characteristics of a short-lived Java application are likely to be distorted by JVM warm-up effects. A lot of time will be spent in loading the classes required by your app. Then your code (and library code) will be interpreted for a bit, until the JIT compiler has figured out what needs to be compiled to native code. Finally, the JIT compiler will spend time doing its work.
I don't know if profilers attempt to compensate to for JVM warmup effects. But even if they do, these effects influence your applications real behavior, and there is not a great deal that the application developer can do to mitigate them.
Returning to my previous point ... if you run a short lived app in a loop you are actually doing something that modifies its normal execution pattern and removes the JVM warmup component. So when you optimize the method that takes (say) 50% of the execution time in the modified app, that is really 50% of the time excluding JVM warmup. If JVM warmup is using (say) 80% of the execution time when the app is executed normally, you are actually optimizing 50% of 20% ... and that is not worth the effort.
If it doesn't take long enough, just wrap a loop around it, an infinite loop if you like. That will have no effect on the inclusive time percentages spent either in functions or in lines of code. Then, given that it's taking plenty of time, I just rely on this technique. That tells which lines of code, whether they are function calls or not, are costing the highest percentage of time and would therefore gain the most if they could be avoided.
start your application with profiling turned on, waiting for profiler to attach. Any profiler that conforms to Java profiling architecture should work. i've tried this with NetBeans's profiler.
basically, when your application starts, it waits for a profiler to be attached before execution. So, technically even line of code execution can be profiled.
with this approach, you can profile all kinds of things from threads, memory, cpu, method/class invocation times/duration...
http://profiler.netbeans.org/
The SD Java Profiler can capture statement block execution-count data no matter how short your run is. Relative execution counts will tell you where the time is spent.
You can use a measurement (metering) recording: http://www.jinspired.com/site/case-study-scala-compiler-part-9
You can also inspect the resulting snapshots: http://www.jinspired.com/site/case-study-scala-compiler-part-10
Disclaimer: I am the architect of JXInsight/OpenCore.
I suggest you try yourkit. It can profile from the start and dump the results when the program finishes. You have to pay for it but you can get an eval license or use the EAP version without one. (Time limited)
YourKit can take a snapshot of a profile session, which can be later analyzed in the YourKit GUI. I use this to feature to profile a command-line short-lived application I work on. See my answer to this question for details.
Related
I am hoping to do a profiling analysis on my Java project. To get the results I want to add a "hook" to the JVM so that every time a heap access occurs, the "hook" is called and does some tracing. I have been looking into JVMTI but this does not seem to give me what I expect.
I have several questions:
Is it possible to add such a hook?
If possible, what are the correct tools/interfaces that I should use?
If there is no existing tools that do this, can I achieve this by modifying the JVM codebase?
Thanks.
I want to add a "hook" to the JVM so that every time a heap access occurs
You can't really do this in the Java as the hook itself would access the heap and cal itself. Even if you work around this, it would make the program impossibly slow.
What you can do is use the debugging interface to breakpoint after each instruction, inspect the instruction and see if it accessed the heap or not. This would be perhaps 10,000x slower than normal.
An alternative is to translate the bytecode using Instrumentation to trace each memory access. This might be only a few hundred times slower.
To do what you propose efficiently, you could use https://software.intel.com/en-us/articles/intel-performance-counter-monitor which used by tools such as perf on Linux. This requires in-depth knowledge of the processor you are using
I'm profiling a Java application using Java Mission Control, and it's saying on the main page of the flight recording that "This recording contains few profiling samples even though CPU load is high.
The profiling data is thus likely not relevant."
It seems to be telling the truth. I asked it to sample every 10 ms for 3 minutes which should be 18000 samples, but I only see 996 samples.
It goes on to explain "The profiling data is thus likely not relevant. This might be because the application is running a lot JNI code or that the JVM is spending a lot of time in GC, class loading, JIT compilation etc."
Hmm, I don't have any native methods, and it shouldn't be loading classes or doing any JIT at the stage I recorded (well into the repetitive number crunching part of the code.) It doesn't look like it's spending an inordinate amount of time garbage collecting either.
We used to use hprof to profile this product, with much success. Hprof helped immensely in figuring out where we were relying on the main thread execution, so we could parallelize the hotspots into multiple threads. But that tool got discontinued in Java 9 so we're moving onward to Java Mission Control. It has a lot going for it, but if it can't identify what line numbers the VM threads are sitting on at random sample times, it's not very useful. Is there some other tool to use? Or, is there a way to debug this further from within Java Mission Control? It also looks like JVisualVM is no longer included in Java 9.
If you have many more running threads than cores, the sampling thread could be starved and not able to wake up at the interval you specified.
The answer is probably as simple as you having more threads than cores, and thus most of them not being scheduled on CPU at the time of sampling. The JFR method sampler will only keep samples of threads actually on CPU. The idea is to provide you of a view of where you are actually spending the time executing your Java code.
Now, we know that there are cases where you want to get random samples of all threads, no matter what they are doing. We are adding new profiling capabilities/events in JDK 10.
I am developing a Java 8 SE application in Netbeans. A new feature I added recently to the app was running too slowly (about a minute, until the calculations stopped). So I fired up the profiler to see what is the major bottleneck. To my surprise, the calculations completed in about 7 seconds.
Couldn't believe it at first, but the results were correct.
Tried it a few times again, but the app always ran 10 times faster with the profiler attached to it. I also tried to run the compiled .jar file directly from the Windows command line, but the computations took about a minute again and again.
How is it possible, that the attached profiler provides such a massive boost to the performance? What changes does it do to the JVM or application?
BTW, I am using native OpenCV in these calculations with provided Java wrapper, if it makes any difference.
//Edit - Additional info: I am using the built-in Netbeans 8.1 profiler, which I believe is basically VisualVM. As for a profiling method I chose to monitor "Methods" and their execution times and invocation counts. The performance bump happens both with instrumented and sampled profiling.
Unfortunately there probably isn't one single answer that will explain why this is the case. Of course, it will depend on what the program is doing as well as how the program is being launched. For example, if you're using the profiler to launch the application (as opposed to connecting afterwards) then it may be that the profiler is launching with different configuration (heap size, garbage collector etc.) and that is the cause of the difference.
If you run jcmd you should see a list of processes. You can then run jcmd <id> VM.flags to see what the JVM has been configured with, and verify that the same are for the application when under a profiler and when it isn't.
Another possibility is that your program is excessively locking, and this excessive locking is causing thrashing in your application when the profiler isn't attached. With it attached the locking may be slower, resulting in the application threads co-operating and ultimately making faster progress.
However these are just suggestions of how you can investigate further; it's quite likely that there is another as yet undiscovered problem that you're seeing which is completely different (e.g. it's defaulting to a different level of logging ...)
Imagine you have command-line application that takes input file and does something with it. Now imagine you want to sample/profile this application. If it were Visual Studio you would just select profiling method (sampling/instrumentation) and VS would run application for you and collect data while program completes. But as far as I can see there is no similar functionality in VisualVM. You have to run your application, then select it in VisualVM and then explicitly start sampling/profiling. The problem is that sometimes execution of program with certain input data takes less time than it is required to setup VisualVM. Also with such an approach there is no possibility to batch profile application. Someone has suggested to start application in debug mode from Eclipse and set breakpoint somewhere in the beginning of main() method. Then setup VisualVM and continue execution. But I have suspicion that running in Debug vs Release mode has performance implications on its own.
Suggestions?
There is a new Startup Profiler plugin for VisualVM 1.3.6, which allows you to profile your application from its startup. See this article for additional information.
If the program does I/O, the Visual Studio sampler will not see the I/O because it is a "CPU Sampler" (even if nearly all of the time is spent waiting for I/O).
If you use Instrumentation, you won't see any line-level information because it only summarizes at the function level.
I use this technique.
If the program runs too quickly to sample, just put a temporary outer loop around it of, say, 100 or 1000 iterations.
The difference between Debug and Release mode will be next to nothing unless you are spending a good fraction of time in tight loops, in your code, where the loops do not contain any function calls, OR if you are doing data structure operations that do a lot of validation in the libraries.
If you are, then your samples will show that you are, and you will know that Release will make a speed difference.
As far as batch profiling is concerned, I don't. I just keep an eye on the program's overall throughput rate. If there is some input that seems to make it take too long, then I do the sampling procedure on the program with that input, see what the problem is, and fix it.
I have started to use visualvm for profiling my application which I launch in Eclipse. Then I launch visualvm which initially gives believable results.
After some time two processes appear in the monitor which consume huge amounts of time.
I have not deliberately invoked these. After a time they disappear. Are they an artefact of the profiling process and do I need to worry?
Very few of my routines appear in the profile, mainly the libraries they call. Is there a way of showing which routines call the most heavily used ones?
It is better to start with CPU sampling, if you don't know which part of the code is slow. Once you know better (based on the sampling results) what is going on, you can profile just part of your application, which is slow. You need to set profiling roots and instrumentation filter and don't forget to take the snapshot of collected results. See Profiling With VisualVM, Part 1 and Profiling With VisualVM, Part 2 to get more information about profiling and how to set profiling roots and instrumentation filter.
VisualVM uses Java to perform it's work. This means you will see some artefacts which relate to the RMI calls it makes. You can ignore them.
I use YourKit which doesn't do this, but it's not free ;)
VisualVM will track all methods being called by the java program it's monitoring, so either your program or one of its libraries is calling those methods. VisualVM is also connecting to it so there might be some small artifacts.
As for searching, probably the easiest way is to filter by your own packages. There is a space at the bottom where you can enter those so you can see which of your own methods is really taking time. Also you should pay attention to which thread you are in, usually you will want to look at the whatever is your "main" thread. The other threads are interesting but won't always give you the best idea of how your program behaves.