I'm in the process of benchmarking an app i've written. I ran my app through the benchmark 10 times in a loop (to get 10 results instead of only 1). Each time, the first iteration seems to take some 50 - 100 milliseconds longer than rest of the iterations.
Is this related to the JIT compiler and is there anything one could do to "reset" the state so that you would get the initial "lag" included with all iterations?
To benchmark a long running application you should allow an initialization (1st pass), thats because classes have to be loaded, code has to be generated, in web-apps JSP compile to servlets etc. JIT of course plays its role also. Sometimes a pass could take longer if garbage collection occurs.
It is probably caused by the JIT kicking in, however you probably want to ignore the initial lag anyway. At least most benchmarks try to, because it heavily distorts the statistics.
You can't "uncompile" code that has been compiled but you can turn compiling off completely by using the -Xint command line switch.
The first pass will probably always be slower because of the JIT. I'd even expect to see differences when more runs are made because of possible incremental compilation or better branch prediction.
For benchmarking, follow the recommondations given in the other answers (except I wouldn't turn off the JIT because you'd have your app running with JIT in a production environment).
In any case use a profiler such as JVisualVM (included in JDK).
Is this related to the JIT compiler
Probably yes, though there are other potential sources of "lag":
Bootstrapping the JVM and creation of the initial classloader.
Reading and loading the application's classes, and the library classes that are used.
Initializing the classes.
JIT compilation.
Heap warmup effects; e.g. the overheads of having a heap that is initially too small. (This can result on the GC running more often than normal ... until the heap reaches a size that matches the application's peak working set size.)
Virtual memory warmup effects; e.g. the OS overheads incurred when the JVM grows the process address space and physical pages are allocated.
... and is there anything one could do to "reset" the state so that you would get the initial "lag" included with all iterations?
There is nothing you can do, apart from starting the JVM over again.
However, there are things that you can do to remove some of these sources of "lag"; e.g. turning of JIT compilation, using a large initial heap size, and running on an otherwise idle machine.
Also, the link that #Joachim contributed above is worth a thorough read.
There are certain structures you might have in your code, such as singletons which are initialized only once and consume system resources. If you're using a database connection pool for example, this might be the case. Moreover it is the time needed by Java classes to be initialized. For these reasons, I think you should discard that first value and keep only the rest.
Related
I have a performance-critical method called often when my app starts up. Eventually, it gets JIT-compiled, but not after some noticeable time being run in the interpreter.
Is there any way I can tell the JVM that I want this method compiled right from the start (without tweaking other internals with stuff like -XX:CompileThreshold)?
The only way I know of is the -Xcomp flag, but that is not generally advisable to use. It forces immediate JIT compilation of ALL classes and methods first time they are run. The downside is that you will see a performance decrease on initial startup (due to increased JIT activity). The other major limitation with this flag is that it appears to disable the incremental profiling-based optimization that JIT would normally do. In standard mixed mode, the JIT compiler can (and will) deoptimize and re-compile parts of the code continually based on profiling and run-time information collected. This allows it to "correct" faulty optimizations like boundary checks that were omitted but turned out to be needed, sub-optimal inlinings etc. -Xcomp disables the profiling-based optimization and depending on program, can cause significant performance losses overall for only a small or no real gain in startup, which is why it's not recommended to use.
Beyond to -Xcomp (which is pretty brutal) and -XX:CompileThreshold (which controls how many executions of a given method the JIT will run in intepreted mode to gather stats before compiling/optimizing it), there is also -Xbatch. This forces JIT compilation to the "foreground", essentially blocking calls to methods until it's been compiled, rather than compiling it in the background as it normally does.
You didn't specify which Java version you are using, but if Java 7 is an option for you, it introduces a new JIT model called "Tiered compilation" (activated with the -XX:+TieredCompilation switch). What tiered compilation does is that it allows an initial, smaller compilation pass on the first use of a method and than an additional, larger compilation/optimization later, based on collected profiling data. Sounds like it should be interesting to you.
It supposedly requires some additional tweaking and parameters/configurations, but I've not got around to checking it out further.
im not sure if it'll completely precompile the code, but you could add your class with the critical method to the JVM's shared data dump. see this question for more details.
also, have you considered JNI? if your method is very CPU intensive it might speed things up considerably.
I have a network threaded application that I run under Eclipse (Indigo) and Javd 1.7x. For quite a while I have have noticed that the first run of the application produced front and end loaded degradation in performance, for example if I was to load up the application and and then hit it (using a test harness) with say 100 network packets the first few responses would be heavily erratic and the last few. [edit] Without unloading the application, and just running the test harness again, the application performs normally.[end edit]
I decided to get to try and get to the bottom of it and loaded up VisualVM 1.3.5 to profile the behaviour. The CPU Usage has a distinct spike going from 10% to over 50% at the beginning of the run. After the spikes, everything appears normal, and as stated above subsequent runs do not have the leading spikes in CPU Utilisation and the profile of subsequent runs is identical to the profile between the spikes of the first run. There doesn't appear to be any evidence that the number of threads is causing it, but there is a small rise. Heap space increases from 100MB to 200MB, but other than that everything appears normal.
Any thoughts would be welcome.
Thanks
Its fairly typical for system performance to be erratic the first time you run a test. This is due to the operating system reading libraries, JAR files, and other data off of disk and storing it in cache. Once this has been done that first time all subsequent runs will be much faster and more consistent.
Also, keep in mind that the JVM will also tend to be slower right after it starts up. Due to its hotspot analysis and just-in-time compiling, the code will need to run a little while before the JVM optimizes the byte code for your particular workload.
This is typical for OSGi environments, where bundles may be initialize lazily upon first access of a bundles class or services.
You can figure out if this is the case in your scenario by starting eclipse with -console -consolelog arguments.
When the console opens and the application was loaded, issue the ss command and note which bundles are marked LAZY. Then, run your test, issue ss again, and see if one of the LAZY bundles now became ACTIVE. If so, you can force eager start of your bundles via the configuration/config.ini file. This can also be accomplished via the IStartup extension point.
Is there any Java profiler that allows profiling short-lived applications? The profilers I found so far seem to work with applications that keep running until user termination. However, I want to profile applications that work like command-line utilities, it runs and exits immediately. Tools like visualvm or NetBeans Profiler do not even recognize that the application was ran.
I am looking for something similar to Python's cProfile, in that the profiler result is returned when the application exits.
You can profile your application using the JVM builtin HPROF.
It provides two methods:
sampling the active methods on the stack
timing method execution times using injected bytecode (BCI, byte codee injection)
Sampling
This method reveals how often methods were found on top of the stack.
java -agentlib:hprof=cpu=samples,file=profile.txt ...
Timing
This method counts the actual invocations of a method. The instrumenting code has been injected by the JVM beforehand.
java -agentlib:hprof=cpu=times,file=profile.txt ...
Note: this method will slow down the execution time drastically.
For both methods, the default filename is java.hprof.txt if the file= option is not present.
Full help can be obtained using java -agentlib:hprof=help or can be found on Oracles documentation
Sun Java 6 has the java -Xprof switch that'll give you some profiling data.
-Xprof output cpu profiling data
A program running 30 seconds is not shortlived. What you want is a profiler which can start your program instead of you having to attach to a running system. I believe most profilers can do that, but you would most likely like one integrated in an IDE the best. Have a look at Netbeans.
Profiling a short running Java applications has a couple of technical difficulties:
Profiling tools typically work by sampling the processor's SP or PC register periodically to see where the application is currently executing. If your application is short-lived, insufficient samples may be taken to get an accurate picture.
You can address this by modifying the application to run a number of times in a loop, as suggested by #Mike. You'll have problems if your app calls System.exit(), but the main problem is ...
The performance characteristics of a short-lived Java application are likely to be distorted by JVM warm-up effects. A lot of time will be spent in loading the classes required by your app. Then your code (and library code) will be interpreted for a bit, until the JIT compiler has figured out what needs to be compiled to native code. Finally, the JIT compiler will spend time doing its work.
I don't know if profilers attempt to compensate to for JVM warmup effects. But even if they do, these effects influence your applications real behavior, and there is not a great deal that the application developer can do to mitigate them.
Returning to my previous point ... if you run a short lived app in a loop you are actually doing something that modifies its normal execution pattern and removes the JVM warmup component. So when you optimize the method that takes (say) 50% of the execution time in the modified app, that is really 50% of the time excluding JVM warmup. If JVM warmup is using (say) 80% of the execution time when the app is executed normally, you are actually optimizing 50% of 20% ... and that is not worth the effort.
If it doesn't take long enough, just wrap a loop around it, an infinite loop if you like. That will have no effect on the inclusive time percentages spent either in functions or in lines of code. Then, given that it's taking plenty of time, I just rely on this technique. That tells which lines of code, whether they are function calls or not, are costing the highest percentage of time and would therefore gain the most if they could be avoided.
start your application with profiling turned on, waiting for profiler to attach. Any profiler that conforms to Java profiling architecture should work. i've tried this with NetBeans's profiler.
basically, when your application starts, it waits for a profiler to be attached before execution. So, technically even line of code execution can be profiled.
with this approach, you can profile all kinds of things from threads, memory, cpu, method/class invocation times/duration...
http://profiler.netbeans.org/
The SD Java Profiler can capture statement block execution-count data no matter how short your run is. Relative execution counts will tell you where the time is spent.
You can use a measurement (metering) recording: http://www.jinspired.com/site/case-study-scala-compiler-part-9
You can also inspect the resulting snapshots: http://www.jinspired.com/site/case-study-scala-compiler-part-10
Disclaimer: I am the architect of JXInsight/OpenCore.
I suggest you try yourkit. It can profile from the start and dump the results when the program finishes. You have to pay for it but you can get an eval license or use the EAP version without one. (Time limited)
YourKit can take a snapshot of a profile session, which can be later analyzed in the YourKit GUI. I use this to feature to profile a command-line short-lived application I work on. See my answer to this question for details.
I have an interpreter written in Java. I am trying to test the performance results of various optimisations in the interpreter. To do this I parse the code and then repeatedly run the interpreter over the code, this continues until I get 5 runs which differ by a very small margin (0.1s in the times below), the mean is taken and printed. No I/O or randomness happens in the interpreter. If I run the interpreter again I am getting different run times:
91.8s
95.7s
93.8s
97.6s
94.6s
94.6s
107.4s
I have tried to no avail the server and client VM, the serial and parallel gc, large tables and windows and linux. These are on 1.6.0_14 JVM. The computer has no processes running in the background. So I asking what may be causing these large variations or how can I find out what is?
The actualy issue was caused because the program had to iterate to a fixed point solution and the values were stored in a hashset. The hashed values differed between runs, resulting in a different ordering which in turn led to a change in the amount of iterations needed to reach the solution.
"Wall clock time" is rarely a good measurement for benchmarking. A modern OS is extremely unlikely to "[have] no processes running in the background" -- for all you know, it could be writing dirty block buffers to disk, because it's decided that there's no other contention.
Instead, I recommend using ThreadMXBean to track actual CPU consumption.
Your variations don't look that large. It's simply the nature of the beast that there are other things running outside of your direct control, both in the OS and the JVM, and you're not likely to get exact results.
Things that could affect runtime:
if your test runs are creating objects (may be invisible to you, within library calls, etc) then your repeats may trigger a GC
Different GC algorithms, specifications will react differently, different thresholds for incremental gc. You could try to run a System.gc() before every run, although the JVM is not guaranteed to GC when you call that (although it always has when I've played with it).T Depending on the size of your test, and how many iterations you're running, this may be an unpleasantly (and nearly uselessly) slow thing to wait for.
Are you doing any sort of randomization within your tests? e.g. if you're testing integers, values < |128| may be handled slightly differently in memory.
Ultimately I don't think it's possible to get an exact figure, probably the best you can do is an average figure around the cluster of results.
The garbage collection may be responsible. Even though your logic is the same, it may be that the GC logic is being scheduled on external clock/events.
But I don't know that much about JVMs GC implementation.
This seems like a significant variation to me, I would try running with -verbosegc.
You should be able to get the variation to much less than a second if your process has no IO, output or network of any significance.
I suggest profiling your application, there is highly likely to be significant saving if you haven't done this already.
I know that the JVM can do some pretty serious optimizations at runtime, especially in -server mode. Of course, it takes a little while for the JVM to settle down and reach peak performance. Is there any way to take a snapshot of those optimizations so they can be applied immediately the next time you run your app?
"Hey JVM! Great job optimizing my code. Could you write that down for me for later?"
Basically not yet with Sun's VM, but they have it in mind.
See various postings/comments under here:
http://blogs.oracle.com/fatcatair/category/Java
(Sorry: I can't find quite the right one about retaining stats over restart for immediate C1 compilation of known-hot-at-startup methods.)
But I don't know where all this stuff is right now.
Note that optimisations appropriate in steady-state may well not be appropriate at start-up and might indeed reduce start-up performance, and indeed two runs may not have the same hotspots...
Perhaps this might help: http://wikis.sun.com/display/HotSpotInternals/PrintAssembly.