Is stopwatch benchmarking acceptable? - java

Does anyone ever use stopwatch benchmarking, or should a performance tool always be used? Are there any good free tools available for Java? What tools do you use?
To clarify my concerns, stopwatch benchmarking is subject to error due to operating system scheduling. On a given run of your program the OS might schedule another process (or several) in the middle of the function you're timing. In Java, things are even a little bit worse if you're trying to time a threaded application, as the JVM scheduler throws even a little bit more randomness into the mix.
How do you address operating system scheduling when benchmarking?

Stopwatch benchmarking is fine, provided you measure enough iterations to be meaningful. Typically, I require a total elapsed time of some number of single digit seconds. Otherwise, your results are easily significantly skewed by scheduling, and other O/S interruptions to your process.
For this I use a little set of static methods I built a long time ago, which are based on System.currentTimeMillis().
For the profiling work I have used jProfiler for a number of years and have found it very good. I have recently looked over YourKit, which seems great from the WebSite, but I've not used it at all, personally.
To answer the question on scheduling interruptions, I find that doing repeated runs until consistency is achieved/observed works in practice to weed out anomalous results from process scheduling. I also find that thread scheduling has no practical impact for runs of between 5 and 30 seconds. Lastly, after you pass the few seconds threshold scheduling has, in my experience, negligible impact on the results - I find that a 5 second run consistently averages out the same as a 5 minute run for time/iteration.
You may also want to consider prerunning the tested code about 10,000 times to "warm up" the JIT, depending on the number of times you expect the tested code to run over time in real life.

It's totally valid as long as you measure large enough intervals of time. I would execute 20-30 runs of what you intend to test so that the total elapsed time is over 1 second. I've noticed that time calculations based off System.currentTimeMillis() tend to be either 0ms or ~30ms; I don't think you can get anything more precise than that. You may want to try out System.nanoTime() if you really need to measure a small time interval:
documentation: http://java.sun.com/javase/6/docs/api/java/lang/System.html#nanoTime()
SO question about measuring small time spans, since System.nanoTime() has some issues, too: How can I measure time with microsecond precision in Java?

Stopwatch is actually the best benchmark!
The real end to end user response time is the time that actually matters.
It is not always possible to obtain this time using the available tools, for instance most testing tools do not include the time it takes for a browser to render a page so an overcomplex page with badly written css will show sub second response times to the testing tools, but, 5 seconds plus response time to the user.
The tools are great for automated testing, and for problem determinittion but dont lose sight of what you really want to measure.

A profiler gives you more detailed information, which can help to diagnose and fix performance problems.
In terms of actual measurement, stopwatch time is what users notice so if you want to validate that things are within acceptable limits, stopwatch time is fine.
When you want to actually fix problems, however, a profiler can be really helpful.

You need to test a realistic number of iterations as you will get different answers depending on how you test the timing. If you only perform an operation once, it could be misleading to take the average of many iterations. If you want to know the time it takes after the JVM has warmed up you might run many (e.g. 10,000) iterations which are not included in the timings.
I also suggest you use System.nanoTime() as it's much more accurate. If your test time is around 10 micro-seconds or less, you don't want to call this too often or it can change your result. (e.g. If I am testing for say 5 seconds and I want to know when this is up I only get the nanoTime every 1000 iterations, if I know an iteration is very quick)

How do you address operating system scheduling when benchmarking?
Benchmark for long enough on a system which is representative of the machine you will be using. If your OS slows down your application, then that should be part of the result.
There is no point in saying, my program would be faster, if only I didn't have an OS.
If you are using Linux, you can use tools such as numactl, chrt and taskset to control how CPUs are used and the scheduling.

Profilers can get in the way of timings, so I would use a combination of stopwatch timing to identify overall performance problems, then use the profiler to work out where the time is being spent. Repeat the process as required.

After all, it's probably the second most popular form of benchmarking, right after "no-watch benchmarking" - where we say "this activity seems slow, that one seems fast."
Usually what's most important to optimize is whatever interferes with the user experience - which is most often a function of how frequently you perform the action, and whatever else is going on at the same time. Other forms of benchmarking often just help zero in on these.

I think a key question is the complexity and length of time of the operation.
I sometimes even use physical stopwatch measurements to see if something takes minutes, hours, days, or even weeks to compute (I am working with an application where run times on the orders of several days are not unheard of, even if seconds and minutes are the most common time spans).
However, the automation afforded by calls to any kind of clock system on the computer, like the java millis call referred to in the linked article, is clearly superior to manually seeing how long something runs.
Profilers are nice, when they work, but I have had problems applying them to our application, which usually involves dynamic code generation, dynamic loading of DLLs, and work performed in the two built-in just-in-time-compiled scripting languages of my application. They are quite often limited to assuming a single source language and other unrealistic expectations for complex software.

I ran a program today that searched through and collected information from a bunch of dBase files, it took just over an hour to run. I took a look at the code, made an educated guess at what the bottleneck was, made a minor improvement to the algorithm, and re-ran the program, this time it completed in 2.5 minutes.
I didn't need any fancy profiling tools or benchmark suites to tell me the new version was a significant improvement. If I needed to further optimize the running time I probably would have done some more sophisticated analysis but this wasn't necessary. I find that this sort of "stopwatch benchmarking" is an acceptable solution in quite a number of cases and resorting to more advanced tools would actually be more time-consuming in these cases.

I don't think stopwatch benchmarking is too horrible, but if you can get onto a Solaris or OS X machine you should check out DTrace. I've used it to get some great information about timing in my applications.

I always use stopwatch benchmarking as it is so much easier. The results don't need to be very accurate for me though. If you need accurate results then you shouldn't use stopwatch benchmarking.

I do it all the time. I'd much rather use a profiler, but the vendor of the domain-specific language I'm working with doesn't provide one.

Related

Java Optimization Paradigm

I am currently doing some Java optimization. It seems the best approach to assess my progress is to do repeated runs and collect run time statistics using System.nanoTime.
Most of my background has been with embedded DSP applications. In my embedded developments, I would have access to CPU cycle counters (which are a great measure for optimization). The JRE acts like a synthetic CPU, is there a way to get information of the number of instructions or JRE clock equivalents that were executed?
Thanks in advance for any hints. J.R.
The byte code count is almost certainly useless for what you want. They are entirely virtual/notional in terms of performance at run time. The only thing which matters is elapse time for code which has been warmed up.
I suggest you have a look at JMH for micro-benchmarking your code. http://openjdk.java.net/projects/code-tools/jmh/
If you come from the DSP/realtime background I suggest looking at the latency distribution and minimising your allocation rate.
NOTE: The JVM injects "safe points" between instructions to check if the thread needs to stop e.g. to perform garbage collection. A garbage collection can take seconds. However safepoints are often optimised away to avoid slowing down your program.
In short this means the time between instructions might be nothing, it might be something, it might be seconds or even minutes so I wouldn't bother counting them.

Is it possible to use a micro-benchmark framework to only time some statements?

I am planning to micro benchmark my java code which involves several calls to local as well as remote database. I was about to use System.nanoTime() but started reading about the micro benchmarking frameworks such as jmh and caliper. Use of these frameworks is definitely recommended but from whatever (little) I read, it seems that we can benchmark only a complete method and also it allows us to do this non-invasively (w.r.t existing code) i.e., we need not litter existing code with the code/annotations of jmh/caliper.
I want to benchmark only specific pieces of code (statements) within some methods. Is it possible to do this with any of micro benchmarking frameworks? Please provide some insights into this.
I guess, calls to a DB are usually expensive enough to eliminate most of the problem with microbenchmarking. So your approach was probably fine. If you're measuring it in production, repeating the measurement many times, and don't care about a few nanoseconds, stick with System.nanoTime.
You're doing something very different from microbenchmarking like e.g. I did here. You're not trying to optimize a tiny piece of code and you don't want to eliminate external influences.
Microbenchmarking a part of a method makes no sense to me, as a method gets optimized as a whole (and possibly also inlined). It's a different level.
I don't think any framework could help, all they can do in your case is automate the work, which you don't seem to need. Note that System.nanoTime may take several hundreds cycles (which is probably fine in your case).
You can try using metrics from codehale.
I found its easy to use and low overhead if you are using in certain configuration i.e. Exponentially decaying Reservoir.
Micro level and precise benchmarking does comes with an associated cost with it i.e. memory overhead at run time for sampling, benchmark might it self take time for calculation and and stats generation (ideal one would be offsetting that from stats) .
But if you want to bench mark db connection which I don't think should be very frequent, metrics might be appropriate, I found its easy to use. and yes it is bit invasive but configurable.

Detecting and pinpointing performance regressions

Are there any known techniques (and resources related to them, like research papers or blog entries) which describe how do dynamically programatically detect the part of the code that caused a performance regression, and if possible, on the JVM or some other virtual machine environment (where techniques such as instrumentation can be applied relatively easy)?
In particular, when having a large codebase and a bigger number of committers to a project (like, for example, an OS, language or some framework), it is sometimes hard to find out the change that caused a performance regression. A paper such as this one goes a long way in describing how to detect performance regressions (e.g. in a certain snippet of code), but not how to dynamically find the piece of the code in the project that got changed by some commit and caused the performance regression.
I was thinking that this might be done by instrumenting pieces of the program to detect the exact method which causes the regression, or at least narrowing the range of possible causes of the performance regression.
Does anyone know about anything written about this, or any project using such performance regression detection techniques?
EDIT:
I was referring to something along these lines, but doing further analysis into the codebase itself.
Perhaps not entirely what you are asking, but on a project I've worked on with extreme performance requirements, we wrote performance tests using our unit testing framework, and glued them into our continuous integration environment.
This meant that every check-in, our CI server would run tests that validated we hadn't slowed down the functionality beyond our acceptable boundaries.
It wasn't perfect - but it did allow us to keep an eye on our key performance statistics over time, and it caught check-ins that affected the performance.
Defining "acceptable boundaries" for performance is more an art than a science - in our CI-driven tests, we took a fairly simple approach, based on the hardware specification; we would fail the build if the performance tests exceeded a response time of more than 1 second with 100 concurrent users. This caught a bunch of lowhanging fruit performance issues, and gave us a decent level of confidence on "production" hardware.
We explicitly didn't run these tests before check-in, as that would slow down the development cycle - forcing a developer to run through fairly long-running tests before checking in encourages them not to check in too often. We also weren't confident we'd get meaningful results without deploying to known hardware.
With tools like YourKit you can take a snapshot of the performance breakdown of a test or application. If you run the application again, you can compare performance breakdowns to find differences.
Performance profiling is more of an art than a science. I don't believe you will find a tool which tells you exactly what the problem is, you have to use your judgement.
For example, say you have a method which is taking much longer than it used to do. Is it because the method has changed or because it is being called a different way, or much more often. You have to use some judgement of your own.
JProfiler allows you to see list of instrumented methods which you can sort by average execution time, inherent time, number of invocations etc. I think if this information is saved over releases one can get some insight into regression. Offcourse the profiling data will not be accurate if the tests are not exactly same.
Some people are aware of a technique for finding (as opposed to measuring) the cause of excess time being taken.
It's simple, but it's very effective.
Essentially it is this:
If the code is slow it's because it's spending some fraction F (like 20%, 50%, or 90%) of its time doing something X unnecessary, in the sense that if you knew what it was, you'd blow it away, and save that fraction of time.
During the general time it's being slow, at any random nanosecond the probability that it's doing X is F.
So just drop in on it, a few times, and ask it what it's doing.
And ask it why it's doing it.
Typical apps are spending nearly all their time either waiting for some I/O to complete, or some library function to return.
If there is something in your program taking too much time (and there is), it is almost certainly one or a few function calls, that you will find on the call stack, being done for lousy reasons.
Here's more on that subject.

Can a profiler change how long recursive calls take to run in Java?

I'm working on refactoring some code in Java, so I I'm timing things to make sure the code doesn't get any slower. However, the new refactored code seems to take more time than the original code. Remarkably, when I run the code with a profiler, the new code is significantly faster than the old code. The primary difference is that the old code is recursive, while the new code is iterative. Can a profiler affect the recursive code by a factor of several hundred thousand while only affecting the iterative code by a factor of 1.5?
I'm running on Mac OS X 10.6.6, 3 GB RAM, 2.4 GHz CPU, using the default Netbeans 6.9 profiler with Java 1.6.0__22 64-Bit Server.
(Both methods have self-timing code using System.currentTimeMillis() to allow me to compare the times when not using a profiler, but this shouldn't affect things noticeably.)
Yes. Most profiles do instrumentation at the level of method invocations. In recursive form, the profiler must take a lot more measurements than in iterative form. While profilers do try to extract their overhead from the reported numbers, this is very difficult to do reliable. Different profilers will be better/worse at this.
I'm working on refactoring some code in Java, so I I'm timing things to make sure the code doesn't get any slower. However, the new refactored code seems to take more time than the original code.
Yes. Code typically runs slower under a profiler.
Therefore, you should compare times of old / new version of your application, either both run under the profiler, or both run normally.
Also be aware that a profiler can actually distort performance characteristics. And different profilers may disagree about where the code hotspots are. So it is a good idea to run/compare versions of your application without profiling before you adopt an optimization that you are trialing.
(Both methods have self-timing code using System.currentTimeMillis() to allow me to compare the times when not using a profiler, but this shouldn't affect things noticeably.)
There are traps here too:
The best possible granularity of currentTimeMillis() is 1 millisecond, but on some OSes it might be tens of milliseconds.
If you don't take steps to avoid this, your manual timings can include distorting overheads such as the JIT compilation times and the GC times. The effects could be quite subtle.
I would say if you want to measure speed, just measure speed, don't profile. They're not the same thing. Instrumenting profilers put a lot of overhead into each function call, and if all you want is an overall speed difference, it won't be accurate because you're partly measuring the cost of the instrumentation itself.
If you want to find out what is taking the time, that is different from measuring. A wall-clock-time stack-sampling profiler (not instrumentation) that reports line-level percent is your best bet. It doesn't matter if it slows the program down, because it's purpose is not to measure speed; it's purpose is to find out where the time is going and why, on a percentage basis. It would be OK if it or something else slowed the program down by 10%, or 10 times, if it showed you where the time was being taken, independent of speed.
I only say this because lots of people are confused about this point, and the confusion gets solidified into lots of profilers.
More on that subject.

Java performance Inconsistent

I have an interpreter written in Java. I am trying to test the performance results of various optimisations in the interpreter. To do this I parse the code and then repeatedly run the interpreter over the code, this continues until I get 5 runs which differ by a very small margin (0.1s in the times below), the mean is taken and printed. No I/O or randomness happens in the interpreter. If I run the interpreter again I am getting different run times:
91.8s
95.7s
93.8s
97.6s
94.6s
94.6s
107.4s
I have tried to no avail the server and client VM, the serial and parallel gc, large tables and windows and linux. These are on 1.6.0_14 JVM. The computer has no processes running in the background. So I asking what may be causing these large variations or how can I find out what is?
The actualy issue was caused because the program had to iterate to a fixed point solution and the values were stored in a hashset. The hashed values differed between runs, resulting in a different ordering which in turn led to a change in the amount of iterations needed to reach the solution.
"Wall clock time" is rarely a good measurement for benchmarking. A modern OS is extremely unlikely to "[have] no processes running in the background" -- for all you know, it could be writing dirty block buffers to disk, because it's decided that there's no other contention.
Instead, I recommend using ThreadMXBean to track actual CPU consumption.
Your variations don't look that large. It's simply the nature of the beast that there are other things running outside of your direct control, both in the OS and the JVM, and you're not likely to get exact results.
Things that could affect runtime:
if your test runs are creating objects (may be invisible to you, within library calls, etc) then your repeats may trigger a GC
Different GC algorithms, specifications will react differently, different thresholds for incremental gc. You could try to run a System.gc() before every run, although the JVM is not guaranteed to GC when you call that (although it always has when I've played with it).T Depending on the size of your test, and how many iterations you're running, this may be an unpleasantly (and nearly uselessly) slow thing to wait for.
Are you doing any sort of randomization within your tests? e.g. if you're testing integers, values < |128| may be handled slightly differently in memory.
Ultimately I don't think it's possible to get an exact figure, probably the best you can do is an average figure around the cluster of results.
The garbage collection may be responsible. Even though your logic is the same, it may be that the GC logic is being scheduled on external clock/events.
But I don't know that much about JVMs GC implementation.
This seems like a significant variation to me, I would try running with -verbosegc.
You should be able to get the variation to much less than a second if your process has no IO, output or network of any significance.
I suggest profiling your application, there is highly likely to be significant saving if you haven't done this already.

Categories

Resources