Would anyone have an explanation, or even better a suggested fix, for why the time taken to execute Mockito mocks is so erratic? The simplest SSCCE I could come up with for this is below:
import static org.mockito.Mockito.mock;
public class TestSimpleMockTiming
{
public static final void main (final String args [])
{
final Runnable theMock = mock (Runnable.class);
int tookShort = 0;
int tookMedium = 0;
int tookLong = 0;
int tookRidiculouslyLong = 0;
long longest = 0;
for (int n = 0; n < 2000000; n++)
{
final long startTime = System.nanoTime ();
theMock.run ();
final long duration = System.nanoTime () - startTime;
if (duration < 1000000) // 0.001 seconds
tookShort++;
else if (duration < 100000000) // 0.1 seconds
tookMedium++;
else if (duration < 1000000000) // 1 second !!!
tookLong++;
else
tookRidiculouslyLong++;
longest = Math.max (longest, duration);
}
System.out.println (tookShort + ", " + tookMedium + ", " + tookLong + ", " + tookRidiculouslyLong);
System.out.println ("Longest duration was " + longest + " ns");
}
}
If I run this (from within Eclipse, using JDK 1.7.45 on Win 7 x64) typical output looks like:
1999983, 4, 9, 4
Longest duration was 5227445252 ns
So, while in the majority of situations the mock executes very fast, there's several executions that take even longer than 1 second. That's an eternity for a method that does nothing. From my experimenting with this, I don't believe the issue is the accuracy of System.nanoTime (), I think the mock really does take that long to execute. Is there anything I can do to improve on this and make the timing behave more consistently?
(FYI, why this is an issue is that I have a Swing app which contains various frames, and I try to write JUnit tests for the frames so that I can test that the layoutManagers behave correctly without having to fire up the whole app and navigate to the correct screen. In one such test, the screen uses a javax.swing.Timer to implement scrolling, so the display will pan around an area when the mouse is held near the end of the frame. I noticed the behaviour of this was very erratic, and the scrolling while usually fine would periodically freeze for up to a second and it looked dreadful. I wrote an SSCCE around this, thinking the problem was that Swing Timers can't be depended on to fire at a consistent rate, and in the SSCCE it worked perfectly.
After hours of tearing my hair out then trying to spot differences between my real code and the scrolling demo SSCCE, I started putting nano timers around blocks of code that ran repeatedly, noticed the time taken by my paintComponent method to be very erratic and eventually narrowed it down to a mock call. Testing the screen from running the real app, the scrolling behaves smoothly, its only a problem from the JUnit test because of the mock call, which led to me testing a simple mock in isolation with the SSCCE posted above.)
Many thanks!
This test is flawed in multiple ways. If you want to benchmark properly I strongly suggest that using JMH, it is done by someone Alexey Shipilev that is much smarter than us and definitely more knowledgeable on the JVM than most people doing Java on our beloved planet.
Here's the most notable way the test is flawed.
The test ignores what the JVM is doing, like the warmup phase, compilation C1 and C2 thread, GC, threading issues (even though this code is not multi-threaded, the JVM/OS may have to do something else) etc...
The test do seem to ignore if the actual OS/JVM/CPU combination offer a proper resolution up to the nanosecond.
Even though there's a System.nanoTime() are you sure the JVM and the OS have the proper resolution. On windows for example, there's the JVM don't have access to the the real nanosecond, but instead to some counter, not a wall-clock time. The javadoc states this, here's snippet :
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine; other virtual machine instances are likely to use a different origin.
This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes) - no guarantees are made except that the resolution is at least as good as that of currentTimeMillis().
The test also ignores how Mockito works.
Mockito stores every invocation in its own model in order to be able to verify these calls after executing the scenario. So on every iteration of the loop Mockito stores another invocation up to 2M invocations, which will impact the JVM (maybe the mock instance will hold several generations and promoted to the tenured which is definitely more costly for the GC). That means that the more the iterations the more this code stresses the JVM and not Mockito.
I believe it's not released (there's dev binaries on jcentral however), but Mockito will offer a setting to allow mockito to stub only hence it will not store invocations, which may allow Mockito to fit well in a scenario like this one.
The test lacks proper statistical analysis.
Interestingly enough the code of the test have a pseudo percentile approach. Which is good! Although it doesn't work like that and in this case it cannot work to catch the big issue. Instead it should record every measure in order to extract the tendencies of the evolution of the time mockito spent as the iteration count advances.
And if you want, it's good idea to store every recorded measure, so it would be possible to feed them to a proper statistical analysis tool like R in order extract a graph, percentile data, etc.
On that statistical matter it would certainly be interesting to use the HDRHistogram. Outside a microbenchmark of course as it will impact the memory and alter the result of the microbenchmark. Let's keep that for JMH.
Both point 1 and 2 can be addressed if you change the code to use JMH.
Hope that helps.
A JVM is a very complex thing that does a lot of optimization at runtime (including caching and byte code optimization). Thus, measuring execution time of Java programs, first of all you should do a warmup phase before doing your actual benchmark.
I expect that your first four runs took your longest profilling time and afterwards, the execution time became better and better.
Execute your benchmark a few hundreds or thousands times before you actually start profiling. Afterwards, I expect your measurement results should become more stable.
Related
Sorting-Algorithms get faster (in Java)?!
I have implemented some Sortingalgorithms and a method getNanoTime which gives the NanoTime of this Sorting-Algorithm.
I wanted to calculate an average-value. I recognized that the average-times are different to the time when testing the algorithm one time.
I thought I did something wrong.
But then I found it.
When doing:
int length = 5000;
int bereich = 1000;
long time;
time = Bubblesort.getNanoTime(length, bereich);
System.out.println("BUBBLESORT: " + (1.0 * time / 1_000_000) + " ms");
time = Insertionsort.getNanoTime(length, bereich);
System.out.println("INSERTIONSORT: " + (1.0 * time / 1_000_000) + " ms");
time = Mergesort.getNanoTime(length, bereich);
System.out.println("MERGESORT: " + (1.0 * time / 1_000_000) + " ms");
time = Quicksort.getNanoTime(length, bereich);
System.out.println("QUICKSORT: " + (1.0 * time / 1_000_000) + " ms");
time = Selectionsort.getNanoTime(length, bereich);
System.out.println("SELECTIONSORT: " + (1.0 * time / 1_000_000) + " ms");
I got:
BUBBLESORT: 75.7835 ms
INSERTIONSORT: 27.250875 ms
MERGESORT: 17.450083 ms
QUICKSORT: 7.092709 ms
SELECTIONSORT: 967.638792 ms
But when doing for example:
for (int i = 0; i < 20; i++) {
System.out.println(1.0 * Bubblesort.getNanoTime(5000, 1000) / 1_000_000);
}
I got:
85.473625 ms
62.681959 ms
68.866542 ms
48.737333 ms
47.402708 ms
47.368708 ms
47.567792 ms
47.018042 ms
45.1795 ms
47.871416 ms
49.570208 ms
50.285875 ms
56.37975 ms
50.342917 ms
50.262833 ms
50.036959 ms
50.286542 ms
51.752708 ms
50.342458 ms
51.511541 ms
The first time is always high (here is the first time 85 ms) and the times after the first time are lower.
So, I think, the machine learns, and it becomes faster
Could that be?
Do you know more?
I think, the machine learns, and it becomes faster
Yup.
Look up Just-in-time compilation and while you're at it, spend a few weeks becoming a rocket scientist so that you can fully understand how CPU caches work.
Or, if you don't want to spend the next 10 weeks studying but you do want a much better idea of how any of this works, read the rest of this answer, then go look at this talk by Douglas Hawkins about JVM performance puzzlers. I bet after watching those 40 minutes you will feel fully enlightened about this conundrum.
Two things are going on here (JIT warmup effect, and cachepage effect), possibly more:
The JIT is 'warming up': The way java works, is that it runs your class file code in the dumbest, slowest, stupidest way possible, and is in addition wasting even more time maintaining a boatload of bookkeeping on the code, such as 'how often does this if block enter vs how often it is skipped?' for no good reason. Everything runs slow as molasses. Intentionally so, really.
But then... because of all that bookkeeping the JVM at some point goes: Huh. Literally (and I'm not overstating the case here, this is very common!) 99% of the CPU's time is spent on this 0.1% of the entire code base.
It then takes some time to analyse the daylights out of that 0.1%, creating a very finely tuned machine code version that is exactly perfect for the actual CPU you're running on. This takes a lot of time, will use all that bookkeeping (which is, after all, not so pointless!) to do things such as re-order code so that the most-often taken 'branch' in an if/else block is the one that can run without code jumps (which are slow due to pipeline resets), and will even turn currently-observed-truths-which-may-not-hold-later into assumptions. As in, the code is 'compiled' into machine code that will straight up not work if these assumptions (that so far have been, due to all that bookkeeping, observed to always be true) end up being false, and will then add hooks throughout the entire VM that if any code happens to break the assumption, the finely crafted machine code is flagged as now invalid and will no longer be used. For example, if you have a class that is not final, it can be extended. And java is always dynamic dispatch: If you invoke foo.hello(), you get the hello() implementation of the actual type of the object the foo variable is pointing at, not the type of the foo expression itself. Classloading in java is inherently dynamic (classes can be loaded at any time, a JVM never knows that it is 'done loading classes'. This means a lookup table must be involved. A pricey annoyance! But, hotspot optimizer gets around it and eliminates the table: If the optimizer figures out that a non-final class is nevertheless not currently extended, or all extensions do not overwrite the implementation in question, then it can omit the lookup table and just link a method call straight to the implementation. It also adds hooks to the classloader that if ever any class is loaded that extends the targeted class (and changes the impl of the relevant method), the machine code that directly jumps straight to the impl is invalidated. That method's actual performance takes a nosedive again, as the JVM goes back to the slow-as-molasses way. If it's still being run a lot, no worries. hotspot will do another pass, this time taking into account there are multiple implementations.
Once this machine code is available, all calls to this method are redirected to run using this finely tuned machine code. Which is incredibly fast; usually faster than -O3 compiled C code, in fact, because the JVM gets the benefit of taking runtime behaviour into account which a C compiler can never do.
Nevertheless usually only about 1% of all code in a VM is actually running in this mode. The simple truth is that almost all code of any app is irrelevant performance wise. It doesn't do anything complicated, isn't running during 'hot' moments, and it just. doesn't. matter. The smartypants analysis is left solely for the 1% or so that is actually going to run lots.
And that's what likely explains a large chunk of that difference: A bunch of your sort algorithm's loops are running in dogslow (not-hotspotted) mode, whereas once the hotspotting is done during your first sort run, the next sort runs get the benefit of the hotspotted code from the start.
Secondarily, data needs to be in a cache page for the CPU to really do it quickly. Often repeating a calculation means the first run through gets the penalty of the CPU having to swap in a bunch of cache pages, whereas all future runs don't need to pay that price as the relevant parts of memory are already in the caches.
The conclusion is simple: Micro-benchmarking like this is incredibly complicated, you can't just time it with System.nanoTime, JVMs are incredibly complicated, CPUs are incredibly complicated, even the wintered engineers that spend their days writing the JVM itself are on record as stating they are far too stupid to guess performance like this. So you stand absolutely no chance whatsoever.
The solution is, fortunately, also very very simple. Those same JVM engineers wanna know how fast stuff runs so they wrote an entire framework that lets you micro-benchmark, actively checking for hotspot warmups, doing a bunch of dry runs, ensuring that the optimizer doesn't optimize away your entire algorithm (which can happen, if you sort a list and then toss the list in the garbage, the optimizer might just figure out that the entire sort op is best optimized by skipping it entirely, because, hey, if nobody actually cares about the results of the sort, why sort, right? You need a 'sink' to ensure that the optimizer doesn't conclude it can just junk the whole thing due to the data being discarded!) - and it is called JMH. Rewrite your benchmark in JMH and let it rip. You'll find that it times consistently, and those times are generally meaningful (vs what you wrote, which means mostly nothing).
I have developed an image processing algorithm in core java (without using any third party API), Now I have to calculate execution time of that algorithm, for this i have used System.currentTimeMillis() like that,
public class MyAlgo {
public MyAlgo(String imagePath){
long stTime = System.currentTimeMillis();
// ..........................
// My Algorithm
// ..........................
long endTime = System.currentTimeMillis();
System.out.println("Time ==> " + (endTime - stTime));
}
public static void main(String args[]){
new MyAlgo("d:\\myImage.bmp");
}
}
But the problem is that each time I am running this program I am getting different execution time. Can anyone please suggest me that how can I do this?
If you don't want to use external profiling libraries just wrap your algorithm in a for() loop that executes it 1000 times and divide the total time by 1000. The result will be much more accurate since all the other tasks/processes will even out.
Note: The overall measure time will reflect the expected time of the algorithm to finish and not the total time that algorithms code instruction require.
For example if your algorithm uses a lot of memory and on average java VM calls garbage collector twice per each execution of algorithm - than you should take into account also the time of the garbage collector.
That is exactly what a for() loop does, so you will get good results.
You cannot get a reliable result from one execution alone; Java (well, JVMs) does runtime optimizations, plus there are other processes competing for CPU time/resource access. Also, are you sure your algorithm runs in constant time whatever the inputs?
Your best bet to have a calculation as reliable as possible is to use a library dedicated to performance measurements; one of them is caliper.
Set up a benchmark with different inputs/outputs etc and run it.
You need to apply some statistical analysis over multiple executions of your algorithm. For instance, execute it 1000 times and analyze min, max and average time.
Multiple executions in different scenarios might provide insights too, for instance, in different hardware or with images with different resolution.
I suppose your algorithm can be divided in multiple steps. You can monitor the steps independently to understand the impact of each one.
Marvin Image Processing Framework, for instance, provides methods to monitor and analyze the time and the number of executions of each algorithm step.
I am trying to measure the complexity of an algorithm using a timer to measure the execution time, whilst changing the size of the input array.
The code I have at the moment is rather simple:
public void start() {
start = System.nanoTime();
}
public long stop() {
long time = System.nanoTime() - start;
start = 0;
return time;
}
It appears to work fine, up until the size of the array becomes very large, and what I expect to be an O(n) complexity algorithm turns out appearing to be O(n^2). I believe that this is due to the threading on the CPU, with other processes cutting in for more time during the runs with larger values for n.
Basically, I want to measure how much time my process has been running for, rather than how long it has been since I invoked the algorithm. Is there an easy way to do this in Java?
Measuring execution time is a really interesting, but also complicated topic. To do it right in Java, you have to know a little bit about how the JVM works. Here is a good article from developerWorks about benchmarking and measuring. Read it, it will help you a lot.
The author also provides a small framework for doing benchmarks. You can use this framework. It will give you exaclty what you needs - the CPU consuming time, instead of just two time stamps from before and after. The framework will also handle the JVM warm-up and will keep track of just-in-time-compilings.
You can also use a performance monitor like this one for Eclipse. The problem by such a performance monitor is, that it doesn't perform a benchmark. It just tracks the time, memory and such things, that your application currently uses. But that's not a real measurement - it's just a snapshot at a specific time.
Benchmarking in Java is a hard problem, not least because the JIT can have weird effects as your method gets more and more heavily optimized. Consider using a purpose-built tool like Caliper. Examples of how to use it and to measure performance on different input sizes are here.
If you want the actual CPU time of the current thread (or indeed, any arbitrary thread) rather than the wall clock time then you can get this via ThreadMXBean. Basically, do this at the start:
ThreadMXBean thx = ManagementFactory.getThreadMXBean();
thx.setThreadCpuTimeEnabled(true);
Then, whenever you want to get the elapsed CPU time for the current thread:
long cpuTime = thx.getCurrentThreadCpuTime();
You'll see that ThreadMXBean has calls to get CPU time and other info for arbitrary threads too.
Other comments about the complexities of timing also apply. The timing of the individual invocation of a piece of code can depend among other things on the state of the CPU and on what the JIT compiler decides to do at that particular moment. The overall scalability behaviour of an algorithm is generally a trend that emerges across a number of invocations and you will always need to be prepared for some "outliers" in your timings.
Also, remember that just because a particular timing is expressed in nanoseconds (or indeed milliseconds) does not mean that the timing actually has that granularity.
First of all I have to admit that these are very basic and primitive questions... I want to demonstrate different Algorithms in Java for sorting and searching, and to get a value for the runtime. There're issues I cannot solve:
there's Hotspot compiling - which is a runtime optimization I need to deactivate (I guess).
How do I get time-values (seconds) for runtimes? Starting a timer before the execution and stopping it afterwards... seems a little primitive. And the timer-object itself consumes runtime... I need to avoid that.
Is there anything in the Java API one could utilize to solve these problems?
Thanks,
Klaus
You can disable HotSpot with -Xint on the command line, to get an order of magnitude decrease in performance. However, why don't you want to measure real world performance? Different things can become bottlenecks when you compile.
Generally for microbenchmarks:
use System.nanoTime to get a time measurement at the start and end
run for a reasonable length of time
do the measurement a number of times over (there's some "warm up")
don't interleave measurements of different algorithms
don't do any I/O in the measured segment
use the result (HotSpot can completely optimise away trivial operations)
do it in a real world situation (or a cloae as possible)
remember dual core is the norm, and more cores will become normal
Use -Xint JVM flag. Other options can be seen here.
Use the ThreadMXBean API to get CPU/User times for your thread. An example can be seen here.
Using System.nanoTime() twice consumes less than 1 micro-second. I suggest you run any benchmark for a number of second and take an average so a micro-second error won't be significant.
Overall, I would suggest not making things more complicated than you need it to be.
To have a built in warm up I often ignore the first 10%-20% of iterations. Something like
long start;
int count;
for(int i = -count / 5; i < count; i++) {
if (count == 0) start = System.nanoTime();
// do tested code
}
long time = System.nanoTime() - start;
long average = time / count;
System.out.printf("Average time was %,d micro-seconds%n", average / 1000);
This question already has answers here:
Is stopwatch benchmarking acceptable?
(13 answers)
Closed 7 years ago.
I want to do some timing tests on a Java application. This is what I am currently doing:
long startTime = System.currentTimeMillis();
doSomething();
long finishTime = System.currentTimeMillis();
System.out.println("That took: " + (finishTime - startTime) + " ms");
Is there anything "wrong" with performance testing like this? What is a better way?
Duplicate: Is stopwatch benchmarking acceptable?
The one flaw in that approach is that the "real" time doSomething() takes to execute can vary wildly depending on what other programs are running on the system and what its load is. This makes the performance measurement somewhat imprecise.
One more accurate way of tracking the time it takes to execute code, assuming the code is single-threaded, is to look at the CPU time consumed by the thread during the call. You can do this with the JMX classes; in particular, with ThreadMXBean. You can retrieve an instance of ThreadMXBean from java.lang.management.ManagementFactory, and, if your platform supports it (most do), use the getCurrentThreadCpuTime method in place of System.currentTimeMillis to do a similar test. Bear in mind that getCurrentThreadCpuTime reports time in nanoseconds, not milliseconds.
Here's a sample (Scala) method that could be used to perform a measurement:
def measureCpuTime(f: => Unit): java.time.Duration = {
import java.lang.management.ManagementFactory.getThreadMXBean
if (!getThreadMXBean.isThreadCpuTimeSupported)
throw new UnsupportedOperationException(
"JVM does not support measuring thread CPU-time")
var finalCpuTime: Option[Long] = None
val thread = new Thread {
override def run(): Unit = {
f
finalCpuTime = Some(getThreadMXBean.getThreadCpuTime(
Thread.currentThread.getId))
}
}
thread.start()
while (finalCpuTime.isEmpty && thread.isAlive) {
Thread.sleep(100)
}
java.time.Duration.ofNanos(finalCpuTime.getOrElse {
throw new Exception("Operation never returned, and the thread is dead " +
"(perhaps an unhandled exception occurred)")
})
}
(Feel free to translate the above to Java!)
This strategy isn't perfect, but it's less subject to variations in system load.
The code shown in the question is not a good performance measuring code:
The compiler might choose to optimize your code by reordering statements. Yes, it can do that. That means your entire test might fail. It can even choose to inline the method under test and reorder the measuring statements into the now-inlined code.
The hotspot might choose to reorder your statements, inline code, cache results, delay execution...
Even assuming the compiler/hotspot didn't trick you, what you measure is "wall time". What you should be measuring is CPU time (unless you use OS resources and want to include these as well or you measure lock contestation in a multi-threaded environment).
The solution? Use a real profiler. There are plenty around, both free profilers and demos / time-locked trials of commercials strength ones.
Using a Java Profiler is the best option and it will give you all the insight that you need into the code. viz Response Times, Thread CallTraces, Memory Utilisations, etc
I will suggest you JENSOR, an open source Java Profiler, for its ease-of-use and no overheads on CPU. You can download it, instrument the code and will get all the info you need about your code.
You can download it from: http://jensor.sourceforge.net/
Keep in mind that the resolution of System.currentTimeMillis() varies between different operating systems. I believe Windows is around 15 msec. So if your doSomething() runs faster than the time resolution, you'll get a delta of 0. You could run doSomething() in a loop multiple times, but then the JVM may optimize it.
Have you looked at the profiling tools in netbeans and eclipse. These tools give you a better handle on what is REALLY taking up all the time in your code. I have found problems that I did not realize by using these tools.
Well that is just one part of performance testing. Depending on the thing you are testing you may have to look at heap size, thread count, network traffic or a whole host of other things. Otherwise I use that technique for simple things that I just want to see how long they take to run.
That's good when you are comparing one implementation to another or trying to find a slow part in your code (although it can be tedious). It's a really good technique to know and you'll probably use it more than any other, but be familiar with a profiling tool as well.
I'd imagine you'd want to doSomething() before you start timing too, so that the code is JITted and "warmed up".
Japex may be useful to you, either as a way to quickly create benchmarks, or as a way to study benchmarking issues in Java through the source code.