Why does Method access seem faster than Field access? - java

I was doing some tests to find out what the speed differences are between using getters/setters and direct field access. I wrote a simple benchmark application like this:
public class FieldTest {
private int value = 0;
public void setValue(int value) {
this.value = value;
}
public int getValue() {
return this.value;
}
public static void doTest(int num) {
FieldTest f = new FieldTest();
// test direct field access
long start1 = System.nanoTime();
for (int i = 0; i < num; i++) {
f.value = f.value + 1;
}
f.value = 0;
long diff1 = System.nanoTime() - start1;
// test method field access
long start2 = System.nanoTime();
for (int i = 0; i < num; i++) {
f.setValue(f.getValue() + 1);
}
f.setValue(0);
long diff2 = System.nanoTime() - start2;
// print results
System.out.printf("Field Access: %d ns\n", diff1);
System.out.printf("Method Access: %d ns\n", diff2);
System.out.println();
}
public static void main(String[] args) throws InterruptedException {
int num = 2147483647;
// wait for the VM to warm up
Thread.sleep(1000);
for (int i = 0; i < 10; i++) {
doTest(num);
}
}
}
Whenever I run it, I get consistent results such as these: http://pastebin.com/hcAtjVCL
I was wondering if someone could explain to me why field access seems to be slower than getter/setter method access, and also why the last 8 iterations execute incredibly fast.
Edit: Having taken into account assylias and Stephen C comments, I have changed the code to http://pastebin.com/Vzb8hGdc where I got slightly different results: http://pastebin.com/wxiDdRix .

The explanation is that your benchmark is broken.
The first iteration is done using the interpreter.
Field Access: 1528500478 ns
Method Access: 1521365905 ns
The second iteration is done by the interpreter to start with and then we flip to running JIT compiled code.
Field Access: 1550385619 ns
Method Access: 47761359 ns
The remaining iterations are all done using JIT compiled code.
Field Access: 68 ns
Method Access: 33 ns
etcetera
The reason they are unbelievably fast is that the JIT compiler has optimized the loops away. It has detected that they were not contributing anything useful to the computation. (It is not clear why the first number seems consistently faster than the second, but I doubt that the optimized code is measuring field versus method access in any meaningful way.)
Re the UPDATED code / results: it is obvious that the JIT compiler is still optimizing the loops away.

Related

During Performance Test (Jmeter) of Spring Boot application the tested Method has some extremely short runtimes

I am currently working on my Bachelors Thesis on the advantages/disadvantages of GraalVM Native Image compared to a Jar running on the JVM.
During one of the Tests i am calling a certain Method which allocates and populates an Array of size 10^6. After that the function loops the array and performs arithmetic operations (It is a variant of the ackley function). The usual runtime of this method was between 3 and 4 seconds, but sometimes the method would complete after just 50 ms (when running as either the Native Image or as the jar file running on the JVM).
Since the array is populated by using the Math.Random() function I dont think it is due to caching and the Native Image rules out JIT compilation as the source for these outliers.
The endpoint looks like this, where dtno is the Data Transfer Object containing the "range" variable:
#PostMapping(path="/ackley")
public static #ResponseBody long calculateackley (#RequestBody dtno d) {
long start = System.nanoTime();
double res = ackley(d.range);
long end = System.nanoTime();
System.out.println("Ackley funtion took: "+res);
return end - start;
}
The ackley function looks like this:
public static long ackley(int range){
long start = System.nanoTime();
if(range!=0){
double[] a = new double[range];
int counter = 0;
for(int i=-range/2;i<range/2;i++){
a[counter++] = Math.random()*range*i;
}
double sum1 = 0.0;
double sum2 = 0.0;
for (int i = 0 ; i < a.length ; i ++) {
sum1 += a[i]*a[i];
sum2 += (Math.cos(2*Math.PI*a[i]));
}
double result = -20.0*Math.exp(-0.2*Math.sqrt(sum1 / ((double )a.length))) + 20
- Math.exp(sum2 /((double)a.length)) + Math.exp(1.0);
}
long end = System.nanoTime();
return end - start;
}
As already mentioned the range variable in the test was 10^6. What I was also suspecting is that, since the result and sum are never actually used to calculate the return value, the programm decides to skip everything between for loop and the decleration of "end".
In the graph from the JMeter test you can see that these fast execution times where all during ramp up and the very end of the testrun.
Test Results Performance Test
In the summary Report you can see the huge deviation from the average runtime.
Performance Test Summary Report
If anyone could give me a hint or a good source, where I could find a hint as to what is going on, I would be very thankfull.

Time how long a function runs (short duration)

I'm relatively new to Java programming, and I'm running into an issue calculating the amount of time it takes for a function to run.
First some background - I've got a lot of experience with Python, and I'm trying to recreate the functionality of the Jupyter Notebook/Lab %%timeit function, if you're familiar with that. Here's a pic of it in action (sorry, not enough karma to embed yet):
Snip of Jupyter %%timeit
What it does is run the contents of the cell (in this case a recursive function) either 1k, 10k, or 100k times, and give you the average run time of the function, and the standard deviation.
My first implementation (using the same recursive function) used System.nanoTime():
public static void main(String[] args) {
long t1, t2, diff;
long[] times = new long[1000];
int t;
for (int i=0; i< 1000; i++) {
t1 = System.nanoTime();
t = triangle(20);
t2 = System.nanoTime();
diff = t2-t1;
System.out.println(diff);
times[i] = diff;
}
long total = 0;
for (int j=0; j<times.length; j++) {
total += times[j];
}
System.out.println("Mean = " + total/1000.0);
}
But the mean is wildly thrown off -- for some reason, the first iteration of the function (on many runs) takes upwards of a million nanoseconds:
Pic of initial terminal output
Every iteration after the first dozen or so takes either 395 nanos or 0 -- so there could be a problem there too... not sure what's going on!
Also -- the code of the recursive function I'm timing:
static int triangle(int n) {
if (n == 1) {
return n;
} else {
return n + triangle(n -1);
}
}
Initially I had the line n = Math.abs(n) on the first line of the function, but then I removed it because... meh. I'm the only one using this.
I tried a number of different suggestions brought up in this SO post, but they each have their own problems... which I can go into if you need.
Anyway, thank you in advance for your help and expertise!

Vector taking less time to get populated than ArrayList

I was going through the following article:
Understanding Collections and Thread Safety in Java
The article says:
You know, Vector and Hashtable are the two collections exist early in Java history, and they are designed for thread-safe from the start (if you have chance to look at their source code, you will see their methods are all synchronized!). However, they quickly expose poor performance in multi-threaded programs. As you may know, synchronization requires locks which always take time to monitor, and that reduces the performance.
[I've also done a benchmark using Caliper; please hear me out on this]
A sample code has also been provided:
public class CollectionsThreadSafeTest {
public void testVector() {
long startTime = System.currentTimeMillis();
Vector<Integer> vector = new Vector<>();
for (int i = 0; i < 10_000_000; i++) {
vector.addElement(i);
}
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println("Test Vector: " + totalTime + " ms");
}
public void testArrayList() {
long startTime = System.currentTimeMillis();
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 10_000_000; i++) {
list.add(i);
}
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println("Test ArrayList: " + totalTime + " ms");
}
public static void main(String[] args) {
CollectionsThreadSafeTest tester = new CollectionsThreadSafeTest();
tester.testVector();
tester.testArrayList();
}
}
The output they have provided for the above code is as follows:
Test Vector: 9266 ms
Test ArrayList: 4588 ms
But when I ran it in my machine, it gave me the following result:
Test Vector: 521 ms
Test ArrayList: 2273 ms
I found this to be quite odd. I thought doing a micro benchmark would be better. So, I wrote a benchmark for the above using caliper. The code is as follows:
public class CollectionsThreadSafeTest extends SimpleBenchmark {
public static final int ELEMENTS = 10_000_000;
public void timeVector(int reps) {
for (int i = 0; i < reps; i++) {
Vector<Integer> vector = new Vector<>();
for (int k = 0; k < ELEMENTS; k++) {
vector.addElement(k);
}
}
}
public void timeArrayList(int reps) {
for (int i = 0; i < reps; i++) {
List<Integer> list = new ArrayList<>();
for (int k = 0; k < ELEMENTS; k++) {
list.add(k);
}
}
}
public static void main(String[] args) {
String[] classesToTest = { CollectionsThreadSafeTest.class.getName() };
Runner.main(classesToTest);
}
}
But I got a similar result:
0% Scenario{vm=java, trial=0, benchmark=ArrayList} 111684174.60 ns; ?=18060504.25 ns # 10 trials
50% Scenario{vm=java, trial=0, benchmark=Vector} 67701359.18 ns; ?=17924728.23 ns # 10 trials
benchmark ms linear runtime
ArrayList 111.7 ==============================
Vector 67.7 ==================
vm: java
trial: 0
I'm kinda confused. What is happening here? Am I doing something wrong here (that would be really embarrassing) ?
If this is the expected behavior, then what is the explanation behind this?
Update #1
After reading #Kayaman's answer, I ran the caliper tests by changing the values of the initial capacities of both the Vector and the ArrayList. Following are the timings (in ms):
Initial Capacity Vector ArrayList
-------------------------------------
10_000_000 49.2 67.1
10_000_001 48.9 71.2
10_000_010 48.1 61.2
10_000_100 43.9 70.1
10_001_000 45.6 70.6
10_010_000 44.8 68.0
10_100_000 52.8 64.6
11_000_000 52.7 71.7
20_000_000 74.0 51.8
-------------------------------------
Thanks for all the inputs :)
You're not really testing the add() method here. You're testing the different ways that a Vector and an ArrayList grow. A Vector doubles in size when it's full, but an ArrayList has some more refined logic to prevent the internal array from growing exponentially and wasting memory.
If you run your test with a > 10000000 initial capacity for both classes, they won't need to resize and you'll be profiling just the adding part.
The vector is expected to be slower in multithreaded environment. It is expected to be lightweight in your case. Better do the tests with adding these items from 10000 different threads
Both ArrayList and Vector have the same add method:
ensureCapacity();
elementData[elementCount++] = newElement;
The difference is only one. Vector's add method is synchronized and ArrayList's is not. From theory synchronized methods are slower than non-synchronized.
To improve performance of add method you have to specify initialCapacity in constructor or call method ensureCapacity. This creates internal array as long as you need and so no need to recreate it.

Why this method does not get optimized away?

This Java method gets used in benchmarks for simulating slow computation:
static int slowItDown() {
int result = 0;
for (int i = 1; i <= 1000; i++) {
result += i;
}
return result;
}
This is IMHO a very bad idea, as its body can get replaced by return 500500. This seems to never happen1; probably because of such an optimization being irrelevant for real code as Jon Skeet stated.
Interestingly, a slightly simpler method with result += 1; gets fully optimized away (caliper reports 0.460543 ns).
But even when we agree that optimizing away methods returning a constant result is useless for real code, there's still loop unrolling, which could lead to something like
static int slowItDown() {
int result = 0;
for (int i = 1; i <= 1000; i += 2) {
result += 2 * i + 1;
}
return result;
}
So my question remains: Why is no optimization performed here?
1Contrary to what I wrote originally; I must have seen something what wasn't there.
Well, the JVM does optimize away such code. The question is how many times it has to be detected as a real hotspot (benchmarks do some more than this single method, usually) before it will be analyzed this way. In my setup it required 16830 invocations before the execution time went to (almost) zero.
It’s correct that such a code does not appear in real code. However it might remain after several inlining operations of other hotspots dealing with values not being compiling-time constants but runtime constants or de-facto constants (values that could change in theory but don’t practically). When such a piece of code remains it’s a great benefit to optimize it away entirely but that is not expected to happen soon, i.e. when calling right from the main method.
Update: I simplified the code and the optimization came even earlier.
public static void main(String[] args) {
final int inner=10;
final float innerFrac=1f/inner;
int count=0;
for(int j=0; j<Integer.MAX_VALUE; j++) {
long t0=System.nanoTime();
for(int i=0; i<inner; i++) slowItDown();
long t1=System.nanoTime();
count+=inner;
final float dt = (t1-t0)*innerFrac;
System.out.printf("execution time: %.0f ns%n", dt);
if(dt<10) break;
}
System.out.println("after "+count+" invocations");
System.out.println(System.getProperty("java.version"));
System.out.println(System.getProperty("java.vm.version"));
}
static int slowItDown() {
int result = 0;
for (int i = 1; i <= 1000; i++) {
result += i;
}
return result;
}
…
execution time: 0 ns
after 15300 invocations
1.7.0_13
23.7-b01
(64Bit Server VM)

Java iterative vs recursive

Can anyone explain why the following recursive method is faster than the iterative one (Both are doing it string concatenation) ? Isn't the iterative approach suppose to beat up the recursive one ? plus each recursive call adds a new layer on top of the stack which can be very space inefficient.
private static void string_concat(StringBuilder sb, int count){
if(count >= 9999) return;
string_concat(sb.append(count), count+1);
}
public static void main(String [] arg){
long s = System.currentTimeMillis();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 9999; i++){
sb.append(i);
}
System.out.println(System.currentTimeMillis()-s);
s = System.currentTimeMillis();
string_concat(new StringBuilder(),0);
System.out.println(System.currentTimeMillis()-s);
}
I ran the program multiple time, and the recursive one always ends up 3-4 times faster than the iterative one. What could be the main reason there that is causing the iterative one slower ?
See my comments.
Make sure you learn how to properly microbenchmark. You should be timing many iterations of both and averaging these for your times. Aside from that, you should make sure the VM isn't giving the second an unfair advantage by not compiling the first.
In fact, the default HotSpot compilation threshold (configurable via -XX:CompileThreshold) is 10,000 invokes, which might explain the results you see here. HotSpot doesn't really do any tail optimizations so it's quite strange that the recursive solution is faster. It's quite plausible that StringBuilder.append is compiled to native code primarily for the recursive solution.
I decided to rewrite the benchmark and see the results for myself.
public final class AppendMicrobenchmark {
static void recursive(final StringBuilder builder, final int n) {
if (n > 0) {
recursive(builder.append(n), n - 1);
}
}
static void iterative(final StringBuilder builder) {
for (int i = 10000; i >= 0; --i) {
builder.append(i);
}
}
public static void main(final String[] argv) {
/* warm-up */
for (int i = 200000; i >= 0; --i) {
new StringBuilder().append(i);
}
/* recursive benchmark */
long start = System.nanoTime();
for (int i = 1000; i >= 0; --i) {
recursive(new StringBuilder(), 10000);
}
System.out.printf("recursive: %.2fus\n", (System.nanoTime() - start) / 1000000D);
/* iterative benchmark */
start = System.nanoTime();
for (int i = 1000; i >= 0; --i) {
iterative(new StringBuilder());
}
System.out.printf("iterative: %.2fus\n", (System.nanoTime() - start) / 1000000D);
}
}
Here are my results...
C:\dev\scrap>java AppendMicrobenchmark
recursive: 405.41us
iterative: 313.20us
C:\dev\scrap>java -server AppendMicrobenchmark
recursive: 397.43us
iterative: 312.14us
These are times for each approach averaged over 1000 trials.
Essentially, the problems with your benchmark are that it doesn't average over many trials (law of large numbers), and that it is highly dependent on the ordering of the individual benchmarks. The original result I was given for yours:
C:\dev\scrap>java StringBuilderBenchmark
80
41
This made very little sense to me. Recursion on the HotSpot VM is more than likely not going to be as fast as iteration because as of yet it does not implement any sort of tail optimization that you might find used for functional languages.
Now, the funny thing that happens here is that the default HotSpot JIT compilation threshold is 10,000 invokes. Your iterative benchmark will more than likely be executing for the most part before append is compiled. On the other hand, your recursive approach should be comparatively fast since it will more than likely enjoy append after it is compiled. To eliminate this from influencing the results, I passed -XX:CompileThreshold=0 and found...
C:\dev\scrap>java -XX:CompileThreshold=0 StringBuilderBenchmark
8
8
So, when it comes down to it, they're both roughly equal in speed. Note however that the iterative appears to be a bit faster if you average with higher precision. Order might still make a difference in my benchmark, too, as the latter benchmark will have the advantage of the VM having collected more statistics for its dynamic optimizations.

Categories

Resources