JMH - why JIT does not eliminate my dead-code

JMH - why JIT does not eliminate my dead-code - java

I wrote two benchmarks to demonstrate that JIT can be a problem with writing fine benchmark (Please skip that I doesnt use #State here):
#Fork(value = 1)
#Warmup(iterations = 2, time = 10)
#Measurement(iterations = 3, time = 2)
#BenchmarkMode(Mode.AverageTime)
public class DeadCodeTraps {
#Benchmark
#OutputTimeUnit(TimeUnit.MICROSECONDS)
public static void summaryStatistics_standardDeviationForFourNumbers() {
final SummaryStatistics summaryStatistics = new SummaryStatistics();
summaryStatistics.addValue(10.0);
summaryStatistics.addValue(20.0);
summaryStatistics.addValue(30.0);
summaryStatistics.addValue(40.0);
summaryStatistics.getStandardDeviation();
}
#Benchmark
#OutputTimeUnit(TimeUnit.MICROSECONDS)
public static void summaryStatistics_standardDeviationForTenNumbers() {
final SummaryStatistics summaryStatistics = new SummaryStatistics();
summaryStatistics.addValue(10.0);
summaryStatistics.addValue(20.0);
summaryStatistics.addValue(30.0);
summaryStatistics.addValue(40.0);
summaryStatistics.addValue(50.0);
summaryStatistics.addValue(60.0);
summaryStatistics.addValue(70.0);
summaryStatistics.addValue(80.0);
summaryStatistics.addValue(90.0);
summaryStatistics.addValue(100.0);
summaryStatistics.getStandardDeviation();
}
}
I thought that JIT will eliminate dead code, so two methods will be executed at the same time. But in the end, I have:
summaryStatistics_standardDeviationForFourNumbers 0.158 ± 0.046
DeadCodeTraps.summaryStatistics_standardDeviationForTenNumbers 0.359 ± 0.294
Why JIT does not optimize it? The result of summaryStatistics.getStandardDeviation(); is not used anywhere outside the method and it is not returned by it.
(I am using OpenJDK build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4)

If you're talking about the Apache Commons Math SummaryStatistics class, then it's a massive class. Its construction will most certainly not be inlined. To see why, run with -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:-BackgroundCompilation
Dead code elimination happens after inlining.
Unused objects will back-propagate, but the non-inlined constructor will break the chain because the JIT optimizer can no longer be sure there are no side effects.
In other words, the code you expect to be eliminated is too big.

Related

Why does running multiple lambdas in loops suddenly slow down?

Consider the following code:
public class Playground {
private static final int MAX = 100_000_000;
public static void main(String... args) {
execute(() -> {});
execute(() -> {});
execute(() -> {});
execute(() -> {});
}
public static void execute(Runnable task) {
Stopwatch stopwatch = Stopwatch.createStarted();
for (int i = 0; i < MAX; i++) {
task.run();
}
System.out.println(stopwatch);
}
}
This currently prints the following on my Intel MBP on Temurin 17:
3.675 ms
1.948 ms
216.9 ms
243.3 ms
Notice the 100* slowdown for the third (and any subsequent) execution. Now, obviously, this is NOT how to write benchmarks in Java. The loop code doesn't do anything, so I'd expect it to be eliminated for all and any repetitions. Also I could not repeat this effect using JMH which tells me the reason is tricky and fragile.
So, why does this happen? Why would there suddenly be such a catastrophic slowdown, what's going on under the hood? An assumption is that C2 bails on us, but which limitation are we bumping into?
Things that don't change the behavior:
using anonymous inner classes instead of lambdas,
using 3+ different nested classes instead of lambdas.
Things that "fix" the behavior. Actually the third invocation and all subsequent appear to be much faster, hinting that compilation correctly eliminated the loops completely:
using 1-2 nested classes instead of lambdas,
using 1-2 lambda instances instead of 4 different ones,
not calling task.run() lambdas inside the loop,
inlining the execute() method, still maintaining 4 different lambdas.

You can actually replicate this with JMH SingleShot mode:
#BenchmarkMode(Mode.SingleShotTime)
#Warmup(iterations = 0)
#Measurement(iterations = 1)
#Fork(1)
public class Lambdas {
#Benchmark
public static void doOne() {
execute(() -> {});
}
#Benchmark
public static void doFour() {
execute(() -> {});
execute(() -> {});
execute(() -> {});
execute(() -> {});
}
public static void execute(Runnable task) {
for (int i = 0; i < 100_000_000; i++) {
task.run();
}
}
}
Benchmark Mode Cnt Score Error Units
Lambdas.doFour ss 0.446 s/op
Lambdas.doOne ss 0.006 s/op
If you look at -prof perfasm profile for doFour test, you would get a fat clue:
....[Hottest Methods (after inlining)]..............................................................
32.19% c2, level 4 org.openjdk.Lambdas$$Lambda$44.0x0000000800c258b8::run, version 664
26.16% c2, level 4 org.openjdk.Lambdas$$Lambda$43.0x0000000800c25698::run, version 658
There are at least two hot lambdas, and those are represented by different classes. So what you are seeing is likely monomorphic (one target), then bimorphic (two targets), then polymorphic virtual call at task.run.
Virtual call has to choose which class to call the implementation from. The more classes you have, the worse it gets for optimizer. JVM tries to adapt, but it gets worse and worse as the run progresses. Roughly like this:
execute(() -> {}); // compiles with single target, fast
execute(() -> {}); // recompiles with two targets, a bit slower
execute(() -> {}); // recompiles with three targets, slow
execute(() -> {}); // continues to be slow
Now, the elimination of the loop requires seeing through the task.run(). In monomorphic and bimorphic cases it is easy: one or both targets are inlined, their empty body is discovered, done. In both cases, you would have to do typechecks, which means it is not completely free, with bimorphic costing a bit extra. When you experience a polymorphic call, there is no such luck at all: it is opaque call.
You can add two more benchmarks in the mix to see it:
#Benchmark
public static void doFour_Same() {
Runnable l = () -> {};
execute(l);
execute(l);
execute(l);
execute(l);
}
#Benchmark
public static void doFour_Pair() {
Runnable l1 = () -> {};
Runnable l2 = () -> {};
execute(l1);
execute(l1);
execute(l2);
execute(l2);
}
Which would then yield:
Benchmark Mode Cnt Score Error Units
Lambdas.doFour ss 0.445 s/op ; polymorphic
Lambdas.doFour_Pair ss 0.016 s/op ; bimorphic
Lambdas.doFour_Same ss 0.008 s/op ; monomorphic
Lambdas.doOne ss 0.006 s/op
This also explains why your "fixes" help:
using 1-2 nested classes instead of lambdas,
Bimorphic inlining.
using 1-2 lambda instances instead of 4 different ones,
Bimorphic inlining.
not calling task.run() lambdas inside the loop,
Avoids polymorphic (opaque) call in the loop, allows loop elimination.
inlining the execute() method, still maintaining 4 different lambdas.
Avoids a single call site that experiences multiple call targets. In other words, turns a single polymorphic call site into a series of monomorphic call sites each with its own target.

MethodHandles.lookup().lookupClass() vs getClass()

Can anyone tell me the (subtle) differences of
version 1:
protected final Logger log = Logger.getLogger(getClass());
vs
version 2:
protected final Logger log = Logger.getLogger(MethodHandles.lookup().lookupClass());
Is version 2 in general faster than version 1?
I guess version 1 uses reflection (on runtime) to determine the current class while version 2 does not need to use reflection, or (is the check done on build time)?

There is no reflection involved in you first case. Object#getClass() is mapped to JVM's native method.
Your second case is not drop-in replacement for Object#getClass(), it is used to lookup method handles.
So subtle difference is, they are used for completely different purposes.

These are entirely different things. The documentation of lookupClass, specifically says:
Tells which class is performing the lookup. It is this class against which checks are performed for visibility and access permissions
So it's the class which performs the lookup. It's not necessarily the class where you call MethodHandles.lookup(). What I mean by that is that this:
Class<?> c = MethodHandles.privateLookupIn(String.class, MethodHandles.lookup()).lookupClass();
System.out.println(c);
will print String.class and not the class where you define this code.
The only "advantage" (besides confusing every reader of this code), is that if you copy/paste that log creation line across various source files, it will use the proper class, if you, by accident, don't edit it (which probably happens).
Also notice that:
protected final Logger log = Logger.getLogger(getClass());
should be a static field, usually, and you can't call getClass if it is.
A JMH test shows that there is no performance gain to obfuscate your code that much:
#State(Scope.Benchmark)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
#Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
public class LookupTest {
private static final MethodHandles.Lookup LOOKUP = MethodHandles.lookup();
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(LookupTest.class.getSimpleName())
.verbosity(VerboseMode.EXTRA)
.build();
new Runner(opt).run();
}
#Benchmark
#Fork(3)
public Class<?> getClassCall() {
return getClass();
}
#Benchmark
#Fork(3)
public Class<?> methodHandlesInPlaceCall() {
return MethodHandles.lookup().lookupClass();
}
#Benchmark
#Fork(3)
public Class<?> methodHandlesCall() {
return LOOKUP.lookupClass();
}
}
results:
Benchmark Mode Cnt Score Error Units
LookupTest.getClassCall avgt 15 2.264 ± 0.044 ns/op
LookupTest.methodHandlesCall avgt 15 2.262 ± 0.030 ns/op
LookupTest.methodHandlesInPlaceCall avgt 15 4.890 ± 0.783 ns/op

Why does .toString() seem to fix an OutOfMemoryError exception for StringBuilder?

I am learning how to microbenchmark things with JMH. I started with something seemingly simple: string concatenation for StringBuilder vs String +=.
From my understanding, I should make a State object that contains an instance of StringBuilder because I don't want to benchmark its constructor (nor do I want to an empty one every iteration anyway). Same goes for the String += test - I want a String object in my State to be concatenated with new strings.
This is my code:
#State(Scope.Thread)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
public class Test {
#State(Scope.Thread)
public static class BenchmarkState {
public StringBuilder builder;
public String regularString;
#Setup(Level.Iteration)
public void setup() {
builder = new StringBuilder();
regularString = "";
}
}
#Benchmark
public String stringTest(BenchmarkState state) {
state.regularString += "hello";
return state.regularString;
}
#Benchmark
public String stringBuilderTest(BenchmarkState state) {
state.builder.append("hello");
return state.builder.toString();
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(Test.class.getSimpleName())
.forks(1)
.timeUnit(TimeUnit.MILLISECONDS)
.mode(Mode.Throughput)
.measurementTime(TimeValue.seconds(10))
.build();
new Runner(opt).run();
}
}
It works, but I was thinking - I don't want to call .toString() at the end of every iteration. I am testing concatenation only. So I decided to remove it by just returning null instead.
But then, this happens during the first warmup iteration:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
I understand that I would run out of memory pretty quickly if JMH is appending to StringBuilder as fast as it can, so I'm not surprised by the OutOfMemoryError issue. But I don't understand why does builder.toString() fix it.
So my questions are:
Why does builder.toString() avoid an OutOfMemoryError issue? Doesn't StringBuilder still keep all the characters in memory regardless?
Assuming that I do NOT want neither StringBuilder's constructor nor its .toString() method to be part of the benchmark, how do I properly write this test?

Calling toString() takes time, and generates garbage, requiring GC runs, further slowing down the code.
Since testing has a time limit, those slowdowns likely cause test to stop before it consumes all memory. If you increase the time limit, the code will likely fail with OOM even with the toString, it will just take a LOT longer.

How to run JMH from inside JUnit tests?

How can I run JMH benchmarks inside my existing project using JUnit tests? The official documentation recommends making a separate project, using Maven shade plugin, and launching JMH inside the main method. Is this necessary and why is it recommended?

I've been running JMH inside my existing Maven project using JUnit with no apparent ill effects. I cannot answer why the authors recommend doing things differently. I have not observed a difference in results. JMH launches a separate JVM to run benchmarks to isolate them. Here is what I do:
Add the JMH dependencies to your POM:
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.21</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.21</version>
<scope>test</scope>
</dependency>
Note that I've placed them in scope test.
In Eclipse, you may need to configure the annotation processor manually. NetBeans handles this automatically.
Create your JUnit and JMH class. I've chosen to combine both into a single class, but that is up to you. Notice that OptionsBuilder.include is what actually determines which benchmarks will be run from your JUnit test!
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import org.junit.Test;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.*;
public class TestBenchmark
{
#Test public void
launchBenchmark() throws Exception {
Options opt = new OptionsBuilder()
// Specify which benchmarks to run.
// You can be more specific if you'd like to run only one benchmark per test.
.include(this.getClass().getName() + ".*")
// Set the following options as needed
.mode (Mode.AverageTime)
.timeUnit(TimeUnit.MICROSECONDS)
.warmupTime(TimeValue.seconds(1))
.warmupIterations(2)
.measurementTime(TimeValue.seconds(1))
.measurementIterations(2)
.threads(2)
.forks(1)
.shouldFailOnError(true)
.shouldDoGC(true)
//.jvmArgs("-XX:+UnlockDiagnosticVMOptions", "-XX:+PrintInlining")
//.addProfiler(WinPerfAsmProfiler.class)
.build();
new Runner(opt).run();
}
// The JMH samples are the best documentation for how to use it
// http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/
#State (Scope.Thread)
public static class BenchmarkState
{
List<Integer> list;
#Setup (Level.Trial) public void
initialize() {
Random rand = new Random();
list = new ArrayList<>();
for (int i = 0; i < 1000; i++)
list.add (rand.nextInt());
}
}
#Benchmark public void
benchmark1 (BenchmarkState state, Blackhole bh) {
List<Integer> list = state.list;
for (int i = 0; i < 1000; i++)
bh.consume (list.get (i));
}
}
JMH's annotation processor seems to not work well with compile-on-save in NetBeans. You may need to do a full Clean and Build whenever you modify the benchmarks. (Any suggestions appreciated!)
Run your launchBenchmark test and watch the results!
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running com.Foo
# JMH version: 1.21
# VM version: JDK 1.8.0_172, Java HotSpot(TM) 64-Bit Server VM, 25.172-b11
# VM invoker: /usr/lib/jvm/java-8-jdk/jre/bin/java
# VM options: <none>
# Warmup: 2 iterations, 1 s each
# Measurement: 2 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.Foo.benchmark1
# Run progress: 0.00% complete, ETA 00:00:04
# Fork: 1 of 1
# Warmup Iteration 1: 4.258 us/op
# Warmup Iteration 2: 4.359 us/op
Iteration 1: 4.121 us/op
Iteration 2: 4.029 us/op
Result "benchmark1":
4.075 us/op
# Run complete. Total time: 00:00:06
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
Foo.benchmark1 avgt 2 4.075 us/op
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.013 sec
Runner.run even returns RunResult objects on which you can do assertions, etc.

A declarative approach using annotations:
#State(Scope.Benchmark)
#Threads(1)
public class TestBenchmark {
#Param({"10","100","1000"})
public int iterations;
#Setup(Level.Invocation)
public void setupInvokation() throws Exception {
// executed before each invocation of the benchmark
}
#Setup(Level.Iteration)
public void setupIteration() throws Exception {
// executed before each invocation of the iteration
}
#Benchmark
#BenchmarkMode(Mode.AverageTime)
#Fork(warmups = 1, value = 1)
#Warmup(batchSize = -1, iterations = 3, time = 10, timeUnit = TimeUnit.MILLISECONDS)
#Measurement(batchSize = -1, iterations = 10, time = 10, timeUnit = TimeUnit.MILLISECONDS)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
public void test() throws Exception {
Thread.sleep(ThreadLocalRandom.current().nextInt(0, iterations));
}
#Test
public void benchmark() throws Exception {
String[] argv = {};
org.openjdk.jmh.Main.main(argv);
}
}

#State(Scope.Benchmark)
#Threads(1)
#Fork(1)
#OutputTimeUnit(TimeUnit.MICROSECONDS)
#Warmup(iterations = 5, time = 1)
#Measurement(iterations = 5, time = 1)
#BenchmarkMode(Mode.All)
public class ToBytesTest {
public static void main(String[] args) {
ToBytesTest test = new ToBytesTest();
System.out.println(test.string()[0] == test.charBufferWrap()[0] && test.charBufferWrap()[0] == test.charBufferAllocate()[0]);
}
#Test
public void benchmark() throws Exception {
org.openjdk.jmh.Main.main(new String[]{ToBytesTest.class.getName()});
}
char[] chars = new char[]{'中', '国'};
#Benchmark
public byte[] string() {
return new String(chars).getBytes(StandardCharsets.UTF_8);
}
#Benchmark
public byte[] charBufferWrap() {
return StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars)).array();
}
#Benchmark
public byte[] charBufferAllocate() {
CharBuffer cb = CharBuffer.allocate(chars.length).put(chars);
cb.flip();
return StandardCharsets.UTF_8.encode(cb).array();
}
}

Java Reflection Performance

Does creating an object using reflection rather than calling the class constructor result in any significant performance differences?

Yes - absolutely. Looking up a class via reflection is, by magnitude, more expensive.
Quoting Java's documentation on reflection:
Because reflection involves types that are dynamically resolved, certain Java virtual machine optimizations can not be performed. Consequently, reflective operations have slower performance than their non-reflective counterparts, and should be avoided in sections of code which are called frequently in performance-sensitive applications.
Here's a simple test I hacked up in 5 minutes on my machine, running Sun JRE 6u10:
public class Main {
public static void main(String[] args) throws Exception
{
doRegular();
doReflection();
}
public static void doRegular() throws Exception
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; i++)
{
A a = new A();
a.doSomeThing();
}
System.out.println(System.currentTimeMillis() - start);
}
public static void doReflection() throws Exception
{
long start = System.currentTimeMillis();
for (int i=0; i<1000000; i++)
{
A a = (A) Class.forName("misc.A").newInstance();
a.doSomeThing();
}
System.out.println(System.currentTimeMillis() - start);
}
}
With these results:
35 // no reflection
465 // using reflection
Bear in mind the lookup and the instantiation are done together, and in some cases the lookup can be refactored away, but this is just a basic example.
Even if you just instantiate, you still get a performance hit:
30 // no reflection
47 // reflection using one lookup, only instantiating
Again, YMMV.

Yes, it's slower.
But remember the damn #1 rule--PREMATURE OPTIMIZATION IS THE ROOT OF ALL EVIL
(Well, may be tied with #1 for DRY)
I swear, if someone came up to me at work and asked me this I'd be very watchful over their code for the next few months.
You must never optimize until you are sure you need it, until then, just write good, readable code.
Oh, and I don't mean write stupid code either. Just be thinking about the cleanest way you can possibly do it--no copy and paste, etc. (Still be wary of stuff like inner loops and using the collection that best fits your need--Ignoring these isn't "unoptimized" programming, it's "bad" programming)
It freaks me out when I hear questions like this, but then I forget that everyone has to go through learning all the rules themselves before they really get it. You'll get it after you've spent a man-month debugging something someone "Optimized".
EDIT:
An interesting thing happened in this thread. Check the #1 answer, it's an example of how powerful the compiler is at optimizing things. The test is completely invalid because the non-reflective instantiation can be completely factored out.
Lesson? Don't EVER optimize until you've written a clean, neatly coded solution and proven it to be too slow.

You may find that A a = new A() is being optimised out by the JVM.
If you put the objects into an array, they don't perform so well. ;)
The following prints...
new A(), 141 ns
A.class.newInstance(), 266 ns
new A(), 103 ns
A.class.newInstance(), 261 ns
public class Run {
private static final int RUNS = 3000000;
public static class A {
}
public static void main(String[] args) throws Exception {
doRegular();
doReflection();
doRegular();
doReflection();
}
public static void doRegular() throws Exception {
A[] as = new A[RUNS];
long start = System.nanoTime();
for (int i = 0; i < RUNS; i++) {
as[i] = new A();
}
System.out.printf("new A(), %,d ns%n", (System.nanoTime() - start)/RUNS);
}
public static void doReflection() throws Exception {
A[] as = new A[RUNS];
long start = System.nanoTime();
for (int i = 0; i < RUNS; i++) {
as[i] = A.class.newInstance();
}
System.out.printf("A.class.newInstance(), %,d ns%n", (System.nanoTime() - start)/RUNS);
}
}
This suggest the difference is about 150 ns on my machine.

If there really is need for something faster than reflection, and it's not just a premature optimization, then bytecode generation with ASM or a higher level library is an option. Generating the bytecode the first time is slower than just using reflection, but once the bytecode has been generated, it is as fast as normal Java code and will be optimized by the JIT compiler.
Some examples of applications which use code generation:
Invoking methods on proxies generated by CGLIB is slightly faster than Java's dynamic proxies, because CGLIB generates bytecode for its proxies, but dynamic proxies use only reflection (I measured CGLIB to be about 10x faster in method calls, but creating the proxies was slower).
JSerial generates bytecode for reading/writing the fields of serialized objects, instead of using reflection. There are some benchmarks on JSerial's site.
I'm not 100% sure (and I don't feel like reading the source now), but I think Guice generates bytecode to do dependency injection. Correct me if I'm wrong.

"Significant" is entirely dependent on context.
If you're using reflection to create a single handler object based on some configuration file, and then spending the rest of your time running database queries, then it's insignificant. If you're creating large numbers of objects via reflection in a tight loop, then yes, it's significant.
In general, design flexibility (where needed!) should drive your use of reflection, not performance. However, to determine whether performance is an issue, you need to profile rather than get arbitrary responses from a discussion forum.

There is some overhead with reflection, but it's a lot smaller on modern VMs than it used to be.
If you're using reflection to create every simple object in your program then something is wrong. Using it occasionally, when you have good reason, shouldn't be a problem at all.

Yes there is a performance hit when using Reflection but a possible workaround for optimization is caching the method:
Method md = null; // Call while looking up the method at each iteration.
millis = System.currentTimeMillis( );
for (idx = 0; idx < CALL_AMOUNT; idx++) {
md = ri.getClass( ).getMethod("getValue", null);
md.invoke(ri, null);
}
System.out.println("Calling method " + CALL_AMOUNT+ " times reflexively with lookup took " + (System.currentTimeMillis( ) - millis) + " millis");
// Call using a cache of the method.
md = ri.getClass( ).getMethod("getValue", null);
millis = System.currentTimeMillis( );
for (idx = 0; idx < CALL_AMOUNT; idx++) {
md.invoke(ri, null);
}
System.out.println("Calling method " + CALL_AMOUNT + " times reflexively with cache took " + (System.currentTimeMillis( ) - millis) + " millis");
will result in:
[java] Calling method 1000000 times reflexively with lookup took 5618 millis
[java] Calling method 1000000 times reflexively with cache took 270 millis

Interestingly enough, settting setAccessible(true), which skips the security checks, has a 20% reduction in cost.
Without setAccessible(true)
new A(), 70 ns
A.class.newInstance(), 214 ns
new A(), 84 ns
A.class.newInstance(), 229 ns
With setAccessible(true)
new A(), 69 ns
A.class.newInstance(), 159 ns
new A(), 85 ns
A.class.newInstance(), 171 ns

Reflection is slow, though object allocation is not as hopeless as other aspects of reflection. Achieving equivalent performance with reflection-based instantiation requires you to write your code so the jit can tell which class is being instantiated. If the identity of the class can't be determined, then the allocation code can't be inlined. Worse, escape analysis fails, and the object can't be stack-allocated. If you're lucky, the JVM's run-time profiling may come to the rescue if this code gets hot, and may determine dynamically which class predominates and may optimize for that one.
Be aware the microbenchmarks in this thread are deeply flawed, so take them with a grain of salt. The least flawed by far is Peter Lawrey's: it does warmup runs to get the methods jitted, and it (consciously) defeats escape analysis to ensure the allocations are actually occurring. Even that one has its problems, though: for example, the tremendous number of array stores can be expected to defeat caches and store buffers, so this will wind up being mostly a memory benchmark if your allocations are very fast. (Kudos to Peter on getting the conclusion right though: that the difference is "150ns" rather than "2.5x". I suspect he does this kind of thing for a living.)

Yes, it is significantly slower. We were running some code that did that, and while I don't have the metrics available at the moment, the end result was that we had to refactor that code to not use reflection. If you know what the class is, just call the constructor directly.

In the doReflection() is the overhead because of Class.forName("misc.A") (that would require a class lookup, potentially scanning the class path on the filsystem), rather than the newInstance() called on the class. I am wondering what the stats would look like if the Class.forName("misc.A") is done only once outside the for-loop, it doesn't really have to be done for every invocation of the loop.

Yes, always will be slower create an object by reflection because the JVM cannot optimize the code on compilation time. See the Sun/Java Reflection tutorials for more details.
See this simple test:
public class TestSpeed {
public static void main(String[] args) {
long startTime = System.nanoTime();
Object instance = new TestSpeed();
long endTime = System.nanoTime();
System.out.println(endTime - startTime + "ns");
startTime = System.nanoTime();
try {
Object reflectionInstance = Class.forName("TestSpeed").newInstance();
} catch (InstantiationException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
endTime = System.nanoTime();
System.out.println(endTime - startTime + "ns");
}
}

Often you can use Apache commons BeanUtils or PropertyUtils which introspection (basically they cache the meta data about the classes so they don't always need to use reflection).

I think it depends on how light/heavy the target method is. if the target method is very light(e.g. getter/setter), It could be 1 ~ 3 times slower. if the target method takes about 1 millisecond or above, then the performance will be very close. here is the test I did with Java 8 and reflectasm :
public class ReflectionTest extends TestCase {
#Test
public void test_perf() {
Profiler.run(3, 100000, 3, "m_01 by refelct", () -> Reflection.on(X.class)._new().invoke("m_01")).printResult();
Profiler.run(3, 100000, 3, "m_01 direct call", () -> new X().m_01()).printResult();
Profiler.run(3, 100000, 3, "m_02 by refelct", () -> Reflection.on(X.class)._new().invoke("m_02")).printResult();
Profiler.run(3, 100000, 3, "m_02 direct call", () -> new X().m_02()).printResult();
Profiler.run(3, 100000, 3, "m_11 by refelct", () -> Reflection.on(X.class)._new().invoke("m_11")).printResult();
Profiler.run(3, 100000, 3, "m_11 direct call", () -> X.m_11()).printResult();
Profiler.run(3, 100000, 3, "m_12 by refelct", () -> Reflection.on(X.class)._new().invoke("m_12")).printResult();
Profiler.run(3, 100000, 3, "m_12 direct call", () -> X.m_12()).printResult();
}
public static class X {
public long m_01() {
return m_11();
}
public long m_02() {
return m_12();
}
public static long m_11() {
long sum = IntStream.range(0, 10).sum();
assertEquals(45, sum);
return sum;
}
public static long m_12() {
long sum = IntStream.range(0, 10000).sum();
assertEquals(49995000, sum);
return sum;
}
}
}
The complete test code is available at GitHub:ReflectionTest.java

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JMH - why JIT does not eliminate my dead-code - java

Related

Why does running multiple lambdas in loops suddenly slow down?

MethodHandles.lookup().lookupClass() vs getClass()

Why does .toString() seem to fix an OutOfMemoryError exception for StringBuilder?

How to run JMH from inside JUnit tests?

Java Reflection Performance

Categories

Resources