I have the following Java code:
public int sign(int a) {
if(a<0) return -1;
else if (a>0) return 1;
else return 0;
}
which when compiled generated the following bytecode:
public int sign(int);
Code:
0: iload_1
1: ifge 6
4: iconst_m1
5: ireturn
6: iload_1
7: ifle 12
10: iconst_1
11: ireturn
12: iconst_0
13: ireturn
I want to know how the byte offset count (the first column) is calculated, in particular, why is the byte count for the ifge and ifle instructions 3 bytes when all the other instructions are single byte instructions?
As already pointed out in the comment: The ifge and ifle instructions have an additional offset.
The Java Virtual Machine Instruction Set specification for ifge and ifle contains the relevant hint here:
Format
if<cond>
branchbyte1
branchbyte2
This indicates that there are two additional bytes associated with this instruction, namely the "branch bytes". These bytes are composed to a single short value to determine the offset - namely, how far the instruction pointer should "jump" when the condition is satisfied.
Edit:
The comments made me curious: The offset is defined to be a signed 16 bit value, limiting the jumps to the range of +/- 32k. This does not cover the whole range of a possible method, which may contain up to 65535 bytes according to the code_length in the class file.
So I created a test class, to see what happens. This class looks like this:
class FarJump
{
public static void main(String args[])
{
call(0, 1);
}
public static void call(int x, int y)
{
if (x < y)
{
y++;
y++;
... (10921 times) ...
y++;
y++;
}
System.out.println(y);
}
}
Each of the y++ lines will be translated into a iinc instruction, consisting of 3 bytes. So the resulting byte code is
public static void call(int, int);
Code:
0: iload_0
1: iload_1
2: if_icmpge 32768
5: iinc 1, 1
8: iinc 1, 1
...(10921 times) ...
32762: iinc 1, 1
32765: iinc 1, 1
32768: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
32771: iload_1
32772: invokevirtual #4 // Method java/io/PrintStream.println:(I)V
32775: return
One can see that it still uses an if_icmpge instruction, with an offset of 32768 (Edit: It is an absolute offset. The relative offset is 32766. Also see this question)
By adding a single more y++ in the original code, the compiled code suddenly changes to
public static void call(int, int);
Code:
0: iload_0
1: iload_1
2: if_icmplt 10
5: goto_w 32781
10: iinc 1, 1
13: iinc 1, 1
....
32770: iinc 1, 1
32773: iinc 1, 1
32776: goto_w 32781
32781: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
32784: iload_1
32785: invokevirtual #4 // Method java/io/PrintStream.println:(I)V
32788: return
So it reverses the condition from if_icmpge to if_icmplt, and handles the far jump with a goto_w instruction, that contains four branch bytes and can thus cover (more than) a full method range.
The byte offsets can easily be calculated by summing up the size of each instruction before it. Instruction sizes are documented in the JVM specs.
The if<cond> instructions take up more space than the others because in addition to the single byte opcode, they have two extra bytes that specify the offset to jump to if the condition it true.
If you want to experiment further, you could for instance try using larger constants (like, say, 20) in your code. You'll see that the instructions to load those will also use up extra bytes to store the constant value. Commonly used small numbers have one-byte encodings (such as iconst_1) for efficiency.
Related
As far as I understood the JVM bytecode specification,
aload index pushes a reference found at index onto the current stackframe. In my case, I need to aload a reference found at an index which is stored in a local variable - something like this: aload (iload 2).
(How) Is this possible?
This is not possible. Otherwise verifier would not be able to prove the correctness of the bytecode.
To access values by index use array objects with the corresponding bytecodes, e.g. aaload.
There is no dedicated bytecode instruction to perform this kind of selector, so if you want to load a local variable based on the contents of another variable, you have to code this operation yourself, e.g.
// pick #1 or #2 based on #0 in [0..1]
0: iload_0
1: ifne 8
4: aload_1
5: goto 9
8: aload_2
9: // value is now on the stack, ready for use
or
// pick #1, #2, or #3 based on #0 in [0..2]
0: iload_0
1: ifne 8
4: aload_1
5: goto 18
8: iload_0
9: iconst_1
10: if_icmpne 17
13: aload_2
14: goto 18
17: aload_3
18: // value is now on the stack, ready for use
or, to have a variant that scales with higher numbers of variables:
// pick #1, #2, or #3 based on #0 in [0..2]
0: iload_0
1: tableswitch { // 0 to 2
0: 28
1: 32
2: 36
default: 36
}
28: aload_1
29: goto 37
32: aload_2
33: goto 37
36: aload_3
37: // value is now on the stack, ready for use
The last variant has been generated with the ASM library and the following helper method, calling it like loadDynamic(…, 0, 1, 3);
/**
* Load variable #(firstChoice + value(#selector))
*/
public static void loadDynamic(
MethodVisitor target, int selector, int firstChoice, int numChoices) {
Label[] labels = new Label[numChoices];
for(int ix = 0; ix < numChoices; ix++) labels[ix] = new Label();
Label end = new Label();
target.visitVarInsn(Opcodes.ILOAD, selector);
target.visitTableSwitchInsn(0, numChoices - 1, labels[numChoices - 1], labels);
target.visitLabel(labels[0]);
target.visitVarInsn(Opcodes.ALOAD, firstChoice);
for(int ix = 1; ix < numChoices; ix++) {
target.visitJumpInsn(Opcodes.GOTO, end);
target.visitLabel(labels[ix]);
target.visitVarInsn(Opcodes.ALOAD, firstChoice + ix);
}
target.visitLabel(end); // choosen value is now on the stack
}
Obviously, array based code using the intrinsic index based access instructions is much simpler and more efficient in most cases.
Why does Java have an IINC bytecode instruction?
There is already an IADD bytecode instruction that can be used to accomplish the same.
So why does IINC exist?
Only the original designers of Java can answer why they made particular design decisions. However, we can speculate:
IINC does not let you do anything that can't already be accomplished by a ILOAD/SIPUSH/IADD/ISTORE combo. The difference is that IINC is a single instruction, which only takes 3 or 6 bytes, while the 4 instruction sequence is obviously longer. So IINC slightly reduces the size of bytecode that uses it.
Apart from that, early versions of Java used an interpreter, where every instruction has overhead during execution. In this case, using a single IINC instruction could be faster than the equivalent alternative bytecode sequence. Note that JITting has made this largely irrelevant, but IINC dates back to the original version of Java.
As already pointed out a single iinc instruction is shorter than the a iload, sipush, iadd, istore sequence. There is also evidence, that performing a common-case code size reduction was an important motivation.
There are specialized instructions for dealing with the first four local variables, e.g. aload_0 does the same as aload 0 and it will be used often for loading the this reference on the operand stack. There’s an ldc instruction being able to refer to one of the first 255 constant pool items whereas all of them could be handled by ldc_w, branch instructions use two bytes for offsets, so only overly large methods have to resort to goto_w, and iconst_n instructions for -1 to 5 exist despite these all could be handled by bipush which supports values which all also could all be handled by sipush, which could be superseded by ldc.
So asymmetric instructions are the norm. In typical applications, there are a lot of small methods with only a few local variables and smaller numbers are more common than larger numbers. iinc is a direct equivalent to stand-alone i++ or i+=smallConstantNumber expressions (applied to local variables) which often occur within loops. By being able to express common code idioms in more compact code without loosing the ability to express all code, you’ll get great savings in overall code size.
As also already pointed out, there is only a slight opportunity for faster execution in interpreted executions which is irrelevant for compiled/optimized code execution.
Looking at this table, there are a couple important differences.
iinc: increment local variable #index by signed byte const
iinc uses a register instead of the stack.
iinc can only increment by a signed byte value. If you want to add [-128,127] to an integer, then you could use iinc, but as soon as you want to add a number outside that range you would need to use isub, iadd, or multiple iinc instructions.
E1:
TL;DR
I was basically right, except that the limit is signed short values (16 bits [-32768,32767]). There's a wide bytecode instruction which modifies iinc (and a couple other instructions) to use 16 bit numbers instead of 8 bit numbers.
Additionally, consider adding two variables together. If one of the variables is not constant, the compiler will never be able to inline its value to bytecode, so it cannot use iinc; it will have to use iadd.
package SO37056714;
public class IntegerIncrementTest {
public static void main(String[] args) {
int i = 1;
i += 5;
}
}
I'm going to be experimenting with the above piece of code. As it is, is uses iinc, as expected.
$ javap -c IntegerIncrementTest.class
Compiled from "IntegerIncrementTest.java"
public class SO37056714.IntegerIncrementTest {
public SO37056714.IntegerIncrementTest();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore_1
2: iinc 1, 5
5: return
}
i += 127 uses iinc as expected.
$ javap -c IntegerIncrementTest.class
Compiled from "IntegerIncrementTest.java"
public class SO37056714.IntegerIncrementTest {
public SO37056714.IntegerIncrementTest();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore_1
2: iinc 1, 127
5: return
}
i += 128 does not use iinc anymore, but instead iinc_w:
$ javap -c IntegerIncrementTest.class
Compiled from "IntegerIncrementTest.java"
public class SO37056714.IntegerIncrementTest {
public SO37056714.IntegerIncrementTest();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore_1
2: iinc_w 1, 128
8: return
}
i -= 601 also uses iinc_w:
$ javap -c IntegerIncrementTest.class
Compiled from "IntegerIncrementTest.java"
public class SO37056714.IntegerIncrementTest {
public SO37056714.IntegerIncrementTest();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore_1
2: iinc_w 1, -601
8: return
}
The _w suffix refers to the wide bytecode, which allows for constants up to 16 bits ([-32768, 32767]).
If we try i += 32768, we will see what I predicted above:
$ javap -c IntegerIncrementTest.class
Compiled from "IntegerIncrementTest.java"
public class SO37056714.IntegerIncrementTest {
public SO37056714.IntegerIncrementTest();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore_1
2: iload_1
3: ldc #16 // int 32768
5: iadd
6: istore_1
7: return
}
Additionally, consider the case where we are adding another variable to i (i += c). The compiler doesn't know if c is constant or not, so it cannot inline c's value to bytecode. It will use iadd for this case too:
int i = 1;
byte c = 3;
i += c;
$ javap -c IntegerIncrementTest.class
Compiled from "IntegerIncrementTest.java"
public class SO37056714.IntegerIncrementTest {
public SO37056714.IntegerIncrementTest();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_1
1: istore_1
2: iconst_3
3: istore_2
4: iload_1
5: iload_2
6: iadd
7: istore_1
8: return
}
This question already has answers here:
Performance difference between post- and pre- increment operators? [closed]
(2 answers)
Closed 7 years ago.
Is there in JAVA a performance difference between i++; and i--;
I'm not able to evaluate bytecode for this, and I think that simple benchmarks are not reliable because of dependence on a specific algorithm.
im not able to evaluate bytecode
Besides the duplicate which I linked and which shows some general things to consider when asking performance related questions:
Given the following sample code (System.err.println is essentially necessary so that the compiler does not optimize away the unused variable):
public class IncDec {
public static void main(String[] args) {
int i = 5;
i++;
System.err.println(i);
i--;
System.err.println(i);
}
}
Disassembled code:
> javap -c IncDec
Compiled from "IncDec.java"
public class IncDec {
public IncDec();
Code:
0: aload_0
1: invokespecial #8 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_5
1: istore_1 // int i = 5
2: iinc 1, 1 // i++
5: getstatic #16 // Field java/lang/System.err:Ljava/io/PrintStream;
8: iload_1
9: invokevirtual #22 // Method java/io/PrintStream.println:(I)V
12: iinc 1, -1 // i--
15: getstatic #16 // Field java/lang/System.err:Ljava/io/PrintStream;
18: iload_1
19: invokevirtual #22 // Method java/io/PrintStream.println:(I)V
22: return
}
So, no, there is no performance difference in this particular case on a bytecode level - both statements are compiled to the same bytecode instruction.
The JIT compiler could be free to do any additional optimization though.
In Java, there isn't a difference in speed between the two. At the most basic level, subtraction is simply addition. That is, taking the 2's complement and adding it.
The "Performance Tips" section in the Android documentation has a pretty bold claim:
one() is faster. It pulls everything out into local variables, avoiding the lookups. Only the array length offers a performance benefit.
where it refers to this code snippet:
int len = localArray.length;
for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;
}
This surprised me a lot because localArray.length is just accessing an integer and if you'd use an intermediate variable, you'd have to do that exact same step again. Are we really saying that an intermediate variable that only has to go to x instead of y.x is faster?
I took a look over at this question which is about the same idea but uses an arraylist and its subsequent .size() method instead. Here the concensus seemed to be that there would be no difference since that method call is probably just going to be inlined to an integer access anyway (which is exactly the scenario we have here).
So I took to the bytecode to see if that could tell me anything.
Given the following source code:
public void MethodOne() {
int[] arr = new int[5];
for (int i = 0; i < arr.length; i++) { }
}
public void MethodTwo() {
int[] arr = new int[5];
int len = arr.length;
for (int i = 0; i < len; i++) { }
}
I get the following bytecode:
public void MethodOne();
Code:
0: iconst_5
1: newarray int
3: astore_1
4: iconst_0
5: istore_2
6: iload_2
7: aload_1
8: arraylength
9: if_icmpge 18
12: iinc 2, 1
15: goto 6
18: return
public void MethodTwo();
Code:
0: iconst_5
1: newarray int
3: astore_1
4: aload_1
5: arraylength
6: istore_2
7: iconst_0
8: istore_3
9: iload_3
10: iload_2
11: if_icmpge 20
14: iinc 3, 1
17: goto 9
20: return
They differ in the following instructions:
Method one
6: iload_2
7: aload_1
8: arraylength
9: if_icmpge 18
12: iinc 2, 1
15: goto 6
18: return
Method two
9: iload_3
10: iload_2
11: if_icmpge 20
14: iinc 3, 1
17: goto 9
20: return
Now, I'm not 100% sure how I have to interpret 8: arraylength but I think that just indicates the field you're accessing. The first method loads the index counter and the array and accesses the arraylength field while the second method loads the index counter and the intermediate variable.
I benchmarked the two methods as well with JMH (10 warmups, 10 iterations, 5 forks) which gives me the following benchmarking result:
c.m.m.Start.MethodOne thrpt 50 3447184.351 19973.900 ops/ms
c.m.m.Start.MethodTwo thrpt 50 3435112.281 32639.755 ops/ms
which tells me that the difference is negligible to inexistant.
On what is the Android documentation's claim that using an intermediate variable in a loop condition, based?
You misunderstood the documentation. They are not referring to what you have described (although I don't blame you, they should've put more effort into those docs :)).
It pulls everything out into local variables, avoiding the lookups.
By avoiding the lookups they refer to field vs local variable access cost. Accessing the field (mArray in the example in the docs) requires loading this first, then loading the field based on the fixed offset from this.
After a while, JIT will probably figure out what's going on and optimize the field access as well (if the field is not volatile or some other form of synchronization happens in the loop) and rewrite the code so that all of the variables participating in the loop are accessed/changed in the CPU registers and caches until the end of the loop.
Generally, it could be potentially much more work for JIT to figure out whether it is safe to optimize the access to the length of the array referenced from a field compared to the reference stored in a local variable. Let's say we have the following loop:
for (int i = 0; i < array.length; ++i) {
process(array[i]);
}
If array is a field and process invokes thousands of lines of complex code, then JIT may find it difficult to check whether the array field is changed somewhere in the loop to reference some other array which has a different length.
Obviously, it is much easier to check whether the local variable is changed in this case (three lines of code).
Actually no it didn't make the loop faster, your idea is right when using the String.length()
the difference is that
array.length is just a field which have a value that you just use it directly..
String.length() is a method which need a time to be executed.
I'm going to make some investigations about dividing large arrays/matrix computations among multiple threads. But I need to know the relative time complexity of Java basic operations.
For instance:
int a = 23498234;
int b = -34234;
int[] array = new int[10000];
int c = a + b; // 1
int c = array[234]; // 2
String 1 (summary of two integers) is 10+ times faster than string 2 (memory access)
or (i & 1) == 0 is 10+ faster than i % 2 == 0.
Question: Can you supppose time relations between next operations:
+, * and / operands (suppose on Integer type)
memory access
starting new thread
For performance timing, there are many confounding factors. Rather than try to get exact timings, it's better to understand what's going on and measure what you can.
The time utility will give you detailed stats on an executable, but keep in mind you're timing the JVM which is running the code, not just your code.
You might try the javap disassembler too -- ultimately you'll want to know how your individual operations break down into java bytecode, and the amount of time it takes to execute certain key bits.
Example source code:
public class T {
public static void main(String [] args) {
int x=2;
int y=3;
int z=x+y;
System.out.println(""+x);
}
}
Compiled, then disassembled:
$ javap -c T
Compiled from "T.java"
public class T {
public T();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_2
1: istore_1
2: iconst_3
3: istore_2
4: iload_1
5: iload_2
6: iadd
7: istore_3
8: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
11: new #3 // class java/lang/StringBuilder
14: dup
15: invokespecial #4 // Method java/lang/StringBuilder."<init>":()V
18: ldc #5 // String
20: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
23: iload_1
24: invokevirtual #7 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
27: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
30: invokevirtual #9 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
33: return
}
Look at code #6 - that's where the actual addition is happening.
One thing you need to establish is how the operations you're interested in turn into bytecode.
Within the JVM itself, you can use System.getCurrentTimeMillis() as a way of timing, but it won't give you sub-ms resolution. You can also use System.nanoTime(); to get higher precision time, (in the sense that it's sub-ms resolution) but it's less accurate.