ASM - strange localVar index using newLocal from LocalVariableSorter

ASM - strange localVar index using newLocal from LocalVariableSorter - java

I'm adding new locals via newLocal from LocalVariableSorter. The method I'm adding the locals to is an instance method with a long parameter. I'm adding two locals; one long, one object. There are no other local vars in the sample code.
As a result I would have expected the following slots / indexes:
0 - this
1 - the long param
3 - my 1st local added via `newLocal` - using two slots as it is a long
5 - my 2nd local added via `newLocal`
What I do get as return from newLocal is 3 and 7 though. Why such a big gap?
And to make things even more strange, when I add xSTORE instructions using those indexes and check the result with javap it shows me:
LSTORE 5
ASTORE 8
Note: Not only are the values different then the ones I've passed to the xSTORE instruction, also the gap between them is now 3 instead of 4 as before.
The resulting code works though. I would just like to understand what magic is happening here an why.
Thanks

The LocalVariableSorter class has a design, which makes it very easy to use it wrong.
When calling methods defined by the MethodVisitor API on it, the local variables undergo the renumbering mentioned in the class documentation.
So when being used with a ClassReader, the visited old code gets transformed. Since you do not want the injected new code to undergo this transformation, but to use the newly defined variable(s), you have to bypass the LocalVariableSorter and call methods on the underlying target MethodVisitor.
When you call visitVarInsn(LSTORE, 3) on the LocalVariableSorter, it gets handled like an old instruction referring to index 3 and since you injected a new variable occupying index 3 and 4, the “old variable” at index 3 gets remapped to the next free index, which is 5 (and 6). Then, when you define your next new variable, it gets index 7 and calling visitVarInsn(ASTORE, 7) on the LocalVariableSorter is handled like an old variable which conflicts with your new variable, so it gets remapped to 8.
This behavior matches exactly what the first sentence of the class documentation states:
LocalVariablesSorter
A MethodVisitor that renumbers local variables in their order of appearance.
So while you have to call newLocal on the LocalVariableSorter to create a new variable that won’t get remapped, you have to call the visit… methods on the original, wrapped MethodVisitor to use it. When you use the subclass GeneratorAdapter, you can use its newly defined methods (those not starting with visit…) to create new instructions which don’t get transformed, but to me, this would make matters even worse, having methods for transforming instructions and creating untransformed instructions on the same class and always needing to keep in mind that the visit… prefix makes the difference. For some methods, you would still need to access the original method visitor, as discussed in this answer which deals with visitLocalVariable to create debug information for the created variable.

Related

What exactly happens in the JVM when invoking an object's instance method?

I think I have finally found out how to word, what is giving me so much trouble in understanding: how the virtual machine can access a classes methods and use it only on a given instance (object) with the catch that the virtual machine is only being given the reference/pointer variable.
This was compounded by the fact that most visualizations of the methods interacting with the stack/heap (that is shown to most beginner Java programmers) don’t quite go deep enough into the depth I am in looking for.
I have done a lot of research, and I want to say a good summary of what I learned, and I am asking if you could please correct me where I am wrong (or elaborate further if you think there is more that could be said)! Note that I am using this portion of an article I found (I am using it more as a visual reference, I understand some of the text in the article does not pertain to the question), so please take a look at it before reading onward:
So let’s say I have a reference/pointer variable foo1 that is of type Foo(was created using a constructor called Foo). foo1 is stored on the stack, but the object it points to is stored on the heap (the Foo object having an instance variable int size;).
So I understand how foo1.size would give the integer value of size because the value of foo1 is dereferenced to get the value field of size (the reference/pointer variable has a direct address where the size field is stored on the heap in the object).
But when foo1.bar() is ran, what exactly does its bytecode translate to? And how is this method call performed at runtime (would it be correct to say the value of foo1 is being dereferenced to get method bar())?
Does it relate correctly to the diagram in the image above (all in the JVM: does it go from the reference/pointer variable foo1 on the stack to the heap which is actually a pointer to another pointer (which points to the bytecode of all the class data) full class data (in a method table which is just an array of pointers to the data for each instance method that can be invoked on objects of that class) in the method area which then itself has "pointer variables" to the actual bytecode method data)?
I apologize for how long-winded this post is, but I want to be extremely specific since I have had major trouble the past week trying to word my question properly. I know I sound sceptical of the article I am referencing, but it seems there is a lot of junk visualizations out there and I want to be sure that I’m continuing my Java programming correctly, and not on incorrect notions.

Ordinary instance method invocations get compiled to invokevirtual instructions.
This has been described in JVMS, §3.7. Invoking Methods:
The normal method invocation for a instance method dispatches on the run-time type of the object. (They are virtual, in C++ terms.) Such an invocation is implemented using the invokevirtual instruction, which takes as its argument an index to a run-time constant pool entry giving the internal form of the binary name of the class type of the object, the name of the method to invoke, and that method's descriptor (§4.3.3). To invoke the addTwo method, defined earlier as an instance method, we might write:
int add12and13() {
return addTwo(12, 13);
}
This compiles to:
Method int add12and13()
0 aload_0 // Push local variable 0 (this)
1 bipush 12 // Push int constant 12
3 bipush 13 // Push int constant 13
5 invokevirtual #4 // Method Example.addtwo(II)I
8 ireturn // Return int on top of operand stack;
// it is the int result of addTwo()
The invocation is set up by first pushing a reference to the current instance, this, on to the operand stack. The method invocation's arguments, int values 12 and 13, are then pushed. When the frame for the addTwo method is created, the arguments passed to the method become the initial values of the new frame's local variables. That is, the reference for this and the two arguments, pushed onto the operand stack by the invoker, will become the initial values of local variables 0, 1, and 2 of the invoked method.
It’s up to the particular JVM implementation, how to perform the invocation at runtime, but using a vtable is very common. This basically matches the graphic in your question. The reference to the receiver object, which will become the this reference for the invoked method, is used to retrieve a method table.
In the HotSpot JVM, the metadata structure is called Klass (actually a common name, even across different implementations). See “Object header layout” on the OpenJDK Wiki:
An object header consists of a native-sized mark word, a klass word, a 32-bit length word (if the object is an array), a 32-bit gap (if required by alignment rules), and then zero or more instance fields, array elements, or metadata fields. (Interesting Trivia: Klass metaobjects contain a C++ vtable immediately after the klass word.)
When resolving a symbolic reference to a method, its corresponding index in the table will be identified and remembered for subsequent invocations, as it never changes. Then, the entry of the actual object’s class can be used for the invocation. Subclasses will have the entries of the superclass, new methods appended to the end, with the entries of overridden methods replaced.
This is the simple, unoptimized scenario. Most runtime optimizations work better when methods are inlined, to have the context of caller and callee in one piece of code to transform. Therefore, the HotSpot JVM will attempt inlining even for invokevirtual instructions to potentially overridable methods. As the wiki says:
Virtual (and interface) invocations are often demoted to "special" invocations, if the class hierarchy permits it. A dependency is registered in case further class loading spoils things.
Virtual (and interface) invocations with a lopsided type profile are compiled with an optimistic check in favor of the historically common type (or two types).
Depending on the profile, a failure of the optimistic check will either deoptimize or run through a (slow) vtable/itable call.
On the fast path of an optimistically typed call, inlining is common. The best case is a de facto monomorphic call which is inlined. Such calls, if back-to-back, will perform the receiver type check only once.
This aggressive or optimistic inlining will sometime require Deoptimization but will usually yield an overall higher performance.

How to increment a value in Java Stream?

I want to increment value of index with the each iteration by 1. Easily to be achieved in the for-loop. The variable image is an array of ImageView.
Here is my for-loop.
for (Map.Entry<String, Item> entry : map.entrySet()) {
image[index].setImage(entry.getValue().getImage());
index++;
}
In order to practise Stream, I have tried to rewrite it to the Stream:
map.entrySet().stream()
.forEach(e -> item[index++].setImage(e.getValue().getImage()));
Causing me the error:
error: local variables referenced from a lambda expression must be final or effectively final
How to rewrite the Stream incrementing the variable index to be used in?

You shouldn't. These two look similar, but they are conceptually different. The loop is just a loop, but a forEach instructs the library to perform the action on each element, without specifying neither the order of actions (for parallel streams) nor threads which will execute them. If you use forEachOrdered, then there are still no guarantees about threads, but at least you have the guarantee of happens-before relationship between actions on subsequent elements.
Note especially that the docs say:
For any given element, the action may be performed at whatever time
and in whatever thread the library chooses. If the action accesses
shared state, it is responsible for providing the required
synchronization.
As #Marko noted in the comments below, though, it only applies to parallel streams, even if the wording is a bit confusing. Nevertheless, using a loop means that you don't even have to worry about all this complicated stuff!
So the bottom line is: use loops if that logic is a part of the function it's in, and use forEach if you just want to tell Java to “do this and that” to elements of the stream.
That was about forEach vs loops. Now on the topic of why the variable needs to be final in the first place, and why you can do that to class fields and array elements. It's because, like it says, Java has the limitation that anonymous classes and lambdas can't access a local variable unless it never changes. Meaning not only they can't change it themselves, but you can't change it outside them as well. But that only applies to local variables, which is why it works for everything else like class fields or array elements.
The reason for this limitation, I think, is lifetime issues. A local variable exists only while the block containing it is executing. Everything else exists while there are references to it, thanks to garbage collection. And that everything else includes lambdas and anonymous classes too, so if they could modify local variables which have different lifetime, that could lead to problems similar to dangling references in C++. So Java took the easy way out: it simply copies the local variable at the time the lambda / anonymous class is created. But that would lead to confusion if you could change that variable (because the copy wouldn't change, and since the copy is invisible it would be very confusing). So Java just forbids any changes to such variables, and that's that.
There are many questions on the final variables and anonymous classes discussed already, like this one.

Some kind of "zip" operation would be helpful here, though standard Stream API lacks it. Some third-party libraries extending Stream API provide it, including my free StreamEx library:
IntStreamEx.ints() // get stream of numbers 0, 1, 2, ...
.boxed() // box them
.zipWith(StreamEx.ofValues(map)) // zip with map values
.forKeyValue((index, item) -> image[index].setImage(item.getImage()));
See zipWith documentation for more details. Note that your map should have meaningful order (like LinkedHashMap), otherwise this would be pretty useless...

Access variable/constant values in method call

I want to view arguments for method calls. So if I call foo:
x = 4;
y = 5;
...
foo(x, y, 20, 25);
I want to print the arguments(4,5,20,25)
I understand these arguments are pushed onto the stack before the method is invoked. How do I get the value(if initialized or a constant) from the method's local variable array?
visitVarInsn() and VarInsnNode do not have a way to lookup the actual value from the array.
Do I need to use an Analyzer and Interpreter to do this, or is there an easier way?
EDIT: Figured out how to do this.
I modified BasicValue and BasicInterpreter to account for bytecode instruction arguments.
So Values representing instructions like BIPUSH contain information about the value being pushed, instead of only type information.
Frames are examined the same way with an Analyzer

Constant numeric values passed directly to the method call (20 and 25) are easy to retrieve statically - they will result in push instructions that you can read in visitIntInsn. Smaller values will result in const instructions you can catch with visitInsn, large values can be caught with visitLdcInsn.
I don't believe it is generally possible to determine the values bound to variables at the point of the method call statically. You will need to do a dataflow analysis (using Analyzer and Interpreter as you suggest) which should be able to provide the range of possible values for each variable. This won't give you definite values in the general case, but will in the specific cases of variables that are only assigned once or assigned multiple times, but unconditionally.

it's not related to asm and bytecode manipulation, but just in case -
if method foo belongs to a class with interface method foo you may use Proxy to wrap interface implementation and intercept method names.
Also, you may found this answer useful for ASM bytecode modifications.

Visiting arrays access using ASM

I'd like to know if it's possible to trace access to an array using ASM API.
My goal is to determine which index of an array is accessed, and when (this part is easy - using System.NanoTime() ). I just couldn't find a way to determine which index is being accessed.
I have been trying to use those following without any success - visitFieldInsn (for static and non static vars of a class ), visitVarInsn ( for static and nonstatic local variables ), and visitMultiANewArrayInsn - which didn't really recognize any array.

The particular index is not part of the instruction. You have to peek at the value at top of the operand stack to find out which index the instruction refers to. See the JVM reference.
You don't want to havoc the operand stack however, so when you encounter an array-access instruction, perform a DUP do duplicate the top of the stack (duplicate the index the instruction refers to) and then print the value or do whatever you like with it and then continue by visiting the original instruction.
You should know however that there are multiple different instructions to access an array:
aaload, iaload, laload, saload, baload, caload and daload for reading, and
aastore, iastore, lastore, sastore, bastore, castore and dastore for writing

Its worth noting that nanoTime() takes about 100x long that the array access itself. This could significatly skew results.
Have you tried looking at your code with the ASMifier. This should show you what events are triggered by you code.
BTW you can replace the array lookups with method calls e.g.
public static int arrayGet(int[] int. int index)
This will allow you to put in Java whatever you want it to do when an int[] is accessed.

Code analyzers: PMD & FindBugs

1. Regarding PMD:
1.1 How do I set the PMD checks, to ignore some of them, like "Variable name is too short, or too long", "Remove empty constructor, etc" - and if I do that, another warning appears that says the class must have some static methods. Basically, the class was empty, for later development, and I like to leave it that way for now.
1.2 Is it necesarry to follow this warning advice?
A class which only has private constructors should be final
1.3 What is that supposed to mean?
The class 'Dog' has a Cyclomatic Complexity of 3 (Highest = 17)
1.4 What about this one? I would love to change this, but nothing crosses my mind at the moment regarding the change:
Assigning an Object to null is a code smell. Consider refactoring.
2.Regarding FindBugs:
2.1 Is it really that bad to write to a static field, at some point later than its declaration? The following code gives me a warning:
Main.appCalendar = Calendar.getInstance();
Main.appCalendar.setTimeInMillis(System.currentTimeMillis());
where appCalendar is a static variable.
2.2 This code:
strLine = objBRdr.readLine().trim();
gives the warning:
Immediate dereference of the result of readLine()
where objBRdr is a BufferedReader(FileReader). What could happen? readLine() could be null?
The code is nested in while (objBRdr.ready()) test, and so far, I have zero problems there.
Update1: 2.2 was fixed when I replaced the code with:
strLine = objBRdr.readLine();
if (strLine != null) {
strLine = strLine.trim();
}

1.1 How do i set the PMD checks [...]
PMD stores rule configuration in a special repository referred to as the Ruleset XML file. This configuration file carries information about currently installed rules and their attributes.
These files are located in the rulesets directory of the PMD distribution. When using PMD with Eclipse, check Customizing PMD.
1.2 Is it necessary to follow this warning advice?
A class which only has private constructors should be final
All constructors always begin by calling a superclass constructor. If the constructor explicitly contains a call to a superclass constructor, that constructor is used. Otherwise the no-argument constructor is implied. If the no-argument constructor does not exist or is not visible to the subclass, you get a compile-time error.
So it's actually not possible to derive a subclass from a class whose every constructor is private. Marking such a class as final is thus a good idea (but not necessary) as it explicitly prevent subclassing.
1.3 What is that supposed to mean?
The class 'Dog' has a Cyclomatic Complexity of 3 (Highest = 17)
The complexity is the number of decision points in a method plus one for the method entry. The decision points are 'if', 'while', 'for', and 'case labels'. Generally, 1-4 is low complexity, 5-7 indicates moderate complexity, 8-10 is high complexity, and 11+ is very high complexity.
Having that said, I'll just quote some parts of Aggregate Cyclomatic complexity is meaningless:
[...] This metric only has meaning in the context of a single method. Mentioning that a class has a Cyclomatic complexity of X is essentially useless.
Because Cyclomatic complexity measures
pathing in a method, every method has
at least a Cyclomatic complexity of 1,
right? So, the following getter method
has a CCN value of 1:
public Account getAccount(){
return this.account;
}
It’s clear from this boogie method
that account is a property of this
class. Now imagine that this class has 15 properties and follows the typical getter/setter paradigm for each property and those are the only methods available. That means the class has 30 simple methods, each with a Cyclomatic complexity value of 1. The aggregate value of the class is then 30.
Does this value have any meaning, man?
Of course, watching it over time may
yield something interesting; however,
on its own, as an aggregate value, it
is essentially meaningless. 30 for the
class means nothing, 30 for a method
means something though.
The next time you find yourself
reading a copasetic aggregate
Cyclomatic complexity value for a
class, make sure you understand how
many methods the class contains. If
the aggregate Cyclomatic complexity
value of a class is 200– it shouldn’t
raise any red flags until you know the
count of methods. What’s more, if you
find that the method count is low yet
the Cyclomatic complexity value is
high, you will almost always find the
complexity localized to a method.
Right on!
So to me, this PMD rule should be taken with care (and is actually not very valuable).
1.4 What about this one? I would love to change this, but nothing crosses my mind at the moment regarding the change:
Assigning an Object to null is a code smell. Consider refactoring.
Not sure what you don't get about this one.
2.1 Is it really that bad to write to a static field, at some point later than its declaration? [...]
My guess is that you get a warning because the method contains an unsynchronized lazy initialization of a non-volatile static field. And because the compiler or processor may reorder instructions, threads are not guaranteed to see a completely initialized object, if the method can be called by multiple threads. You can make the field volatile to correct the problem.
2.2 [...] Immediate dereference of the result of readLine()
If there are no more lines of text to read, readLine() will return null and dereferencing that will generate a null pointer exception. So you need indeed to check if the result is null.

Here some idea / answer
1.4 What is the reason to assign null to a object? If you reuse the same variable, there's not reason to set it to null before.
2.1 The reason about this warning, is to be sure that all your instance of the class Main have the same static fields. In your Main class, you could have
static Calendar appCalendar = Calendar.getInstance() ;
about your 2.2 you're right, with the null check, you are sure that you'll not have any NullPointerException. We never know when your BufferedReader can block/trash, this doesn't happen often (in my experience) but we never know when a hard drive crash.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ASM - strange localVar index using newLocal from LocalVariableSorter - java

Related

What exactly happens in the JVM when invoking an object's instance method?

How to increment a value in Java Stream?

Access variable/constant values in method call

Visiting arrays access using ASM

Code analyzers: PMD & FindBugs

Categories

Resources