Visiting arrays access using ASM

Visiting arrays access using ASM - java

I'd like to know if it's possible to trace access to an array using ASM API.
My goal is to determine which index of an array is accessed, and when (this part is easy - using System.NanoTime() ). I just couldn't find a way to determine which index is being accessed.
I have been trying to use those following without any success - visitFieldInsn (for static and non static vars of a class ), visitVarInsn ( for static and nonstatic local variables ), and visitMultiANewArrayInsn - which didn't really recognize any array.

The particular index is not part of the instruction. You have to peek at the value at top of the operand stack to find out which index the instruction refers to. See the JVM reference.
You don't want to havoc the operand stack however, so when you encounter an array-access instruction, perform a DUP do duplicate the top of the stack (duplicate the index the instruction refers to) and then print the value or do whatever you like with it and then continue by visiting the original instruction.
You should know however that there are multiple different instructions to access an array:
aaload, iaload, laload, saload, baload, caload and daload for reading, and
aastore, iastore, lastore, sastore, bastore, castore and dastore for writing

Its worth noting that nanoTime() takes about 100x long that the array access itself. This could significatly skew results.
Have you tried looking at your code with the ASMifier. This should show you what events are triggered by you code.
BTW you can replace the array lookups with method calls e.g.
public static int arrayGet(int[] int. int index)
This will allow you to put in Java whatever you want it to do when an int[] is accessed.

Related

ASM - strange localVar index using newLocal from LocalVariableSorter

I'm adding new locals via newLocal from LocalVariableSorter. The method I'm adding the locals to is an instance method with a long parameter. I'm adding two locals; one long, one object. There are no other local vars in the sample code.
As a result I would have expected the following slots / indexes:
0 - this
1 - the long param
3 - my 1st local added via `newLocal` - using two slots as it is a long
5 - my 2nd local added via `newLocal`
What I do get as return from newLocal is 3 and 7 though. Why such a big gap?
And to make things even more strange, when I add xSTORE instructions using those indexes and check the result with javap it shows me:
LSTORE 5
ASTORE 8
Note: Not only are the values different then the ones I've passed to the xSTORE instruction, also the gap between them is now 3 instead of 4 as before.
The resulting code works though. I would just like to understand what magic is happening here an why.
Thanks

The LocalVariableSorter class has a design, which makes it very easy to use it wrong.
When calling methods defined by the MethodVisitor API on it, the local variables undergo the renumbering mentioned in the class documentation.
So when being used with a ClassReader, the visited old code gets transformed. Since you do not want the injected new code to undergo this transformation, but to use the newly defined variable(s), you have to bypass the LocalVariableSorter and call methods on the underlying target MethodVisitor.
When you call visitVarInsn(LSTORE, 3) on the LocalVariableSorter, it gets handled like an old instruction referring to index 3 and since you injected a new variable occupying index 3 and 4, the “old variable” at index 3 gets remapped to the next free index, which is 5 (and 6). Then, when you define your next new variable, it gets index 7 and calling visitVarInsn(ASTORE, 7) on the LocalVariableSorter is handled like an old variable which conflicts with your new variable, so it gets remapped to 8.
This behavior matches exactly what the first sentence of the class documentation states:
LocalVariablesSorter
A MethodVisitor that renumbers local variables in their order of appearance.
So while you have to call newLocal on the LocalVariableSorter to create a new variable that won’t get remapped, you have to call the visit… methods on the original, wrapped MethodVisitor to use it. When you use the subclass GeneratorAdapter, you can use its newly defined methods (those not starting with visit…) to create new instructions which don’t get transformed, but to me, this would make matters even worse, having methods for transforming instructions and creating untransformed instructions on the same class and always needing to keep in mind that the visit… prefix makes the difference. For some methods, you would still need to access the original method visitor, as discussed in this answer which deals with visitLocalVariable to create debug information for the created variable.

How to increment a value in Java Stream?

I want to increment value of index with the each iteration by 1. Easily to be achieved in the for-loop. The variable image is an array of ImageView.
Here is my for-loop.
for (Map.Entry<String, Item> entry : map.entrySet()) {
image[index].setImage(entry.getValue().getImage());
index++;
}
In order to practise Stream, I have tried to rewrite it to the Stream:
map.entrySet().stream()
.forEach(e -> item[index++].setImage(e.getValue().getImage()));
Causing me the error:
error: local variables referenced from a lambda expression must be final or effectively final
How to rewrite the Stream incrementing the variable index to be used in?

You shouldn't. These two look similar, but they are conceptually different. The loop is just a loop, but a forEach instructs the library to perform the action on each element, without specifying neither the order of actions (for parallel streams) nor threads which will execute them. If you use forEachOrdered, then there are still no guarantees about threads, but at least you have the guarantee of happens-before relationship between actions on subsequent elements.
Note especially that the docs say:
For any given element, the action may be performed at whatever time
and in whatever thread the library chooses. If the action accesses
shared state, it is responsible for providing the required
synchronization.
As #Marko noted in the comments below, though, it only applies to parallel streams, even if the wording is a bit confusing. Nevertheless, using a loop means that you don't even have to worry about all this complicated stuff!
So the bottom line is: use loops if that logic is a part of the function it's in, and use forEach if you just want to tell Java to “do this and that” to elements of the stream.
That was about forEach vs loops. Now on the topic of why the variable needs to be final in the first place, and why you can do that to class fields and array elements. It's because, like it says, Java has the limitation that anonymous classes and lambdas can't access a local variable unless it never changes. Meaning not only they can't change it themselves, but you can't change it outside them as well. But that only applies to local variables, which is why it works for everything else like class fields or array elements.
The reason for this limitation, I think, is lifetime issues. A local variable exists only while the block containing it is executing. Everything else exists while there are references to it, thanks to garbage collection. And that everything else includes lambdas and anonymous classes too, so if they could modify local variables which have different lifetime, that could lead to problems similar to dangling references in C++. So Java took the easy way out: it simply copies the local variable at the time the lambda / anonymous class is created. But that would lead to confusion if you could change that variable (because the copy wouldn't change, and since the copy is invisible it would be very confusing). So Java just forbids any changes to such variables, and that's that.
There are many questions on the final variables and anonymous classes discussed already, like this one.

Some kind of "zip" operation would be helpful here, though standard Stream API lacks it. Some third-party libraries extending Stream API provide it, including my free StreamEx library:
IntStreamEx.ints() // get stream of numbers 0, 1, 2, ...
.boxed() // box them
.zipWith(StreamEx.ofValues(map)) // zip with map values
.forKeyValue((index, item) -> image[index].setImage(item.getImage()));
See zipWith documentation for more details. Note that your map should have meaningful order (like LinkedHashMap), otherwise this would be pretty useless...

Optimizing Java Array Copy

So for my research group I am attempting to convert some old C++ code to Java and am running into an issue where in the C++ code it does the following:
method(array+i, other parameters)
Now I know that Java does not support pointer arithmetic, so I got around this by copying the subarray from array+i to the end of array into a new array, but this causes the code to run horribly slow (I.e. 100x slower than the C++ version). Is there a way to get around this? I saw someone mention a built-in method on here, but is that any faster?

Not only does your code become slower, it also changes the semantic of what is happening: when you make a call in C++, no array copying is done, so any change the method may apply to the array is happening in the original, not in the throw-away copy.
To achieve the same effect in Java change the signature of your function as follows:
void method(array, offset, other parameters)
Now the caller has to pass the position in the array that the method should consider the "virtual zero" of the array. In other words, instead of writing something like
for (int i = 0 ; i != N ; i++)
...
you would have to write
for (int i = offset ; i != offset+N ; i++)
...
This would preserve the C++ semantic of passing an array to a member function.

The C++ function probably relied on processing from the beginning of the array. In Java it should be configured to run from an offset into the array so the array doesn't need to be copied. Copying the array, even with System.arraycopy, would take a significant amount of time.
It could be defined as a Java method with something like this:
void method(<somearraytype> array, int offset, other parameters)
Then the method would start at the offset into the array, and it would be called something like this:
method(array, i, other parameters);

If you wish to pass a sub-array to a method, an alternative to copying the sub-array into a new array would be to pass the entire array with an additional offset parameter that indicates the first relevant index of the array. This would require changes in the implementation of method, but if performance is an issue, that's probably the most efficient way.

The right way to handle this is to refactor the method, to take signature
method(int[] array, int i, other parameters)
so that you pass the whole array (by reference), and then tell the method where to start its processing from. Then you don't need to do any copying.

Access variable/constant values in method call

I want to view arguments for method calls. So if I call foo:
x = 4;
y = 5;
...
foo(x, y, 20, 25);
I want to print the arguments(4,5,20,25)
I understand these arguments are pushed onto the stack before the method is invoked. How do I get the value(if initialized or a constant) from the method's local variable array?
visitVarInsn() and VarInsnNode do not have a way to lookup the actual value from the array.
Do I need to use an Analyzer and Interpreter to do this, or is there an easier way?
EDIT: Figured out how to do this.
I modified BasicValue and BasicInterpreter to account for bytecode instruction arguments.
So Values representing instructions like BIPUSH contain information about the value being pushed, instead of only type information.
Frames are examined the same way with an Analyzer

Constant numeric values passed directly to the method call (20 and 25) are easy to retrieve statically - they will result in push instructions that you can read in visitIntInsn. Smaller values will result in const instructions you can catch with visitInsn, large values can be caught with visitLdcInsn.
I don't believe it is generally possible to determine the values bound to variables at the point of the method call statically. You will need to do a dataflow analysis (using Analyzer and Interpreter as you suggest) which should be able to provide the range of possible values for each variable. This won't give you definite values in the general case, but will in the specific cases of variables that are only assigned once or assigned multiple times, but unconditionally.

it's not related to asm and bytecode manipulation, but just in case -
if method foo belongs to a class with interface method foo you may use Proxy to wrap interface implementation and intercept method names.
Also, you may found this answer useful for ASM bytecode modifications.

How to inspect the stack using an ASM visitor?

I am attempting to use the Java byte code engineering library ASM to perform static analysis. I have the situation where I would like to inspect the variables being assigned to a field.
I have MethodVisitor which implements the visitFieldInsn() method. I am specifically looking for the putfield command. That is no problem. The problem is that when I encounter putfield, I want to be able to access the variable that's going to be assigned to the field. Specifically I want to access information about the type of the variable.
At the moment I really only need to look at what's at the top of the stack, but if there's a more general way to inspect it that's even better.
Is there a way using ASM to inspect the variables on the stack?

First of all, if you can assume that bytecode is valid, the type of value assigned to a field should match the field type, which you can read in advance using ClassReader API.
However if you need to track where each individual value on a stack or variable slot for given instruction pointer came from, you can use the Analyzer API with SourceInterpreter. Basically it would allow to find instruction that produced given value and you can use information about that instruction to deduce a type (e.g. if it reads from a variable which corresponds to a method parameter or if value been returned from a method call, so in both cases you can get the type from method descriptor). Also see my old blog post that has an example of using SourceInterpreter.

I am not familiar with ASM, but I have done something that sounds similar with the Eclipse Java AST framework. To know about variables, I had to keep track of variable declarations myself in the appropriate visitX() methods of the AST visitor. It wasn't very difficult once I knew which AST nodes corresponded to variable declarations.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Visiting arrays access using ASM - java

Related

ASM - strange localVar index using newLocal from LocalVariableSorter

How to increment a value in Java Stream?

Optimizing Java Array Copy

Access variable/constant values in method call

How to inspect the stack using an ASM visitor?

Categories

Resources