Access variable/constant values in method call - java

I want to view arguments for method calls. So if I call foo:
x = 4;
y = 5;
...
foo(x, y, 20, 25);
I want to print the arguments(4,5,20,25)
I understand these arguments are pushed onto the stack before the method is invoked. How do I get the value(if initialized or a constant) from the method's local variable array?
visitVarInsn() and VarInsnNode do not have a way to lookup the actual value from the array.
Do I need to use an Analyzer and Interpreter to do this, or is there an easier way?
EDIT: Figured out how to do this.
I modified BasicValue and BasicInterpreter to account for bytecode instruction arguments.
So Values representing instructions like BIPUSH contain information about the value being pushed, instead of only type information.
Frames are examined the same way with an Analyzer

Constant numeric values passed directly to the method call (20 and 25) are easy to retrieve statically - they will result in push instructions that you can read in visitIntInsn. Smaller values will result in const instructions you can catch with visitInsn, large values can be caught with visitLdcInsn.
I don't believe it is generally possible to determine the values bound to variables at the point of the method call statically. You will need to do a dataflow analysis (using Analyzer and Interpreter as you suggest) which should be able to provide the range of possible values for each variable. This won't give you definite values in the general case, but will in the specific cases of variables that are only assigned once or assigned multiple times, but unconditionally.

it's not related to asm and bytecode manipulation, but just in case -
if method foo belongs to a class with interface method foo you may use Proxy to wrap interface implementation and intercept method names.
Also, you may found this answer useful for ASM bytecode modifications.

Related

Do something when a variable is (re)assigned Java

This is a far-fetched question and I am not sure how to approach this problem, so I am open to other workarounds or proposals. As far as I am aware, what I am trying to do is impossible, but I'd like a second input.
Assume we have the following Java code:
int val = 4;
I am curious as to if some sort of function is called when this statement is executed. An overridable function that assigns a given memory location to this value, or something of that nature.
My objective would be to override that function and store this data here and in a file elsewhere as well.
This would need to work for all data types and for reassignments such as that shown below.
val = getNumber(); // Returns 6;
I would have some sort of direction if I was working with Python, but unfortunately, that is not the case.
My best idea for a solution is to call a function that simply returns a provided argument. Due to the application of this, I'd like to avoid this and keep the usage of this framework as conventional as possible.
Thanks!
I don't think any kind of function happens when we assign values. However when we assign a value to a primitive type(int, double...) variable the value is stored in the stack memory. If the data is reference type (String...), then it is stored in the heap memory. Only the reference address will be stored in the stack. Whenever you decide to change the state of that particular variable (field value) the new value will be stored in the stack overriding the previous value. So, you don't have to worry about methods to override using a method.
If you want to deny access to a variable outside the class, but still change the state of that variable, then you can use encapsulation concept of OOP in java.
For further clarification refer this article about stack vs. heap

What exactly happens in the JVM when invoking an object's instance method?

I think I have finally found out how to word, what is giving me so much trouble in understanding: how the virtual machine can access a classes methods and use it only on a given instance (object) with the catch that the virtual machine is only being given the reference/pointer variable.
This was compounded by the fact that most visualizations of the methods interacting with the stack/heap (that is shown to most beginner Java programmers) don’t quite go deep enough into the depth I am in looking for.
I have done a lot of research, and I want to say a good summary of what I learned, and I am asking if you could please correct me where I am wrong (or elaborate further if you think there is more that could be said)! Note that I am using this portion of an article I found (I am using it more as a visual reference, I understand some of the text in the article does not pertain to the question), so please take a look at it before reading onward:
So let’s say I have a reference/pointer variable foo1 that is of type Foo(was created using a constructor called Foo). foo1 is stored on the stack, but the object it points to is stored on the heap (the Foo object having an instance variable int size;).
So I understand how foo1.size would give the integer value of size because the value of foo1 is dereferenced to get the value field of size (the reference/pointer variable has a direct address where the size field is stored on the heap in the object).
But when foo1.bar() is ran, what exactly does its bytecode translate to? And how is this method call performed at runtime (would it be correct to say the value of foo1 is being dereferenced to get method bar())?
Does it relate correctly to the diagram in the image above (all in the JVM: does it go from the reference/pointer variable foo1 on the stack to the heap which is actually a pointer to another pointer (which points to the bytecode of all the class data) full class data (in a method table which is just an array of pointers to the data for each instance method that can be invoked on objects of that class) in the method area which then itself has "pointer variables" to the actual bytecode method data)?
I apologize for how long-winded this post is, but I want to be extremely specific since I have had major trouble the past week trying to word my question properly. I know I sound sceptical of the article I am referencing, but it seems there is a lot of junk visualizations out there and I want to be sure that I’m continuing my Java programming correctly, and not on incorrect notions.
Ordinary instance method invocations get compiled to invokevirtual instructions.
This has been described in JVMS, §3.7. Invoking Methods:
The normal method invocation for a instance method dispatches on the run-time type of the object. (They are virtual, in C++ terms.) Such an invocation is implemented using the invokevirtual instruction, which takes as its argument an index to a run-time constant pool entry giving the internal form of the binary name of the class type of the object, the name of the method to invoke, and that method's descriptor (§4.3.3). To invoke the addTwo method, defined earlier as an instance method, we might write:
int add12and13() {
return addTwo(12, 13);
}
This compiles to:
Method int add12and13()
0 aload_0 // Push local variable 0 (this)
1 bipush 12 // Push int constant 12
3 bipush 13 // Push int constant 13
5 invokevirtual #4 // Method Example.addtwo(II)I
8 ireturn // Return int on top of operand stack;
// it is the int result of addTwo()
The invocation is set up by first pushing a reference to the current instance, this, on to the operand stack. The method invocation's arguments, int values 12 and 13, are then pushed. When the frame for the addTwo method is created, the arguments passed to the method become the initial values of the new frame's local variables. That is, the reference for this and the two arguments, pushed onto the operand stack by the invoker, will become the initial values of local variables 0, 1, and 2 of the invoked method.
It’s up to the particular JVM implementation, how to perform the invocation at runtime, but using a vtable is very common. This basically matches the graphic in your question. The reference to the receiver object, which will become the this reference for the invoked method, is used to retrieve a method table.
In the HotSpot JVM, the metadata structure is called Klass (actually a common name, even across different implementations). See “Object header layout” on the OpenJDK Wiki:
An object header consists of a native-sized mark word, a klass word, a 32-bit length word (if the object is an array), a 32-bit gap (if required by alignment rules), and then zero or more instance fields, array elements, or metadata fields. (Interesting Trivia: Klass metaobjects contain a C++ vtable immediately after the klass word.)
When resolving a symbolic reference to a method, its corresponding index in the table will be identified and remembered for subsequent invocations, as it never changes. Then, the entry of the actual object’s class can be used for the invocation. Subclasses will have the entries of the superclass, new methods appended to the end, with the entries of overridden methods replaced.
This is the simple, unoptimized scenario. Most runtime optimizations work better when methods are inlined, to have the context of caller and callee in one piece of code to transform. Therefore, the HotSpot JVM will attempt inlining even for invokevirtual instructions to potentially overridable methods. As the wiki says:
Virtual (and interface) invocations are often demoted to "special" invocations, if the class hierarchy permits it. A dependency is registered in case further class loading spoils things.
Virtual (and interface) invocations with a lopsided type profile are compiled with an optimistic check in favor of the historically common type (or two types).
Depending on the profile, a failure of the optimistic check will either deoptimize or run through a (slow) vtable/itable call.
On the fast path of an optimistically typed call, inlining is common. The best case is a de facto monomorphic call which is inlined. Such calls, if back-to-back, will perform the receiver type check only once.
This aggressive or optimistic inlining will sometime require Deoptimization but will usually yield an overall higher performance.

How do I list all local variables within a Java method / function?

My main question: I know you can generically output class fields with reflection, even if you do not know the variable names, types, or even how many there are. However, is there a way to list all variables within the current function or current scope, assuming I do not know what the variable names are?
In other words:
int x = 5;
int y = 42;
// some more code
//Now I want to println x and y, but assuming I cannot use "x" or "y".
I'd also be happy with an answer to this question:
Let's say I'm allowed to store the names of all variables, does that help? e.g.:
Set<String> varNames = new HashSet<String>();
int x = 5;
varNames.add("x");
int y = 42;
varNames.add("y");
// some more code
//Now with varNames, can I output x and y without using "x" or "y"?
Why am I asking this? I am translating XYZ language(s) to java using ANTLR, and I would like to provide a simple method to output the entire state of the program at any point in time.
Third possible solution I'd be happy with: If this is not possible in Java, is there any way I can write byte-code for a function that visits the calling function and examines the stack? This would also solve the problem.
What would be amazing is if Java had the equivalent of Python's eval() or php's get_defined_vars().
If it makes a difference, I'm using Java 6, but anything for Java 5, 6, or 7 should be good.
Thanks!
You can't, as far as I'm aware. At least, not with normal Java code. If you're able to run the bytecode through some sort of post-processor before running it, and assuming you're still building with the debug symbols included, then you could autogenerate the code to do it - but I don't believe there's any way of accessing local variables in the current stack frame via reflection.
If you do not want to use this as part of the normal execution path of your program but just for debugging, then use the Java platform debugger architecture (JPDA). Essentially, you would write your own debugger, set a breakpoint and use the JDI API to query the state of the program. Local variables can be listed with StackFrame#visibleVariables().
If the above is not an option, it will be very difficult to achieve. To get the variables names, you could parse the class file and read the local variable table attribute of the method. However, the only way to get the value of a local variable is via the aload/iload/etc. bytecode instructions. These have to be present in the method that you want to analyze, so you cannot put this functionality into a different helper method.

Visiting arrays access using ASM

I'd like to know if it's possible to trace access to an array using ASM API.
My goal is to determine which index of an array is accessed, and when (this part is easy - using System.NanoTime() ). I just couldn't find a way to determine which index is being accessed.
I have been trying to use those following without any success - visitFieldInsn (for static and non static vars of a class ), visitVarInsn ( for static and nonstatic local variables ), and visitMultiANewArrayInsn - which didn't really recognize any array.
The particular index is not part of the instruction. You have to peek at the value at top of the operand stack to find out which index the instruction refers to. See the JVM reference.
You don't want to havoc the operand stack however, so when you encounter an array-access instruction, perform a DUP do duplicate the top of the stack (duplicate the index the instruction refers to) and then print the value or do whatever you like with it and then continue by visiting the original instruction.
You should know however that there are multiple different instructions to access an array:
aaload, iaload, laload, saload, baload, caload and daload for reading, and
aastore, iastore, lastore, sastore, bastore, castore and dastore for writing
Its worth noting that nanoTime() takes about 100x long that the array access itself. This could significatly skew results.
Have you tried looking at your code with the ASMifier. This should show you what events are triggered by you code.
BTW you can replace the array lookups with method calls e.g.
public static int arrayGet(int[] int. int index)
This will allow you to put in Java whatever you want it to do when an int[] is accessed.

How to inspect the stack using an ASM visitor?

I am attempting to use the Java byte code engineering library ASM to perform static analysis. I have the situation where I would like to inspect the variables being assigned to a field.
I have MethodVisitor which implements the visitFieldInsn() method. I am specifically looking for the putfield command. That is no problem. The problem is that when I encounter putfield, I want to be able to access the variable that's going to be assigned to the field. Specifically I want to access information about the type of the variable.
At the moment I really only need to look at what's at the top of the stack, but if there's a more general way to inspect it that's even better.
Is there a way using ASM to inspect the variables on the stack?
First of all, if you can assume that bytecode is valid, the type of value assigned to a field should match the field type, which you can read in advance using ClassReader API.
However if you need to track where each individual value on a stack or variable slot for given instruction pointer came from, you can use the Analyzer API with SourceInterpreter. Basically it would allow to find instruction that produced given value and you can use information about that instruction to deduce a type (e.g. if it reads from a variable which corresponds to a method parameter or if value been returned from a method call, so in both cases you can get the type from method descriptor). Also see my old blog post that has an example of using SourceInterpreter.
I am not familiar with ASM, but I have done something that sounds similar with the Eclipse Java AST framework. To know about variables, I had to keep track of variable declarations myself in the appropriate visitX() methods of the AST visitor. It wasn't very difficult once I knew which AST nodes corresponded to variable declarations.

Categories

Resources