How to inspect the stack using an ASM visitor?

How to inspect the stack using an ASM visitor? - java

I am attempting to use the Java byte code engineering library ASM to perform static analysis. I have the situation where I would like to inspect the variables being assigned to a field.
I have MethodVisitor which implements the visitFieldInsn() method. I am specifically looking for the putfield command. That is no problem. The problem is that when I encounter putfield, I want to be able to access the variable that's going to be assigned to the field. Specifically I want to access information about the type of the variable.
At the moment I really only need to look at what's at the top of the stack, but if there's a more general way to inspect it that's even better.
Is there a way using ASM to inspect the variables on the stack?

First of all, if you can assume that bytecode is valid, the type of value assigned to a field should match the field type, which you can read in advance using ClassReader API.
However if you need to track where each individual value on a stack or variable slot for given instruction pointer came from, you can use the Analyzer API with SourceInterpreter. Basically it would allow to find instruction that produced given value and you can use information about that instruction to deduce a type (e.g. if it reads from a variable which corresponds to a method parameter or if value been returned from a method call, so in both cases you can get the type from method descriptor). Also see my old blog post that has an example of using SourceInterpreter.

I am not familiar with ASM, but I have done something that sounds similar with the Eclipse Java AST framework. To know about variables, I had to keep track of variable declarations myself in the appropriate visitX() methods of the AST visitor. It wasn't very difficult once I knew which AST nodes corresponded to variable declarations.

Related

Do something when a variable is (re)assigned Java

This is a far-fetched question and I am not sure how to approach this problem, so I am open to other workarounds or proposals. As far as I am aware, what I am trying to do is impossible, but I'd like a second input.
Assume we have the following Java code:
int val = 4;
I am curious as to if some sort of function is called when this statement is executed. An overridable function that assigns a given memory location to this value, or something of that nature.
My objective would be to override that function and store this data here and in a file elsewhere as well.
This would need to work for all data types and for reassignments such as that shown below.
val = getNumber(); // Returns 6;
I would have some sort of direction if I was working with Python, but unfortunately, that is not the case.
My best idea for a solution is to call a function that simply returns a provided argument. Due to the application of this, I'd like to avoid this and keep the usage of this framework as conventional as possible.
Thanks!

I don't think any kind of function happens when we assign values. However when we assign a value to a primitive type(int, double...) variable the value is stored in the stack memory. If the data is reference type (String...), then it is stored in the heap memory. Only the reference address will be stored in the stack. Whenever you decide to change the state of that particular variable (field value) the new value will be stored in the stack overriding the previous value. So, you don't have to worry about methods to override using a method.
If you want to deny access to a variable outside the class, but still change the state of that variable, then you can use encapsulation concept of OOP in java.
For further clarification refer this article about stack vs. heap

Create dynamic classes with reserved words as variables

This question was once asked without a satisfactory answer besides "why would you want to do this" at Reserved words as variable or method names. I'm going to ask it again, and provide context that explains why it is necessary, and even the direction to a proper solution.
I am writing code that builds classes dynamically to match the schema of a database, which I have no control over. For the most part, the code is working cleanly, but in about .1% of the column cases, there are reserved words in Java being used as column names. The following code is being used to create the dynamic field in the class:
evalClass.addField(CtField.make("public " + columnType + " " + columnName + ";", evalClass));
Now, with Java the language, this results in an issue, however in JVM byte code, this should be perfectly legal, so there should be a way to dynamically create this field and access it using byte-code operations. Does anybody have any examples of how this would be done in a way that would support arbitrary strings, including spaces and reserved words? Thanks!

It's not clear which part you are stuck on. Any bytecode manipulation library should let you do this.
For example, using ASM, you just pass your string directly to visitField. There's no hoops to jump through or anything.
Note that even at the bytecode level, there are still a few restrictions on field names. In particular, they can't be more than 65535 bytes long in MUTF8 encoding.

You picked the only way where this doesn’t work—Javassist’s source level API. It should be obvious to you that if you use the identifier to construct source code, the identifier must adhere to the source code rules. Besides, using the already known intended structure to construct source code which has to be parsed again to reconstitute the intention, is the most inefficient way of processing byte code.
You can use the Bytecode level API to overcome these limitations. As a side note, most other byte code processing libraries do not have a source level API at all, so with them you would have used a byte code level API right from the start.
That said, you should rethink your premise. Generated classes whose fields can only be accessed via Reflection or other generated code, do not offer any advantage over, e.g. a HashMap mapping from identifiers to values or arrays intrinsically associating columns with positions.

Access variable/constant values in method call

I want to view arguments for method calls. So if I call foo:
x = 4;
y = 5;
...
foo(x, y, 20, 25);
I want to print the arguments(4,5,20,25)
I understand these arguments are pushed onto the stack before the method is invoked. How do I get the value(if initialized or a constant) from the method's local variable array?
visitVarInsn() and VarInsnNode do not have a way to lookup the actual value from the array.
Do I need to use an Analyzer and Interpreter to do this, or is there an easier way?
EDIT: Figured out how to do this.
I modified BasicValue and BasicInterpreter to account for bytecode instruction arguments.
So Values representing instructions like BIPUSH contain information about the value being pushed, instead of only type information.
Frames are examined the same way with an Analyzer

Constant numeric values passed directly to the method call (20 and 25) are easy to retrieve statically - they will result in push instructions that you can read in visitIntInsn. Smaller values will result in const instructions you can catch with visitInsn, large values can be caught with visitLdcInsn.
I don't believe it is generally possible to determine the values bound to variables at the point of the method call statically. You will need to do a dataflow analysis (using Analyzer and Interpreter as you suggest) which should be able to provide the range of possible values for each variable. This won't give you definite values in the general case, but will in the specific cases of variables that are only assigned once or assigned multiple times, but unconditionally.

it's not related to asm and bytecode manipulation, but just in case -
if method foo belongs to a class with interface method foo you may use Proxy to wrap interface implementation and intercept method names.
Also, you may found this answer useful for ASM bytecode modifications.

Visiting arrays access using ASM

I'd like to know if it's possible to trace access to an array using ASM API.
My goal is to determine which index of an array is accessed, and when (this part is easy - using System.NanoTime() ). I just couldn't find a way to determine which index is being accessed.
I have been trying to use those following without any success - visitFieldInsn (for static and non static vars of a class ), visitVarInsn ( for static and nonstatic local variables ), and visitMultiANewArrayInsn - which didn't really recognize any array.

The particular index is not part of the instruction. You have to peek at the value at top of the operand stack to find out which index the instruction refers to. See the JVM reference.
You don't want to havoc the operand stack however, so when you encounter an array-access instruction, perform a DUP do duplicate the top of the stack (duplicate the index the instruction refers to) and then print the value or do whatever you like with it and then continue by visiting the original instruction.
You should know however that there are multiple different instructions to access an array:
aaload, iaload, laload, saload, baload, caload and daload for reading, and
aastore, iastore, lastore, sastore, bastore, castore and dastore for writing

Its worth noting that nanoTime() takes about 100x long that the array access itself. This could significatly skew results.
Have you tried looking at your code with the ASMifier. This should show you what events are triggered by you code.
BTW you can replace the array lookups with method calls e.g.
public static int arrayGet(int[] int. int index)
This will allow you to put in Java whatever you want it to do when an int[] is accessed.

Java source refactoring of 7000 references

I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?

Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).

Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.

IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.

I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.

Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}

Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.

If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.

Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)

I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.

Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.

I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck

If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to inspect the stack using an ASM visitor? - java

Related

Do something when a variable is (re)assigned Java

Create dynamic classes with reserved words as variables

Access variable/constant values in method call

Visiting arrays access using ASM

Java source refactoring of 7000 references

Categories

Resources