I am trying to use the ASM bytecode tree API for static analysis of Java Code.
I have a ClassNode cn, MethodNode m and the list of instructions in that method say InsnList list.
Suppose for a given instruction( i.e. AbstractInsnNode) s, I need to find all the definitions/assignments of the variable at s in the above instruction list. To make it more clear, suppose a variable var is defined and initialized on line 2, then assigned some other value on line number 8 and then used on line number 12. Line number 12 is my s, in this case. Also, assume lots of conditional code in the lines in between.
Is this possible to do with ASM? How??
Thanks and Regards,
SJ
For clarity,
public void funcToAnalyze(String k, SomeClass v) {
int numIter = 0;
/*
Do cool stuff here.... modifies member variables and passed params too
*/
if (v.rank > 1 || numIter>200) {
magicFunction(k, 1);
}
}
Here, suppose the conditional is the JumpInsnNode (current instruction) and I need to find if (and where) any of the variables in the conditional (v.rank and numIter in this case) are modified or assigned anywhere in the above code. Keep it simple, just member variables (no static function or delegation to function of another class).
The SourceInterpreter computes SourceValues
for each Frame for a corresponding instruction in MethodNode. Basically it tells which instructions could place value to a given variable or stack slot.
Also see ASM User Guide for more information about ASM analysis package.
However if you just need to detect if certain variable been assigned, then all you have to do is to look for xSTORE instructions with corresponding variable indexes.
Related
I am trying to understand Interpreter Design Pattern in Java. I am getting the following code from Wikipedia. There is written
interface Expression {
public int interpret(Map<String,Expression> variables);
}
Could you explain me what is going on here with respect to Expression being the value of Map, which is inside in an Interface of Type Expression. Is it something like recursive calling ? Kindly explain.
To answer your question, Yes same function interpret() is called again and again but they are not of the same class. Hence it is not actually recursive function.
Used wiki code to explain the Interpreter pattern, but you need to go to and fro to wiki page to understand the whole picture,
http://en.wikipedia.org/wiki/Interpreter_pattern
Interpreter pattern is the one in which each and every variable and the operator in the given expression is represented as a separate class, and then the evaluation happens on the objects of these classes.
Expression - class, expression - ((y+z)-x)
In the case of the wiki example which you pointed out, when you call the constructor of Evaluator in the main(), only expression will be constructed (again its another Expression object) and saved in the syntaxTree reference variable of Evaluator.
To give a gist of whats happening there with expression: x y z + -
First variables x,y,z will be stored as such in the expressionStack variable
When you encounter +, (y+z) will be pushed into the expressionStack
After - token, ((y+z)-x) Expression object will be in the expressionStack (check the push and pop which is happening for the operator in Evaluator)
so once the constructor of Evaluator is done, you will have Expression object whose implementation is again an Expression denoted as ((y+z)-x).
Now comes the interesting part in main(), you are substituting values for the variable (x,y,z) using Number class and it happens in this order,
main () sentence.interpret(variables);
Evaluator.syntaxTree.interpret(variables);
Variable.interpret(variables) // Here the actual values(5,10,42) gets substituted for x, y, z.
and then the Expression is evaluated.
If you see interpret() of Variables class, it is slightly different where it gets the corresponding Number object of the variable, using the context passed. This is the actual substitution of variables to numbers using the context object passed in main(). This in turn calls the interpret() of Number which just returns the number and the operation happens as ((10+5)-42) = -27.
Advantage:
By using this technique you can keep on adding operations (plus, minus) without affecting the existing operations and one operation is independent of other. It is used in SQL queries and other interpreters.
Thanks,
Prasanna V.
An interface defines methods that a class has to implement if it has the interface.
class MathExpression implements Expression {
public int interpret(Map<String,Expression> variables) {
//insert code here
}
}
I wouldn't describe it as a recursive call. An accurate description is the interface is self-referencing itself in a method call.
Expression expression = new MathExpression();
expression.interpret(stringToExpressionMap);
The advantage of this is you can define a behavior in that class without having to know the specific implementation of that class.
I have a rather simple question about variable scope.
I am familiar with the Enhanced For-Loops but I do not get why I should declare a new variable to keep each element. One example might clarify my question:
int[] ar = {1, 2, 3};
int i = 0;
for(i : ar) { // this causes an error if I do not declare a new variable: int i
// for(int i : ar) // this works fine
System.out.println(i);
}
So why I should declare this new variable? After all i is accessible inside the for loop. I did not want to use any previous value of i, just did not want to declare a new variable. (I guessed for other iterable items it might be faster using the same variable).
I guess that's how Enhanced For-Loops were built but does not this break the whole scope idea?
There is a question rising from the above behavior. Whether the compiler uses the same variable for the whole for loop and just updates its value or it creates a new variable for each iteration?
An interesting part is that if I keep both declaration of int i (before and inside the for loop) I even get a compiler error about
Duplicate local variable i
which makes (at least for me) things a bit more strange. So I cannot use the previous declared variable i inside the for loop but neither can I declare a new one inside it with the same name.
So why I should declare this new variable?
Because that's the way the syntax is defined.
After all i is accessible inside the for loop.
That's semantics. It's irrelevant to syntax.
I did not want to use any previous value of i, just did not want to declare a new variable. (I guessed for other iterable items it might be faster using the same variable).
Don 't guess about performance. Test and measure. But in this case there's nothing to measure, because any working code is faster than any non-working code.
Does this means that I have a local variable that gets different values or a different variable in each loop?
From a language point of view you have a different variable in each iteration. That’s why you can write:
for(final ItemType item: iterable) {
…
}
which makes a great difference as you can create inner class instances within the loop referring to the current element. With Java 8 you can use lambdas as well and even omit the final modifier but the semantic does not change: you don’t get the surprising results like in C#.
I guessed for other iterable items it might be faster using the same variable
That’s nonsense. As long as you don’t have a clue of how the produced code looks like you shouldn’t even guess.
But if you are interested in the details of Java byte code: within a stack frame local variables are addressed by a number rather than by a name. And the local variables of your program are mapped to these storage locations by reusing the storage of local variables that went out of scope. It makes no difference whether the variable exists during the entire loop or is “recreated” on every iteration. It will still occupy just one slot within the stack frame. Hence, trying to “reuse local variables” on a source code level makes no sense at all. It just makes your program less readable.
Just to have the reference here: The JLS Section 14.14.2, The enhanced for statement defines the enhanced for-loop to have the following structure (relevant for this question):
EnhancedForStatement:
for ( {VariableModifier} UnannType VariableDeclaratorId : Expression ) Statement
where UnannType can be summarized to be "a type" (primitive, reference...). So giving the type of the loop variable is simply obligatory according to the language specification - causing the (admittedly: somewhat confusing) observations described in the question.
The int i in the program is visible to the for loop and maybe other for loops beneath it (if present) under the same scope. But the i inside the for(int i : ar) is local to the for loop. Hence ending once the execution of loop is over. Thats the syntax defined for foreach loop that "you have to use a variable with scope limited to the loop".
So why I should declare this new variable? After all i is accessible inside the for loop. I did not want to use any previous value of i, just did not want to declare a new variable. (I guessed for other iterable items it might be faster using the same variable).
Why would there be any considerable performance benefit if you use the same variable tiny primitive variable over and over versus creating a one only when needed and which gets destroyed after loop ends.
I don't think anyone has answered the original question beyond just declaring that that is the syntax. We all know that that is the syntax. The question is, logically speaking, why?
After all, you can use a variable defined just before a loop as the loop variable, as long as the loop is a non-enhanced for loop!
I want to view arguments for method calls. So if I call foo:
x = 4;
y = 5;
...
foo(x, y, 20, 25);
I want to print the arguments(4,5,20,25)
I understand these arguments are pushed onto the stack before the method is invoked. How do I get the value(if initialized or a constant) from the method's local variable array?
visitVarInsn() and VarInsnNode do not have a way to lookup the actual value from the array.
Do I need to use an Analyzer and Interpreter to do this, or is there an easier way?
EDIT: Figured out how to do this.
I modified BasicValue and BasicInterpreter to account for bytecode instruction arguments.
So Values representing instructions like BIPUSH contain information about the value being pushed, instead of only type information.
Frames are examined the same way with an Analyzer
Constant numeric values passed directly to the method call (20 and 25) are easy to retrieve statically - they will result in push instructions that you can read in visitIntInsn. Smaller values will result in const instructions you can catch with visitInsn, large values can be caught with visitLdcInsn.
I don't believe it is generally possible to determine the values bound to variables at the point of the method call statically. You will need to do a dataflow analysis (using Analyzer and Interpreter as you suggest) which should be able to provide the range of possible values for each variable. This won't give you definite values in the general case, but will in the specific cases of variables that are only assigned once or assigned multiple times, but unconditionally.
it's not related to asm and bytecode manipulation, but just in case -
if method foo belongs to a class with interface method foo you may use Proxy to wrap interface implementation and intercept method names.
Also, you may found this answer useful for ASM bytecode modifications.
So I have been using Javassist a bit lately, and I have run into a question I haven't been able to find an answer to. The insertAt method of CtMethod allows you to insert code at a specific line number, but does it overwrite that line or keep it, and how do I make it do the opposite of what it does by default? I have an application which modifies source just before runtime with Javassist, based on 'hooks' in an XML file. I want to make it so that a line can be overridden, or a line can be placed above the line instead of overriding it. Obviously there are hackish ways to do that, but I'd rather use a proper way.
The easy part
The method insertAt(int lineNumber, String src) present in CtMethod object allows injecting the code written in src before the code that was in the given line.
For instance, take the following (simple) example program:
public class TestSubject {
public static void main(String[] args) {
TestSubject testSubject = new TestSubject();
testSubject.print();
}
private void print() {
System.out.println("One"); // line 9
System.out.println("Two"); // line 10
System.out.println("Three"); // line 11
}
}
By simply coding (keep in mind that method variable must be the CtMethod representation of print method):
// notice that I said line 10, which is where the sysout of "two" is
method.insertAt(10, true, "System.out.println(\"one and an half\");");
Will inject a new sysout instruction in the class. The output of the new class will be:
one
one and an half
two
three
The hard part
Javassist does not provide an easy way to remove a line of code, so if you really want to replace it you'll have no choice than hack your way through.
How to do it? Well, let me introduce you to your new friend (if you don't know it yet), the CodeAttribute object.
The CodeAttribute object is responsible for holding the bytecode that represents the method flow besides that code attribute also has another attribute called LineNumberAttribute which helps you map the line numbers into the bytecode array. So summing up this object has everything you need!
The idea in the following example is quite simple. Relate the bytes in bytecode array with the line that should be removed and substitute the bytes by a no operation code.
Once again, method is the CtMethod representation of method print
// let's erase the sysout "Two"
int lineNumberToReplace = 10;
// Access the code attribute
CodeAttribute codeAttribute = method.getMethodInfo().getCodeAttribute();
// Access the LineNumberAttribute
LineNumberAttribute lineNumberAttribute = (LineNumberAttribute) codeAttribute.getAttribute(LineNumberAttribute.tag);
// Index in bytecode array where the instruction starts
int startPc = lineNumberAttribute.toStartPc(lineNumberToReplace);
// Index in the bytecode array where the following instruction starts
int endPc = lineNumberAttribute.toStartPc(lineNumberToReplace+1);
System.out.println("Modifying from " + startPc + " to " + endPc);
// Let's now get the bytecode array
byte[] code = codeAttribute.getCode();
for (int i = startPc; i < endPc; i++) {
// change byte to a no operation code
code[i] = CodeAttribute.NOP;
}
Running this modification in the original TestSubject class, would result in an injected class with the following output:
one
three
Summing Up
When you have the need to add a line and still keeping the existing one, you just need to use the example given in the easy part if you want to replace the line, you have to first remove the existing line using the example given in the hard part and then inject the new line using the 1st example.
Also keep in mind that in the examples I assumed you were already comfortable with the basics of javassist showing you only the juicy parts, instead of the all deal. That's why, for instance, in the examples there is no ctClass.writeFile... you still need to do it, I just left it out because I do expect you should know you have to do it.
If you need any extra help in the code examples, just ask. I'll be glad to help.
Recently I refactored the code of a 3rd party hash function from C++ to C. The process was relatively painless, with only a few changes of note. Now I want to write the same function in Java and I came upon a slight issue.
In the C/C++ code there is a C preprocessor macro that takes a few integer variables names as arguments and performs a bunch of bitwise operations with their contents and a few constants. That macro is used in several different places, therefore its presence avoids a fair bit of code duplication.
In Java, however, there is no equivalent for the C preprocessor. There is also no way to affect any basic type passed as an argument to a method - even autoboxing produces immutable objects. Coupled with the fact that Java methods return a single value, I can't seem to find a simple way to rewrite the macro.
Avenues that I considered:
Expand the macro by hand everywhere: It would work, but the code duplication could make things interesting in the long run.
Write a method that returns an array: This would also work, but it would repeatedly result into code like this:
long tmp[] = bitops(k, l, m, x, y, z);
k = tmp[0];
l = tmp[1];
m = tmp[2];
x = tmp[3];
y = tmp[4];
z = tmp[5];
Write a method that takes an array as an argument: This would mean that all variable names would be reduced to array element references - it would be rather hard to keep track of which index corresponds to which variable.
Create a separate class e.g. State with public fields of the appropriate type and use that as an argument to a method: This is my current solution. It allows the method to alter the variables, while still keeping their names. It has the disadvantage, however, that the State class will get more and more complex, as more macros and variables are added, in order to avoid copying values back and forth among different State objects.
How would you rewrite such a C macro in Java? Is there a more appropriate way to deal with this, using the facilities provided by the standard Java 6 Development Kit (i.e. without 3rd party libraries or a separate preprocessor)?
Option 3, create you own MutableInteger wrapper class.
struct MutableInteger{
public MutableInteger(int v) { this.value = value;}
public int value;
}
public void swap3( MutableInteger k, MutableInteger l, MutableInteger m) {
int t = m.value;
m.value = l.value
l.value=k.value;
k.value=t;
}
Create a separate class e.g. State
with public fields of the appropriate
type and use that as an argument to a
method
This, but as an intermediate step. Then continue refactoring - ideally class State should have private fields. Replace the macros with methods to update this state. Then replace all the rest of your code with methods that update the state, until eventually your program looks like:
System.out.println(State(System.in).hexDigest());
Finally, rename State to SHA1 or whatever ;-)