I am working on a library where we want to determine how much of our library is being used. I.E. we want to know how many methods in our library are public, but never being called.
Goal:
Static Analysis
Determine how many lines of code call each public method in package A in the current project. If the number of calls is zero, the method should be reported as such.
I belive you are looking for this eclipse plugin --> UCDetector
From the documentation (pay notice to second bullet point)
Unnecessary (dead) code
Code where the visibility could be changed to protected, default or
private
Methods of fields, which can be final
On Larger scale, if you want to do Object Level Static Analysis, look at this tool from IBM -->Structural Analysis for Java. It is really helpful for object analysis of libraries, APIs, etc.
Not exactly what you are looking for, but:
Something similar be done with code coverage tools (like Cobertura). They do not do static inspection of the source code, but instrument the bytecode to gather metrics at runtime. Of course, you need to drive the application in a way that exercises all usage pattern, and might miss the rarer code paths.
On the static analysis front, maybe these tools can help you (the Apache project uses them to check for API compatibility for new releases, seems like that task is somewhat related to what you are trying to do):
Clirr is a tool that checks Java libraries for binary and source compatibility with older releases. Basically you give it two sets of jar files and Clirr dumps out a list of changes in the public api.
JDiff is a Javadoc doclet which generates an HTML report of all the packages, classes, constructors, methods, and fields which have been removed, added or changed in any way, including their documentation, when two APIs are compared.
Client use of reflective calls is one hole in static analysis to consider. As there's no way to know for sure that a particular method isn't being called via some bizarre reflection scheme. So, maybe a combination of runtime and static analysis might be best.
I don't think you are able to measure how "often" a class or a function is needed.
There are some simple questions:
What defines, if a usage statistic of your game library is "normal" or an "outlier"? Is it wrong to kill yourself in the game too often? You would use the "killScreen" class more frequently like a good gamer.
What defines "much"? Time or usage count? POJOs will consume rare time, but are used pretty frequently.
Conclusion:
I don't know what you are trying to accomplish.
If you want to display your code dependencies, there are other tools for doing this. If you're trying to measure your code execution, there are profiler or benchmarks for Java. If you are a statistic geek, you'll be happy with RapidMiner ;)
Good luck with that!
I would suggest JDepend shows you the dependencies between packages and classes, excellent to find cyclic dependencies!
http://clarkware.com/software/JDepend.html
(it has an eclipse plugin: http://andrei.gmxhome.de/jdepend4eclipse/
and also PMD for other metrics
http://pmd.sourceforge.net/
IntelliJ has a tool to detect methods, fields, class which can have more restricted modifiers. It also a has a quick fix to apply these changes which can save you a lot of work as well. If you don't want to pay for it, you can get the 30-day eval license which is more than enough time to change your code, its not something your should need to do very often.
BTW: IntelliJ has about 650 code inspections to improve code quality, about half has automatic fixes so I suggest spend a couple of day using it to refactor/tidy up your code.
Please take a look at Dead Code Detector. It claims to do just what you are looking for: finding unused code using static analysis.
Here's are a few lists of Java code coverage tools. I haven't used any of these personally, but it might get you started:
http://java-source.net/open-source/code-coverage
http://www.codecoveragetools.com/index.php/coverage-process/code-coverage-tools-java.html
Proguard may be an option too (http://proguard.sourceforge.net/):
"Some uses of ProGuard are:
...
Listing dead code, so it can be removed from the source code.
... "
See also http://proguard.sourceforge.net/manual/examples.html#deadcode
You could write your own utility for that (within an hours after reading this) using the ASM bytecode analysis library (http://asm.ow2.org). You'll need to implement a ClassVisitor and a MethodVisitor. You'll use a ClassReader to parse the class files in your library.
Your ClassVisitor's visitMethod(..) will be called for each declared method.
Your MethodVisitor's visitMethodInsn(..) will be called for each called method.
Maintain a Map to do the counting. The keys represent the methods (see below). Here's some code:
class MyClassVisitor {
// ...
public void visit(int version, int access, String name, ...) {
this.className = name;
}
public MethodVisitor visitMethod(int access, String name, String desc, ...):
String key = className + "." + name + "#" + desc;
if (!map.containsKey() {
map.put(key, 0);
}
return new MyMethodVisitor(map);
}
// ...
}
void class MyMethodVisitor {
// ...
public visitMethodInsn(int opcode, String name, String owner, String desc, ...) {
String key = owner + "." + name + "#" + desc;
if (!map.containsKey() {
map.put(key, 0);
}
map.put(key, map.get(key) + 1);
}
// ...
}
Basically that's it. Your're starting the show with something like this:
Map<String,Integer> map = new HashMap<String,Integer>();
for (File classFile : my library) {
InputStream input = new FileInputStream(classFile);
new ClassReader(input).accept(new MyClassVisitor(map), 0);
input.close();
}
for (Map.Entry<String,Integer> entry : map.entrySet()) {
if (entry.getValue() == 0) {
System.out.println("Unused method: " + entry.getKey());
}
}
Enjoy!
Related
I need to merge two similar huge projects (1000+ classes). The second one is a fork of the first one, and it contains some country-specific behavior. The two projects diverge a lot, because svn versioning was handled very poorly.
It often happens that two classes are semantically identical. Their source codes only differ in terms of warnings, import statements, the order of some methods or variables, code formatting, comments, etc.
Is there a way to automatically check if two classes are semantically identical?
You should consider using program analysis tools like Soot. Soot has some excellent APIs to analyze code that is best suited for your purpose. For example, to check whether two classes "semantically identical", you can consider (1) whether both of the classes have same (or similar fields) (2) both of the classes has same (or similar methods).
Fields are represented as SootField in Soot. You will have all the necessary information in a SootField object that you want to use for comparison. To check the semantic similarity of two methods you can check whether their control flow graphs (CFGs) are similar or not (Details are in Section 5.7 of this guide).
Tips on how you can use soot.
If your source dir is srcDir, Java Home is javaHome and the list of classes is classNames, then you can use the following code snippet to programmatically load your classes in Soot toolset.
String sootClassPath = srcDir + ":"
+ javaHome + "/jre/lib/rt.jar:"
+javaHome + "/jre/lib/jce.jar";
Options.v().set_output_format(Options.output_format_jimple);
Options.v().set_src_prec(Options.src_prec_java);
for (String className : classNames) { // // "className" is like a.b.Myclass
Options.v().classes().add(className);
}
Options.v().set_keep_line_number(true);
Options.v().set_allow_phantom_refs(true);
Scene.v().setSootClassPath(sootClassPath);
Scene.v().loadBasicClasses();
When your classes are loaded, you can access a class like below:
SootClass sClass = Scene.v().loadClassAndSupport(className); // "className" is like a.b.Myclass
Now you can access the fields and methods of sClass like below:
Chain<SootField> fieldList = sClass.getFields(); // import soot.util.Chain;
List<SootMethod> methods = sClass.getMethods();
You can iterate the CFG of a method, like below to get the list of instructions of it,
if (method.isConcrete()) {
List<Unit> instructionList = new ArrayList<>();
Body b = method.retrieveActiveBody();
DirectedGraph g = new ExceptionalUnitGraph(b);
Iterator gitr = g.iterator();
while (gitr.hasNext()) {
Unit unit = (Unit) gitr.next();
instructionList.add(unit);
}
}
Maybe first convert 2 projects' code into UML diagrams using a tool like Architexa.
This may help identify the real function of classes in the context of the system objective.
Suspected equivalent classes can then be compared in detail.
I cannot understand why the Java compiler does not shorten names of variables, parameters, method names, by replacing them with some unique IDs.
For instance, given the class
public class VeryVeryVeryVeryVeryLongClass {
private int veryVeryVeryVeryVeryLongInt = 3;
public void veryVeryVeryVeryVeryLongMethod(int veryVeryVeryVeryVeryLongParamName) {
this.veryVeryVeryVeryVeryLongInt = veryVeryVeryVeryVeryLongParamName;
}
}
the compiled file contains all these very long names:
Wouldn't simple unique IDs speed the parsing, and also provide a first obfuscation?
You assume that obfuscation is always desired, but it isn't:
Reflection would break, and with it JavaBeans and many frameworks reliant on it
Stack traces would become completely unreadable
If you tried to code against a compiled JAR, you'd end up with code like String name = p.a1() instead of String name = p.getName()
Obfuscation is normally the very last step taken, when you're delivering the finished app, and even then it's not used particularly often except when the target platform has severe memory constraints.
When you use a class, you refer to its methods by their name. Therefore, the compiler needs to preserve those names.
In any event, I don't see why the compiler should aim to obfuscate anything. Rather, it should aim to do exactly the opposite: be as transparent as possible.
The JVM does use numeric IDs internally.
Class files cannot be obfuscated like that because Java is dynamically linked: names of members must be publicly readable or other classes cannot use your code.
Wouldn't simple unique IDs speed the parsing?
No. It would add a mapping that would probably slow it down.
and also provide a first obfuscation
Yes, but who wants the compiler to do obfuscation buy default? Not me.
Your suggestion has no merit.
I am writing a java code analyzing snippet which will find out the use of variables in a method. (to be specific how many times a global class variable is read and written in a method). Can this be done using JavaParser? Would anyone have any other recommendations? Does any one know how class metrics are calculated? They probably deal with similar things.
Thanks guys. Both your answers lead me in a direction to solution to this problem using the AST implementation in JAVAPARSER. Here's a code snippet to help others
class CatchNameExpr extends VoidVisitorAdapter {
HashMap<String, ArrayList<Integer>> variableLineNumMap;``
ArrayList<String> variableList;
boolean functionParsing = false;
public CatchNameExpr(ArrayList<String> classVariables) {
variableList=classVariables;
}
public void visit(MethodDeclaration method, Object arg) {
System.out.println("---------------");
System.out.println(method.getName());
System.out.println("---------------");
variableLineNumMap = new HashMap<String, ArrayList<Integer>>();
System.out.println();
functionParsing = true;
visit(method.getBody(),arg);
// Analyze lines for variable usage. Add to list of vars after checking if its read or written or unknown.
functionParsing = false;
}
public void visit(NameExpr n, Object arg) {
if(!functionParsing)
return;
//TODO: check if this var was declared above it, as a local var to the func. if yes, return
ArrayList<Integer> setOfLineNum;
System.out.println(n.getBeginLine()+" NameExpr " + n.getName());
if(!variableList.contains(n.getName()) || n.getName().length()==0)
return;
if (!variableLineNumMap.containsKey(n.getName()))
{
setOfLineNum = new ArrayList<Integer>();
setOfLineNum.add(n.getBeginLine());
variableLineNumMap.put(n.getName(), setOfLineNum);
}
else
{
setOfLineNum = variableLineNumMap.get(n.getName());
setOfLineNum.add(n.getBeginLine());
variableLineNumMap.put(n.getName(), setOfLineNum);
}
}
}
Instantiate the class --->
CatchNameExpr nameExp = new CatchNameExpr(classVariables);
nameExp.visit(classCompilationUnit, null);
In a similar manner you can visit the AST for the following expressions, statements, condition etc
http://www.jarvana.com/jarvana/view/com/google/code/javaparser/javaparser/1.0.8/javaparser-1.0.8-javadoc.jar!/japa/parser/ast/visitor/VoidVisitorAdapter.html
I am well aware that byte-code processor will be more efficient, and will do the job better than i can hope for. But given the time limit, this option fitted me the best.
Thanks guys,
Jasmeet
To do the task of finding usages of variables, a parser buld with ANTLR should also produce AST. I am almost sure you can find ready AST builder, but don't know where.
Another approach is to analyze class files with ASM, BCEL or other class file analyzer. I think it is easier, and would work faster. Besides, it would work for other jvm languages (e.g. Scala).
To ask questions as to whether a variable read is "global" or not, you need what amounts to a full Java compiler front end, that parses code, build symbol tables and related type information.
To the extent the compiler has actually recorded this information in your class files, you may be able to execute "reflection" operations to get your hands it. To the extent that such information is present in .class files, you can access it with class-file byte-code processor such as those mentioned in Kaigorodov's answer.
ANTLR has a grammar for Java, but I don't believe any support for symbol table construction.
You can't fake this yourself; Java's rules are too complex. You might be able to extend the ANTLR parser to do this, but it would be a LOT of work; "Java's rules are too complex".
I understand the Java compiler offers some kind of name/type accurate access to its internal structures; you might be able to use that.
Our DMS Software Reengineering Toolkit has full Java parsers, with name and type resolution, and could be used for this purpose.
Is there any way of inserting code at runtime to log return values, for instance, using instrumentation?
So far, I managed to insert code when a method exits, but I would like to log something like "method foo returned HashMap { 1 -> 2, 2 -> 3 }"
I'm looking for a general approach that can also deal with, for instance, java.io.* classes. (So in general I'll have no access to the code).
I tried using a custom classloader too, but lot of difficulties arise as I cannot modify java.* classes.
Thanks for the help!
Sergio
Check out BTrace. It's Java, and I believe it'll do what you want.
Have you considered AOP? (Aspect-oriented programming) - if by "I cannot modify java.* classes" you mean you don't have access to the uncompiled code, and cannot add configuration, etc., then that won't probably work for you. In any other case, check that link for examples using Spring-aop:
http://static.springsource.org/spring/docs/2.5.x/reference/aop.html
If not, you could consider solutions based on remote-debugging, or profiling. But they all involve "some" access to the original code, if only to enable / disable JMX access.
Well, since you're looking for everything, the only thing I can think off is using a machine agent. Machine agents hook into the low levels of the JVM itself and can be used to monitor these things.
I have not used DTrace, but it sounds like it would be able to do what you need. Adam Leventhal wrote a nice blog post about it. The link to DTrace in the blog is broken, but I'm sure a quick search and you'll come up with it.
Take a look at Spring AOP, which is quite powerful, and flexible. To start you off on the method foo, you can apply an AfterReturning advice to it as:
#Aspect
public class AfterReturningExample {
#AfterReturning(
pointcut="package.of.your.choice.YourClassName.foo()",
returning="retVal")
public void logTheFoo( Object retVal ) {
// ... logger.trace( "method 'foo' returned " + retVal ); // might need to convert "retVal" toString representation if needed
}
}
The pointcut syntax is really flexible so you can target all the sub packages, components, methods, return values given the expression.
I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?
Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).
Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.
IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.
I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.
Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}
Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.
If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.
Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)
I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.
Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.
I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck
If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}