parsing incomplete Java source code

parsing incomplete Java source code - java

In certain problem I need to parse a Java source code fragment that is potentially incomplete. For example, the code can refer to variables that are not defined in such fragment.
In that case, I would still like to parse such incomplete Java code, transform it to a convenient inspectable representation, and being able to generate source code from such abstract representation.
What is the right tool for this ? In this post I found suggestions to use Antlr, JavaCC or the Eclipse JDT.
However, I did not find any reference regarding dealing with incomplete Java source code fragments, hence this question (and in addition the linked question is more than two years old, so I am wondering if something new is on the map).
As an example, the code could be something like the following expression:
"myMethod(aVarName)"
In that case, I would like to be able to somehow detect that the variable aVarName is referenced in the code.

Uhm... This question does not have anything even vaguely like a simple answer. Any of the above parser technologies will allow you to do what you wish to do, if you write the correct grammar and manipulate the parser to do fallback parsing unknown token passover sort of things.
The least amount of work to get you where you're going is either to use ANTLR which has resumable parsing and comes with a reasonably complete java 7 grammar, or see what you can pull out of the eclipse JDT ( which is used for doing the error and intention notations and syntax highlighting in the eclipse IDE. )
Note that none of this stuff is easy -- you're writing klocs, not just importing a class and telling it to go.
At a certain point of incorrect/incompleteness all of these strategies will fail just because no computer ( or even person for that matter ) is able to discern what you mean unless you at least vaguely say it correctly.

Eclipse contains just that: a compiler that can cope with incomplete java code (basically, that was one reason for these guys to implement an own java-compiler. (See here for better explanation)
There are several tutorials that explain the ASTParser, here is one.

If you just want basic parsing - an undecorated AST - you can use existing Java parsers. But from your question I understand you're interested in deeper inspection of the partial code. First, be aware the problem you are trying to solve is far from simple, especially because partial code introduces a lot of ambiguities.
But there is an existing solution - I needed to solve a similar problem, and found that a nice fellow called Barthélémy Dagenais has worked on it, producing a paper and a pair of open-source tools - one based on Soot and the other (which is generally preferable) on Eclipse. I have used both and they work, though they have their own limitations - don't expect miracles.
Here's a direct link to a quick tutorial on how to start with the Eclipse-based tool.

I needed to solve a similar problem in my recent work. I have tried many tools, including Eclipse JDT ASTParser, python javalang and PPA. I'd like to share my experience. To sum up, they all can parse code fragment to some extent, but all failed to parse occasionally when the code fragment is too ambiguous.
Eclipse JDT ASTParser
Eclipse JDT ASTParser is the most powerful and widely-used tool. This is a code snippet to parse the method invocation node.
ASTParser parser = ASTParser.newParser(AST.JLS8);
parser.setResolveBindings(true);
parser.setKind(ASTParser.K_STATEMENTS);
parser.setBindingsRecovery(true);
Map options = JavaCore.getOptions();
parser.setCompilerOptions(options);
parser.setUnitName("test");
String src = "System.out.println(\"test\");";
String[] sources = { };
String[] classpath = {"C:/Users/chenzhi/AppData/Local/Programs/Java/jdk1.8.0_131"};
parser.setEnvironment(classpath, sources, new String[] { }, true);
parser.setSource(src.toCharArray());
final Block block = (Block) parser.createAST(null);
block.accept(new ASTVisitor() {
public boolean visit(MethodInvocation node) {
System.out.println(node);
return false;
}
});
You should pay attention to parser.setKind(ASTParser.K_STATEMENTS), this is setting the kind of constructs to be parsed from the source. ASTParser defines four kind (K_COMPILATION_UNIT, K_CLASS_BODY_DECLARATIONS, K_EXPRESSION, K_STATEMENTS), you can see this javadoc to understand the difference between them.
javalang
javalang is a simple python library. This is a code snippet to parse the method invocation node.
src = 'System.out.println("test");'
tokens = javalang.tokenizer.tokenize(code2)
parser = javalang.parser.Parser(tokens)
try:
ast = parser.parse_expression()
if type(ast) is javalang.tree.MethodInvocation:
print(ast)
except javalang.parser.JavaSyntaxError as err:
print("wrong syntax", err)
Pay attention to ast = parser.parse_expression(), just like the parser.setKind() function in Eclipse JDT ASTParser, you should set the proper parsing function or you will get the 'javalang.parser.JavaSyntaxError' exception. You can read the source code to figure out which function you should use.
PPA
Partial Program Analysis for Java (PPA) is a static analysis framework that transforms the source code of an incomplete Java program into a typed Abstract Syntax Tree. As #Oak said, this tool came from academy.
PPA comes as a set of Eclipse plug-ins which means it need to run with Eclipse. It has provided a headless way to run without displaying the Eclipse GUI or requiring any user input, but it is too heavy.
String src = "System.out.println(\"test\");";
ASTNode node = PPAUtil.getSnippet(src, new PPAOptions(), false);
// Walk through the compilation unit.
node.accept(new ASTVisitor() {
public boolean visit(MethodInvocation node) {
System.out.println(node);
return false;
}
});

Related

Could use some help implementing AST rules for Java ANTLR grammar

For a programming project, I am tasked with taking a set of ANTLR grammar rules for Java and extending them such that they also contain AST rules for the Eclipse JDT API DOM.
For example:
param
: type ID
;
Would become:
param returns [SingleVariableDeclaration result = ast.newSingleVariableDeclaration()]
: paramType=type { result.setType($paramType.result); }
ID { result.setName(ast.newSimpleName($ID.text)); }
;
The first part of the project was creating the grammer rules themselves, and that wasn't too bad, but this part is really throwing me for a loop. Are there any useful resources, examples, or pointers someone could give me as far as adding the AST rules are concerned?
One of the tips I was given was to use the AST viewer in Eclipse to help pinpoint which parts of the API to look at in the Eclipse documentation, but I'm not sure how this helps.
Some of the rules I need to implement yet are array access, for loops, and so on.
Thanks!

Analyzing a variable inside a method. JavaParser/ANTLR or something else?

I am writing a java code analyzing snippet which will find out the use of variables in a method. (to be specific how many times a global class variable is read and written in a method). Can this be done using JavaParser? Would anyone have any other recommendations? Does any one know how class metrics are calculated? They probably deal with similar things.

Thanks guys. Both your answers lead me in a direction to solution to this problem using the AST implementation in JAVAPARSER. Here's a code snippet to help others
class CatchNameExpr extends VoidVisitorAdapter {
HashMap<String, ArrayList<Integer>> variableLineNumMap;``
ArrayList<String> variableList;
boolean functionParsing = false;
public CatchNameExpr(ArrayList<String> classVariables) {
variableList=classVariables;
}
public void visit(MethodDeclaration method, Object arg) {
System.out.println("---------------");
System.out.println(method.getName());
System.out.println("---------------");
variableLineNumMap = new HashMap<String, ArrayList<Integer>>();
System.out.println();
functionParsing = true;
visit(method.getBody(),arg);
// Analyze lines for variable usage. Add to list of vars after checking if its read or written or unknown.
functionParsing = false;
}
public void visit(NameExpr n, Object arg) {
if(!functionParsing)
return;
//TODO: check if this var was declared above it, as a local var to the func. if yes, return
ArrayList<Integer> setOfLineNum;
System.out.println(n.getBeginLine()+" NameExpr " + n.getName());
if(!variableList.contains(n.getName()) || n.getName().length()==0)
return;
if (!variableLineNumMap.containsKey(n.getName()))
{
setOfLineNum = new ArrayList<Integer>();
setOfLineNum.add(n.getBeginLine());
variableLineNumMap.put(n.getName(), setOfLineNum);
}
else
{
setOfLineNum = variableLineNumMap.get(n.getName());
setOfLineNum.add(n.getBeginLine());
variableLineNumMap.put(n.getName(), setOfLineNum);
}
}
}
Instantiate the class --->
CatchNameExpr nameExp = new CatchNameExpr(classVariables);
nameExp.visit(classCompilationUnit, null);
In a similar manner you can visit the AST for the following expressions, statements, condition etc
http://www.jarvana.com/jarvana/view/com/google/code/javaparser/javaparser/1.0.8/javaparser-1.0.8-javadoc.jar!/japa/parser/ast/visitor/VoidVisitorAdapter.html
I am well aware that byte-code processor will be more efficient, and will do the job better than i can hope for. But given the time limit, this option fitted me the best.
Thanks guys,
Jasmeet

To do the task of finding usages of variables, a parser buld with ANTLR should also produce AST. I am almost sure you can find ready AST builder, but don't know where.
Another approach is to analyze class files with ASM, BCEL or other class file analyzer. I think it is easier, and would work faster. Besides, it would work for other jvm languages (e.g. Scala).

To ask questions as to whether a variable read is "global" or not, you need what amounts to a full Java compiler front end, that parses code, build symbol tables and related type information.
To the extent the compiler has actually recorded this information in your class files, you may be able to execute "reflection" operations to get your hands it. To the extent that such information is present in .class files, you can access it with class-file byte-code processor such as those mentioned in Kaigorodov's answer.
ANTLR has a grammar for Java, but I don't believe any support for symbol table construction.
You can't fake this yourself; Java's rules are too complex. You might be able to extend the ANTLR parser to do this, but it would be a LOT of work; "Java's rules are too complex".
I understand the Java compiler offers some kind of name/type accurate access to its internal structures; you might be able to use that.
Our DMS Software Reengineering Toolkit has full Java parsers, with name and type resolution, and could be used for this purpose.

Java source refactoring of 7000 references

I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?

Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).

Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.

IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.

I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.

Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}

Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.

If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.

Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)

I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.

Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.

I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck

If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}

how to count all Operators and Operands in java class file

how to count all Operators and Operands in java class file? Does anyone have an idea?

Doing this kind of thing using regexes is unreliable. The syntax of Java is sufficiently complex that there are bound to be tricky corner cases that will cause your regexes to miscount.
Similarly using a bytecode analyser is liable to give you incorrect results because there isn't necessarily a one-to-one correspondence between source code operators / operands and bytecode instructions. The Java compiler may reorganize and rewrite the code in non-obvious ways.
The best way to do this sort of thing is to find a decent Java AST library, use that to parse your source code, and then traverse the AST to extract the information you need. (In this case, you need to count the operator and operand nodes.)

Forget regex (you'll never get that right without getting false positives like operators in comments etc), you're going to have to run a visitor over your code that counts operators. Now you can either use a source code parser or a byte code parser to do that.
For source code parsing I'd suggest the javaparser project. There, you'd create a custom Visitor extending VoidVisitorAdapter and overriding several relevant methods like this:
public void visit(AssignExpr n, A arg) {
// track the operator here
super.visit(n, arg); // resume visitor
}
On the byte code side, you'd probably use ASM and extend ClassAdapter to create your visitor. Both versions should work equally well. Or maybe not, as Stephen C writes (the compiler may have added or removed some operations).

You could try to analyze the bytecode of your class using a library like bcel.
Or use the sourceforge project lachesis (I haven't tried it):
Lachesis Analysis is a Software Complexity Measurement program for Object-Oriented source code. Analysis for Java source code and Java byte-code only is currently available.

How would one go about adding (minor) syntactic sugars to Java?

Suppose I want to add minor syntactic sugars to Java. Just little things like adding regex pattern literals, or perhaps base-2 literals, or multiline strings, etc. Nothing major grammatically (at least for now).
How would one go about doing this?
Do I need to extend the bytecode compiler? (Is that possible?)
Can I write Eclipse plugins to do simple source code transforms before feeding it to the standard Java compiler?

I would take a look at Project Lombok and try to reuse the attempt they take. They use Java 5 annotations to hook in a Java agent which can manipulate the abstract syntax tree before the code is compiled. They are currently working on creating an API to allow custom transformers to be written which can be used with javac, or the major IDEs such as Eclipse and NetBeans. As well as annotations which trigger code to be generated, they are also planning on adding syntax changes (possibly mixin or pre-Java 7 closure syntax).
(I may have some of the details slightly off, but I think I'm pretty close).
Lombok is open source so studying their code and trying to build on that would probably be a good start.
Failing that, you could attempt to change the javac compiler. Though from what I've heard that's likely to be a hair-pulling exercise in frustration for anyone who is not a compiler and Java expert.

You can hack javac with JSR 269 (pluggable annotation processing) notably. You can hook into the visitor that traverse the statements in the source code and transform it.
Here is for instance the core of a transformation to add support for roman number in java (read of course the complete post for more details). It seems relatively easy.
public class Transform extends TreeTranslator {
#Override
public void visitIdent(JCIdent tree) {
String name = tree.getName().toString();
if (isRoman(name)) {
result = make.Literal(numberize(name));
result.pos = tree.pos;
} else {
super.visitIdent(tree);
}
}
}
Here are additional resources:
Hacker's guide to the java compiler
Javac hacker resources
I don't know if project Lombok (cited in the other answer) uses the same technique, but I guess yes.

Charles Nutter, the tech lead of JRuby, extended Javac with literal regular expressions. He had to change about 3 lines of case, as far I recall.
See http://twitter.com/headius/status/1319031705
And here is an awesome tutorial on how to add a new operator to javac, http://www.ahristov.com/tutorial/java-compiler.html
For more links like that, see my list of Links for javac hackers .

Charles Nutter, the tech lead of JRuby, extended Javac with literal regular expressions. He had to change about 3 lines of case, as far I recall.
See http://twitter.com/headius/status/1319031705

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

parsing incomplete Java source code - java

Eclipse contains just that: a compiler that can cope with incomplete java code (basically, that was one reason for these guys to implement an own java-compiler. (See here for better explanation) There are several tutorials that explain the ASTParser, here is one.

Related

Could use some help implementing AST rules for Java ANTLR grammar

Analyzing a variable inside a method. JavaParser/ANTLR or something else?

Java source refactoring of 7000 references

how to count all Operators and Operands in java class file

How would one go about adding (minor) syntactic sugars to Java?

Categories

Resources