How to access comments from the java compiler tree api generated ast?

How to access comments from the java compiler tree api generated ast? - java

I've used the java compiler tree api to generate the ast for java source files. However, i'm unable to access th comments in the source files.
So far, i've been unable to find a way to extract comments from source file .. is there a way using the compiler api or some other tool ?

Our SD Java Front End is a Java parser that builds ASTs (and optionally symbol tables). It captures comments directly on tree nodes.
The Java Front End is a member of a family of compiler langauge front ends (C, C++, C#, COBOL, JavaScript, ...) all of which are supported by DMS Software Reengineering Toolkit. DMS is designed to process languages for the purposes of transformation, and thus can capture comments, layout and formats to enable regeneration of code preserving the original layout as much as possible.
EDIT 3/29/2012: (in contrast to answer posted for doing this with ANTLR)
To get a comment on an AST node in DMS, one calls the DMS (lisp-like) function
(AST:GetComments <node>)
which provide access to the array of comments associated with the AST node. One can inquire about the length of this array (may be null), or for each array element, ask for any of these properties: (AST:Get... FileIndex, Line, Column, EndLine, EndColumn, String (exact Unicode comment content).

The comments obtained through getCommentList method of CompilationUnit will not have the comment body. Also the comments will not be visited, during and AST Visit. Inorder to visit the comments we have call accept method for each comment in the Comment List.
for (Comment comment : (List<Comment>) compilationUnit.getCommentList()) {
comment.accept(new CommentVisitor(compilationUnit, classSource.split("\n")));
}
The body of the comments can be obtained using some simple logic. In the below AST Visitor for comments, we need to specify the Complied class unit and the source code of the class during initialization.
import org.eclipse.jdt.core.dom.ASTNode;
import org.eclipse.jdt.core.dom.ASTVisitor;
import org.eclipse.jdt.core.dom.BlockComment;
import org.eclipse.jdt.core.dom.CompilationUnit;
import org.eclipse.jdt.core.dom.LineComment;
public class CommentVisitor extends ASTVisitor {
CompilationUnit compilationUnit;
private String[] source;
public CommentVisitor(CompilationUnit compilationUnit, String[] source) {
super();
this.compilationUnit = compilationUnit;
this.source = source;
}
public boolean visit(LineComment node) {
int startLineNumber = compilationUnit.getLineNumber(node.getStartPosition()) - 1;
String lineComment = source[startLineNumber].trim();
System.out.println(lineComment);
return true;
}
public boolean visit(BlockComment node) {
int startLineNumber = compilationUnit.getLineNumber(node.getStartPosition()) - 1;
int endLineNumber = compilationUnit.getLineNumber(node.getStartPosition() + node.getLength()) - 1;
StringBuffer blockComment = new StringBuffer();
for (int lineCount = startLineNumber ; lineCount<= endLineNumber; lineCount++) {
String blockCommentLine = source[lineCount].trim();
blockComment.append(blockCommentLine);
if (lineCount != endLineNumber) {
blockComment.append("\n");
}
}
System.out.println(blockComment.toString());
return true;
}
public void preVisit(ASTNode node) {
}
}
Edit: Moved splitting of source out of the visitor.

Just for the record. Now with Java 8 you have a whole interface to play with comments and documentation details here.

You might to use a different tool, like ANTLR's Java grammar. javac has no use for comments, and is likely to discard them completely. The parsers upon which tools like IDEs are built are more likely to retain comments in their AST.

Managed to solve the problem by using the getsourceposition() and some string manipulation (no regex was needed)

Related

How do I track variable dependencies in Nashorn?

I would like to use the Nashorn engine as a general computation engine. It is powerful, fast has plenty of built-in functions and new functions are very easy to add, using #FunctionalInterface or static methods. Even better, it also provides value-adds like cyclic dependency checking, syntax checking, etc.
However I need to automatically update "output" variables when a dependency changes.
The general idea is that in Java, I'll have something like:
class CalculationEngine {
Data addData(String name, Number value){
...
}
Data addData(String name, String formula){
...
}
String getScript(){
...
}
}
CalculationEngine engine = new CalculationEngine();
Data datum1 = engine.addData("datum1", 1); // Constant integer 1
Data datum2 = engine.addData("datum2", 2); // Constant integer 2
Data datum3 = engine.addData("datum3", "datum1*10");
Data datum4 = engine.addData("datum4", "datum3+datum2");
The CalculationEngine service class knows how to use Nashorn to create a script string out of the Data objects that looks like this:
final String script = engine.getScript(); // "var datum1=1; var datum2=2; var datum3=datum1*10; var datum4=datum3+datum2;"
I know I can parse the script with the Nashorn Parser:
final CompilationUnitTree tree = parser.parse("test", script, null);
But how do I extract the dependencies:
List<Data> whatDependsOn(Data input){
// Process the parsed tree
return list;
}
such that whatDependsOn(datum2) returns [datum4] and whatDependsOn(datum1) returns [datum3, datum4] ?
Or the inverse function getReferencedVariables such that getReferencedVariables(datum3) returns [datum1] and getReferencedVariables(datum4) returns [datum2, datum3] (and I can recursively query getReferencedVariables until all referenced variables have been found).
Basically, when the "value" of one of my Data objects change (due to an external event), how I determine which of my script formulae are affected and need to be recomputed?
I know that the Nashorn script can be parsed but I can not figure out how to use the SimpleTreeVisitorES6 to build up a variable dependency graph:
final CompilationUnitTree tree = parser.parse("test", script, null);
if (tree != null) {
tree.accept(new SimpleTreeVisitorES6<Void, Void>() {
#Override
public Void visitVariable(VariableTree tree, Void v) {
final Kind kind = tree.getKind();
System.out.println("Found a variable: " + kind);
System.out.println(" name: " + kind.toString());
IdentifierTree binding = (IdentifierTree) tree.getBinding();
System.out.println(" kind: " + binding.getKind().name());
System.out.println(" name: " + binding.getName());
System.out.println(" val: " + kind.name());
return null;
}
}, null);
}

one of Nashorn devs here. What you are trying to do is compute the so called def-use relations on source code (well, more likely their transitive closure, but I digress). That's a well-understood compiler theory concept. The good news is that CompilationUnitTree and friends should give you enough information to implement an algorithm for computing this information. The bad news is you'll have to roll up your sleeves and roll your own implementation, I'm afraid. You'll basically have to gather this information, produce merges at control flow join points (back edges and exits of loops, ends of if statements, but you'll also have to handle more exotic stuff like switch/case with their fallthrough semantics and also try/catch/finally, which is the least fun of these as basically control can transfer from anywhere in try block to a catch block.) Your algorithm will also have to repeatedly evaluate loop bodies until the static information you're gathering reaches a fixpoint.
FWIW, while writing Nashorn I had to implement these kinds of things few times using Nashorn's internal parser API (which is different but similar to the public one). If you want some inspiration, you can look into the source code for Nashorn static type analyzer for inferring types of local variables in a JavaScript function which is something I wrote some years ago. If nothing else, it'll give you an idea how to walk an AST tree and keep track of control flow edges and partially computed static analysis data at the edges.
I wish there were an easier way to do this… FWIW, a generalized static analyzer that helps you with bookeeping of flow control could be possible. Good luck.

Eclipse CDT extend to build an additional language

I would like to extend the CDT language plugin for gnu c, to create some kind of new language based on this.
The new language should have a different visual appearence in editor. I would like to color a method body in gray if there is a special pre-processor directive in front of (like an annotation).
Does anybody know where to extend the GCC language for a modification like this?
EDIT1:
As example I want that the colour of the method body of specialFunction to be gray, as a reason of an example annotation -> #annotation
#annotation
int specialFunction(){
return 1;
}
EDIT2:
What i have tried so far is to build a "extended language". The plan was to highlight a preprocessor position and also save the position so that the method below would be colored. I managed to get the preprocessor keyword colored but not how to work on method bodies color.
public class OwnKeyWords extends GCCLanguage implements ICLanguageKeywords
#Override
public String[] getPreprocessorKeywords() {
//System.out.println("Called keywords" + timesPre++);
return new String[]{
"hide",
"show"
};
}
Example to be colored:
#hide
int specialFunction(){
return 1;
}
In the example above "hide" would be highlighted.
EDIT3:
I tried to implement ISemanticHighlighter and tried a few ways to highlight my code:
CVariable
CFunction
ObjectStyleMacro
...
But non of them where suitable to highlight a method body with a preprocessor directive or anything else on top.
Also the note in ISemanticHighlighter:
* NOTE: Implementors are not allowed to keep a reference on the token or on any object retrieved from the
* token.
Is not what I want to achieve, because I would like to keep reference to the highlighted object for later operations.
Maybe the org.eclipse.cdt.ui.text.folding.DefaultCFoldingStructureProvider would be an option too, there I couldn't colour the instrumentation, I could instead hide it.

This does not sound like a new language so much as semantic highlighting.
CDT has a great extension point called org.eclipse.cdt.ui.semanticHighlighting that allows you to define custom semantic highlighting rules.
Here is an example of the plugin.xml entry:
<extension
point="org.eclipse.cdt.ui.semanticHighlighting">
<semanticHighlighting
class="com.example.SemanticHighlighter"
defaultBold="true"
defaultEnabled="true"
defaultTextColor="35,0,35"
displayName="Example Semantic Highlighter"
id="com.example.SemanticHighlighter"
preferenceKey="com.example.SemanticHighlighter.pref"
priority="5">
</semanticHighlighting>
</extension>
Then in your com.example.SemanticHighlighter you implement the org.eclipse.cdt.ui.text.ISemanticHighlighter interface. There is only one method, consumes that takes a ISemanticToken. Analyse the token to see if it is relevant to your highlighter and return true to have it highlighted.
Here is a trivial implementation of the method:
#Override
public boolean consumes(ISemanticToken token) {
IBinding binding = token.getBinding();
if (binding instanceof IFunction) {
IASTNode node = token.getNode();
if (binding != null && node instanceof IASTName && ((IASTName) node).isReference()) {
String n = binding.getName();
if ("MySpecialFunction".equals(n)) {
return true;
}
}
}
return false;
}
Once implemented, users can modify the colour and applicability via the preference page C/C++ - Editor - Syntax Coloring:

You should be able to use ISemanticHighlighter to do the kind of highlighting you want.
For your example of coloring the body of a function with a particular annotation, it could work like this:
class MyHighlighter implements ISemanticHighlighter {
#Override
public boolean consumes(ISemanticToken token) {
IASTNode node = token.getNode();
// Walk up the AST to determine if 'node' is inside a function body.
// If it's not, return false.
// Navigate the AST some more to examine what comes before the
// function's declaration. If it's the annotation in question,
// return true. Otherwise, return false.
}
}
I left out the details of how to navigate the AST, but CDT has a pretty rich AST API so it's definitely doable. Feel free to ask specific questions if you have them.

How to extract one boolean field from XML?

I have a model which is in XML format as shown below and I need to parse the XML and check whether my XML has internal-flag flag set as true or not. In my other models, it might be possible, that internal-flag flag is set as false. And sometimes, it is also possible that this field won't be there so by default it will be false from my code.
<?xml version="1.0"?>
<ClientMetadata
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com client.xsd"
xmlns="http://www.google.com">
<client id="200" version="13">
<name>hello world</name>
<description>hello hello</description>
<organization>TESTER</organization>
<author>david</author>
<internal-flag>true</internal-flag>
<clock>
<clock>
<for>
<init>val(tmp1) = 1</init>
<clock>
<eval><![CDATA[result("," + $convert(val(tmp1)))]]></eval>
</clock>
</for>
<for>
<incr>val(tmp1) -= 1</incr>
<clock>
<eval><![CDATA[result("," + $convert(val(tmp1)))]]></eval>
</clock>
</for>
</clock>
</clock>
</client>
</ClientMetadata>
I have a POJO in which I am storing my above model -
public class ModelMetadata {
private int modelId;
private String modelValue; // this string will have my above XML data as string
// setters and getters here
}
Now what is the best way to determine whether my model has internal-flag set as true or not?
// this list will have all my Models stored
List<ModelMetadata> metadata = getModelMetadata();
for (ModelMetadata model : metadata) {
// my model will be stored in below variable in XML format
String modelValue = model.getModelValue();
// now parse modelValue variable and extract `internal-flag` field property
}
Do I need to use XML parsing for this or is there any better way to do this?
Update:-
I have started using Stax and this is what I have tried so far but not sure how can I extract that field -
InputStream is = new ByteArrayInputStream(modelValue.getBytes());
XMLStreamReader r = XMLInputFactory.newInstance().createXMLStreamReader(is);
while(r.hasNext()) {
// now what should I do here?
}

There is an easy solution using XMLBeam (Disclosure: I'm affiliated with that project), just a few lines:
public class ReadBoolean {
public interface ClientMetaData {
#XBRead("//xbdefaultns:internal-flag")
boolean hasFlag();
}
public static void main(String[] args) throws IOException {
ClientMetaData clientMetaData = new XBProjector().io().url("res://xmlWithBoolean.xml").read(ClientMetaData.class);
System.out.println("Has flag:"+clientMetaData.hasFlag());
}
}
This program prints out
Has flag:true
for your XML.

You could also do some simple string parsing, but this will only work for small cases with proper XML and if there's only a single <internal-flag> element.
This is a simple solution to your problem without using any XML parsing utilities. Other solutions may be more robust or powerful.
Find the index of the string literal <internal-flag>. If it doesn't exist, return false.
Go forward "<internal-flag>".length (15) characters. Read up to the next </internal-flag>, which should be the string true or false.
Take that string, use Boolean.parseBoolean(String) to get a boolean value.
If you want me to help you out with the code just drop a comment!

If you are willing to consider adding Groovy to your mix (e.g. see the book Making Java Groovy) then using a Groovy XMLParser and associated classes will make this simple.
If you need to stick to Java, let me put in a shameless plug for my Xen library, which mimics a lot of the "Groovy way". The answer to your question would be:
Xen doc = new XenParser().parseText(YOUR_XML_STRING);
String internalFlag = doc.getText(".client.internal-flag");
boolean isSet = "true".equals(internalFlag);
If the XML comes from a File, Stream, or URI, that can be handled too.
Caveat emptor, (even though it is free) this is a fairly new library, written solely by a random person (me), and not thoroughly tested on all the crazy XML out there. If anybody knows of a similar, more "mainstream" library I'd be very interested in hearing about it.

How to merge two ASTs?

I'm trying to implement a tool for merging different versions of some source code. Given two versions of the same source code, the idea would be to parse them, generate the respective Abstract Source Trees (AST), and finally merge them into a single output source keeping grammatical consistency - the lexer and parser are those of question ANTLR: How to skip multiline comments.
I know there is class ParserRuleReturnScope that helps... but getStop() and getStart() always return null :-(
Here is a snippet that illustrates how I modified my perser to get rules printed:
parser grammar CodeTableParser;
options {
tokenVocab = CodeTableLexer;
backtrack = true;
output = AST;
}
#header {
package ch.bsource.ice.parsers;
}
#members {
private void log(ParserRuleReturnScope rule) {
System.out.println("Rule: " + rule.getClass().getName());
System.out.println(" getStart(): " + rule.getStart());
System.out.println(" getStop(): " + rule.getStop());
System.out.println(" getTree(): " + rule.getTree());
}
}
parse
: codeTabHeader codeTable endCodeTable eof { log(retval); }
;
codeTabHeader
: comment CodeTabHeader^ { log(retval); }
;
...

Assuming you have the ASTs (often difficult to get in the first place, parsing real languages is often harder than it looks), you first have to determine what they have in common, and build a mapping collecting that information. That's not as easy as it looks; do you count a block of code that has moved, but is the same exact subtree, as "common"? What about two subtrees that are the same except for consistent renaming of an identifier? What about changed comments? (most ASTs lose the comments; most programmers will think this is a really bad idea).
You can build a variation of the "Longest Common Substring" algorithm to compare trees. I've used that in tools that I have built.
Finally, after you've merged the trees, now you need to regenerate the text, ideally preserving most of the layout of the original code. (Programmers hate when you change the layout they so loving produced). So your ASTs need to capture position information, and your regeneration has to honor that where it can.

The call to log(retval) in your parser code looks like it's going to happen at the end of the rule, but it's not. You'll want to move the call into an #after block.
I changed log to spit out a message as well as the scope information and added calls to it to my own grammar like so:
script
#init {log("#init", retval);}
#after {log("#after", retval);}
: statement* EOF {log("after last rule reference", retval);}
-> ^(STMTS statement*)
;
Parsing test input produced the following output:
Logging from #init
getStart(): [#0,0:4='Print',<10>,1:0]
getStop(): null
getTree(): null
Logging from after last rule reference
getStart(): [#0,0:4='Print',<10>,1:0]
getStop(): null
getTree(): null
Logging from #after
getStart(): [#0,0:4='Print',<10>,1:0]
getStop(): [#4,15:15='<EOF>',<-1>,1:15]
getTree(): STMTS
The call in the after block has both the stop and tree fields populated.
I can't say whether this will help you with your merging tool, but I think this will at least get you past the problem with the half-populated scope object.

Convert a list of URLs to a Tree

Just to be sure I'm not reinventing the wheel, I want to see if there is some known algorithm, class, or something that can help me solve my problem. I have a huge list of URLs from an application. I'd like to feed those URLs into a tree to create a sitemap-like data structure.
It seems that something like this may have done before. However, everything I see from my searches appears to do it from xml to tree. Ideally I'd like to have answer in Java, but I'm sure I could translate it to Java myself if necessary. If I need to do it myself, I'd probablty take each URL and break them into indexes.
[root] [0] [1] [1] -file
wwe.site.com/dir1/dir2/file.html
[root] [0] [1] [1]
www.site.com/dirabc/dir2/file.html
So, I'd parse each url into offsets [0], [1], [2], … etc., and those be depth down in tree where to add them. That was at least my initial plan. I'm open to any and all suggestions!

You could define your UrlTree as nested HashMaps
public class UrlTree {
private final Map<String, UrlTree> branches = new HashMap<String, UrlTree>();
public void add(String[] tokens, int i) {
if (i >= tokens.length) {
return;
}
final String token = tokens[i];
UrlTree branch = branches.get(token);
if (branch == null) {
branch = new UrlTree();
branches.put(token, branch);
}
branch.add(tokens, i + 1);
}
...
}

You'll need to implement TreeModel in a way that reflects the hierarchy of your observed directory structure. FileTreeModel is an example, and ac.Name is a simple class that parses paths for a vintage file system. See also How to Use Trees. An instance of NetBeans Outline, illustrated here, would make a nice alternative view.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to access comments from the java compiler tree api generated ast? - java

I've used the java compiler tree api to generate the ast for java source files. However, i'm unable to access th comments in the source files. So far, i've been unable to find a way to extract comments from source file .. is there a way using the compiler api or some other tool ?

Just for the record. Now with Java 8 you have a whole interface to play with comments and documentation details here.

You might to use a different tool, like ANTLR's Java grammar. javac has no use for comments, and is likely to discard them completely. The parsers upon which tools like IDEs are built are more likely to retain comments in their AST.

Managed to solve the problem by using the getsourceposition() and some string manipulation (no regex was needed)

Related

How do I track variable dependencies in Nashorn?

Eclipse CDT extend to build an additional language

How to extract one boolean field from XML?

How to merge two ASTs?

Convert a list of URLs to a Tree

Categories

Resources