Antlr3: building parse tree for qualified names - java

I couldn't find a question/answer that comes close to helping with my issue. Therefore, I am posting this question here.
I am trying to build a parse tree for qualified names. The below example shows an example.
E.g.,
foo_boo.aaa.ccc1_c
Here I have dot separated words. I am using antlr3 and below is my grammer.
parse
: expr
;
list_expr : <I removed the grammar here>
SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
QualifiedType : SimpleType | SimpleType ('\.' SimpleType)+;
expr : list_expr
| QualifiedType
| union_expr;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here, SympleType represents grammar for a word. My requirement is to build the grammar for the QualifiedType. The current grammar given in above is not working as expected (QualifiedType : SimpleType | SimpleType ('\.'SimpleType)+;). How to write correct grammar for Qualified names (Dot separated words)?

Make QualifiedType a parser rule instead of a lexer rule:
qualifiedType : SimpleType ('.' SimpleType)*;
Also, '\.' does not need an escape: '.' is OK.
EDIT
You'll have to set the output to AST and apply some tree rewrite rules to make it work properly. Here's a quick demo:
grammar T;
options {
output=AST;
}
tokens {
Root;
QualifiedName;
}
parse
: qualifiedType EOF -> ^(Root qualifiedType)
;
qualifiedType
: SimpleType ('.' SimpleType)* -> ^(QualifiedName SimpleType+)
;
SimpleType
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
;
And if you now run the code:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.stringtemplate.StringTemplate;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("foo_boo.aaa.ccc1_c"));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
you'll get some DOT output, which corresponds to the following AST:

Related

Making generated parser work in Java for ANTLR 4.8

I've been having trouble getting my generated parser to work in Java for ANTLR 4.8. There are other answers to this question, but it seems that ANTLR has changed things since 4.7 and all the other answers are before this change. My code is:
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
CharStream input = CharStreams.fromString(formula);
Antlr.LogicGrammerLexer lexer = new Antlr.LogicGrammerLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = new ParseTree(parser);
It appears to be reading in the formula correctly into the CharStream, but anything I try to do past that just isn't working at all. For example, if I try to print out the parse tree, nothing will be printed. The following line will print out nothing:
System.out.println(lexer._input.getText(new Interval(0, 100)));
Any advice appreciated.
EDIT: added the grammar file:
grammar LogicGrammer;
logicalStmt: BOOL_EXPR | '('logicalStmt' '*LOGIC_SYMBOL' '*logicalStmt')';
BOOL_EXPR: '('IDENTIFIER' '*MATH_SYMBOL' '*IDENTIFIER')';
IDENTIFIER: CHAR+('.'CHAR*)*;
CHAR: 'a'..'z' | 'A'..'Z' | '1'..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';
This line:
ParseTree pt = new ParseTree(parser);
is incorrect. You need to call the start rule method on your parser object to get your parse tree
Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();
So far as printing out your input, generally fields starting with an _ (like _input) are not intended for external use. Though I suspect the failure may be that you don't have 100 characters in your input stream, so the Interval is invalid. (I haven't tried it to see the exact failure)
I you include your grammar, one of us could easily attempt to generate and compile and, perhaps, be more specific.
Using your grammar, this works for me:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.misc.Interval;
import org.antlr.v4.runtime.tree.ParseTree;
public class Logic {
public static void main(String... args) {
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
CharStream input = CharStreams.fromString(formula);
LogicGrammerLexer lexer = new LogicGrammerLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
LogicGrammerParser parser = new LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();
System.out.println(pt.toStringTree());
System.out.println(input.getText(new Interval(1, 28)));
}
}
output:
([] (fm.a < fm.b))
fm.a < fm.b) | (fm.a = fm.b)
BTW, a couple of minor suggestions for your grammar:
set up a rule to skip whitespace WS: [ \t\r\n]+ -> skip;
change BOOL_EXPR to a parser rule (since it's made up of a composition of tokens from other lexer rules:
grammar LogicGrammer
;
logicalStmt
: boolExpr
| '(' logicalStmt LOGIC_SYMBOL logicalStmt ')'
;
boolExpr: '(' IDENTIFIER MATH_SYMBOL IDENTIFIER ')';
IDENTIFIER: CHAR+ ('.' CHAR*)*;
CHAR: 'a' ..'z' | 'A' ..'Z' | '1' ..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';
WS: [ \t\r\n]+ -> skip;
The BOOL_EXPR shouldn't be a lexer rule. I suggest you do something like this instead:
grammar LogicGrammer;
parse
: logicalStmt EOF
;
logicalStmt
: logicalStmt LOGIC_SYMBOL logicalStmt
| logicalStmt MATH_SYMBOL logicalStmt
| '(' logicalStmt ')'
| IDENTIFIER
;
IDENTIFIER
: CHAR+ ( '.'CHAR+ )*
;
LOGIC_SYMBOL
: [~|&]
;
MATH_SYMBOL
: [<≤=≥>]
;
SPACE
: [ \t\r\n] -> skip
;
fragment CHAR
: [a-zA-Z1-9]
;
which can be tested by running the following code:
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
LogicGrammerLexer lexer = new LogicGrammerLexer(CharStreams.fromString(formula));
LogicGrammerParser parser = new LogicGrammerParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();
System.out.println(root.toStringTree(parser));

Visitor methods for Java grammar not working in ANTLR 4.4

I am new to ANTLR framework. I have been working around this for a week.
Now am in a situation where i need to parse the Java file and extract the data.
Am using ANTLR 4 for parsing. I create the Lexer, Parser and Visitor files using ANTLR in built tool.
When I try to over ride the Visitor method I doesn't gets called and returns null value.
Here is the coding.
I have generated JavaLexer, JavaParser, JavaVisitor, JavaBaseVisitor, JavaListener
package com.antlr;
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import java.io.FileInputStream;
import java.io.InputStream;
public class ExtractInterfaceVisitor {
public static class AnnVisitor extends JavaBaseVisitor<String> {
#Override
public String visitAnnotation (JavaParser.AnnotationContext ctx)
{
System.out.println("Annotation");
return ctx.getText();
}
#Override
public String visitClassDeclaration( JavaParser.ClassDeclarationContext ctx)
{
System.out.println("Class Declaration");
return ctx.getText();
}
}
public static void main(String[] args) throws Exception {
String inputFile = null;
inputFile = "C:/Users/User/Desktop/antlr/java1/Demo.java"; //Contains a Java File
InputStream is = System.in;
is = new FileInputStream(inputFile);
ANTLRInputStream input = new ANTLRInputStream(is);
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
parser.setBuildParseTree(true); // tell ANTLR to build a parse tree
ParseTree tree = parser.compilationUnit(); // parse
// show tree in text form
//System.out.println(tree.toStringTree(parser));
AnnVisitor Visitor = new AnnVisitor();
String result = Visitor.visit(tree);
System.out.println("visitor result = "+result);
}
}
Demo.java
#ClassAnnotation(Value="Class")
public class Demo {
#MethodAnnotation(Value="Method")
void MethodName(int x, String y) { }
int x;
int[ ] g(/*no args*/)
{ }
int average()
{ }
List<Map<String, Integer>>[] h() { return null; }
}
Java.g4
/** Java 1.6 grammar (ANTLR v4). Derived from
http://docs.oracle.com/javase/specs/jls/se7/jls7.pdf
and JavaParser.g from ANTLR v3
*/
grammar Java;
#lexer::members {
protected boolean enumIsKeyword = true;
protected boolean assertIsKeyword = true;
}
// starting point for parsing a java file
compilationUnit
: packageDeclaration? importDeclaration* typeDeclaration*
EOF
;
packageDeclaration
: 'package' qualifiedName ';'
;
importDeclaration
: 'import' 'static'? qualifiedName ('.' '*')? ';'
;
typeDeclaration
: classOrInterfaceModifier*
( classDeclaration
| interfaceDeclaration
| enumDeclaration
)
| ';'
;
classDeclaration
: 'class' Identifier typeParameters? ('extends' type)?
('implements' typeList)?
classBody
;
enumDeclaration
: ENUM Identifier ('implements' typeList)? enumBody
;
interfaceDeclaration
: normalInterfaceDeclaration
| annotationTypeDeclaration
;
classOrInterfaceModifier
: annotation // class or interface
| 'public' // class or interface
| 'protected' // class or interface
| 'private' // class or interface
| 'abstract' // class or interface
| 'static' // class or interface
| 'final' // class only -- does not apply to interfaces
| 'strictfp' // class or interface
;
modifiers
: modifier*
;
typeParameters
: '<' typeParameter (',' typeParameter)* '>'
;
typeParameter
: Identifier ('extends' typeBound)?
;
typeBound
: type ('&' type)*
;
enumBody
: '{' enumConstants? ','? enumBodyDeclarations? '}'
;
enumConstants
: enumConstant (',' enumConstant)*
;
enumConstant
: annotations? Identifier arguments? classBody?
;
enumBodyDeclarations
: ';' (classBodyDeclaration)*
;
normalInterfaceDeclaration
: 'interface' Identifier typeParameters? ('extends' typeList)? interfaceBody
;
typeList
: type (',' type)*
;
classBody
: '{' classBodyDeclaration* '}'
;
interfaceBody
: '{' interfaceBodyDeclaration* '}'
;
classBodyDeclaration
: ';'
| 'static'? block
| modifiers member
;
member
: genericMethodDeclaration
| methodDeclaration
| fieldDeclaration
| constructorDeclaration
| interfaceDeclaration
| classDeclaration
;
methodDeclaration
: type Identifier formalParameters ('[' ']')* methodDeclarationRest
| 'void' Identifier formalParameters methodDeclarationRest
;
methodDeclarationRest
: ('throws' qualifiedNameList)?
( methodBody
| ';'
)
;
genericMethodDeclaration
: typeParameters methodDeclaration
;
fieldDeclaration
: type variableDeclarators ';'
;
constructorDeclaration
: typeParameters? Identifier formalParameters
('throws' qualifiedNameList)? constructorBody
;
interfaceBodyDeclaration
: modifiers interfaceMemberDecl
| ';'
;
interfaceMemberDecl
: interfaceMethodOrFieldDecl
| interfaceGenericMethodDecl
| 'void' Identifier voidInterfaceMethodDeclaratorRest
| interfaceDeclaration
| classDeclaration
;
interfaceMethodOrFieldDecl
: type Identifier interfaceMethodOrFieldRest
;
interfaceMethodOrFieldRest
: constantDeclaratorsRest ';'
| interfaceMethodDeclaratorRest
;
voidMethodDeclaratorRest
: formalParameters ('throws' qualifiedNameList)?
( methodBody
| ';'
)
;
interfaceMethodDeclaratorRest
: formalParameters ('[' ']')* ('throws' qualifiedNameList)? ';'
;
interfaceGenericMethodDecl
: typeParameters (type | 'void') Identifier
interfaceMethodDeclaratorRest
;
voidInterfaceMethodDeclaratorRest
: formalParameters ('throws' qualifiedNameList)? ';'
;
constantDeclarator
: Identifier constantDeclaratorRest
;
variableDeclarators
: variableDeclarator (',' variableDeclarator)*
;
variableDeclarator
: variableDeclaratorId ('=' variableInitializer)?
;
constantDeclaratorsRest
: constantDeclaratorRest (',' constantDeclarator)*
;
constantDeclaratorRest
: ('[' ']')* '=' variableInitializer
;
variableDeclaratorId
: Identifier ('[' ']')*
;
variableInitializer
: arrayInitializer
| expression
;
arrayInitializer
: '{' (variableInitializer (',' variableInitializer)* (',')? )? '}'
;
modifier
: annotation
| 'public'
| 'protected'
| 'private'
| 'static'
| 'abstract'
| 'final'
| 'native'
| 'synchronized'
| 'transient'
| 'volatile'
| 'strictfp'
;
packageOrTypeName
: qualifiedName
;
enumConstantName
: Identifier
;
typeName
: qualifiedName
;
type: classOrInterfaceType ('[' ']')*
| primitiveType ('[' ']')*
;
classOrInterfaceType
: Identifier typeArguments? ('.' Identifier typeArguments? )*
;
primitiveType
: 'boolean'
| 'char'
| 'byte'
| 'short'
| 'int'
| 'long'
| 'float'
| 'double'
;
variableModifier
: 'final'
| annotation
;
typeArguments
: '<' typeArgument (',' typeArgument)* '>'
;
typeArgument
: type
| '?' (('extends' | 'super') type)?
;
qualifiedNameList
: qualifiedName (',' qualifiedName)*
;
formalParameters
: '(' formalParameterDecls? ')'
;
formalParameterDecls
: variableModifiers type formalParameterDeclsRest
;
formalParameterDeclsRest
: variableDeclaratorId (',' formalParameterDecls)?
| '...' variableDeclaratorId
;
methodBody
: block
;
constructorBody
: '{' explicitConstructorInvocation? blockStatement* '}'
;
explicitConstructorInvocation
: nonWildcardTypeArguments? ('this' | 'super') arguments ';'
| primary '.' nonWildcardTypeArguments? 'super' arguments ';'
;
qualifiedName
: Identifier ('.' Identifier)*
;
literal
: integerLiteral
| FloatingPointLiteral
| CharacterLiteral
| StringLiteral
| booleanLiteral
| 'null'
;
integerLiteral
: HexLiteral
| OctalLiteral
| DecimalLiteral
;
booleanLiteral
: 'true'
| 'false'
;
// ANNOTATIONS
annotations
: annotation+
;
annotation
: '#' annotationName ( '(' ( elementValuePairs | elementValue )? ')' )?
;
annotationName
: Identifier ('.' Identifier)*
;
elementValuePairs
: elementValuePair (',' elementValuePair)*
;
elementValuePair
: Identifier '=' elementValue
;
elementValue
: expression
| annotation
| elementValueArrayInitializer
;
elementValueArrayInitializer
: '{' (elementValue (',' elementValue)*)? (',')? '}'
;
annotationTypeDeclaration
: '#' 'interface' Identifier annotationTypeBody
;
annotationTypeBody
: '{' (annotationTypeElementDeclaration)* '}'
;
annotationTypeElementDeclaration
: modifiers annotationTypeElementRest
;
annotationTypeElementRest
: type annotationMethodOrConstantRest ';'
| classDeclaration ';'?
| normalInterfaceDeclaration ';'?
| enumDeclaration ';'?
| annotationTypeDeclaration ';'?
;
annotationMethodOrConstantRest
: annotationMethodRest
| annotationConstantRest
;
annotationMethodRest
: Identifier '(' ')' defaultValue?
;
annotationConstantRest
: variableDeclarators
;
defaultValue
: 'default' elementValue
;
// STATEMENTS / BLOCKS
block
: '{' blockStatement* '}'
;
blockStatement
: localVariableDeclarationStatement
| classDeclaration
| interfaceDeclaration
| statement
;
localVariableDeclarationStatement
: localVariableDeclaration ';'
;
localVariableDeclaration
: variableModifiers type variableDeclarators
;
variableModifiers
: variableModifier*
;
statement
: block
| ASSERT expression (':' expression)? ';'
| 'if' parExpression statement ('else' statement)?
| 'for' '(' forControl ')' statement
| 'while' parExpression statement
| 'do' statement 'while' parExpression ';'
| 'try' block
( catches 'finally' block
| catches
| 'finally' block
)
| 'switch' parExpression switchBlock
| 'synchronized' parExpression block
| 'return' expression? ';'
| 'throw' expression ';'
| 'break' Identifier? ';'
| 'continue' Identifier? ';'
| ';'
| statementExpression ';'
| Identifier ':' statement
;
catches
: catchClause (catchClause)*
;
catchClause
: 'catch' '(' formalParameter ')' block
;
formalParameter
: variableModifiers type variableDeclaratorId
;
switchBlock
: '{' switchBlockStatementGroup* switchLabel* '}'
;
switchBlockStatementGroup
: switchLabel+ blockStatement*
;
switchLabel
: 'case' constantExpression ':'
| 'case' enumConstantName ':'
| 'default' ':'
;
forControl
: enhancedForControl
| forInit? ';' expression? ';' forUpdate?
;
forInit
: localVariableDeclaration
| expressionList
;
enhancedForControl
: variableModifiers type Identifier ':' expression
;
forUpdate
: expressionList
;
// EXPRESSIONS
parExpression
: '(' expression ')'
;
expressionList
: expression (',' expression)*
;
statementExpression
: expression
;
constantExpression
: expression
;
expression
: primary
| expression '.' Identifier
| expression '.' 'this'
| expression '.' 'super' '(' expressionList? ')'
| expression '.' 'new' Identifier '(' expressionList? ')'
| expression '.' 'super' '.' Identifier arguments?
| expression '.' explicitGenericInvocation
| expression '[' expression ']'
| expression '(' expressionList? ')'
| expression ('++' | '--')
| ('+'|'-'|'++'|'--') expression
| ('~'|'!') expression
| '(' type ')' expression
| 'new' creator
| expression ('*'|'/'|'%') expression
| expression ('+'|'-') expression
| expression ('<' '<' | '>' '>' '>' | '>' '>') expression
| expression ('<' '=' | '>' '=' | '>' | '<') expression
| expression 'instanceof' type
| expression ('==' | '!=') expression
| expression '&' expression
| expression '^' expression
| expression '|' expression
| expression '&&' expression
| expression '||' expression
| expression '?' expression ':' expression
| expression
('^='<assoc=right>
|'+='<assoc=right>
|'-='<assoc=right>
|'*='<assoc=right>
|'/='<assoc=right>
|'&='<assoc=right>
|'|='<assoc=right>
|'='<assoc=right>
|'>' '>' '='<assoc=right>
|'>' '>' '>' '='<assoc=right>
|'<' '<' '='<assoc=right>
|'%='<assoc=right>
)
expression
;
primary
: '(' expression ')'
| 'this'
| 'super'
| literal
| Identifier
| type '.' 'class'
| 'void' '.' 'class'
;
creator
: nonWildcardTypeArguments createdName classCreatorRest
| createdName (arrayCreatorRest | classCreatorRest)
;
createdName
: classOrInterfaceType
| primitiveType
;
innerCreator
: nonWildcardTypeArguments? Identifier classCreatorRest
;
explicitGenericInvocation
: nonWildcardTypeArguments Identifier arguments
;
arrayCreatorRest
: '['
( ']' ('[' ']')* arrayInitializer
| expression ']' ('[' expression ']')* ('[' ']')*
)
;
classCreatorRest
: arguments classBody?
;
nonWildcardTypeArguments
: '<' typeList '>'
;
arguments
: '(' expressionList? ')'
;
// LEXER
HexLiteral : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
DecimalLiteral : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
OctalLiteral : '0' ('0'..'7')+ IntegerTypeSuffix? ;
fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
IntegerTypeSuffix : ('l'|'L') ;
FloatingPointLiteral
: ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
| '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
| ('0'..'9')+ Exponent FloatTypeSuffix?
| ('0'..'9')+ FloatTypeSuffix
| ('0x' | '0X') (HexDigit )*
('.' (HexDigit)*)?
( 'p' | 'P' )
( '+' | '-' )?
( '0' .. '9' )+
FloatTypeSuffix?
;
fragment
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
CharacterLiteral
: '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
;
StringLiteral
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
fragment
EscapeSequence
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UnicodeEscape
| OctalEscape
;
fragment
OctalEscape
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
ENUM: 'enum' {if (!enumIsKeyword) setType(Identifier);}
;
ASSERT
: 'assert' {if (!assertIsKeyword) setType(Identifier);}
;
Identifier
: Letter (Letter|JavaIDDigit)*
;
/**I found this char range in JavaCC's grammar, but Letter and Digit overlap.
Still works, but...
*/
fragment
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f' |
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
fragment
JavaIDDigit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN) // match anything between /* and */
;
WS : [ \r\t\u000C\n]+ -> channel(HIDDEN)
;
LINE_COMMENT
: '//' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN)
;
The Final out put which i get is: Visitor result = null
I don't know where i went wrong. It doesn't call the visit methods also. Please correct me.
The generated parse tree visitor extends AbstractParseTreeVisitor, which has two methods which would be helpful to override to get the result you are looking for.
Firstly, AbstractParseTreeVisitor#defaultResult() returns the default result for every node in the parse tree you visit. By default, it returns null.
Second, AbstractParseTreeVisitor#aggregateResult(T,T) aggregates the last node's visited result with the total result so far.
You have not overridden either of these methods, so aggregateResult(T,T) is returning the default result of the last parse tree node visited, which is giving you a null result.
So, if you want to fix this, I would override defaultResult to look something like this:
#Override
public String aggregateResult(String aggregate, String nextResult) {
if (aggregate == null) {
return nextResult;
}
if (nextResult == null) {
return aggregrate;
}
StringBuilder sb = new StringBuilder(aggregate);
sb.append(" ");
sb.append(nextResult);
return sb.toString();
}
If you don't want to do the null checks in your aggregateResult override, you could override defaultResult to return the empty String and then have aggregateResult append every result to the aggregate, but I personally prefer the first solution.
I also used ANTLR before but I ended up using Eclipse JDT is updated for JAVA8 and its really simple to get started with the AST.
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.eclipse.jdt.core.dom.AST;
import org.eclipse.jdt.core.dom.ASTParser;
import org.eclipse.jdt.core.dom.ASTVisitor;
import org.eclipse.jdt.core.dom.CompilationUnit;
import org.eclipse.jdt.core.dom.MarkerAnnotation;
import org.eclipse.jdt.core.dom.NormalAnnotation;
import org.eclipse.jdt.core.dom.SimpleName;
import org.eclipse.jdt.core.dom.SingleMemberAnnotation;
import org.eclipse.jdt.core.dom.VariableDeclarationFragment;
/**
*
* #author Lethe
*/
public class Parser {
public Parser() {
}
public String readFile(String path, Charset encoding)
throws IOException {
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
public void parseSource() {
try {
ASTParser parser = ASTParser.newParser(AST.JLS3);
String source = readFile("test.java", Charset.defaultCharset());
//parser.setSource("public class A { int i = 9; \n int j; \n ArrayList<Integer> al = new ArrayList<Integer>();j=1000; }".toCharArray());
parser.setSource(source.toCharArray());
parser.setKind(ASTParser.K_COMPILATION_UNIT);
//ASTNode node = parser.createAST(null);
final CompilationUnit cu = (CompilationUnit) parser.createAST(null);
cu.accept(new ASTVisitor() {
Set names = new HashSet();
public boolean visit(VariableDeclarationFragment node) {
SimpleName name = node.getName();
this.names.add(name.getIdentifier());
System.out.println("Declaration of '" + name + "' at line" + cu.getLineNumber(name.getStartPosition()));
return false; // do not continue to avoid usage info
}
public boolean visit(SimpleName node) {
if (this.names.contains(node.getIdentifier())) {
System.out.println("Usage of '" + node + "' at line " + cu.getLineNumber(node.getStartPosition()));
}
return true;
}
public boolean visit(SingleMemberAnnotation annotation) {
System.out.println(annotation.getTypeName());
System.out.println(annotation.getValue());
return true;
}
public boolean visit(NormalAnnotation annotation) {
System.out.println(annotation.getTypeName());
return true;
}
public boolean visit(MarkerAnnotation annotation) {
System.out.println(annotation.getTypeName());
return true;
}
});
} catch (IOException ex) {
Logger.getLogger(Parser.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
Hope this helps with your AST http://www.eclipse.org/jdt/apt/index.php ^_^
First some advice:
You should check out the definitive antlr reference. One week is enough of time for that. In there, it says and I quote
"Visitors walk parse trees by explicitly calling interface ParseTreeVisitor’s visit()
method on child nodes."
The class "ExtractInterfaceVisitor" is quite redundant, since You only have one inner class inside of it...
Try debugging the code, put breakpoints in your visit methods and see what happens. Chances are that only top-level-rule visitor will be executed.
If You don't see the output "Class Declaration", I could only assume that Demo.java does not start with a class declaration.
EDIT: The first visit method that is called is the one corresponding to the parser rule that matches the first line in Demo.java.
I was facing the exact same issue on Eclipse Neon and I resolved it in the following way.
Step1: I removed Antlr4 sdk from eclipse. This is a plugin to auto generate Visitor and Listener classes after parsing Java.g4 grammar file.
Step2: Created a source folder in my project
src/main/antlr4
Inside this I created the package
com.helper.tony.grammar
I placed my Java.g4 file inside this package. This was done to create my Visitors and Listeners with the same package structure in the output directory. Output directory location is
target/generated-sources/antlr4
as indicated by output directory tag in pom.xml.
I also added this into my build path.
Step3: Setting pom file properly.I added tag into my pom file and did set maven compiler plugin source and target version to Java 1.8.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.helper.mani</groupId>
<artifactId>builder_factory</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.antlr/antlr4 -->
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>4.6</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>4.6</version>
<configuration>
<grammars>Java.g4</grammars>
<visitor>true</visitor>
<listener>true</listener>
<inputEncoding>UTF-8</inputEncoding>
<outputDirectory>${project.build.directory}/generated-sources/antlr4</outputDirectory>
</configuration>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>${maven.compiler.source}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
</plugins>
</build>
Step4: Executed the following maven goal to generate Visitor and Listener classes.
org.antlr:antlr4-maven-plugin:4.6:antlr4
With all the files in place, I was able to write my parser easily.
EuReKA!!!
I know this is an old post but I found it because I had the same problem. I re-generated my parser class while making sure the '-visitor' switch was in the command line. Dropping only the parser class into the existing project fixed the issue.
By default, ANTLR generates listeners and does not generate visitors.
If you are using the ANTLR 4 tool in Eclipse, you should open Project properties, select ANTLR 4 / Tool and check the "Generate parse tree visitors (-visitor)" option. You may need to touch (edit) the grammar file to force the ANTLR 4 tool to re-generate the sources.
If you use maven, you should explicitly enable visitor generation:
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>${antlr.version}</version>
<configuration>
<listener>false</listener>
<visitor>true</visitor>
</configuration>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>

Understanding the context data structure in Antlr4

I'm trying to write a code translator in Java with the help of Antlr4 and had great success with the grammar part so far. However I'm now banging my head against a wall wrapping my mind around the parse tree data structure that I need to work on after my input has been parsed.
I'm trying to use the visitor template to go over my parse tree. I'll show you an example to illustrate the points of my confusion.
My grammar:
grammar pqlc;
// Lexer
//Schlüsselwörter
EXISTS: 'exists';
REDUCE: 'reduce';
QUERY: 'query';
INT: 'int';
DOUBLE: 'double';
CONST: 'const';
STDVECTOR: 'std::vector';
STDMAP: 'std::map';
STDSET: 'std::set';
C_EXPR: 'c_expr';
INTEGER_LITERAL : (DIGIT)+ ;
fragment DIGIT: '0'..'9';
DOUBLE_LITERAL : DIGIT '.' DIGIT+;
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
DOT : '.';
EQUAL : '==';
LE : '<=';
GE : '>=';
GT : '>';
LT : '<';
ADD : '+';
MUL : '*';
AND : '&&';
COLON : ':';
IDENTIFIER : JavaLetter JavaLetterOrDigit*;
fragment JavaLetter : [a-zA-Z$_]; // these are the "java letters" below 0xFF
fragment JavaLetterOrDigit : [a-zA-Z0-9$_]; // these are the "java letters or digits" below 0xFF
WS
: [ \t\r\n\u000C]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
// Parser
//start_rule: query;
query :
quant_expr
| qexpr+
| IDENTIFIER // order IDENTIFIER and qexpr+?
| numeral
| c_expr //TODO
;
c_type : INT | DOUBLE | CONST;
bin_op: AND | ADD | MUL | EQUAL | LT | GT | LE| GE;
qexpr:
LPAREN query RPAREN bin_op_query?
// query bin_op query
| IDENTIFIER bin_op_query? // copied from query to resolve left recursion problem
| numeral bin_op_query? // ^
| quant_expr bin_op_query? // ^
|c_expr bin_op_query?
// query.find(query)
| IDENTIFIER find_query? // copied from query to resolve left recursion problem
| numeral find_query? // ^
| quant_expr find_query?
|c_expr find_query?
// query[query]
| IDENTIFIER array_query? // copied from query to resolve left recursion problem
| numeral array_query? // ^
| quant_expr array_query?
|c_expr array_query?
// | qexpr bin_op_query // bad, resolved by quexpr+ in query
;
bin_op_query: bin_op query bin_op_query?; // resolve left recursion of query bin_op query
find_query: '.''find' LPAREN query RPAREN;
array_query: LBRACK query RBRACK;
quant_expr:
quant id ':' query
| QUERY LPAREN match RPAREN ':' query
| REDUCE LPAREN IDENTIFIER RPAREN id ':' query
;
match:
STDVECTOR LBRACK id RBRACK EQUAL cm
| STDMAP '.''find' LPAREN cm RPAREN EQUAL cm
| STDSET '.''find' LPAREN cm RPAREN
;
cm:
IDENTIFIER
| numeral
| c_expr //TODO
;
quant :
EXISTS;
id :
c_type IDENTIFIER
| IDENTIFIER // Nach Seite 2 aber nicht der Übersicht. Laut übersicht id -> aber dann wäre Regel 1 ohne +
;
numeral :
INTEGER_LITERAL
| DOUBLE_LITERAL
;
c_expr:
C_EXPR
;
Now let's parse the following string:
double x: x >= c_expr
Visually I'll get this tree:
Let's say my visitor is in the visitQexpr(#NotNull pqlcParser.QexprContext ctx) routine when it hits the branch Qexpr(x bin_op_query).
My question is, how can I tell that the left children ("x" in the tree) is a terminal node, or more specifically an "IDENTIFIER"? There are no visiting rules for Terminal nodes since they aren't rules.
ctx.getChild(0) has no RuleIndex. I guess I could use that to check if I'm in a terminal or not, but that still wouldn't tell me if I was in IDENTIFIER or another kind of terminal token. I need to be able to tell the difference somehow.
I had more questions but in the time it took me to write the explanation I forgot them :<
Thanks in advance.
You can add labels to tokens and access them/check if they exist in the surrounding context:
id :
c_type labelA = IDENTIFIER
| labelB = IDENTIFIER
;
You could also do this to create different visits:
id :
c_type IDENTIFIER #idType1 //choose more appropriate names!
| IDENTIFIER #idType2
;
This will create different visitors for the two alternatives and I suppose (i.e. have not verified) that the visitor for id will not be called.
I prefer the following approach though:
id :
typeDef
| otherId
;
typeDef: c_type IDENTIFIER;
otherId : IDENTIFIER ;
This is a more heavily typed system. But you can very specifically visit nodes. Some rules of thumb I use:
Use | only when all alternatives are parser rules.
Wrap each Token in a parser rule (like otherId) to give them "more meaning".
It's ok to mix parser rules and tokens, if the tokens are not really important (like ;) and therefore not needed in the parse tree.

Regular Expressions - tree grammar Antlr Java

I'm trying to write a program in ANTLR (Java) concerning simplifying regular expression. I have already written some code (grammar file contents below)
grammar Regexp_v7;
options{
language = Java;
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
}
tokens{
DOT;
REPEAT;
RANGE;
NULL;
}
fragment
ZERO
: '0'
;
fragment
DIGIT
: '1'..'9'
;
fragment
EPSILON
: '#'
;
fragment
FI
: '%'
;
ID
: EPSILON
| FI
| 'a'..'z'
| 'A'..'Z'
;
NUMBER
: ZERO
| DIGIT (ZERO | DIGIT)*
;
WHITESPACE
: ('\r' | '\n' | ' ' | '\t' ) + {$channel = HIDDEN;}
;
list
: (reg_exp ';'!)*
;
term
: ID -> ID
| '('! reg_exp ')'!
;
repeat_exp
: term ('{' range_exp '}')+ -> ^(REPEAT term (range_exp)+)
| term -> term
;
range_exp
: NUMBER ',' NUMBER -> ^(RANGE NUMBER NUMBER)
| NUMBER (',') -> ^(RANGE NUMBER NULL)
| ',' NUMBER -> ^(RANGE NULL NUMBER)
| NUMBER -> ^(RANGE NUMBER NUMBER)
;
kleene_exp
: repeat_exp ('*'^)*
;
concat_exp
: kleene_exp (kleene_exp)+ -> ^(DOT kleene_exp (kleene_exp)+)
| kleene_exp -> kleene_exp
;
reg_exp
: concat_exp ('|'^ concat_exp)*
;
My next goal is to write down tree grammar code, which is able to simplify regular expressions (e.g. a|a -> a , etc.). I have done some coding (see text below), but I have troubles with defining rule that treats nodes as subtrees (in order to simplify following kind of expressions e.g.: (a|a)|(a|a) to a, etc.)
tree grammar Regexp_v7Walker;
options{
language = Java;
tokenVocab = Regexp_v7;
ASTLabelType = CommonTree;
output=AST;
backtrack = true;
}
tokens{
NULL;
}
bottomup
: ^('*' ^('*' e=.)) -> ^('*' $e) //a** -> a*
| ^('|' i=.* j=.* {$i.tree.toStringTree() == $j.tree.toStringTree()} )
-> $i // There are 3 errors while this line is up and running:
// 1. CommonTree cannot be resolved,
// 2. i.tree cannot be resolved or is not a field,
// 3. i cannot be resolved.
;
Small driver class:
public class Regexp_Test_v7 {
public static void main(String[] args) throws RecognitionException {
CharStream stream = new ANTLRStringStream("a***;a|a;(ab)****;ab|ab;ab|aa;");
Regexp_v7Lexer lexer = new Regexp_v7Lexer(stream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Regexp_v7Parser parser = new Regexp_v7Parser(tokenStream);
list_return list = parser.list();
CommonTree t = (CommonTree) list.getTree();
System.out.println("Original tree: " + t.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Regexp_v7Walker s = new Regexp_v7Walker(nodes);
t = (CommonTree)s.downup(t);
System.out.println("Simplified tree: " + t.toStringTree());
Can anyone help me with solving this case?
Thanks in advance and regards.
Now, I'm no expert, but in your tree grammar:
add filter=true
change the second line of bottomup rule to:
^('|' i=. j=. {i.toStringTree().equals(j.toStringTree()) }? ) -> $i }
If I'm not mistaken by using i=.* you're allowing i to be non-existent and you'll get a NullPointerException on conversion to a String.
Both i and j are of type CommonTree because you've set it up this way: ASTLabelType = CommonTree, so you should call i.toStringTree().
And since it's Java and you're comparing Strings, use equals().
Also to make the expression in curly brackets a predicate, you need a question mark after the closing one.

lexer that takes "not" but not "not like"

I need a small trick to get my parser completely working.
I use antlr to parse boolean queries.
a query is composed of elements, linked together by ands, ors and nots.
So I can have something like :
"(P or not Q or R) or (( not A and B) or C)"
Thing is, an element can be long, and is generally in the form :
a an_operator b
for example :
"New-York matches NY"
Trick, one of the an_operator is "not like"
So I would like to modify my lexer so that the not checks that there is no like after it, to avoid parsing elements containing "not like" operators.
My current grammar is here :
// save it in a file called Logic.g
grammar Logic;
options {
output=AST;
}
// parser/production rules start with a lower case letter
parse
: expression EOF! // omit the EOF token
;
expression
: orexp
;
orexp
: andexp ('or'^ andexp)* // make `or` the root
;
andexp
: notexp ('and'^ notexp)* // make `and` the root
;
notexp
: 'not'^ atom // make `not` the root
| atom
;
atom
: ID
| '('! expression ')'! // omit both `(` andexp `)`
;
// lexer/terminal rules start with an upper case letter
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
Any help would be appreciated.
Thanks !
Here's a possible solution:
grammar Logic;
options {
output=AST;
}
tokens {
NOT_LIKE;
}
parse
: expression EOF!
;
expression
: orexp
;
orexp
: andexp (Or^ andexp)*
;
andexp
: fuzzyexp (And^ fuzzyexp)*
;
fuzzyexp
: (notexp -> notexp) ( Matches e=notexp -> ^(Matches $fuzzyexp $e)
| Not Like e=notexp -> ^(NOT_LIKE $fuzzyexp $e)
| Like e=notexp -> ^(Like $fuzzyexp $e)
)?
;
notexp
: Not^ atom
| atom
;
atom
: ID
| '('! expression ')'!
;
And : 'and';
Or : 'or';
Not : 'not';
Like : 'like';
Matches : 'matches';
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
which will parse the input "A not like B or C like D and (E or not F) and G matches H" into the following AST:

Categories

Resources