ANTLR NoViableAltException with JAVA - java

In my grammar with antlrworks, I can get noviablealtexception for rules like if, while which need corresponding right and left brackets. However, in java, i cannot get noviablealtexception.
loop_statement: (WHILE LPAREN expr RPAREN statement)
| (DO statement WHILE LPAREN expr RPAREN);
condition_statement
: IF LPAREN expr RPAREN statement (options {greedy=true;}: ELSE statement)?
In statement rule I have a block rule which is,
statement_blocks
: (LBRACE statement* RBRACE)
;
And statement rule is below,
statement
: var_dec
| statement_blocks
| condition_statement
| loop_statement
| expr_statement
;
Before posting this I've checked some examples. I think i need to add EOF at the end of each rule. When I add EOF for those rules, I get different errors. For example,
loop_statement: ((WHILE LPAREN expr RPAREN statement)
| (DO statement WHILE LPAREN expr RPAREN)) EOF;
condition_statement
: (
(IF LPAREN expr RPAREN statement (options {greedy=true;}: ELSE statement)?
)EOF
These are what I get for the following inputs;
if(s==d){
d=s;
if(a=n){
s=h;
}
a=g;
}
line 6:0 missing EOF at 'a'
When I remove the first left bracket from the first "if"
if(s==d)
d=s;
if(a=n){
s=h;
}
a=g;
}
testcases/new file line 3:0 missing EOF at 'if',
testcases/new file line 6:0 missing EOF at 'a'
while(s==d){
d=s;
while(a=n){
s=h;
}
a=g;
}
line 6:0 missing EOF at 'a'
When I remove the first left bracket from the first "while"
while(s==d)
d=s;
while(a=n){
s=h;
}
a=g;
}
testcases/new file line 3:0 missing EOF at 'while'
testcases/new file line 6:0 missing EOF at 'a'

No, you need to place EOF at the end of your "main" parser rule, not after more than one statement. By doing so, the parser expects the end of the file after such statements (which is not correct, of course).
My guess is that your entry point does not contain EOF causing the parser to stop prematurely instead of throwing an error/exception when it stumbles upon invalid input.
Here's a demo (note the EOF after the parse rule):
T.g
grammar T;
parse
: statement+ EOF
;
statement
: var_dec
| statement_blocks
| c=condition_statement {System.out.println("parsed :: " + $c.text);}
;
var_dec
: ID '=' ID ';'
;
statement_blocks
: LBRACE statement* RBRACE
;
condition_statement
: IF LPAREN expr RPAREN statement (options {greedy=true;}: ELSE statement)?
;
expr
: ID '==' ID
;
IF : 'if';
ELSE : 'else';
ID : 'a'..'z'+;
LBRACE : '{';
RBRACE : '}';
LPAREN : '(';
RPAREN : ')';
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
which can be tested with the class:
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRFileStream("in.txt"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
Testing it all
If you now parse the input file (in.txt):
if(s==d) {
d=s;
if(a==n){
s=h;
}
a=g;
}
there's no problem, as you can see:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
parsed :: if(a==n){s=h;}
parsed :: if(s==d){d=s;if(a==n){s=h;}a=g;}
And if you remove a ( or ) from the file in.txt, you will get the following (similar) error:
in.txt line 1:8 missing RPAREN at '{'

Related

Antlr3: building parse tree for qualified names

I couldn't find a question/answer that comes close to helping with my issue. Therefore, I am posting this question here.
I am trying to build a parse tree for qualified names. The below example shows an example.
E.g.,
foo_boo.aaa.ccc1_c
Here I have dot separated words. I am using antlr3 and below is my grammer.
parse
: expr
;
list_expr : <I removed the grammar here>
SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
QualifiedType : SimpleType | SimpleType ('\.' SimpleType)+;
expr : list_expr
| QualifiedType
| union_expr;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here, SympleType represents grammar for a word. My requirement is to build the grammar for the QualifiedType. The current grammar given in above is not working as expected (QualifiedType : SimpleType | SimpleType ('\.'SimpleType)+;). How to write correct grammar for Qualified names (Dot separated words)?
Make QualifiedType a parser rule instead of a lexer rule:
qualifiedType : SimpleType ('.' SimpleType)*;
Also, '\.' does not need an escape: '.' is OK.
EDIT
You'll have to set the output to AST and apply some tree rewrite rules to make it work properly. Here's a quick demo:
grammar T;
options {
output=AST;
}
tokens {
Root;
QualifiedName;
}
parse
: qualifiedType EOF -> ^(Root qualifiedType)
;
qualifiedType
: SimpleType ('.' SimpleType)* -> ^(QualifiedName SimpleType+)
;
SimpleType
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
;
And if you now run the code:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.stringtemplate.StringTemplate;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("foo_boo.aaa.ccc1_c"));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
you'll get some DOT output, which corresponds to the following AST:

ANTLR4 Grammar only matching first part of parser rule

I'm using ANTLR 4 to try and parse task definitions. The task definitions look a little like the following:
task = { priority = 10; };
My grammar file then looks like the following:
grammar TaskGrammar;
/* Parser rules */
task : 'task' ASSIGNMENT_OP block EOF;
logical_entity : (TRUE | FALSE) # LogicalConst
| IDENTIFIER # LogicalVariable
;
numeric_entity : DECIMAL # NumericConst
| IDENTIFIER # NumericVariable
;
block : LBRACE (statement)* RBRACE SEMICOLON;
assignment : IDENTIFIER ASSIGNMENT_OP DECIMAL SEMICOLON
| IDENTIFIER ASSIGNMENT_OP block SEMICOLON
| IDENTIFIER ASSIGNMENT_OP QUOTED_STRING SEMICOLON
| IDENTIFIER ASSIGNMENT_OP CONSTANT SEMICOLON;
functionCall : IDENTIFIER LPAREN (parameter)*? RPAREN SEMICOLON;
parameter : DECIMAL
| QUOTED_STRING;
statement : assignment
| functionCall;
/* Lexxer rules */
IF : 'if' ;
THEN : 'then';
AND : 'and' ;
OR : 'or' ;
TRUE : 'true' ;
FALSE : 'false' ;
MULT : '*' ;
DIV : '/' ;
PLUS : '+' ;
MINUS : '-' ;
GT : '>' ;
GE : '>=' ;
LT : '<' ;
LE : '<=' ;
EQ : '==' ;
ASSIGNMENT_OP : '=' ;
LPAREN : '(' ;
RPAREN : ')' ;
LBRACE : '{' ;
RBRACE : '}' ;
SEMICOLON : ';' ;
// DECIMAL, IDENTIFIER, COMMENTS, WS are set using regular expressions
DECIMAL : '-'?[0-9]+('.'[0-9]+)? ;
IDENTIFIER : [a-zA-Z_][a-zA-Z_0-9]* ;
Value: STR_EXT | QUOTED_STRING | SINGLE_QUOTED
;
STR_EXT
:
[a-zA-Z0-9_/\.,\-:=~+!?$&^*\[\]#|]+;
Comment
:
'#' ~[\r\n]*;
CONSTANT : StringCharacters;
QUOTED_STRING
:
'"' StringCharacters? '"'
;
fragment
StringCharacters
: (~["\\] | EscapeSequence)+
;
fragment
EscapeSequence
: '\\' [btnfr"'\\]?
;
SINGLE_QUOTED
:
'\'' ~['\\]* '\'';
// COMMENT and WS are stripped from the output token stream by sending
// to a different channel 'skip'
COMMENT : '//' .+? ('\n'|EOF) -> skip ;
WS : [ \r\t\u000C\n]+ -> skip ;
This grammar compiles fine in ANTLR, but when it comes to trying to use the parser, I get the following error:
line 1:0 mismatched input 'task = { priority = 10; return = AND; };' expecting 'task'
org.antlr.v4.runtime.InputMismatchException
It looks like the parser isn't recognising the block part of the definition, but I can't quite see why. The block parse rule definition should match as far as I can tell. I would expect to have a TaskContext, with a child BlockContext containing a single AssignmentContext. I get the TaskContext, but it has the above exception.
Am I missing something here? This is my first attempt at using Antler, so may be getting confused between Lexxer and Parser rules...
Your STR_EXT consumes the entire input. That rule has to go: ANTLR's lexer will always try to match as much characters as possible.
I also see that CONSTANT might consume that entire input. It has to go to, or at least be changed to consume less chars.

Visitor methods for Java grammar not working in ANTLR 4.4

I am new to ANTLR framework. I have been working around this for a week.
Now am in a situation where i need to parse the Java file and extract the data.
Am using ANTLR 4 for parsing. I create the Lexer, Parser and Visitor files using ANTLR in built tool.
When I try to over ride the Visitor method I doesn't gets called and returns null value.
Here is the coding.
I have generated JavaLexer, JavaParser, JavaVisitor, JavaBaseVisitor, JavaListener
package com.antlr;
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import java.io.FileInputStream;
import java.io.InputStream;
public class ExtractInterfaceVisitor {
public static class AnnVisitor extends JavaBaseVisitor<String> {
#Override
public String visitAnnotation (JavaParser.AnnotationContext ctx)
{
System.out.println("Annotation");
return ctx.getText();
}
#Override
public String visitClassDeclaration( JavaParser.ClassDeclarationContext ctx)
{
System.out.println("Class Declaration");
return ctx.getText();
}
}
public static void main(String[] args) throws Exception {
String inputFile = null;
inputFile = "C:/Users/User/Desktop/antlr/java1/Demo.java"; //Contains a Java File
InputStream is = System.in;
is = new FileInputStream(inputFile);
ANTLRInputStream input = new ANTLRInputStream(is);
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
parser.setBuildParseTree(true); // tell ANTLR to build a parse tree
ParseTree tree = parser.compilationUnit(); // parse
// show tree in text form
//System.out.println(tree.toStringTree(parser));
AnnVisitor Visitor = new AnnVisitor();
String result = Visitor.visit(tree);
System.out.println("visitor result = "+result);
}
}
Demo.java
#ClassAnnotation(Value="Class")
public class Demo {
#MethodAnnotation(Value="Method")
void MethodName(int x, String y) { }
int x;
int[ ] g(/*no args*/)
{ }
int average()
{ }
List<Map<String, Integer>>[] h() { return null; }
}
Java.g4
/** Java 1.6 grammar (ANTLR v4). Derived from
http://docs.oracle.com/javase/specs/jls/se7/jls7.pdf
and JavaParser.g from ANTLR v3
*/
grammar Java;
#lexer::members {
protected boolean enumIsKeyword = true;
protected boolean assertIsKeyword = true;
}
// starting point for parsing a java file
compilationUnit
: packageDeclaration? importDeclaration* typeDeclaration*
EOF
;
packageDeclaration
: 'package' qualifiedName ';'
;
importDeclaration
: 'import' 'static'? qualifiedName ('.' '*')? ';'
;
typeDeclaration
: classOrInterfaceModifier*
( classDeclaration
| interfaceDeclaration
| enumDeclaration
)
| ';'
;
classDeclaration
: 'class' Identifier typeParameters? ('extends' type)?
('implements' typeList)?
classBody
;
enumDeclaration
: ENUM Identifier ('implements' typeList)? enumBody
;
interfaceDeclaration
: normalInterfaceDeclaration
| annotationTypeDeclaration
;
classOrInterfaceModifier
: annotation // class or interface
| 'public' // class or interface
| 'protected' // class or interface
| 'private' // class or interface
| 'abstract' // class or interface
| 'static' // class or interface
| 'final' // class only -- does not apply to interfaces
| 'strictfp' // class or interface
;
modifiers
: modifier*
;
typeParameters
: '<' typeParameter (',' typeParameter)* '>'
;
typeParameter
: Identifier ('extends' typeBound)?
;
typeBound
: type ('&' type)*
;
enumBody
: '{' enumConstants? ','? enumBodyDeclarations? '}'
;
enumConstants
: enumConstant (',' enumConstant)*
;
enumConstant
: annotations? Identifier arguments? classBody?
;
enumBodyDeclarations
: ';' (classBodyDeclaration)*
;
normalInterfaceDeclaration
: 'interface' Identifier typeParameters? ('extends' typeList)? interfaceBody
;
typeList
: type (',' type)*
;
classBody
: '{' classBodyDeclaration* '}'
;
interfaceBody
: '{' interfaceBodyDeclaration* '}'
;
classBodyDeclaration
: ';'
| 'static'? block
| modifiers member
;
member
: genericMethodDeclaration
| methodDeclaration
| fieldDeclaration
| constructorDeclaration
| interfaceDeclaration
| classDeclaration
;
methodDeclaration
: type Identifier formalParameters ('[' ']')* methodDeclarationRest
| 'void' Identifier formalParameters methodDeclarationRest
;
methodDeclarationRest
: ('throws' qualifiedNameList)?
( methodBody
| ';'
)
;
genericMethodDeclaration
: typeParameters methodDeclaration
;
fieldDeclaration
: type variableDeclarators ';'
;
constructorDeclaration
: typeParameters? Identifier formalParameters
('throws' qualifiedNameList)? constructorBody
;
interfaceBodyDeclaration
: modifiers interfaceMemberDecl
| ';'
;
interfaceMemberDecl
: interfaceMethodOrFieldDecl
| interfaceGenericMethodDecl
| 'void' Identifier voidInterfaceMethodDeclaratorRest
| interfaceDeclaration
| classDeclaration
;
interfaceMethodOrFieldDecl
: type Identifier interfaceMethodOrFieldRest
;
interfaceMethodOrFieldRest
: constantDeclaratorsRest ';'
| interfaceMethodDeclaratorRest
;
voidMethodDeclaratorRest
: formalParameters ('throws' qualifiedNameList)?
( methodBody
| ';'
)
;
interfaceMethodDeclaratorRest
: formalParameters ('[' ']')* ('throws' qualifiedNameList)? ';'
;
interfaceGenericMethodDecl
: typeParameters (type | 'void') Identifier
interfaceMethodDeclaratorRest
;
voidInterfaceMethodDeclaratorRest
: formalParameters ('throws' qualifiedNameList)? ';'
;
constantDeclarator
: Identifier constantDeclaratorRest
;
variableDeclarators
: variableDeclarator (',' variableDeclarator)*
;
variableDeclarator
: variableDeclaratorId ('=' variableInitializer)?
;
constantDeclaratorsRest
: constantDeclaratorRest (',' constantDeclarator)*
;
constantDeclaratorRest
: ('[' ']')* '=' variableInitializer
;
variableDeclaratorId
: Identifier ('[' ']')*
;
variableInitializer
: arrayInitializer
| expression
;
arrayInitializer
: '{' (variableInitializer (',' variableInitializer)* (',')? )? '}'
;
modifier
: annotation
| 'public'
| 'protected'
| 'private'
| 'static'
| 'abstract'
| 'final'
| 'native'
| 'synchronized'
| 'transient'
| 'volatile'
| 'strictfp'
;
packageOrTypeName
: qualifiedName
;
enumConstantName
: Identifier
;
typeName
: qualifiedName
;
type: classOrInterfaceType ('[' ']')*
| primitiveType ('[' ']')*
;
classOrInterfaceType
: Identifier typeArguments? ('.' Identifier typeArguments? )*
;
primitiveType
: 'boolean'
| 'char'
| 'byte'
| 'short'
| 'int'
| 'long'
| 'float'
| 'double'
;
variableModifier
: 'final'
| annotation
;
typeArguments
: '<' typeArgument (',' typeArgument)* '>'
;
typeArgument
: type
| '?' (('extends' | 'super') type)?
;
qualifiedNameList
: qualifiedName (',' qualifiedName)*
;
formalParameters
: '(' formalParameterDecls? ')'
;
formalParameterDecls
: variableModifiers type formalParameterDeclsRest
;
formalParameterDeclsRest
: variableDeclaratorId (',' formalParameterDecls)?
| '...' variableDeclaratorId
;
methodBody
: block
;
constructorBody
: '{' explicitConstructorInvocation? blockStatement* '}'
;
explicitConstructorInvocation
: nonWildcardTypeArguments? ('this' | 'super') arguments ';'
| primary '.' nonWildcardTypeArguments? 'super' arguments ';'
;
qualifiedName
: Identifier ('.' Identifier)*
;
literal
: integerLiteral
| FloatingPointLiteral
| CharacterLiteral
| StringLiteral
| booleanLiteral
| 'null'
;
integerLiteral
: HexLiteral
| OctalLiteral
| DecimalLiteral
;
booleanLiteral
: 'true'
| 'false'
;
// ANNOTATIONS
annotations
: annotation+
;
annotation
: '#' annotationName ( '(' ( elementValuePairs | elementValue )? ')' )?
;
annotationName
: Identifier ('.' Identifier)*
;
elementValuePairs
: elementValuePair (',' elementValuePair)*
;
elementValuePair
: Identifier '=' elementValue
;
elementValue
: expression
| annotation
| elementValueArrayInitializer
;
elementValueArrayInitializer
: '{' (elementValue (',' elementValue)*)? (',')? '}'
;
annotationTypeDeclaration
: '#' 'interface' Identifier annotationTypeBody
;
annotationTypeBody
: '{' (annotationTypeElementDeclaration)* '}'
;
annotationTypeElementDeclaration
: modifiers annotationTypeElementRest
;
annotationTypeElementRest
: type annotationMethodOrConstantRest ';'
| classDeclaration ';'?
| normalInterfaceDeclaration ';'?
| enumDeclaration ';'?
| annotationTypeDeclaration ';'?
;
annotationMethodOrConstantRest
: annotationMethodRest
| annotationConstantRest
;
annotationMethodRest
: Identifier '(' ')' defaultValue?
;
annotationConstantRest
: variableDeclarators
;
defaultValue
: 'default' elementValue
;
// STATEMENTS / BLOCKS
block
: '{' blockStatement* '}'
;
blockStatement
: localVariableDeclarationStatement
| classDeclaration
| interfaceDeclaration
| statement
;
localVariableDeclarationStatement
: localVariableDeclaration ';'
;
localVariableDeclaration
: variableModifiers type variableDeclarators
;
variableModifiers
: variableModifier*
;
statement
: block
| ASSERT expression (':' expression)? ';'
| 'if' parExpression statement ('else' statement)?
| 'for' '(' forControl ')' statement
| 'while' parExpression statement
| 'do' statement 'while' parExpression ';'
| 'try' block
( catches 'finally' block
| catches
| 'finally' block
)
| 'switch' parExpression switchBlock
| 'synchronized' parExpression block
| 'return' expression? ';'
| 'throw' expression ';'
| 'break' Identifier? ';'
| 'continue' Identifier? ';'
| ';'
| statementExpression ';'
| Identifier ':' statement
;
catches
: catchClause (catchClause)*
;
catchClause
: 'catch' '(' formalParameter ')' block
;
formalParameter
: variableModifiers type variableDeclaratorId
;
switchBlock
: '{' switchBlockStatementGroup* switchLabel* '}'
;
switchBlockStatementGroup
: switchLabel+ blockStatement*
;
switchLabel
: 'case' constantExpression ':'
| 'case' enumConstantName ':'
| 'default' ':'
;
forControl
: enhancedForControl
| forInit? ';' expression? ';' forUpdate?
;
forInit
: localVariableDeclaration
| expressionList
;
enhancedForControl
: variableModifiers type Identifier ':' expression
;
forUpdate
: expressionList
;
// EXPRESSIONS
parExpression
: '(' expression ')'
;
expressionList
: expression (',' expression)*
;
statementExpression
: expression
;
constantExpression
: expression
;
expression
: primary
| expression '.' Identifier
| expression '.' 'this'
| expression '.' 'super' '(' expressionList? ')'
| expression '.' 'new' Identifier '(' expressionList? ')'
| expression '.' 'super' '.' Identifier arguments?
| expression '.' explicitGenericInvocation
| expression '[' expression ']'
| expression '(' expressionList? ')'
| expression ('++' | '--')
| ('+'|'-'|'++'|'--') expression
| ('~'|'!') expression
| '(' type ')' expression
| 'new' creator
| expression ('*'|'/'|'%') expression
| expression ('+'|'-') expression
| expression ('<' '<' | '>' '>' '>' | '>' '>') expression
| expression ('<' '=' | '>' '=' | '>' | '<') expression
| expression 'instanceof' type
| expression ('==' | '!=') expression
| expression '&' expression
| expression '^' expression
| expression '|' expression
| expression '&&' expression
| expression '||' expression
| expression '?' expression ':' expression
| expression
('^='<assoc=right>
|'+='<assoc=right>
|'-='<assoc=right>
|'*='<assoc=right>
|'/='<assoc=right>
|'&='<assoc=right>
|'|='<assoc=right>
|'='<assoc=right>
|'>' '>' '='<assoc=right>
|'>' '>' '>' '='<assoc=right>
|'<' '<' '='<assoc=right>
|'%='<assoc=right>
)
expression
;
primary
: '(' expression ')'
| 'this'
| 'super'
| literal
| Identifier
| type '.' 'class'
| 'void' '.' 'class'
;
creator
: nonWildcardTypeArguments createdName classCreatorRest
| createdName (arrayCreatorRest | classCreatorRest)
;
createdName
: classOrInterfaceType
| primitiveType
;
innerCreator
: nonWildcardTypeArguments? Identifier classCreatorRest
;
explicitGenericInvocation
: nonWildcardTypeArguments Identifier arguments
;
arrayCreatorRest
: '['
( ']' ('[' ']')* arrayInitializer
| expression ']' ('[' expression ']')* ('[' ']')*
)
;
classCreatorRest
: arguments classBody?
;
nonWildcardTypeArguments
: '<' typeList '>'
;
arguments
: '(' expressionList? ')'
;
// LEXER
HexLiteral : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
DecimalLiteral : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
OctalLiteral : '0' ('0'..'7')+ IntegerTypeSuffix? ;
fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
IntegerTypeSuffix : ('l'|'L') ;
FloatingPointLiteral
: ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
| '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
| ('0'..'9')+ Exponent FloatTypeSuffix?
| ('0'..'9')+ FloatTypeSuffix
| ('0x' | '0X') (HexDigit )*
('.' (HexDigit)*)?
( 'p' | 'P' )
( '+' | '-' )?
( '0' .. '9' )+
FloatTypeSuffix?
;
fragment
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
CharacterLiteral
: '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
;
StringLiteral
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
fragment
EscapeSequence
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UnicodeEscape
| OctalEscape
;
fragment
OctalEscape
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
ENUM: 'enum' {if (!enumIsKeyword) setType(Identifier);}
;
ASSERT
: 'assert' {if (!assertIsKeyword) setType(Identifier);}
;
Identifier
: Letter (Letter|JavaIDDigit)*
;
/**I found this char range in JavaCC's grammar, but Letter and Digit overlap.
Still works, but...
*/
fragment
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f' |
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
fragment
JavaIDDigit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN) // match anything between /* and */
;
WS : [ \r\t\u000C\n]+ -> channel(HIDDEN)
;
LINE_COMMENT
: '//' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN)
;
The Final out put which i get is: Visitor result = null
I don't know where i went wrong. It doesn't call the visit methods also. Please correct me.
The generated parse tree visitor extends AbstractParseTreeVisitor, which has two methods which would be helpful to override to get the result you are looking for.
Firstly, AbstractParseTreeVisitor#defaultResult() returns the default result for every node in the parse tree you visit. By default, it returns null.
Second, AbstractParseTreeVisitor#aggregateResult(T,T) aggregates the last node's visited result with the total result so far.
You have not overridden either of these methods, so aggregateResult(T,T) is returning the default result of the last parse tree node visited, which is giving you a null result.
So, if you want to fix this, I would override defaultResult to look something like this:
#Override
public String aggregateResult(String aggregate, String nextResult) {
if (aggregate == null) {
return nextResult;
}
if (nextResult == null) {
return aggregrate;
}
StringBuilder sb = new StringBuilder(aggregate);
sb.append(" ");
sb.append(nextResult);
return sb.toString();
}
If you don't want to do the null checks in your aggregateResult override, you could override defaultResult to return the empty String and then have aggregateResult append every result to the aggregate, but I personally prefer the first solution.
I also used ANTLR before but I ended up using Eclipse JDT is updated for JAVA8 and its really simple to get started with the AST.
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.eclipse.jdt.core.dom.AST;
import org.eclipse.jdt.core.dom.ASTParser;
import org.eclipse.jdt.core.dom.ASTVisitor;
import org.eclipse.jdt.core.dom.CompilationUnit;
import org.eclipse.jdt.core.dom.MarkerAnnotation;
import org.eclipse.jdt.core.dom.NormalAnnotation;
import org.eclipse.jdt.core.dom.SimpleName;
import org.eclipse.jdt.core.dom.SingleMemberAnnotation;
import org.eclipse.jdt.core.dom.VariableDeclarationFragment;
/**
*
* #author Lethe
*/
public class Parser {
public Parser() {
}
public String readFile(String path, Charset encoding)
throws IOException {
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
public void parseSource() {
try {
ASTParser parser = ASTParser.newParser(AST.JLS3);
String source = readFile("test.java", Charset.defaultCharset());
//parser.setSource("public class A { int i = 9; \n int j; \n ArrayList<Integer> al = new ArrayList<Integer>();j=1000; }".toCharArray());
parser.setSource(source.toCharArray());
parser.setKind(ASTParser.K_COMPILATION_UNIT);
//ASTNode node = parser.createAST(null);
final CompilationUnit cu = (CompilationUnit) parser.createAST(null);
cu.accept(new ASTVisitor() {
Set names = new HashSet();
public boolean visit(VariableDeclarationFragment node) {
SimpleName name = node.getName();
this.names.add(name.getIdentifier());
System.out.println("Declaration of '" + name + "' at line" + cu.getLineNumber(name.getStartPosition()));
return false; // do not continue to avoid usage info
}
public boolean visit(SimpleName node) {
if (this.names.contains(node.getIdentifier())) {
System.out.println("Usage of '" + node + "' at line " + cu.getLineNumber(node.getStartPosition()));
}
return true;
}
public boolean visit(SingleMemberAnnotation annotation) {
System.out.println(annotation.getTypeName());
System.out.println(annotation.getValue());
return true;
}
public boolean visit(NormalAnnotation annotation) {
System.out.println(annotation.getTypeName());
return true;
}
public boolean visit(MarkerAnnotation annotation) {
System.out.println(annotation.getTypeName());
return true;
}
});
} catch (IOException ex) {
Logger.getLogger(Parser.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
Hope this helps with your AST http://www.eclipse.org/jdt/apt/index.php ^_^
First some advice:
You should check out the definitive antlr reference. One week is enough of time for that. In there, it says and I quote
"Visitors walk parse trees by explicitly calling interface ParseTreeVisitor’s visit()
method on child nodes."
The class "ExtractInterfaceVisitor" is quite redundant, since You only have one inner class inside of it...
Try debugging the code, put breakpoints in your visit methods and see what happens. Chances are that only top-level-rule visitor will be executed.
If You don't see the output "Class Declaration", I could only assume that Demo.java does not start with a class declaration.
EDIT: The first visit method that is called is the one corresponding to the parser rule that matches the first line in Demo.java.
I was facing the exact same issue on Eclipse Neon and I resolved it in the following way.
Step1: I removed Antlr4 sdk from eclipse. This is a plugin to auto generate Visitor and Listener classes after parsing Java.g4 grammar file.
Step2: Created a source folder in my project
src/main/antlr4
Inside this I created the package
com.helper.tony.grammar
I placed my Java.g4 file inside this package. This was done to create my Visitors and Listeners with the same package structure in the output directory. Output directory location is
target/generated-sources/antlr4
as indicated by output directory tag in pom.xml.
I also added this into my build path.
Step3: Setting pom file properly.I added tag into my pom file and did set maven compiler plugin source and target version to Java 1.8.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.helper.mani</groupId>
<artifactId>builder_factory</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.antlr/antlr4 -->
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>4.6</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>4.6</version>
<configuration>
<grammars>Java.g4</grammars>
<visitor>true</visitor>
<listener>true</listener>
<inputEncoding>UTF-8</inputEncoding>
<outputDirectory>${project.build.directory}/generated-sources/antlr4</outputDirectory>
</configuration>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>${maven.compiler.source}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
</plugins>
</build>
Step4: Executed the following maven goal to generate Visitor and Listener classes.
org.antlr:antlr4-maven-plugin:4.6:antlr4
With all the files in place, I was able to write my parser easily.
EuReKA!!!
I know this is an old post but I found it because I had the same problem. I re-generated my parser class while making sure the '-visitor' switch was in the command line. Dropping only the parser class into the existing project fixed the issue.
By default, ANTLR generates listeners and does not generate visitors.
If you are using the ANTLR 4 tool in Eclipse, you should open Project properties, select ANTLR 4 / Tool and check the "Generate parse tree visitors (-visitor)" option. You may need to touch (edit) the grammar file to force the ANTLR 4 tool to re-generate the sources.
If you use maven, you should explicitly enable visitor generation:
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>${antlr.version}</version>
<configuration>
<listener>false</listener>
<visitor>true</visitor>
</configuration>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>

Remove extra symbol from the repetitive ANTLR rule

Consider the following simple grammar.
grammar test;
options {
language = Java;
output = AST;
}
//imaginary tokens
tokens{
}
parse
: declaration
;
declaration
: forall
;
forall
:'forall' '('rule1')' '[' (( '(' rule2 ')' '|' )* ) ']'
;
rule1
: INT
;
rule2
: ID
;
ID
: ('a'..'z' | 'A'..'Z'|'_')('a'..'z' | 'A'..'Z'|'0'..'9'|'_')*
;
INT
: ('0'..'9')+
;
WHITESPACE
: ('\t' | ' ' | '\r' | '\n' | '\u000C')+ {$channel = HIDDEN;}
;
and here is the input
forall (1) [(first) | (second) | (third) | (fourth) | (fifth) |]
The grammar works fine for the above input but I want to get rid of the extra pipe symbol (2nd last character in the input) from the input.
Any thoughts/ideas?
My antlr syntax is a bit rusty but you should try something like this:
forall
:'forall' '('rule1')' '[' ('(' rule2 ')' ('|' '(' rule2 ')' )* )? ']'
;
That is, instead of (r|)* write (r(|r)*)?. You can see how the latter allows for zero, one or many rules with pipes inbetween.

How can I create a simple input validator by using ANTLR?

I wrote my grammar in ANTLRWorks and it worked pretty well and then I generated lexer and parser.
Well the code executes and there's no error.
But it makes me crazy even with a wrong input everything is fine. By this I mean that parser.prog() executes just fine. So where is the information that I should get as the result? I just want to check the input to figure it out that if it is a propositional logic statement or not?
I used the below to generate the code but it had some errors like it can not find the main class!
java antlr.jar org.antlr.Tool PropLogic.g
But this code worked :
java -cp antlr.jar org.antlr.Tool PropLogic.g
Here's the Grammar :
grammar PropLogic;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
prog : formula ;
formula : NOT formula
| OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here's my code:
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
public class Tableaux {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
Given the following test class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(args[0]);
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
which can be invoked on *nix/MacOS like this:
java -cp .:antlr-3.2.jar Main "a b c"
or on Windows
java -cp .;antlr-3.2.jar Main "a b c"
does not produce any errors because your parser and lexer are "content" with the input. The lexer tokenizes the input into the following 3 tokens a, b and c (spaces are ignored). And the parser rule:
prog
: formula
;
matches a single formula, which in its turn matches a SYMBOLS token. Note that although you named it SYMBOLS (plural), it only matches a single lower case letter, or tilde (~):
SYMBOLS : ('a'..'z') | '~' ;
So, in short, from the input source "a b c", only a is being parsed by your parser. You probably want your parser to consume the entire token stream, which can be done by adding the EOF (end of file) token after the entry point of your grammar:
prog
: formula EOF
;
If you run the test class again and provide "a b c" as input, the following error is produced:
line 1:2 missing EOF at 'b'
EDIT
I tested you grammar including the EOF token:
grammar PropLogic;
prog
: formula EOF
;
formula
: NOT formula
| OP formula (AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS
;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ { $channel = HIDDEN; } ;
with the class including the ANTLRStringStream:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
with both ANTLR 3.2, and ANTLR 3.3:
java -cp antlr-3.2.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main
line 1:2 missing EOF at 'b'
java -cp antlr-3.3.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
line 1:2 missing EOF at 'b'
And as you can see, both produce the error message:
line 1:2 missing EOF at 'b'

Categories

Resources