ANTLR4 - Returning specific rule objects

ANTLR4 - Returning specific rule objects - java

I would like to return an ExprData. ExprData is class inside my project. When i try to compile the grammar i get:
SASGrammarParser.java:684: error: cannot find symbol
It is a import problem. And how do i instantiate the ExprData?
expr returns [ExprData exprData]
: expr AND expr #AndExpr
| expr OR expr #OrExpr
| expr IN '(' constant_list ')' #InExpr
| expr (EQ | ASSIGN) expr #EqualExpr
| expr op=(MULT | DIV) expr #DivMultExpr
| expr op=(PLUS | MINUS) expr #PlusMinusExpr
| expr LTEQ expr #LessEqualExpr
| expr LT expr #LessExpr
| expr GT expr #GreaterExpr
| expr GTEQ expr #GreaterEqualExpr
| '-' expr #MinusExpr
| '(' expr ')' #SimpleExpr
| variable #VariableExp
| constant #ConstantExp
| function #FunctionExp
;

If you want to use some class in the grammar (and therefore in the generated parser) you need to import all of them in the grammar with
#parser::header {
import packageName.ExprData;
}
And I'm not sure on what do you mean on how to instantiate? exprData is the return variable here, so you can assign to it by referring it from the action with $exprData. Just form the top of my head (maybe that labels can't be used like this:
expr OR expr #OrExpr {$exprData=someFuncitonThatReturnsExprDataObject();}

Related

Prevent left recursion in ANTLR 4 from matching invalid inputs

I am making a simple programming language. It has the following grammar:
program: declaration+;
declaration: varDeclaration
| statement
;
varDeclaration: 'var' IDENTIFIER ('=' expression)?';';
statement: exprStmt
| assertStmt
| printStmt
| block
;
exprStmt: expression';';
assertStmt: 'assert' expression';';
printStmt: 'print' expression';';
block: '{' declaration* '}';
//expression without left recursion
/*
expression: assignment
;
assignment: IDENTIFIER '=' assignment
| equality;
equality: comparison (op=('==' | '!=') comparison)*;
comparison: addition (op=('>' | '>=' | '<' | '<=') addition)* ;
addition: multiplication (op=('-' | '+') multiplication)* ;
multiplication: unary (op=( '/' | '*' ) unary )* ;
unary: op=( '!' | '-' ) unary
| primary
;
*/
//expression with left recursion
expression: IDENTIFIER '=' expression
| expression op=('==' | '!=') expression
| expression op=('>' | '>=' | '<' | '<=') expression
| expression op=('-' | '+') expression
| expression op=( '/' | '*' ) expression
| op=( '!' | '-' ) expression
| primary
;
primary: intLiteral
| booleanLiteral
| stringLiteral
| identifier
| group
;
intLiteral: NUMBER;
booleanLiteral: value=('True' | 'False');
stringLiteral: STRING;
identifier: IDENTIFIER;
group: '(' expression ')';
TRUE: 'True';
FALSE: 'False';
NUMBER: [0-9]+ ;
STRING: '"' ~('\n'|'"')* '"' ;
IDENTIFIER : [a-zA-Z]+ ;
This left recursive grammar is useful because it ensures every node in the parse tree has at most 2 children. For example,
var a = 1 + 2 + 3 will turn into two nested addition expressions, rather than one addition expression with three children. That behavior is useful because it makes writing an interpreter easy, since I can just do (highly simplified):
public Object visitAddition(AdditionContext ctx) {
return visit(ctx.addition(0)) + visit(ctx.addition(1));
}
instead of iterating through all the child nodes.
However, this left recursive grammar has one flaw, which is that it accepts invalid statements.
For example:
var a = 3;
var b = 4;
a = b == b = a;
is valid under this grammar even though the expected behavior would be
b == b is parsed first since == has higher precedence than assignment (=).
Because b == b is parsed first, the expression becomes incoherent. Parsing fails.
Instead, the following undesired behavior occurs: the final line is parsed as (a = b) == (b = a).
How can I prevent left recursion from parsing incoherent statements, such as a = b == b = a?
The non-left-recursive grammar recognizes this input is correct and throws a parsing error, which is the desired behavior.

Convert the concrete syntax of expression from ML to Java

I have the following 2 concrete syntax of expression and declaration in ML, I'm not familiar with ML, so I am just wondering if anyone can help me to convert it in Java, thanks very much
Exp::=
Ide |
"if" Exp "then" Exp "else" Exp |
"fun" "(" Ide ")" Exp |
Exp "(" Exp ")" |
"let" Decl "in" Exp |
"(" Exp ")"
Decl ::=
Ide "=" Exp |
Decl "then" Decl |
"rec" Decl |
"(" Decl ")"

Choice Conflict Involving Two Expansions:

I'm trying to create my own analyser/parser.
I have a problem which I understand why it doesn't work but I'm unsure of how to solve it.
This is the code for the problem part of my parser.
void Expression() : {}{
Term() ((<PLUS> | <MINUS>) Term())*
}
void Term() : {}{
Factor()((<MULTIPLY> | <DIVIDE>) Factor())*
}
void Factor() : {}{
(<ID> | <NUMBER> | ((<PLUS> | <MINUS>)?<OPEN_PARENTHESIS> Expression() <CLOSE_PARENTHESIS>))
}
void Condition() : {}{
(
(<NOT> Condition()) |
(<OPEN_PARENTHESIS> Condition() (<AND> | <OR>) Condition() <CLOSE_PARENTHESIS>) |
(Expression() (<EQUAL_CHECK> | <NOT_EQUAL> | <LESS> | <LESS_EQUAL> | <GREATER> | <GREATER_EQUAL>) Expression())
)
}
As you can see, the problem comes within the Condition() method from the last two of the three options in the OR section. This is because Expression() can eventually become "( Expression() )" therefore both the third and second option can begin with a open parenthesis token.
However, I'm unsure how I would solve this problem. I solved a similar problem earlier in the parser however I can't employ the same logic here without it being extremely messy because of the way Expression() --> Term() --> Factor() and the problem code being all the way down in the Factor() method.
Any advice would be greatly appreciated.
Thanks,
Thomas.
EDIT:
For more info, I'll provide to code examples that should work with this parser but will not due to the bug explained above.
fun succesful_method()
start
var i = 1;
if(i > 0 and i < 2)
do
i = 2;
stop
stop
start
successful_method()
stop
The above method would run successfully as it uses the second alternative of the Condition() method.
fun succesful_method()
start
var i = 1;
if(i > 0)
do
i = 2;
stop
stop
start
successful_method()
stop
The above method would fail, as it requires use of the third alternative, however it cannot access this due to the '(' causing the parser to call the second alternative.

You can solve this with syntactic look ahead.
void CompOp() : {} { <EQUAL_CHECK> | <NOT_EQUAL> | <LESS> | <LESS_EQUAL> | <GREATER> | <GREATER_EQUAL> }
void Condition() : {}{
<NOT> Condition()
|
LOOKAHEAD(Expression() CompOp())
Expression()
CompOp()
Expression()
|
<OPEN_PARENTHESIS>
Condition()
(<AND> | <OR>)
Condition()
<CLOSE_PARENTHESIS>
}
Slightly more efficient is to only lookahead when there is a (.
void Condition() : {}{
<NOT> Condition()
| LOOKAHEAD( <OPEN_PARENTHESIS> )
(
LOOKAHEAD(Expression() CompOp())
Expression()
CompOp()
Expression()
|
<OPEN_PARENTHESIS>
Condition()
(<AND> | <OR>)
Condition()
<CLOSE_PARENTHESIS>
)
|
Expression()
CompOp()
Expression()
}

Using a single grammar for all expressions and defining precedence for all operators should solve your problem, at the expense of adding semantic checks for the type of expressions.
Expr -> AndExpr (<OR> AndExpr)*
AndExpr -> NotExpr (<AND> NotExpr)*
NotExpr -> <NOT>* RelExpr
RelExpr -> NumExpr () (<RELOP> NumExpr)?
NumExpr -> Term ((<PLUS>|<MINUS>) Term)*
Term -> Factor ((<MULTIPLY>|<DIVIDE>) Factor)*
Factor -> (<PLUS>|<MINUS>)* Atom
Atom -> <ID> | <NUMBER> | <OPEN_PARENTHESIS> Expr <CLOSE_PARENTHESIS>
The token <RELOP>represents your relational operators.
Note that this grammar let's you mix boolean and numerical expressions, so you should check for errors.
For example for Expr -> AndExpr the type returned would be the type of AndExpr. But for AndExpr <OR> AndExpr you should check that both AndExprs are boolean expressions and the type returned by Expr would be Boolean.

Antlr4 grammar with basic arithmetic and signed expressions

I'm learning Antlr4 to write a language for basic arithmetics. Currently, I have written a grammar with Antlr4 for the basic arithmetic operators * + - /.
Here is my grammar:
grammar Expr; // rename to distinguish from Expr.g4
prog: stat (';' stat)* ;
stat: ID '=' expr (';'|',')? # assign
| expr (';')? # printExpr
;
expr: op=('-'|'+') expr # signed
| expr op=('*'|'/') expr # MulDiv
| expr op=('+'|'-') expr # AddSub
| ID # id
| DOUBLE # Double
| '(' expr ')' # parens
;
MUL : '*' ; // assigns token name to '*' used above in grammar
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
ID : [a-zA-Z]+ [0-9]* ; // match identifiers
DOUBLE : [0-9]+ ('.' [0-9]+)? ;
WS : [ \t\r\n]+ -> skip ;
The Problem is that my grammar accepts inputs like 2++++3 due to rule: op=('-'|'+') expr. However, I didn't find another way to implements signed expressions such as -2 + 3, x = 6; y = -x, +3 -2.
How can I fix the bug?

Try breaking up your grammar, now it is a bit of a monster rule (expr). You probably don't want to sign an entire expression, but rather a single value. How about something like this
expr: add value
| expr mult expr
| expr add expr
| value
;
value: ID
| DOUBLE
| '(' expr ')'
;
add: '+' | '-';
mult: '*' | '/';
This way, you can build signed expressions like -2, +x or -(2+3), but not 2++3.

Issues with ANTLR rewrite statement (simple?)

I keep getting MissingTokenException, NullPointerException, and if I remember correctly NoViableAlterativeException. The logfile / console output from ANTLRWorks is not helpful enough for me.
What I'm after is a rewrite such as the following:
(expression | FLOAT) '(' -> (expression | FLOAT) '*('
Here below is a sample of my grammar that I snatched out to create a test file with.
grammar Test;
expression
: //FLOAT '(' -> (FLOAT '*(')+
| add EOF!
;
term
:
| '(' add ')'
| FLOAT
| IMULT
;
IMULT
: (add ('(' add)*) -> (add ('*' add)*)
;
negation
: '-'* term
;
unary
: ('+' | '-')* negation
;
mult
: unary (('*' | '/') unary)*
;
add
: mult (('+' | '-') mult)*
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')*// EXPONENT?
| '.' ('0'..'9')+ //EXPONENT?
| ('0'..'9')+ //EXPONENT
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
I've also tried :
imult
: FLOAT '(' -> FLOAT '*('
;
And this:
IMULT / imult
: expression '(' -> expression '*'
;
As well as countless other versions (hacks) that I have lost count of.
Can anyone help me out with this ?

I've run into this problem before. The basic answer is that ANTLR doesn't allow you to use tokens on the right hand side of a '->' statement that weren't present on the left hand side. However, what you can do is use extra tokens defined specifically for AST's.
Just create a tokens block before the grammar rules as follows:
tokens { ABSTRACTTOKEN; }
You can use them on the right hand side of the grammar statement like this.
imult
: FLOAT '(' -> ^(ABSTRACTTOKEN FLOAT)
;
Hope that helps.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ANTLR4 - Returning specific rule objects - java

Related

Prevent left recursion in ANTLR 4 from matching invalid inputs

Convert the concrete syntax of expression from ML to Java

Choice Conflict Involving Two Expansions:

Antlr4 grammar with basic arithmetic and signed expressions

Issues with ANTLR rewrite statement (simple?)

Categories

Resources