Recursively Processing Rules in ANTLR - java

Okay, for my third ANTLR question in two days:
My Grammar is meant to parse boolean statements, something like this:
AGE > 21 AND AGE < 35
Since this is a relatively simple grammar, I embedded the code rather than using an AST. The rule looks like this:
: a=singleEvaluation { $evalResult = $a.evalResult;}
(('AND') b=singleEvaluation {$evalResult = $evalResult && $b.evalResult;})+
{
// code
}
;
Now I need to implement order of operations using parenthesis, to parse something like this:
AGE >= 21 AND (DEPARTMENT=1000 OR DEPARTMENT=1001)
or even worse:
AGE >= 21 AND (DEPARTMENT=1000 OR (EMPID=1000 OR EMPID=1001))
Can anyone suggest a way to implement the recursion needed? I'd rather not switch to an AST at this late stage, and I'm still a relative noob at this.
Jason

Since some of your rules evaluate to a boolean, and others to an integer (or only compare integers), you'd best let your rules return a generic Object, and cast accordingly.
Here's a quick demo (including making a recursive call in case of parenthesized expressions):
grammar T;
#parser::members {
private java.util.Map<String, Integer> memory = new java.util.HashMap<String, Integer>();
}
parse
#init{
// initialize some test values
memory.put("AGE", 42);
memory.put("DEPARTMENT", 999);
memory.put("EMPID", 1001);
}
: expression EOF {System.out.println($text + " -> " + $expression.value);}
;
expression returns [Object value]
: logical {$value = $logical.value;}
;
logical returns [Object value]
: e1=equality {$value = $e1.value;} ( 'AND' e2=equality {$value = (Boolean)$value && (Boolean)$e2.value;}
| 'OR' e2=equality {$value = (Boolean)$value || (Boolean)$e2.value;}
)*
;
equality returns [Object value]
: r1=relational {$value = $r1.value;} ( '=' r2=relational {$value = $value.equals($r2.value);}
| '!=' r2=relational {$value = !$value.equals($r2.value);}
)*
;
relational returns [Object value]
: a1=atom {$value = $a1.value;} ( '>=' a2=atom {$value = (Integer)$a1.value >= (Integer)$a2.value;}
| '>' a2=atom {$value = (Integer)$a1.value > (Integer)$a2.value;}
| '<=' a2=atom {$value = (Integer)$a1.value <= (Integer)$a2.value;}
| '<' a2=atom {$value = (Integer)$a1.value < (Integer)$a2.value;}
)?
;
atom returns [Object value]
: INTEGER {$value = Integer.valueOf($INTEGER.text);}
| ID {$value = memory.get($ID.text);}
| '(' expression ')' {$value = $expression.value;}
;
INTEGER : '0'..'9'+;
ID : ('a'..'z' | 'A'..'Z')+;
SPACE : ' ' {$channel=HIDDEN;};
Parsing the input "AGE >= 21 AND (DEPARTMENT=1000 OR (EMPID=1000 OR EMPID=1001))" would result in the following output:
AGE >= 21 AND (DEPARTMENT=1000 OR (EMPID=1000 OR EMPID=1001)) -> true

I would do it like this:
program : a=logicalExpression {System.out.println($a.evalResult);}
;
logicalExpression returns [boolean evalResult] : a=andExpression { $evalResult = $a.evalResult;} (('OR') b=andExpression {$evalResult = $evalResult || $b.evalResult;})*
;
andExpression returns [boolean evalResult] : a=atomicExpression { $evalResult = $a.evalResult;} (('AND') b=atomicExpression {$evalResult = $evalResult && $b.evalResult;})*
;
atomicExpression returns [boolean evalResult] : a=singleEvaluation {$evalResult = $a.evalResult;}
| '(' b=logicalExpression ')' {$evalResult = $b.evalResult;}
;
singleEvaluation returns [boolean evalResult ] : 'TRUE' {$evalResult = true;}
| 'FALSE' {$evalResult = false;}
;

Related

Integrate OOPath Syntax in a drools like rule, using Xtext

I try to implement oopath syntax in a drools like rule, but I have some issues regarding the non variables oopaths. For example, here is what I try to generate in the when block:
when $rt : string(Variables
$yr1t : /path1/F
$yr2t : /path2/F
$yr3t : path3.path4/PATH5
$yr4t : path3
$yr5t : /path3
Conditions
$yr4t == $yr5t + 3
$yr3t != $yr2t
//FROM HERE IS THE PROBLEM:
$yr3t == p/path/f
$yr3t == /g/t
/path2/F[g==$yr1t]
)
The problem I am facing is that my grammar doesn't support this format and I don't know how to modify the existing one in order to support even the last 3 statements.
Here is what I've tried so far:
Model:
declarations+=Declaration*
;
Temp:
elementType=ElementType
;
ElementType:
typeName=('string'|'int'|'boolean');
Declaration:
Rule
;
#Override
terminal ID: ('^'|'$')('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
terminal PATH: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
Rule:
'Filter'
'rule' ruleDescription=STRING
'#specification' QualifiedNameSpecification
'ruleflow-group' ruleflowDescription=STRING
'when' (name += ID ':' atribute += Temp '(' 'Variables'?
//(varName += ID ':' QualifiedNameVariabilePath)*
(variablesList += Variable)*
'Conditions'?
(exp += EvalExpression)*
')'
)*
;
QualifiedNameSpecification: '(' STRING ')';
QualifiedNameVariabilePath: (('/'|'.')? PATH)* ;
ExpressionsModel:
elements += AbstractElement*;
AbstractElement:
Variable | EvalExpression ;
Variable:
//'var'
name=ID ':' QualifiedNameVariabilePath; //expression=Expression;
EvalExpression:
//'eval'
expression=Expression;
Expression: Or;
Or returns Expression:
And ({Or.left=current} "||" right=And)*
;
And returns Expression:
Equality ({And.left=current} "&&" right=Equality)*
;
Equality returns Expression:
Comparison (
{Equality.left=current} op=("=="|"!=")
right=Comparison
)*
;
Comparison returns Expression:
PlusOrMinus (
{Comparison.left=current} op=(">="|"<="|">"|"<")
right=PlusOrMinus
)*
;
PlusOrMinus returns Expression:
MulOrDiv (
({Plus.left=current} '+' | {Minus.left=current} '-')
right=MulOrDiv
)*
;
MulOrDiv returns Expression:
Primary (
{MulOrDiv.left=current} op=('*'|'/')
right=Primary
)*
;
Primary returns Expression:
'(' Expression ')' |
{Not} "!" expression=Primary |
Atomic
;
Atomic returns Expression:
{IntConstant} value=INT |
{StringConstant} value=STRING |
{BoolConstant} value=('true'|'false') |
{VariableRef} variable=[Variable]
;
EDIT: To make it more clear, the question is how do I modify my grammar in order to support oopath syntax without binding them to a variable, something like a temporary object.

Indentation management in ANTLR4 for a python interpreter

I'm implementing a python interpreter using ANTLR4 like lexer and parser generator. I used the BNF defined at this link:
https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4.
However the implementation of indentation with the INDENT and DEDENT tokens within the lexer::members do not work when i define a compound statement.
For example if i define the following statement:
x=10
while x>2 :
print("hello")
x=x-3
So in the line when i reassign the value of x variable i should have an indentation error that i don't have in my currest state.
Should i edit something into the lexer code or what?
This is the BNF that i'm using with the lexer::members and the NEWLINE rules defined in the above link.
grammar python;
tokens { INDENT, DEDENT }
#lexer::members {
// A queue where extra tokens are pushed on (see the NEWLINE lexer rule).
private java.util.LinkedList<Token> tokens = new java.util.LinkedList<>();
// The stack that keeps track of the indentation level.
private java.util.Stack<Integer> indents = new java.util.Stack<>();
// The amount of opened braces, brackets and parenthesis.
private int opened = 0;
// The most recently produced token.
private Token lastToken = null;
#Override
public void emit(Token t) {
super.setToken(t);
tokens.offer(t);
}
#Override
public Token nextToken() {
// Check if the end-of-file is ahead and there are still some DEDENTS expected.
if (_input.LA(1) == EOF && !this.indents.isEmpty()) {
// Remove any trailing EOF tokens from our buffer.
for (int i = tokens.size() - 1; i >= 0; i--) {
if (tokens.get(i).getType() == EOF) {
tokens.remove(i);
}
}
// First emit an extra line break that serves as the end of the statement.
this.emit(commonToken(pythonParser.NEWLINE, "\n"));
// Now emit as much DEDENT tokens as needed.
while (!indents.isEmpty()) {
this.emit(createDedent());
indents.pop();
}
// Put the EOF back on the token stream.
this.emit(commonToken(pythonParser.EOF, "<EOF>"));
//throw new Exception("indentazione inaspettata in riga "+this.getLine());
}
Token next = super.nextToken();
if (next.getChannel() == Token.DEFAULT_CHANNEL) {
// Keep track of the last token on the default channel.
this.lastToken = next;
}
return tokens.isEmpty() ? next : tokens.poll();
}
private Token createDedent() {
CommonToken dedent = commonToken(pythonParser.DEDENT, "");
dedent.setLine(this.lastToken.getLine());
return dedent;
}
private CommonToken commonToken(int type, String text) {
int stop = this.getCharIndex() - 1;
int start = text.isEmpty() ? stop : stop - text.length() + 1;
return new CommonToken(this._tokenFactorySourcePair, type, DEFAULT_TOKEN_CHANNEL, start, stop);
}
// Calculates the indentation of the provided spaces, taking the
// following rules into account:
//
// "Tabs are replaced (from left to right) by one to eight spaces
// such that the total number of characters up to and including
// the replacement is a multiple of eight [...]"
//
// -- https://docs.python.org/3.1/reference/lexical_analysis.html#indentation
static int getIndentationCount(String spaces) {
int count = 0;
for (char ch : spaces.toCharArray()) {
switch (ch) {
case '\t':
count += 8 - (count % 8);
break;
default:
// A normal space char.
count++;
}
}
return count;
}
boolean atStartOfInput() {
return super.getCharPositionInLine() == 0 && super.getLine() == 1;
}
}
parse
:( NEWLINE parse
| block ) EOF
;
block
: (statement NEWLINE?| functionDecl)*
;
statement
: assignment
| functionCall
| ifStatement
| forStatement
| whileStatement
| arithmetic_expression
;
assignment
: IDENTIFIER indexes? '=' expression
;
functionCall
: IDENTIFIER OPAREN exprList? CPAREN #identifierFunctionCall
| PRINT OPAREN? exprList? CPAREN? #printFunctionCall
;
arithmetic_expression
: expression
;
ifStatement
: ifStat elifStat* elseStat?
;
ifStat
: IF expression COLON NEWLINE INDENT block DEDENT
;
elifStat
: ELIF expression COLON NEWLINE INDENT block DEDENT
;
elseStat
: ELSE COLON NEWLINE INDENT block DEDENT
;
functionDecl
: DEF IDENTIFIER OPAREN idList? CPAREN COLON NEWLINE INDENT block DEDENT
;
forStatement
: FOR IDENTIFIER IN expression COLON NEWLINE INDENT block DEDENT elseStat?
;
whileStatement
: WHILE expression COLON NEWLINE INDENT block DEDENT elseStat?
;
idList
: IDENTIFIER (',' IDENTIFIER)*
;
exprList
: expression (COMMA expression)*
;
expression
: '-' expression #unaryMinusExpression
| '!' expression #notExpression
| expression '**' expression #powerExpression
| expression '*' expression #multiplyExpression
| expression '/' expression #divideExpression
| expression '%' expression #modulusExpression
| expression '+' expression #addExpression
| expression '-' expression #subtractExpression
| expression '>=' expression #gtEqExpression
| expression '<=' expression #ltEqExpression
| expression '>' expression #gtExpression
| expression '<' expression #ltExpression
| expression '==' expression #eqExpression
| expression '!=' expression #notEqExpression
| expression '&&' expression #andExpression
| expression '||' expression #orExpression
| expression '?' expression ':' expression #ternaryExpression
| expression IN expression #inExpression
| NUMBER #numberExpression
| BOOL #boolExpression
| NULL #nullExpression
| functionCall indexes? #functionCallExpression
| list indexes? #listExpression
| IDENTIFIER indexes? #identifierExpression
| STRING indexes? #stringExpression
| '(' expression ')' indexes? #expressionExpression
| INPUT '(' STRING? ')' #inputExpression
;
list
: '[' exprList? ']'
;
indexes
: ('[' expression ']')+
;
PRINT : 'print';
INPUT : 'input';
DEF : 'def';
IF : 'if';
ELSE : 'else';
ELIF : 'elif';
RETURN : 'return';
FOR : 'for';
WHILE : 'while';
IN : 'in';
NULL : 'null';
OR : '||';
AND : '&&';
EQUALS : '==';
NEQUALS : '!=';
GTEQUALS : '>=';
LTEQUALS : '<=';
POW : '**';
EXCL : '!';
GT : '>';
LT : '<';
ADD : '+';
SUBTRACT : '-';
MULTIPLY : '*';
DIVIDE : '/';
MODULE : '%';
OBRACE : '{' {opened++;};
CBRACE : '}' {opened--;};
OBRACKET : '[' {opened++;};
CBRACKET : ']' {opened--;};
OPAREN : '(' {opened++;};
CPAREN : ')' {opened--;};
SCOLON : ';';
ASSIGN : '=';
COMMA : ',';
QMARK : '?';
COLON : ':';
BOOL
: 'true'
| 'false'
;
NUMBER
: INT ('.' DIGIT*)?
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
STRING
: ["] (~["\r\n] | '\\\\' | '\\"')* ["]
| ['] (~['\r\n] | '\\\\' | '\\\'')* [']
;
SKIPS
: ( SPACES | COMMENT | LINE_JOINING ){firstLine();} -> skip
;
NEWLINE
: ( {atStartOfInput()}? SPACES
| ( '\r'? '\n' | '\r' | '\f' ) SPACES?
)
{
String newLine = getText().replaceAll("[^\r\n\f]+", "");
String spaces = getText().replaceAll("[\r\n\f]+", "");
int next = _input.LA(1);
if (opened > 0 || next == '\r' || next == '\n' || next == '\f' || next == '#') {
// If we're inside a list or on a blank line, ignore all indents,
// dedents and line breaks.
skip();
}
else {
emit(commonToken(NEWLINE, newLine));
int indent = getIndentationCount(spaces);
int previous = indents.isEmpty() ? 0 : indents.peek();
if (indent == previous) {
// skip indents of the same size as the present indent-size
skip();
}
else if (indent > previous) {
indents.push(indent);
emit(commonToken(pythonParser.INDENT, spaces));
}
else {
// Possibly emit more than 1 DEDENT token.
while(!indents.isEmpty() && indents.peek() > indent) {
this.emit(createDedent());
indents.pop();
}
}
}
}
;
fragment INT
: [1-9] DIGIT*
| '0'
;
fragment DIGIT
: [0-9]
;
fragment SPACES
: [ \t]+
;
fragment COMMENT
: '#' ~[\r\n\f]*
;
fragment LINE_JOINING
: '\\' SPACES? ( '\r'? '\n' | '\r' | '\f' )
;
No, this should not be handled in the grammar. The lexer should simply emit the (faulty) INDENT token. The parser should, at runtime, produce an error. Something like this:
String source = "x=10\n" +
"while x>2 :\n" +
" print(\"hello\")\n" +
" x=x-3\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));
// Remove default error-handling
parser.removeErrorListeners();
// Add custom error-handling
parser.addErrorListener(new BaseErrorListener() {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object o, int i, int i1, String s, RecognitionException e) {
CommonToken token = (CommonToken) o;
if (token.getType() == Python3Parser.INDENT) {
// The parser encountered an unexpected INDENT token
// TODO throw your exception
}
// TODO handle other errors
}
});
// Trigger the error
parser.file_input();

ANTLR4 doesn't manage with left recursion properly

I'm trying to describe the grammar for logical expressions using ANTLR4.
Surely, this grammar has direct left recursion, and as I've read ANTLR4 supports it.
grammar Logic;
#header {
package parser;
import expression.*;
}
expression returns [Expression value] : disjunction {$value = $disjunction.value;}
| disjunction IMPLIES expression {$value = new Implication($disjunction.value, $expression.value);};
disjunction returns [Expression value] : conjunction {$value = $conjunction.value;}
| disjunction OR conjunction {$value = new Disjunction($disjunction.value, $conjunction.value);};
conjunction returns [Expression value] : negation {$value = $negation.value;}
| conjunction AND negation {$value = new Conjunction($conjunction.value, $negation.value);};
negation returns [Expression value] : variable {$value = $variable.value;}
| NOT negation {$value = new Negation($negation.value);}
| OB expression CB {$value = $expression.value;};
variable returns [Expression value] : VAR {$value = new Variable($VAR.text);};
IMPLIES : '->';
OR : '|';
AND : '&';
NOT : '!';
OB : '(';
CB : ')';
VAR : [A-Z]([0-9])*;
But when I'm running antlr to generate parser for my grammar, it gives some strange errors:
error(65): Logic.g4:5:125: unknown attribute value for rule disjunction in $disjunction.value
error(65): Logic.g4:5:125: unknown attribute value for rule conjunction in $conjunction.value
When I swap disjunction and conjunction in disjunction rule, making disjunction right-associative, it works, but this can cause some mistakes in my work.
As it might be important, I generate parsers using ANTLR4-plugin for Intellij Idea.
What am I doing wrong?
Thank you in advance.
When referring to a disjunction in your code:
disjunction returns [Expression value]
: conjunction {$value = $conjunction.value;}
| disjunction OR conjunction {$value = new Disjunction($disjunction.value, $conjunction.value);}
;
ANTLR tries to get the value of the entire enclosing rule if you do $disjunction instead of the disjunction in disjunction OR ....
If you want to refer to disjunction in disjunction OR ..., you need to label it before referencing it:
disjunction returns [Expression value]
: c1=conjunction {$value = $c1.value;}
| d=disjunction OR c2=conjunction {$value = new Disjunction($d.value, $c2.value);}
;
Here's a full working (tested) example:
grammar Logic;
#header {
import expression.*;
}
expression returns [Expression value]
: d1=disjunction {$value = $d1.value;}
| d2=disjunction IMPLIES e=expression {$value = new Implication($d2.value, $e.value);}
;
disjunction returns [Expression value]
: c1=conjunction {$value = $c1.value;}
| d=disjunction OR c2=conjunction {$value = new Disjunction($d.value, $c2.value);}
;
conjunction returns [Expression value]
: n1=negation {$value = $n1.value;}
| c=conjunction AND n2=negation {$value = new Conjunction($c.value, $n2.value);}
;
negation returns [Expression value]
: variable {$value = $variable.value;}
| NOT n=negation {$value = new Negation($n.value);}
| OB expression CB {$value = $expression.value;}
;
variable returns [Expression value]
: VAR {$value = new Variable($VAR.text);}
;
IMPLIES : '->';
OR : '|';
AND : '&';
NOT : '!';
OB : '(';
CB : ')';
VAR : [A-Z]([0-9])*;
SPACE : [ \t\r\n] -> skip;

ANTLR: minus expression precedence and different results with Grun

I have a grammar like this:
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| operator=('-'|'+') r=expr # minusPlusArithmeticExpr
| IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| LPAREN expr RPAREN # parensArithmeticExpr
;
value
: number
| variable
| string // contains date
| bool
| null_value
;
/* Atomes */
bool
: BOOL
;
variable
: VARIABLE
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
null_value
: NULL // TODO: test this
;
IDENTIFIER
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
VARIABLE
: LBRACKET ( ~']' | ' ')* RBRACKET
;
Now, I want to parse this:
-1.3 * 5 + -2 * 7
With Grun, I get this:
antlr4 formula.g4 && javac *.java && time grun formula parse -gui
-1.3*5 + -2*7
^D
Which looks OK and I would be happy with that.
But in my Java code, I get called like this using the Visitor pattern:
visitMinusPlusArithmeticExpr -1.3*5+-2*7 // ugh ?? sees "- (1.3 * 5 + - 2 * 7 )" instead of "(-1.3*5) + (-2*7)"
visitAddsubtArithmeticExpr 1.3*5+-2*7
visitMultdivArithmeticExpr 1.3*5
visitArgumentArithmeticExpr 1.3
visitNumber 1.3
visitArgumentArithmeticExpr 5
visitValue 5
visitNumber 5
visitMinusPlusArithmeticExpr -2*7 // UHG? should see a MultDiv with -2 and 7
visitMultdivArithmeticExpr 2*7
visitArgumentArithmeticExpr 2
visitValue 2
visitNumber 2
visitArgumentArithmeticExpr 7
visitValue 7
visitNumber 7
Which means that I don't get my negative number (-1.3), but rather the 'minus expression', which I should not get.
Why is my Java result different from Grun ? I have verified that the grammar is recompiled and I use my parser like this:
formulaLexer lexer = new formulaLexer(new ANTLRInputStream(s));
formulaParser parser = new formulaParser(new CommonTokenStream(lexer));
parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
parser.setErrorHandler(new BailErrorStrategy()); // will throw exceptions on failure
formula = tryParse(parser);
if( formula == null && errors.isEmpty() ){
// the parsing failed, retry in LL mode
parser.getInterpreter().setPredictionMode(PredictionMode.LL);
parser.reset();
tryParse(parser);
}
I have disabled the SLL mode to verify if this was not the problem, and the result was the same.
I thought this could be a problem of precedence, but in my expr I have specified to match a value first, and then only a minusPlusArithmeticExpr.
I can't understand how I will detect this 'minus' expression instead of my 'negative value'. Can you check this?
Also, why does Grun show the correct behavior and not my Java code?
EDIT
Following the comments advice, I modified the grammar to look like this:
expr
: value # argumentArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| function=IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
But now, I want to optimize the case where I have a single "-1.3" somewhere.
I don't know how to do it correctly, since when I land in the visitMinusPlusAritmeticExpr, I have to check if the terminal node is a number.
Here is what I get while debugging:
ctx = {formulaParser$PlusMinusExprContext#929} "[16]"
children = {ArrayList#955} size = 2
0 = {TerminalNodeImpl#962} "-"
1 = {formulaParser$ArgumentArithmeticExprContext#963} "[21 16]"
children = {ArrayList#967} size = 1
0 = {formulaParser$ValueContext#990} "[22 21 16]"
children = {ArrayList#992} size = 1
0 = {formulaParser$NumberContext#997} "[53 22 21 16]"
children = {ArrayList#999} size = 1
0 = {TerminalNodeImpl#1004} "1.3"
I suspect I should walk down the tree and tell if the terminal node is a number, but it seems cumbersome. Do you have any idea on how to do that without compromising legibility of my code?
Ok, for those interested, Lucas and Bart got the answer, and my implementation is like this:
expr
: value # argumentArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| function=IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
And in the visitor of plusMinusExpr:
#Override
public Formula visitPlusMinusExpr(formulaParser.PlusMinusExprContext ctx) {
if( debug ) LOG.log(Level.INFO, "visitPlusMinusExpr " + ctx.getText());
Formula formulaExpr = visit(ctx.expr());
if( ctx.MINUS() == null ) return formulaExpr;
else {
if(formulaExpr instanceof DoubleFormula){
// optimization for numeric values: we don't return "(0.0 MINUS THEVALUE)" but directly "-THEVALUE"
Double v = - ((DoubleFormula) formulaExpr).getValue();
return new DoubleFormula( v );
} else {
return ArithmeticOperator.MINUS( 0, formulaExpr);
}
}
}

Regular Expressions - tree grammar Antlr Java

I'm trying to write a program in ANTLR (Java) concerning simplifying regular expression. I have already written some code (grammar file contents below)
grammar Regexp_v7;
options{
language = Java;
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
}
tokens{
DOT;
REPEAT;
RANGE;
NULL;
}
fragment
ZERO
: '0'
;
fragment
DIGIT
: '1'..'9'
;
fragment
EPSILON
: '#'
;
fragment
FI
: '%'
;
ID
: EPSILON
| FI
| 'a'..'z'
| 'A'..'Z'
;
NUMBER
: ZERO
| DIGIT (ZERO | DIGIT)*
;
WHITESPACE
: ('\r' | '\n' | ' ' | '\t' ) + {$channel = HIDDEN;}
;
list
: (reg_exp ';'!)*
;
term
: ID -> ID
| '('! reg_exp ')'!
;
repeat_exp
: term ('{' range_exp '}')+ -> ^(REPEAT term (range_exp)+)
| term -> term
;
range_exp
: NUMBER ',' NUMBER -> ^(RANGE NUMBER NUMBER)
| NUMBER (',') -> ^(RANGE NUMBER NULL)
| ',' NUMBER -> ^(RANGE NULL NUMBER)
| NUMBER -> ^(RANGE NUMBER NUMBER)
;
kleene_exp
: repeat_exp ('*'^)*
;
concat_exp
: kleene_exp (kleene_exp)+ -> ^(DOT kleene_exp (kleene_exp)+)
| kleene_exp -> kleene_exp
;
reg_exp
: concat_exp ('|'^ concat_exp)*
;
My next goal is to write down tree grammar code, which is able to simplify regular expressions (e.g. a|a -> a , etc.). I have done some coding (see text below), but I have troubles with defining rule that treats nodes as subtrees (in order to simplify following kind of expressions e.g.: (a|a)|(a|a) to a, etc.)
tree grammar Regexp_v7Walker;
options{
language = Java;
tokenVocab = Regexp_v7;
ASTLabelType = CommonTree;
output=AST;
backtrack = true;
}
tokens{
NULL;
}
bottomup
: ^('*' ^('*' e=.)) -> ^('*' $e) //a** -> a*
| ^('|' i=.* j=.* {$i.tree.toStringTree() == $j.tree.toStringTree()} )
-> $i // There are 3 errors while this line is up and running:
// 1. CommonTree cannot be resolved,
// 2. i.tree cannot be resolved or is not a field,
// 3. i cannot be resolved.
;
Small driver class:
public class Regexp_Test_v7 {
public static void main(String[] args) throws RecognitionException {
CharStream stream = new ANTLRStringStream("a***;a|a;(ab)****;ab|ab;ab|aa;");
Regexp_v7Lexer lexer = new Regexp_v7Lexer(stream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Regexp_v7Parser parser = new Regexp_v7Parser(tokenStream);
list_return list = parser.list();
CommonTree t = (CommonTree) list.getTree();
System.out.println("Original tree: " + t.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Regexp_v7Walker s = new Regexp_v7Walker(nodes);
t = (CommonTree)s.downup(t);
System.out.println("Simplified tree: " + t.toStringTree());
Can anyone help me with solving this case?
Thanks in advance and regards.
Now, I'm no expert, but in your tree grammar:
add filter=true
change the second line of bottomup rule to:
^('|' i=. j=. {i.toStringTree().equals(j.toStringTree()) }? ) -> $i }
If I'm not mistaken by using i=.* you're allowing i to be non-existent and you'll get a NullPointerException on conversion to a String.
Both i and j are of type CommonTree because you've set it up this way: ASTLabelType = CommonTree, so you should call i.toStringTree().
And since it's Java and you're comparing Strings, use equals().
Also to make the expression in curly brackets a predicate, you need a question mark after the closing one.

Categories

Resources