recognize fractional numbers in JFlex 1.4.3 - java

in my SL.lex file i have this regular expression for fractional numbers:
Digit = [1-9]
Digit0 = 0|{Digit}
Num = {Digit} {Digit0}*
Frac = {Digit0}* {Digit}
Pos = {Num} | '.' {Frac} | 0 '.' {Frac} | {Num} '.' {Frac}
PosOrNeg = -{Pos} | {Pos}
Numbers = 0 | {PosOrNeg}
and then in
/* literals */
{Numbers} { return new Token(yytext(), sym.NUM, getLineNumber()); }
but every time i try to recognize a number with a dot, it fails and i get an error.
instead of '.' i also tried \\.,\.,".", but every time it fails.

You are right, . needs to be escaped, otherwise it matches anything but line return.
But quoting characters is done with double quotes, not single quotes.
Pos = {Num} | "." {Frac} | 0 "." {Frac} | {Num} "." {Frac}
If you do that, the input:
123.45
works as expected:
java -cp target/classes/ Yylex src/test/resources/test.txt
line: 1 match: --123.45--
action [29] { return new Yytoken(zzAction, yytext(), yyline+1, 0, 0); }
Text : 123.45
index : 10
line : 1
null
Also, regular expressions are more powerful than just unions, you could make it more concise:
Digit = [1-9]
Digit0 = 0 | {Digit}
Num = {Digit} {Digit0}*
Frac = {Digit0}* {Digit}
Pos = 0? "." {Frac} | {Num} ("." {Frac})?
PosOrNeg = -?{Pos}
Number = {PosOrNeg} | 0

Related

Indentation management in ANTLR4 for a python interpreter

I'm implementing a python interpreter using ANTLR4 like lexer and parser generator. I used the BNF defined at this link:
https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4.
However the implementation of indentation with the INDENT and DEDENT tokens within the lexer::members do not work when i define a compound statement.
For example if i define the following statement:
x=10
while x>2 :
print("hello")
x=x-3
So in the line when i reassign the value of x variable i should have an indentation error that i don't have in my currest state.
Should i edit something into the lexer code or what?
This is the BNF that i'm using with the lexer::members and the NEWLINE rules defined in the above link.
grammar python;
tokens { INDENT, DEDENT }
#lexer::members {
// A queue where extra tokens are pushed on (see the NEWLINE lexer rule).
private java.util.LinkedList<Token> tokens = new java.util.LinkedList<>();
// The stack that keeps track of the indentation level.
private java.util.Stack<Integer> indents = new java.util.Stack<>();
// The amount of opened braces, brackets and parenthesis.
private int opened = 0;
// The most recently produced token.
private Token lastToken = null;
#Override
public void emit(Token t) {
super.setToken(t);
tokens.offer(t);
}
#Override
public Token nextToken() {
// Check if the end-of-file is ahead and there are still some DEDENTS expected.
if (_input.LA(1) == EOF && !this.indents.isEmpty()) {
// Remove any trailing EOF tokens from our buffer.
for (int i = tokens.size() - 1; i >= 0; i--) {
if (tokens.get(i).getType() == EOF) {
tokens.remove(i);
}
}
// First emit an extra line break that serves as the end of the statement.
this.emit(commonToken(pythonParser.NEWLINE, "\n"));
// Now emit as much DEDENT tokens as needed.
while (!indents.isEmpty()) {
this.emit(createDedent());
indents.pop();
}
// Put the EOF back on the token stream.
this.emit(commonToken(pythonParser.EOF, "<EOF>"));
//throw new Exception("indentazione inaspettata in riga "+this.getLine());
}
Token next = super.nextToken();
if (next.getChannel() == Token.DEFAULT_CHANNEL) {
// Keep track of the last token on the default channel.
this.lastToken = next;
}
return tokens.isEmpty() ? next : tokens.poll();
}
private Token createDedent() {
CommonToken dedent = commonToken(pythonParser.DEDENT, "");
dedent.setLine(this.lastToken.getLine());
return dedent;
}
private CommonToken commonToken(int type, String text) {
int stop = this.getCharIndex() - 1;
int start = text.isEmpty() ? stop : stop - text.length() + 1;
return new CommonToken(this._tokenFactorySourcePair, type, DEFAULT_TOKEN_CHANNEL, start, stop);
}
// Calculates the indentation of the provided spaces, taking the
// following rules into account:
//
// "Tabs are replaced (from left to right) by one to eight spaces
// such that the total number of characters up to and including
// the replacement is a multiple of eight [...]"
//
// -- https://docs.python.org/3.1/reference/lexical_analysis.html#indentation
static int getIndentationCount(String spaces) {
int count = 0;
for (char ch : spaces.toCharArray()) {
switch (ch) {
case '\t':
count += 8 - (count % 8);
break;
default:
// A normal space char.
count++;
}
}
return count;
}
boolean atStartOfInput() {
return super.getCharPositionInLine() == 0 && super.getLine() == 1;
}
}
parse
:( NEWLINE parse
| block ) EOF
;
block
: (statement NEWLINE?| functionDecl)*
;
statement
: assignment
| functionCall
| ifStatement
| forStatement
| whileStatement
| arithmetic_expression
;
assignment
: IDENTIFIER indexes? '=' expression
;
functionCall
: IDENTIFIER OPAREN exprList? CPAREN #identifierFunctionCall
| PRINT OPAREN? exprList? CPAREN? #printFunctionCall
;
arithmetic_expression
: expression
;
ifStatement
: ifStat elifStat* elseStat?
;
ifStat
: IF expression COLON NEWLINE INDENT block DEDENT
;
elifStat
: ELIF expression COLON NEWLINE INDENT block DEDENT
;
elseStat
: ELSE COLON NEWLINE INDENT block DEDENT
;
functionDecl
: DEF IDENTIFIER OPAREN idList? CPAREN COLON NEWLINE INDENT block DEDENT
;
forStatement
: FOR IDENTIFIER IN expression COLON NEWLINE INDENT block DEDENT elseStat?
;
whileStatement
: WHILE expression COLON NEWLINE INDENT block DEDENT elseStat?
;
idList
: IDENTIFIER (',' IDENTIFIER)*
;
exprList
: expression (COMMA expression)*
;
expression
: '-' expression #unaryMinusExpression
| '!' expression #notExpression
| expression '**' expression #powerExpression
| expression '*' expression #multiplyExpression
| expression '/' expression #divideExpression
| expression '%' expression #modulusExpression
| expression '+' expression #addExpression
| expression '-' expression #subtractExpression
| expression '>=' expression #gtEqExpression
| expression '<=' expression #ltEqExpression
| expression '>' expression #gtExpression
| expression '<' expression #ltExpression
| expression '==' expression #eqExpression
| expression '!=' expression #notEqExpression
| expression '&&' expression #andExpression
| expression '||' expression #orExpression
| expression '?' expression ':' expression #ternaryExpression
| expression IN expression #inExpression
| NUMBER #numberExpression
| BOOL #boolExpression
| NULL #nullExpression
| functionCall indexes? #functionCallExpression
| list indexes? #listExpression
| IDENTIFIER indexes? #identifierExpression
| STRING indexes? #stringExpression
| '(' expression ')' indexes? #expressionExpression
| INPUT '(' STRING? ')' #inputExpression
;
list
: '[' exprList? ']'
;
indexes
: ('[' expression ']')+
;
PRINT : 'print';
INPUT : 'input';
DEF : 'def';
IF : 'if';
ELSE : 'else';
ELIF : 'elif';
RETURN : 'return';
FOR : 'for';
WHILE : 'while';
IN : 'in';
NULL : 'null';
OR : '||';
AND : '&&';
EQUALS : '==';
NEQUALS : '!=';
GTEQUALS : '>=';
LTEQUALS : '<=';
POW : '**';
EXCL : '!';
GT : '>';
LT : '<';
ADD : '+';
SUBTRACT : '-';
MULTIPLY : '*';
DIVIDE : '/';
MODULE : '%';
OBRACE : '{' {opened++;};
CBRACE : '}' {opened--;};
OBRACKET : '[' {opened++;};
CBRACKET : ']' {opened--;};
OPAREN : '(' {opened++;};
CPAREN : ')' {opened--;};
SCOLON : ';';
ASSIGN : '=';
COMMA : ',';
QMARK : '?';
COLON : ':';
BOOL
: 'true'
| 'false'
;
NUMBER
: INT ('.' DIGIT*)?
;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]*
;
STRING
: ["] (~["\r\n] | '\\\\' | '\\"')* ["]
| ['] (~['\r\n] | '\\\\' | '\\\'')* [']
;
SKIPS
: ( SPACES | COMMENT | LINE_JOINING ){firstLine();} -> skip
;
NEWLINE
: ( {atStartOfInput()}? SPACES
| ( '\r'? '\n' | '\r' | '\f' ) SPACES?
)
{
String newLine = getText().replaceAll("[^\r\n\f]+", "");
String spaces = getText().replaceAll("[\r\n\f]+", "");
int next = _input.LA(1);
if (opened > 0 || next == '\r' || next == '\n' || next == '\f' || next == '#') {
// If we're inside a list or on a blank line, ignore all indents,
// dedents and line breaks.
skip();
}
else {
emit(commonToken(NEWLINE, newLine));
int indent = getIndentationCount(spaces);
int previous = indents.isEmpty() ? 0 : indents.peek();
if (indent == previous) {
// skip indents of the same size as the present indent-size
skip();
}
else if (indent > previous) {
indents.push(indent);
emit(commonToken(pythonParser.INDENT, spaces));
}
else {
// Possibly emit more than 1 DEDENT token.
while(!indents.isEmpty() && indents.peek() > indent) {
this.emit(createDedent());
indents.pop();
}
}
}
}
;
fragment INT
: [1-9] DIGIT*
| '0'
;
fragment DIGIT
: [0-9]
;
fragment SPACES
: [ \t]+
;
fragment COMMENT
: '#' ~[\r\n\f]*
;
fragment LINE_JOINING
: '\\' SPACES? ( '\r'? '\n' | '\r' | '\f' )
;
No, this should not be handled in the grammar. The lexer should simply emit the (faulty) INDENT token. The parser should, at runtime, produce an error. Something like this:
String source = "x=10\n" +
"while x>2 :\n" +
" print(\"hello\")\n" +
" x=x-3\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));
// Remove default error-handling
parser.removeErrorListeners();
// Add custom error-handling
parser.addErrorListener(new BaseErrorListener() {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object o, int i, int i1, String s, RecognitionException e) {
CommonToken token = (CommonToken) o;
if (token.getType() == Python3Parser.INDENT) {
// The parser encountered an unexpected INDENT token
// TODO throw your exception
}
// TODO handle other errors
}
});
// Trigger the error
parser.file_input();

ANTLR parsing is not finding correct lexer parts

I am a complete newcomer to ANTLR.
I have the following ANTLR grammar:
grammar DrugEntityRecognition;
// Parser Rules
derSentence : ACTION (INT | FRACTION | RANGE) FORM TEXT;
// Lexer Rules
ACTION : 'TAKE' | 'INFUSE' | 'INJECT' | 'INHALE' | 'APPLY' | 'SPRAY' ;
INT : [0-9]+ ;
FRACTION : [1] '/' [1-9] ;
RANGE : INT '-' INT ;
FORM : ('TABLET' | 'TABLETS' | 'CAPSULE' | 'CAPSULES' | 'SYRINGE') ;
TEXT : ('A'..'Z' | WHITESPACE | ',')+ ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n' | '\u000C')+ -> skip ;
And when I try to parse a sentence as follows:
String upperLine = line.toUpperCase();
org.antlr.v4.runtime.CharStream stream = new ANTLRInputStream(upperLine);
DrugEntityRecognitionLexer lexer = new DrugEntityRecognitionLexer(stream);
lexer.removeErrorListeners();
lexer.addErrorListener(ThrowingErrorListener.INSTANCE);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
DrugEntityRecognitionParser parser = new DrugEntityRecognitionParser(tokenStream);
try {
DrugEntityRecognitionParser.DerSentenceContext ctx = parser.derSentence();
StringBuilder sb = new StringBuilder();
sb.append("ACTION: ").append(ctx.ACTION());
sb.append(", ");
sb.append("FORM: ").append(ctx.FORM());
sb.append(", ");
sb.append("INT: ").append(ctx.INT());
sb.append(", ");
sb.append("FRACTION: ").append(ctx.FRACTION());
sb.append(", ");
sb.append("RANGE: ").append(ctx.RANGE());
System.out.println(upperLine);
System.out.println(sb.toString());
} catch (ParseCancellationException e) {
//e.printStackTrace();
}
An example of the input to lexer:
take 10 Tablet (25MG) by oral route every week
In this case ACTION node is not getting populated, but take is getting recognized only as a TEXT node, not an ACTION node. 10 is being recognized as an INT node, however.
How can I modify this grammar to work correctly, where ACTION node is populated correctly (as well as FORM, which is not being populated either)?
There are several problems in your grammar:
Your TEXT rule only matches uppercase letters. Same for ACTION.
You shouldn't mix punctuation and text in a single text rule (here the comma), otherwise you cannot freely allow whitespaces between tokens.
You don't match parentheses at all, hence (25MG) is not valid input and the parser returns in an error state.
You did not check for any syntax errors, to learn what went wrong during recognition.
Also, when in doubt, always print your token sequence from the token source to see if the input has actually been tokenized as you expect. Start there to fix your grammar before you go to the parser.
About case sensitivity: typically (if your language is case-insensitive) you have rules like these:
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
...
to match a letter in either case and then define your keywords so:
ACTION : T A K E | I N F U S E | I N J E C T | I N H A L E | A P P L Y | S P R A Y;

ANTLR: minus expression precedence and different results with Grun

I have a grammar like this:
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| operator=('-'|'+') r=expr # minusPlusArithmeticExpr
| IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| LPAREN expr RPAREN # parensArithmeticExpr
;
value
: number
| variable
| string // contains date
| bool
| null_value
;
/* Atomes */
bool
: BOOL
;
variable
: VARIABLE
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
null_value
: NULL // TODO: test this
;
IDENTIFIER
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
VARIABLE
: LBRACKET ( ~']' | ' ')* RBRACKET
;
Now, I want to parse this:
-1.3 * 5 + -2 * 7
With Grun, I get this:
antlr4 formula.g4 && javac *.java && time grun formula parse -gui
-1.3*5 + -2*7
^D
Which looks OK and I would be happy with that.
But in my Java code, I get called like this using the Visitor pattern:
visitMinusPlusArithmeticExpr -1.3*5+-2*7 // ugh ?? sees "- (1.3 * 5 + - 2 * 7 )" instead of "(-1.3*5) + (-2*7)"
visitAddsubtArithmeticExpr 1.3*5+-2*7
visitMultdivArithmeticExpr 1.3*5
visitArgumentArithmeticExpr 1.3
visitNumber 1.3
visitArgumentArithmeticExpr 5
visitValue 5
visitNumber 5
visitMinusPlusArithmeticExpr -2*7 // UHG? should see a MultDiv with -2 and 7
visitMultdivArithmeticExpr 2*7
visitArgumentArithmeticExpr 2
visitValue 2
visitNumber 2
visitArgumentArithmeticExpr 7
visitValue 7
visitNumber 7
Which means that I don't get my negative number (-1.3), but rather the 'minus expression', which I should not get.
Why is my Java result different from Grun ? I have verified that the grammar is recompiled and I use my parser like this:
formulaLexer lexer = new formulaLexer(new ANTLRInputStream(s));
formulaParser parser = new formulaParser(new CommonTokenStream(lexer));
parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
parser.setErrorHandler(new BailErrorStrategy()); // will throw exceptions on failure
formula = tryParse(parser);
if( formula == null && errors.isEmpty() ){
// the parsing failed, retry in LL mode
parser.getInterpreter().setPredictionMode(PredictionMode.LL);
parser.reset();
tryParse(parser);
}
I have disabled the SLL mode to verify if this was not the problem, and the result was the same.
I thought this could be a problem of precedence, but in my expr I have specified to match a value first, and then only a minusPlusArithmeticExpr.
I can't understand how I will detect this 'minus' expression instead of my 'negative value'. Can you check this?
Also, why does Grun show the correct behavior and not my Java code?
EDIT
Following the comments advice, I modified the grammar to look like this:
expr
: value # argumentArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| function=IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
But now, I want to optimize the case where I have a single "-1.3" somewhere.
I don't know how to do it correctly, since when I land in the visitMinusPlusAritmeticExpr, I have to check if the terminal node is a number.
Here is what I get while debugging:
ctx = {formulaParser$PlusMinusExprContext#929} "[16]"
children = {ArrayList#955} size = 2
0 = {TerminalNodeImpl#962} "-"
1 = {formulaParser$ArgumentArithmeticExprContext#963} "[21 16]"
children = {ArrayList#967} size = 1
0 = {formulaParser$ValueContext#990} "[22 21 16]"
children = {ArrayList#992} size = 1
0 = {formulaParser$NumberContext#997} "[53 22 21 16]"
children = {ArrayList#999} size = 1
0 = {TerminalNodeImpl#1004} "1.3"
I suspect I should walk down the tree and tell if the terminal node is a number, but it seems cumbersome. Do you have any idea on how to do that without compromising legibility of my code?
Ok, for those interested, Lucas and Bart got the answer, and my implementation is like this:
expr
: value # argumentArithmeticExpr
| (PLUS|MINUS) expr # plusMinusExpr
| l=expr operator=(MULT|DIV) r=expr # multdivArithmeticExpr
| l=expr operator=(PLUS|MINUS) r=expr # addsubtArithmeticExpr
| function=IDENTIFIER '(' (expr ( COMMA expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
And in the visitor of plusMinusExpr:
#Override
public Formula visitPlusMinusExpr(formulaParser.PlusMinusExprContext ctx) {
if( debug ) LOG.log(Level.INFO, "visitPlusMinusExpr " + ctx.getText());
Formula formulaExpr = visit(ctx.expr());
if( ctx.MINUS() == null ) return formulaExpr;
else {
if(formulaExpr instanceof DoubleFormula){
// optimization for numeric values: we don't return "(0.0 MINUS THEVALUE)" but directly "-THEVALUE"
Double v = - ((DoubleFormula) formulaExpr).getValue();
return new DoubleFormula( v );
} else {
return ArithmeticOperator.MINUS( 0, formulaExpr);
}
}
}

Matching ${123...456} and extracting 2 numbers in Java?

What is the simplest succinct way to expect 2 integers from a String when i know the format will always be ${INT1...INT2} e.g. "Hello ${123...456} would extract 123,456?
I would go with a Pattern with groups and back-references.
Here's an example:
String input = "Hello ${123...456}, bye ${789...101112}";
// | escaped "$"
// | | escaped "{"
// | | | first group (any number of digits)
// | | | | 3 escaped dots
// | | | | | second group (same as 1st)
// | | | | | | escaped "}"
Pattern p = Pattern.compile("\\$\\{(\\d+)\\.{3}(\\d+)\\}");
Matcher m = p.matcher(input);
// iterating over matcher's find for multiple matches
while (m.find()) {
System.out.println("Found...");
System.out.println("\t" + m.group(1));
System.out.println("\t" + m.group(2));
}
Output
Found...
123
456
Found...
789
101112
final String string = "${123...456}";
final String firstPart = string.substring(string.indexOf("${") + "${".length(), string.indexOf("..."));
final String secondPart = string.substring(string.indexOf("...") + "...".length(), string.indexOf("}"));
final Integer integer = Integer.valueOf(firstPart.concat(secondPart));

Regular Expressions - tree grammar Antlr Java

I'm trying to write a program in ANTLR (Java) concerning simplifying regular expression. I have already written some code (grammar file contents below)
grammar Regexp_v7;
options{
language = Java;
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
}
tokens{
DOT;
REPEAT;
RANGE;
NULL;
}
fragment
ZERO
: '0'
;
fragment
DIGIT
: '1'..'9'
;
fragment
EPSILON
: '#'
;
fragment
FI
: '%'
;
ID
: EPSILON
| FI
| 'a'..'z'
| 'A'..'Z'
;
NUMBER
: ZERO
| DIGIT (ZERO | DIGIT)*
;
WHITESPACE
: ('\r' | '\n' | ' ' | '\t' ) + {$channel = HIDDEN;}
;
list
: (reg_exp ';'!)*
;
term
: ID -> ID
| '('! reg_exp ')'!
;
repeat_exp
: term ('{' range_exp '}')+ -> ^(REPEAT term (range_exp)+)
| term -> term
;
range_exp
: NUMBER ',' NUMBER -> ^(RANGE NUMBER NUMBER)
| NUMBER (',') -> ^(RANGE NUMBER NULL)
| ',' NUMBER -> ^(RANGE NULL NUMBER)
| NUMBER -> ^(RANGE NUMBER NUMBER)
;
kleene_exp
: repeat_exp ('*'^)*
;
concat_exp
: kleene_exp (kleene_exp)+ -> ^(DOT kleene_exp (kleene_exp)+)
| kleene_exp -> kleene_exp
;
reg_exp
: concat_exp ('|'^ concat_exp)*
;
My next goal is to write down tree grammar code, which is able to simplify regular expressions (e.g. a|a -> a , etc.). I have done some coding (see text below), but I have troubles with defining rule that treats nodes as subtrees (in order to simplify following kind of expressions e.g.: (a|a)|(a|a) to a, etc.)
tree grammar Regexp_v7Walker;
options{
language = Java;
tokenVocab = Regexp_v7;
ASTLabelType = CommonTree;
output=AST;
backtrack = true;
}
tokens{
NULL;
}
bottomup
: ^('*' ^('*' e=.)) -> ^('*' $e) //a** -> a*
| ^('|' i=.* j=.* {$i.tree.toStringTree() == $j.tree.toStringTree()} )
-> $i // There are 3 errors while this line is up and running:
// 1. CommonTree cannot be resolved,
// 2. i.tree cannot be resolved or is not a field,
// 3. i cannot be resolved.
;
Small driver class:
public class Regexp_Test_v7 {
public static void main(String[] args) throws RecognitionException {
CharStream stream = new ANTLRStringStream("a***;a|a;(ab)****;ab|ab;ab|aa;");
Regexp_v7Lexer lexer = new Regexp_v7Lexer(stream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Regexp_v7Parser parser = new Regexp_v7Parser(tokenStream);
list_return list = parser.list();
CommonTree t = (CommonTree) list.getTree();
System.out.println("Original tree: " + t.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Regexp_v7Walker s = new Regexp_v7Walker(nodes);
t = (CommonTree)s.downup(t);
System.out.println("Simplified tree: " + t.toStringTree());
Can anyone help me with solving this case?
Thanks in advance and regards.
Now, I'm no expert, but in your tree grammar:
add filter=true
change the second line of bottomup rule to:
^('|' i=. j=. {i.toStringTree().equals(j.toStringTree()) }? ) -> $i }
If I'm not mistaken by using i=.* you're allowing i to be non-existent and you'll get a NullPointerException on conversion to a String.
Both i and j are of type CommonTree because you've set it up this way: ASTLabelType = CommonTree, so you should call i.toStringTree().
And since it's Java and you're comparing Strings, use equals().
Also to make the expression in curly brackets a predicate, you need a question mark after the closing one.

Categories

Resources