ANTLR4 -> Check a literal is STRING_LITERAL or a NUMBER_LITERAL

ANTLR4 -> Check a literal is STRING_LITERAL or a NUMBER_LITERAL - java

Currently, I'm using this ANTLR4 grammar (part of) in order to get strings and numbers:
Go figure this summarized grammar:
gramm
: expr SCOL
;
expr
: literal #LiteralExpression
;
literal
: NUMERIC_LITERAL
| STRING_LITERAL
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
;
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
fragment DIGIT : [0-9];
So, I'm implementing an GrammBaseVisitor<Void>.
I'm not quite to figure out how to check whether a literal is a NUMERIC_LITERAL or a STRING_LITERAL.
As far I've been able to get, I've override visitLiteral() and visitLiteralExpression():
#Override
public Void visitLiteral(LiteralContext ctx) {
// TODO What should I do here in order to check whether
// ctx contains an STRING_LITERAL or a NUMBER_LITERAL?
return super.visitLiteral(ctx);
}
#Override
public Void visitLiteralExpression(LiteralExpressionContext ctx) {
return super.visitLiteralExpression(ctx);
}
What's the difference between visitLiteral and visitLiteralExpression()?

Your literal production consists of two possible terminals, the numeric and string literal. Whether parsed input contains one or the other, you can determine with null checks inside visitLiteral.
#Override
public Object visitLiteral(LiteralContext ctx) {
TerminalNode numeric = ctx.NUMERIC_LITERAL();
TerminalNode string = ctx.STRING_LITERAL();
if (numeric != null) {
System.out.println(numeric.getSymbol().getType());
} else if (string != null) {
System.out.println(string.getSymbol().getType());
}
return super.visitLiteral(ctx);
}
You can visit all terminals by overriding visitTerminal.
#Override
public Object visitTerminal(TerminalNode node) {
int type = node.getSymbol().getType(); // matches a constant in your parser
switch (type) {
case GrammParser.NUMERIC_LITERAL:
System.out.println("numeric literal");
break;
case GrammParser.STRING_LITERAL:
System.out.println("string literal");
break;
}
System.out.println(node.getSymbol().getText());
return super.visitTerminal(node);
}
What's the difference between visitLiteral and visitLiteralExpression()?
The former represents your literal production and the latter represents your expr production. Note that # symbol has special meaning in ANTLR 4 syntax, representing a label - a name for alternatives inside productions. It is not a comment. Since your expr only has one alternative, it becomes visitLiteralExpression. Try commenting it out (//) and see your generated code change.

Related

How to fix the error in left-recursion used with semantic predicates?

I would like to parse two type of expression with boolean :
- the first would be an init expression with boolean like : init : false
- and the last one would be a derive expression with boolean like : derive : !express or (express and (amount >= 100))
My idea is to put semantic predicates in a set of rules,
the goal is when I'm parsing a boolean expression beginning with the word 'init' then it has to go to only one alternative rule proposed who is boolliteral, the last alternative in boolExpression. And if it's an expression beginning with the word 'derive' then it could have access to all alternatives of boolExpression.
I know that I could make two type of boolExpression without semantic predicates like boolExpressionInit and boolExpressionDerive... But I would like to try with my idea if it's could work with a only one boolExpression with semantic predicates.
Here's my grammar
grammar TestExpression;
#header
{
package testexpressionparser;
}
#parser::members {
int vConstraintType;
}
/* SYNTAX RULES */
textInput : initDefinition
| derDefinition ;
initDefinition : t=INIT {vConstraintType = $t.type;} ':' boolExpression ;
derDefinition : t=DERIVE {vConstraintType = $t.type;} ':' boolExpression ;
boolExpression : {vConstraintType != INIT || vConstraintType == DERIVE}? boolExpression (boolOp|relOp) boolExpression
| {vConstraintType != INIT || vConstraintType == DERIVE}? NOT boolExpression
| {vConstraintType != INIT || vConstraintType == DERIVE}? '(' boolExpression ')'
| {vConstraintType != INIT || vConstraintType == DERIVE}? attributeName
| {vConstraintType != INIT || vConstraintType == DERIVE}? numliteral
| {vConstraintType == INIT || vConstraintType == DERIVE}? boolliteral
;
boolOp : OR | AND ;
relOp : EQ | NEQ | GT | LT | GEQT | LEQT ;
attributeName : WORD;
numliteral : intliteral | decliteral;
intliteral : INT ;
decliteral : DEC ;
boolliteral : BOOLEAN;
/* LEXICAL RULES */
INIT : 'init';
DERIVE : 'derive';
BOOLEAN : 'true' | 'false' ;
BRACKETSTART : '(' ;
BRACKETSTOP : ')' ;
BRACESTART : '{' ;
BRACESTOP : '}' ;
EQ : '=' ;
NEQ : '!=' ;
NOT : '!' ;
GT : '>' ;
LT : '<' ;
GEQT : '>=' ;
LEQT : '<=' ;
OR : 'or' ;
AND : 'and' ;
DEC : [0-9]* '.' [0-9]* ;
INT : ZERO | POSITIF;
ZERO : '0';
POSITIF : [1-9] [0-9]* ;
WORD : [a-zA-Z] [_0-9a-zA-Z]* ;
WS : (SPACE | NEWLINE)+ -> skip ;
SPACE : [ \t] ; /* Space or tab */
NEWLINE : '\r'? '\n' ; /* Carriage return and new line */
I except that the grammar would run successfully, but what i receive is : "error(119): TestExpression.g4::: The following sets of rules are mutually left-recursive [boolExpression]
1 error(s)
BUILD FAIL"

Apparently ANTLR4's support for (direct) left-recursion does not work when a predicate appears before a left-recursive rule invocation. So you can fix the error by moving the predicate after the first boolExpression in the left-recursive alternatives.
That said, it seems like the predicates aren't really necessary in the first place - at least not in the example you've shown us (or the one before your edit as far as I could tell). Since a boolExpression with the constraint type INIT can apparently only match boolLiteral, you can just change initDefinition as follows:
initDefinition : t=INIT ':' boolLiteral ;
Then boolExpression will always have the constraint type DERIVE and no predicates are necessary anymore.
Generally, if you want to allow different alternatives in non-terminal x based on whether it was invoked by y or z, you should simply have multiple versions of x and then call one from y and the other from z. That's usually a lot less hassle than littering the code with actions and predicates.
Similarly it can also make sense to have a rule that matches more than it should and then detect illegal expressions in a later phase instead of trying to reject them at the syntax level. Specifically beginners often try to write grammars that only allow well-typed expressions (rejecting something like 1+true with a syntax error) and that never works out well.

ANTLR4 AST Creation - How to create an AstVistor

With the help of this SO question How to create AST with ANTLR4? I was able to create the AST Nodes, but I'm stuck at coding the BuildAstVisitor as depicted in the accepted answer's example.
I have a grammar that starts like this:
mini: (constDecl | varDef | funcDecl | funcDef)* ;
And I can neither assign a label to the block (antlr4 says label X assigned to a block which is not a set), and I have no idea how to visit the next node.
public Expr visitMini(MiniCppParser.MiniContext ctx) {
return visitConstDecl(ctx.constDecl());
}
I have the following problems with the code above: I don't know how to decide whether it's a constDecl, varDef or any other option and ctx.constDecl() returns a List<ConstDeclContext> whereas I only need one element for the visitConstDecl function.
edit:
More grammar rules:
mini: (constDecl | varDef | funcDecl | funcDef)* ;
//--------------------------------------------------
constDecl: 'const' type ident=ID init ';' ;
init: '=' ( value=BOOLEAN | sign=('+' | '-')? value=NUMBER ) ;
// ...
//--------------------------------------------------
OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';
OP_MOD: '%';
BOOLEAN : 'true' | 'false' ;
NUMBER : '-'? INT ;
fragment INT : '0' | [1-9] [0-9]* ;
ID : [a-zA-Z]+ ;
// ...
I'm still not entirely sure on how to implement the BuildAstVisitor. I now have something along the lines of the following, but it certainly doesn't look right to me...
#Override
public Expr visitMini(MiniCppParser.MiniContext ctx) {
for (MiniCppParser.ConstDeclContext constDeclCtx : ctx.constDecl()) {
visit(constDeclCtx);
}
return null;
}
#Override
public Expr visitConstDecl(MiniCppParser.ConstDeclContext ctx) {
visit(ctx.type());
return visit(ctx.init());
}

If you want to get the individual subrules then implement the visitXXX functions for them (visitConstDecl(), visitVarDef() etc.) instead of the visitMini() function. They will only be called if there's really a match for them in the input. Hence you don't need to do any checks for occurences.

ANTLR4 JAVA -Is it possible to extract fragments from the lexer at the Parser Listener point?

I have a Lexer Rule as follows:
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
SUFFIX : [ab];
TCHAN : PREFIX EXTRA? DIGIT+ SUFFIX?;
and a parser rule:
tpin : TCHAN
;
In the exit_tpin() Listiner method, is there a syntax where I can extract the DIGIT component of the token? Right now I can get the ctx.TCHAN() element, but this is a string. I just want the digit portion of TCHAN.
Or should I remove TCHAN as a TOKEN and move that rule to be tpin (i.e)
tpin : PREFIX EXTRA? DIGIT+ SUFFIX?
Where I know how to extract DIGIT from the listener.
My guess is that by the time the TOKEN is presented to the parser it is too late to deconstruct it... but I was wondering if some ANTLR guru's out there knew of a technique.
If I re-write my TOKENIZER, there is a possiblity that TCHAN tokens will be missed for INT/ID tokens (I think thats why I ended up parsing as I do).
I can always do some regexp work in the listener method... but that seemed like bad form ... as I had the individual components earlier. I'm just lazy, and was wondering if a techniqe other than refactoring the parsing grammar was possible.

In The Definitive ANTLR Reference you can find examples of complex lexers where much of the work is done. But when learning ANTLR, I would advise to consider the lexer mostly for its splitting function of the input stream into small tokens. Then do the big work in the parser. In the present case I would do :
grammar Question;
/* extract digit */
question
: tpin EOF
;
tpin
// : PREFIX EXTRA? DIGIT+ SUFFIX?
// {System.out.println("The only useful information is " + $DIGIT.text);}
: PREFIX EXTRA? number SUFFIX?
{System.out.println("The only useful information is " + $number.text);}
;
number
: DIGIT+
;
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
DIGIT : [0-9] ;
SUFFIX : [ab];
WS : [ \t\r\n]+ -> skip ;
Say the input is d_xyz123456b. With the first version
: PREFIX EXTRA? DIGIT+ SUFFIX?
you get
$ grun Question question -tokens data.txt
[#0,0:1='d_',<PREFIX>,1:0]
[#1,2:4='xyz',<EXTRA>,1:2]
[#2,5:5='1',<DIGIT>,1:5]
[#3,6:6='2',<DIGIT>,1:6]
[#4,7:7='3',<DIGIT>,1:7]
[#5,8:8='4',<DIGIT>,1:8]
[#6,9:9='5',<DIGIT>,1:9]
[#7,10:10='6',<DIGIT>,1:10]
[#8,11:11='b',<SUFFIX>,1:11]
[#9,13:12='<EOF>',<EOF>,2:0]
The only useful information is 6
Because the parsing of DIGIT+ translates to a loop which reuses DIGIT
setState(12);
_errHandler.sync(this);
_la = _input.LA(1);
do {
{
{
setState(11);
((TpinContext)_localctx).DIGIT = match(DIGIT);
}
}
setState(14);
_errHandler.sync(this);
_la = _input.LA(1);
} while ( _la==DIGIT );
and $DIGIT.text translates to ((TpinContext)_localctx).DIGIT.getText(), only the last digit is retained. That's why I define a subrule number
: PREFIX EXTRA? number SUFFIX?
which makes it easy to capture the value :
[#0,0:1='d_',<PREFIX>,1:0]
[#1,2:4='xyz',<EXTRA>,1:2]
[#2,5:5='1',<DIGIT>,1:5]
[#3,6:6='2',<DIGIT>,1:6]
[#4,7:7='3',<DIGIT>,1:7]
[#5,8:8='4',<DIGIT>,1:8]
[#6,9:9='5',<DIGIT>,1:9]
[#7,10:10='6',<DIGIT>,1:10]
[#8,11:11='b',<SUFFIX>,1:11]
[#9,13:12='<EOF>',<EOF>,2:0]
The only useful information is 123456
You can even make it simpler :
tpin
: PREFIX EXTRA? INT SUFFIX?
{System.out.println("The only useful information is " + $INT.text);}
;
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
INT : [0-9]+ ;
SUFFIX : [ab];
WS : [ \t\r\n]+ -> skip ;
$ grun Question question -tokens data.txt
[#0,0:1='d_',<PREFIX>,1:0]
[#1,2:4='xyz',<EXTRA>,1:2]
[#2,5:10='123456',<INT>,1:5]
[#3,11:11='b',<SUFFIX>,1:11]
[#4,13:12='<EOF>',<EOF>,2:0]
The only useful information is 123456
In the listener you have a direct access to these values through the rule context TpinContext :
public static class TpinContext extends ParserRuleContext {
public Token INT;
public TerminalNode PREFIX() { return getToken(QuestionParser.PREFIX, 0); }
public TerminalNode INT() { return getToken(QuestionParser.INT, 0); }
public TerminalNode EXTRA() { return getToken(QuestionParser.EXTRA, 0); }
public TerminalNode SUFFIX() { return getToken(QuestionParser.SUFFIX, 0); }

Rewriting parts of a string

I have a string I would like to rewrite. The string contains substrings that look like "DDT" plus four digits. I'll call these blocks. It also contains connectives like "&" and "|", where | represents "or", as well as parentheses.
Now I would like to rewrite this string such that blocks separated by &s should be written as "min(x(block1), x(block2), etc.)", whereas blocks separated by |s should be written as "max(x(block1), x(block2), etc.)".
Looking at an example should help:
public class test{
public static void main(String[] arg) throws Exception {
String str = "(DDT1453 & DDT1454) | (DDT3524 & DDT3523 & DDT3522 & DDT3520)";
System.out.println(str.replaceAll("DDT\\d+","x($0)"));
}
}
My desired output is:
max(min(x(DDT1453),x(DDT1454)),min(x(DDT3524),x(DDT3523),x(DDT3522),x(DDT3520)))
As you can see, I performed an initial substitution to include the x(block) part of the output, but I cannot get the rest. Any ideas on how to achieve my desired output?

Just doing string substitution is the wrong way to go about this. Use Recursive-Descent Parsing instead
First you want to define what symbols create what for example:
program -> LiteralArg|fn(x)|program
LiteralArg -> LiteralArg
LiteralArg&LiteralArg -> fn(LiteralArg) & fn'(LiteralArg)
fn(x) -> fn(x)
fn(x) |fn(y) -> fn(x),fn(y)
From there you make functions which will recursively parse your data expecting certain things to happen. For example
String finalResult = "";
function parse(baseString) {
if(basestring.isLiteralArg)
{
if(peekAheadToCheckForAmpersand())
{
expectAnotherLiteralArgAfterAmpersandOtherwiseThrowError();
finalResult += fn(LiteralArg) & fn'(LiteralArg)
parse(baseString - recentToken);
}
else
{
finalResult += literalArg;
parse(baseString - recentToken);
}
}
else if(baseString.isFunction()
{
if(peekAheadToCheckForPipe())
{
expectAnotherFunctionAfterAmpersandOtherwiseThrowError();
finalResult += fn(x),fn(y)
parse(baseString - recentToken);
}
else
{
finalResult += fn(x)
parse(baseString - recentToken);
}
}
}
As you find tokens, take them off the string and call the parse function on the remaining string.
Rough example which I'm basing off a project I did years ago. Here is the relevant lecture:
http://faculty.ycp.edu/~dhovemey/fall2009/cs340/lecture/lecture7.html

If you insist on using regex substitutions, then the following code seems to work:
str = str.replaceAll("\\([^)]*\\)", "min$0");
str = str.replaceAll("DDT\\d+","x($0)");
str = str.replaceAll("&|\\|",",");
str = "max(" + str + ")";
Hoewever, I would consider what the others suggest - using a parsing logic instead.
This way you can extend your grammar easily in the future, and you'll also be able to validate the input and report meaningful error messages.
--EDIT--
The solution above assumes there's no nesting. If nesting is legal, then you definitely can't use the regex solution.

If you are intested to learn and use ANTLR
Following ANTLR grammer
grammar DDT;
options {
output = AST;
ASTLabelType = CommonTree;
}
tokens { DDT; AMP; PIPE;}
#members {}
expr : op1=amp (oper=PIPE^ op2=amp)*;
amp : op1=atom (oper=AMP^ op2=atom)*;
atom : DDT! INT | '('! expr ')'!;
fragment
Digit : '0'..'9';
PIPE : '|' ;
AMP : '&';
DDT : 'DDT';
INT : Digit Digit*;
produces below AST (abstract syntax tree) for input (DDT1 | DDT2) & (DDT3 | DDT4) & DDT5
above syntax tree (CommonTree) could be walked in intended order (optionally using StringTemplates) and the desired result could be obtained.

A full-blown parser for such a small grammar could be an overkill, specially when the OP obviously has no prior experience with them. Not even using parser generators like ANTLR or JavaCC seems a good idea.
It's not easy to elaborate more with the current information. OP, please provide information requested as comments to your question.
Tentative grammar:
maxExpr ::= maxExpr '|' '(' minExpr ')'
maxExpr ::= '(' minExpr ')'
minExpr ::= minExpr '&' ITEM
minExpr ::= ITEM
ITEM ::= 'DDT\d{4}'
Realized that, true, the grammar is excessive for a RegEx, but for a single RegEx. Nobody is saying we can't use more than one. In fact, even the simplest RegEx substitution can be regarded as a step in a Turing machine, and thus the problem is solvable using them. So...
str= str.replaceAll("\\s+", "" ) ;
str= str.replaceAll("&", "," ) ;
str= str.replaceAll("\\([^)]+\\)", "-$0" ) ;
str= str.replaceAll("\\|", "," ) ;
str= str.replaceAll(".+", "+($0)" ) ;
str= str.replaceAll("\\w+", "x($0)" ) ;
str= str.replaceAll("\\+", "max" ) ;
str= str.replaceAll("-", "min" ) ;
I didn't take many shortcuts. The general idea is that "+" equates to a production of maxExpr and "-" to one of minExpr.
I tested this with input
str= "(DDT1453 & DDT1454 & DDT1111) | (DDT3524 & DDT3523 & DDT3522 & DDT3520)" ;
Output is:
max(min(x(DDT1453),x(DDT1454),x(DDT1111)),min(x(DDT3524),x(DDT3523),x(DDT3522),x(DDT3520)))
Back to the idea of a grammar, it's easy to recognize that the significant elements of it really are ITEMS and '|' . All the rest (parentheses and '&') is just decoration.
Simplified grammar:
maxExpr ::= maxExpr '|' minExpr
maxExpr ::= minExpr
minExpr ::= minExpr ITEM
minExpr ::= ITEM
ITEM ::= 'DDT\d{4}'
From here, a very simple finite automaton:
<start>
maxExpr= new List() ;
minExpr= new List() ;
"Expecting ITEM" (BEFORE_ITEM):
ITEM -> minExpr.add(ITEM) ; move to "Expecting ITEM, |, or END"
"Expecting ITEM, |, or END" (AFTER_ITEM):
ITEM -> minExpr.add(ITEM) ; move to "Expecting ITEM, |, or END"
| -> maxExpr.add(minExpr); minExpr= new List(); move to "Expecting ITEM"
END -> maxExpr.add(minExpr); move to <finish>
... and the corresponding implementation:
static Pattern pattern= Pattern.compile("(\\()|(\\))|(\\&)|(\\|)|(\\w+)|(\\s+)") ;
static enum TokenType { OPEN, CLOSE, MIN, MAX, ITEM, SPACE, _END_, _ERROR_ };
static enum State { BEFORE_ITEM, AFTER_ITEM, END }
public static class Token {
TokenType type;
String value;
public Token(TokenType type, String value) {
this.type= type ;
this.value= value ;
}
}
public static class Lexer {
Scanner scanner;
public Lexer(String input) {
this.scanner= new Scanner(input) ;
}
public Token getNext() {
String tokenValue= scanner.findInLine(pattern) ;
TokenType tokenType;
if( tokenValue == null ) tokenType= TokenType._END_ ;
else if( tokenValue.matches("\\s+") ) tokenType= TokenType.SPACE ;
else if( "(".equals(tokenValue) ) tokenType= TokenType.OPEN ;
else if( ")".equals(tokenValue) ) tokenType= TokenType.CLOSE ;
else if( "&".equals(tokenValue) ) tokenType= TokenType.MIN ;
else if( "|".equals(tokenValue) ) tokenType= TokenType.MAX ;
else if( tokenValue.matches("\\w+") ) tokenType= TokenType.ITEM ;
else tokenType= TokenType._ERROR_ ;
return new Token(tokenType,tokenValue) ;
}
public void close() {
scanner.close();
}
}
public static String formatColl(String pre,Collection<?> coll,String sep,String post) {
StringBuilder result= new StringBuilder() ;
result.append(pre);
boolean first= true ;
for(Object item: coll ) {
if( ! first ) result.append(sep);
result.append(item);
first= false ;
}
result.append(post);
return result.toString() ;
}
public static void main(String... args) {
String str= "(DDT1453 & DDT1454) | (DDT3524 & DDT3523 & DDT3522 & DDT3520)" ;
Lexer lexer= new Lexer(str) ;
State currentState= State.BEFORE_ITEM ;
List<List<String>> maxExpr= new LinkedList<List<String>>() ;
List<String> minExpr= new LinkedList<String>() ;
while( currentState != State.END ) {
Token token= lexer.getNext() ;
switch( currentState ) {
case BEFORE_ITEM:
switch( token.type ) {
case ITEM:
minExpr.add("x("+token.value+")") ;
currentState= State.AFTER_ITEM ;
break;
case _END_:
maxExpr.add(minExpr) ;
currentState= State.END ;
break;
default:
// Ignore; preserve currentState, of course
break;
}
break;
case AFTER_ITEM:
switch( token.type ) {
case ITEM:
minExpr.add("x("+token.value+")") ;
currentState= State.AFTER_ITEM ;
break;
case MAX:
maxExpr.add(minExpr) ;
minExpr= new LinkedList<String>() ;
currentState= State.BEFORE_ITEM ;
break;
case _END_:
maxExpr.add(minExpr) ;
currentState= State.END ;
break;
default:
// Ignore; preserve currentState, of course
break;
}
break;
}
}
lexer.close();
System.out.println(maxExpr);
List<String> maxResult= new LinkedList<String>() ;
for(List<String> minItem: maxExpr ) {
maxResult.add( formatColl("min(",minExpr,",",")") ) ;
}
System.out.println( formatColl("max(",maxResult,",",")") );
}

Regex is not the best choice to do this - or to say it right away: its not possible (in java).
Regex might be able to change the formating of a given String, using backreferenes, but it can not generate content aware backreferences. In other words: You would require some kind of recursion (or iterative solution) to resolve an infinite depth of nested parenthesis.
Therefore, you would need to write your own parser, that is able to handle your input.
While replacing the DDT1234 Strings with the appropriate x(DDT1234) representation is easily doable (its a single backreference for ALL occurrences), you need to take care for correct nesting on your own.
For parsing nested expressions, you may want to have a look at this example:
Parsing an Infix Expression with Parentheses (like ((2*4-6/3)*(3*5+8/4))-(2+3))
http://www.smccd.net/accounts/hasson/C++2Notes/ArithmeticParsing.html
Its just a (verbal) example of how to handle such a given string.

How to iterate over a production in ANTLR

Lets suppose the following scenarios with 2 ANTLR grammars:
1)
expr : antExp+;
antExpr : '{' T '}' ;
T : 'foo';
2)
expr : antExpr;
antExpr : '{' T* '}' ;
T : 'bar';
In both cases I need to know how to iterate over antExp+ and T*, because I need to generate an ArrayList of each element of them. Of course my grammar is more complex, but I think that this example should explain what I'm needing. Thank you!

Production rules in ANTLR can have one or more return types which you can reference inside a loop (a (...)* or (...)+). So, let's say you want to print each of the T's text the antExp rule matches. This could be done like this:
expr
: (antExp {System.out.println($antExp.str);} )+
;
antExpr returns [String str]
: '{' T '}' {$str = $T.text;}
;
T : 'foo';
The same principle holds for example grammar #2:
expr : antExpr;
antExpr : '{' (T {System.out.println($T.text);} )* '}' ;
T : 'bar';
EDIT
Note that you're not restricted to returning a single reference. Running the parser generated from:
grammar T;
parse
: ids {System.out.println($ids.firstId + "\n" + $ids.allIds);}
;
ids returns [String firstId, List<String> allIds]
#init{$allIds = new ArrayList<String>();}
#after{$firstId = $allIds.get(0);}
: (ID {$allIds.add($ID.text);})+
;
ID : ('a'..'z' | 'A'..'Z')+;
SPACE : ' ' {skip();};
on the input "aaa bbb ccc" would print the following:
aaa
[aaa, bbb, ccc]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ANTLR4 -> Check a literal is STRING_LITERAL or a NUMBER_LITERAL - java

Related

How to fix the error in left-recursion used with semantic predicates?

ANTLR4 AST Creation - How to create an AstVistor

ANTLR4 JAVA -Is it possible to extract fragments from the lexer at the Parser Listener point?

Rewriting parts of a string

How to iterate over a production in ANTLR

Categories

Resources