Making generated parser work in Java for ANTLR 4.8

Making generated parser work in Java for ANTLR 4.8 - java

I've been having trouble getting my generated parser to work in Java for ANTLR 4.8. There are other answers to this question, but it seems that ANTLR has changed things since 4.7 and all the other answers are before this change. My code is:
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
CharStream input = CharStreams.fromString(formula);
Antlr.LogicGrammerLexer lexer = new Antlr.LogicGrammerLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = new ParseTree(parser);
It appears to be reading in the formula correctly into the CharStream, but anything I try to do past that just isn't working at all. For example, if I try to print out the parse tree, nothing will be printed. The following line will print out nothing:
System.out.println(lexer._input.getText(new Interval(0, 100)));
Any advice appreciated.
EDIT: added the grammar file:
grammar LogicGrammer;
logicalStmt: BOOL_EXPR | '('logicalStmt' '*LOGIC_SYMBOL' '*logicalStmt')';
BOOL_EXPR: '('IDENTIFIER' '*MATH_SYMBOL' '*IDENTIFIER')';
IDENTIFIER: CHAR+('.'CHAR*)*;
CHAR: 'a'..'z' | 'A'..'Z' | '1'..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';

This line:
ParseTree pt = new ParseTree(parser);
is incorrect. You need to call the start rule method on your parser object to get your parse tree
Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();
So far as printing out your input, generally fields starting with an _ (like _input) are not intended for external use. Though I suspect the failure may be that you don't have 100 characters in your input stream, so the Interval is invalid. (I haven't tried it to see the exact failure)
I you include your grammar, one of us could easily attempt to generate and compile and, perhaps, be more specific.
Using your grammar, this works for me:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.misc.Interval;
import org.antlr.v4.runtime.tree.ParseTree;
public class Logic {
public static void main(String... args) {
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
CharStream input = CharStreams.fromString(formula);
LogicGrammerLexer lexer = new LogicGrammerLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
LogicGrammerParser parser = new LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();
System.out.println(pt.toStringTree());
System.out.println(input.getText(new Interval(1, 28)));
}
}
output:
([] (fm.a < fm.b))
fm.a < fm.b) | (fm.a = fm.b)
BTW, a couple of minor suggestions for your grammar:
set up a rule to skip whitespace WS: [ \t\r\n]+ -> skip;
change BOOL_EXPR to a parser rule (since it's made up of a composition of tokens from other lexer rules:
grammar LogicGrammer
;
logicalStmt
: boolExpr
| '(' logicalStmt LOGIC_SYMBOL logicalStmt ')'
;
boolExpr: '(' IDENTIFIER MATH_SYMBOL IDENTIFIER ')';
IDENTIFIER: CHAR+ ('.' CHAR*)*;
CHAR: 'a' ..'z' | 'A' ..'Z' | '1' ..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';
WS: [ \t\r\n]+ -> skip;

The BOOL_EXPR shouldn't be a lexer rule. I suggest you do something like this instead:
grammar LogicGrammer;
parse
: logicalStmt EOF
;
logicalStmt
: logicalStmt LOGIC_SYMBOL logicalStmt
| logicalStmt MATH_SYMBOL logicalStmt
| '(' logicalStmt ')'
| IDENTIFIER
;
IDENTIFIER
: CHAR+ ( '.'CHAR+ )*
;
LOGIC_SYMBOL
: [~|&]
;
MATH_SYMBOL
: [<≤=≥>]
;
SPACE
: [ \t\r\n] -> skip
;
fragment CHAR
: [a-zA-Z1-9]
;
which can be tested by running the following code:
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
LogicGrammerLexer lexer = new LogicGrammerLexer(CharStreams.fromString(formula));
LogicGrammerParser parser = new LogicGrammerParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();
System.out.println(root.toStringTree(parser));

Related

Antlr3: building parse tree for qualified names

I couldn't find a question/answer that comes close to helping with my issue. Therefore, I am posting this question here.
I am trying to build a parse tree for qualified names. The below example shows an example.
E.g.,
foo_boo.aaa.ccc1_c
Here I have dot separated words. I am using antlr3 and below is my grammer.
parse
: expr
;
list_expr : <I removed the grammar here>
SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
QualifiedType : SimpleType | SimpleType ('\.' SimpleType)+;
expr : list_expr
| QualifiedType
| union_expr;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here, SympleType represents grammar for a word. My requirement is to build the grammar for the QualifiedType. The current grammar given in above is not working as expected (QualifiedType : SimpleType | SimpleType ('\.'SimpleType)+;). How to write correct grammar for Qualified names (Dot separated words)?

Make QualifiedType a parser rule instead of a lexer rule:
qualifiedType : SimpleType ('.' SimpleType)*;
Also, '\.' does not need an escape: '.' is OK.
EDIT
You'll have to set the output to AST and apply some tree rewrite rules to make it work properly. Here's a quick demo:
grammar T;
options {
output=AST;
}
tokens {
Root;
QualifiedName;
}
parse
: qualifiedType EOF -> ^(Root qualifiedType)
;
qualifiedType
: SimpleType ('.' SimpleType)* -> ^(QualifiedName SimpleType+)
;
SimpleType
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
;
And if you now run the code:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.stringtemplate.StringTemplate;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("foo_boo.aaa.ccc1_c"));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
you'll get some DOT output, which corresponds to the following AST:

Why is ANTLR omitting the final token and not producing an error?

I have a grammar like this (anything which looks convoluted is a result of it being a subset of the actual grammar which contains more red herrings):
grammar Query;
startExpression
: WS? expression WS? EOF
;
expression
| maybeDefaultBooleanExpression
;
maybeDefaultBooleanExpression
: defaultBooleanExpression
| queryFragment
;
defaultBooleanExpression
: nested += queryFragment (WS nested += queryFragment)+
;
queryFragment
: unquotedQuery
| quotedQuery
;
unquotedQuery
: UNQUOTED
;
quotedQuery
: QUOTED
;
UNQUOTED
: UnquotedStartChar
UnquotedChar*
;
fragment
UnquotedStartChar
: EscapeSequence
| ~( ' ' | '\r' | '\t' | '\u000C' | '\n' | '\\' | ':'
| '"' | '\u201C' | '\u201D' // DoubleQuote
| '\'' | '\u2018' | '\u2019' // SingleQuote
| '(' | ')' | '[' | ']' | '{' | '}' | '~'
| '&' | '|' | '!' | '^' | '?' | '*' | '/' | '+' | '-' | '$' )
;
fragment
UnquotedChar
: EscapeSequence
| ~( ' ' | '\r' | '\t' | '\u000C' | '\n' | '\\' | ':'
| '"' | '\u201C' | '\u201D' // DoubleQuote
| '\'' | '\u2018' | '\u2019' // SingleQuote
| '(' | ')' | '[' | ']' | '{' | '}' | '~'
| '&' | '|' | '!' | '^' | '?' | '*' )
;
QUOTED
: '"'
QuotedChar*
'"'
;
fragment
QuotedChar
: ~( '\\'
| | '\u201C' | '\u201D' // DoubleQuote
| '\r' | '\n' | '?' | '*' )
;
WS : ( ' ' | '\r' | '\t' | '\u000C' | '\n' )+;
If I call the lexer myself directly:
CharStream input = CharStreams.fromString("A \"");
QueryLexer lexer = new QueryLexer(input);
lexer.removeErrorListeners();
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println(tokens.LT(0));
System.out.println(tokens.LT(1));
System.out.println(tokens.LT(2));
System.out.println(tokens.LT(3));
I get:
java.lang.StringIndexOutOfBoundsException: String index out of range: 4
at java.lang.String.checkBounds(String.java:385)
at java.lang.String.<init>(String.java:462)
at org.antlr.v4.runtime.CodePointCharStream$CodePoint8BitCharStream.getText(CodePointCharStream.java:160)
at org.antlr.v4.runtime.Lexer.notifyListeners(Lexer.java:360)
at org.antlr.v4.runtime.Lexer.nextToken(Lexer.java:144)
at org.antlr.v4.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:169)
at org.antlr.v4.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:152)
at org.antlr.v4.runtime.CommonTokenStream.LT(CommonTokenStream.java:100)
This makes some kind of sense, though I think a proper ANTLR exception might have been better.
What I really don't get, though, is that when I feed this through the complete parser:
QueryParser parser = new QueryParser(tokens);
parser.removeErrorListeners();
parser.addErrorListener(LoggingErrorListener.get());
parser.setErrorHandler(new BailErrorStrategy());
// Performance hack as per the ANTLR v4 FAQ
parser.getInterpreter().setPredictionMode(PredictionMode.SLL);
ParseTree expression;
try
{
expression = parser.startExpression();
}
catch (Exception e)
{
// It catches a StringIndexOutOfBoundsException here.
parser.reset();
parser.getInterpreter().setPredictionMode(PredictionMode.LL);
expression = parser.startExpression();
}
I get:
tokens = {org.antlr.v4.runtime.CommonTokenStream#1811}
channel = 0
tokenSource = {MyQueryLexer#1810}
tokens = {java.util.ArrayList#1816} size = 3
0 = {org.antlr.v4.runtime.CommonToken#1818} "[#0,0:0='A',<13>,1:0]"
1 = {org.antlr.v4.runtime.CommonToken#1819} "[#1,1:1=' ',<32>,1:1]"
2 = {org.antlr.v4.runtime.CommonToken#1820} "[#2,3:2='<EOF>',<-1>,1:3]"
p = 2
fetchedEOF = true
expression = {MyQueryParser$StartExpressionContext#1813} "[]"
children = {java.util.ArrayList#1827} size = 3
0 = {MyQueryParser$ExpressionContext#1831} "[87]"
1 = {org.antlr.v4.runtime.tree.TerminalNodeImpl#1832} " "
2 = {org.antlr.v4.runtime.tree.TerminalNodeImpl#1833} "<EOF>"
Here I would have expected to get a RecognitionException, but somehow the parsing succeeds, and is missing the invalid bit of the token data at the end.
Questions are:
(1) Is this by design?
(2) If so, how can I detect this and have it treated as a syntax error?
Further investigation
When I went looking for the culprit for who was catching the StringIndexOutOfBoudsException and eating it, it turns out that it comes all the way out to our catch block. So I guess ANTLR never got a chance to finish building that last invalid token..?
I'm not entirely sure what's supposed to happen, but I guess I expected that ANTLR would have caught it, created an invalid token and continued.
I then drilled further in and found that Token#nextToken() was throwing an exception, and the docs made it seem like that wasn't supposed to happen, so I ended up filing a ticket about that.

Until very recent builds, ANTLR4's adaptive mechanism has the "feature" of being able to recover from single-token-missing and single-extra-token parses if there were only one viable alternative in that part of the token stream. Now recently, apparently that behavior has changed. So if you're using an older build as I am, you'll still see the adaptive parsing. Maybe Parr and Harwill will fix that.
Like you, I recognized the need for a perfect input stream and zero parse errors, "overlooked" or not. To create a "strict parser" follow these steps:
Make a class called perhaps "StrictErrorStrategy that inherit from/extend DefaultErrorStrategy. You need to override the Recover, RecoverInline, and Sync methods. Bottom line here is we throw exceptions for anything that goes wrongs, and make no attempt to re-sync the code after an extra/missing token. Here's my C# code, your java will look very similar:
public class StrictErrorStrategy : DefaultErrorStrategy
{
public override void Recover(Parser recognizer, RecognitionException e)
{
IToken token = recognizer.CurrentToken;
string message = string.Format("parse error at line {0}, position {1} right before {2} ", token.Line, token.Column, GetTokenErrorDisplay(token));
throw new Exception(message, e);
}
public override IToken RecoverInline(Parser recognizer)
{
IToken token = recognizer.CurrentToken;
string message = string.Format("parse error at line {0}, position {1} right before {2} ", token.Line, token.Column, GetTokenErrorDisplay(token));
throw new Exception(message, new InputMismatchException(recognizer));
}
public override void Sync(Parser recognizer) { /* do nothing to resync */}
}
Make a new lexer that implements a single method:
public class StrictLexer : <YourGeneratedLexerNameHere>
{
public override void Recover(LexerNoViableAltException e)
{
string message = string.Format("lex error after token {0} at position {1}", _lasttoken.Text, e.StartIndex);
throw new ParseCanceledException(BasicEnvironment.SyntaxError);
}
}
Use your lexer and strategy:
AntlrInputStream inputStream = new AntlrInputStream(stream);
StrictLexer lexer = new BailLexer(inputStream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
LISBASICParser parser = new LISBASICParser(tokenStream);
parser.RemoveErrorListeners();
parser.ErrorHandler = new StrictErrorStrategy();
This works great, actual code from one of my projects that has a "zero-tolerance rule" about syntax errors. I got the code and ideas from Terence Parr's great book on ANTLR4.

ANTLR parsing is not finding correct lexer parts

I am a complete newcomer to ANTLR.
I have the following ANTLR grammar:
grammar DrugEntityRecognition;
// Parser Rules
derSentence : ACTION (INT | FRACTION | RANGE) FORM TEXT;
// Lexer Rules
ACTION : 'TAKE' | 'INFUSE' | 'INJECT' | 'INHALE' | 'APPLY' | 'SPRAY' ;
INT : [0-9]+ ;
FRACTION : [1] '/' [1-9] ;
RANGE : INT '-' INT ;
FORM : ('TABLET' | 'TABLETS' | 'CAPSULE' | 'CAPSULES' | 'SYRINGE') ;
TEXT : ('A'..'Z' | WHITESPACE | ',')+ ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n' | '\u000C')+ -> skip ;
And when I try to parse a sentence as follows:
String upperLine = line.toUpperCase();
org.antlr.v4.runtime.CharStream stream = new ANTLRInputStream(upperLine);
DrugEntityRecognitionLexer lexer = new DrugEntityRecognitionLexer(stream);
lexer.removeErrorListeners();
lexer.addErrorListener(ThrowingErrorListener.INSTANCE);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
DrugEntityRecognitionParser parser = new DrugEntityRecognitionParser(tokenStream);
try {
DrugEntityRecognitionParser.DerSentenceContext ctx = parser.derSentence();
StringBuilder sb = new StringBuilder();
sb.append("ACTION: ").append(ctx.ACTION());
sb.append(", ");
sb.append("FORM: ").append(ctx.FORM());
sb.append(", ");
sb.append("INT: ").append(ctx.INT());
sb.append(", ");
sb.append("FRACTION: ").append(ctx.FRACTION());
sb.append(", ");
sb.append("RANGE: ").append(ctx.RANGE());
System.out.println(upperLine);
System.out.println(sb.toString());
} catch (ParseCancellationException e) {
//e.printStackTrace();
}
An example of the input to lexer:
take 10 Tablet (25MG) by oral route every week
In this case ACTION node is not getting populated, but take is getting recognized only as a TEXT node, not an ACTION node. 10 is being recognized as an INT node, however.
How can I modify this grammar to work correctly, where ACTION node is populated correctly (as well as FORM, which is not being populated either)?

There are several problems in your grammar:
Your TEXT rule only matches uppercase letters. Same for ACTION.
You shouldn't mix punctuation and text in a single text rule (here the comma), otherwise you cannot freely allow whitespaces between tokens.
You don't match parentheses at all, hence (25MG) is not valid input and the parser returns in an error state.
You did not check for any syntax errors, to learn what went wrong during recognition.
Also, when in doubt, always print your token sequence from the token source to see if the input has actually been tokenized as you expect. Start there to fix your grammar before you go to the parser.
About case sensitivity: typically (if your language is case-insensitive) you have rules like these:
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
...
to match a letter in either case and then define your keywords so:
ACTION : T A K E | I N F U S E | I N J E C T | I N H A L E | A P P L Y | S P R A Y;

Regular Expressions - tree grammar Antlr Java

I'm trying to write a program in ANTLR (Java) concerning simplifying regular expression. I have already written some code (grammar file contents below)
grammar Regexp_v7;
options{
language = Java;
output = AST;
ASTLabelType = CommonTree;
backtrack = true;
}
tokens{
DOT;
REPEAT;
RANGE;
NULL;
}
fragment
ZERO
: '0'
;
fragment
DIGIT
: '1'..'9'
;
fragment
EPSILON
: '#'
;
fragment
FI
: '%'
;
ID
: EPSILON
| FI
| 'a'..'z'
| 'A'..'Z'
;
NUMBER
: ZERO
| DIGIT (ZERO | DIGIT)*
;
WHITESPACE
: ('\r' | '\n' | ' ' | '\t' ) + {$channel = HIDDEN;}
;
list
: (reg_exp ';'!)*
;
term
: ID -> ID
| '('! reg_exp ')'!
;
repeat_exp
: term ('{' range_exp '}')+ -> ^(REPEAT term (range_exp)+)
| term -> term
;
range_exp
: NUMBER ',' NUMBER -> ^(RANGE NUMBER NUMBER)
| NUMBER (',') -> ^(RANGE NUMBER NULL)
| ',' NUMBER -> ^(RANGE NULL NUMBER)
| NUMBER -> ^(RANGE NUMBER NUMBER)
;
kleene_exp
: repeat_exp ('*'^)*
;
concat_exp
: kleene_exp (kleene_exp)+ -> ^(DOT kleene_exp (kleene_exp)+)
| kleene_exp -> kleene_exp
;
reg_exp
: concat_exp ('|'^ concat_exp)*
;
My next goal is to write down tree grammar code, which is able to simplify regular expressions (e.g. a|a -> a , etc.). I have done some coding (see text below), but I have troubles with defining rule that treats nodes as subtrees (in order to simplify following kind of expressions e.g.: (a|a)|(a|a) to a, etc.)
tree grammar Regexp_v7Walker;
options{
language = Java;
tokenVocab = Regexp_v7;
ASTLabelType = CommonTree;
output=AST;
backtrack = true;
}
tokens{
NULL;
}
bottomup
: ^('*' ^('*' e=.)) -> ^('*' $e) //a** -> a*
| ^('|' i=.* j=.* {$i.tree.toStringTree() == $j.tree.toStringTree()} )
-> $i // There are 3 errors while this line is up and running:
// 1. CommonTree cannot be resolved,
// 2. i.tree cannot be resolved or is not a field,
// 3. i cannot be resolved.
;
Small driver class:
public class Regexp_Test_v7 {
public static void main(String[] args) throws RecognitionException {
CharStream stream = new ANTLRStringStream("a***;a|a;(ab)****;ab|ab;ab|aa;");
Regexp_v7Lexer lexer = new Regexp_v7Lexer(stream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Regexp_v7Parser parser = new Regexp_v7Parser(tokenStream);
list_return list = parser.list();
CommonTree t = (CommonTree) list.getTree();
System.out.println("Original tree: " + t.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
Regexp_v7Walker s = new Regexp_v7Walker(nodes);
t = (CommonTree)s.downup(t);
System.out.println("Simplified tree: " + t.toStringTree());
Can anyone help me with solving this case?
Thanks in advance and regards.

Now, I'm no expert, but in your tree grammar:
add filter=true
change the second line of bottomup rule to:
^('|' i=. j=. {i.toStringTree().equals(j.toStringTree()) }? ) -> $i }
If I'm not mistaken by using i=.* you're allowing i to be non-existent and you'll get a NullPointerException on conversion to a String.
Both i and j are of type CommonTree because you've set it up this way: ASTLabelType = CommonTree, so you should call i.toStringTree().
And since it's Java and you're comparing Strings, use equals().
Also to make the expression in curly brackets a predicate, you need a question mark after the closing one.

How can I create a simple input validator by using ANTLR?

I wrote my grammar in ANTLRWorks and it worked pretty well and then I generated lexer and parser.
Well the code executes and there's no error.
But it makes me crazy even with a wrong input everything is fine. By this I mean that parser.prog() executes just fine. So where is the information that I should get as the result? I just want to check the input to figure it out that if it is a propositional logic statement or not?
I used the below to generate the code but it had some errors like it can not find the main class!
java antlr.jar org.antlr.Tool PropLogic.g
But this code worked :
java -cp antlr.jar org.antlr.Tool PropLogic.g
Here's the Grammar :
grammar PropLogic;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
prog : formula ;
formula : NOT formula
| OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here's my code:
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
public class Tableaux {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}

Given the following test class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(args[0]);
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
which can be invoked on *nix/MacOS like this:
java -cp .:antlr-3.2.jar Main "a b c"
or on Windows
java -cp .;antlr-3.2.jar Main "a b c"
does not produce any errors because your parser and lexer are "content" with the input. The lexer tokenizes the input into the following 3 tokens a, b and c (spaces are ignored). And the parser rule:
prog
: formula
;
matches a single formula, which in its turn matches a SYMBOLS token. Note that although you named it SYMBOLS (plural), it only matches a single lower case letter, or tilde (~):
SYMBOLS : ('a'..'z') | '~' ;
So, in short, from the input source "a b c", only a is being parsed by your parser. You probably want your parser to consume the entire token stream, which can be done by adding the EOF (end of file) token after the entry point of your grammar:
prog
: formula EOF
;
If you run the test class again and provide "a b c" as input, the following error is produced:
line 1:2 missing EOF at 'b'
EDIT
I tested you grammar including the EOF token:
grammar PropLogic;
prog
: formula EOF
;
formula
: NOT formula
| OP formula (AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS
;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ { $channel = HIDDEN; } ;
with the class including the ANTLRStringStream:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
with both ANTLR 3.2, and ANTLR 3.3:
java -cp antlr-3.2.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main
line 1:2 missing EOF at 'b'
java -cp antlr-3.3.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
line 1:2 missing EOF at 'b'
And as you can see, both produce the error message:
line 1:2 missing EOF at 'b'

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Making generated parser work in Java for ANTLR 4.8 - java

Related

Antlr3: building parse tree for qualified names

Why is ANTLR omitting the final token and not producing an error?

ANTLR parsing is not finding correct lexer parts

Regular Expressions - tree grammar Antlr Java

How can I create a simple input validator by using ANTLR?

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Making generated parser work in Java for ANTLR 4.8 - java

Related

Antlr3: building parse tree for qualified names

Why is ANTLR omitting the final token *and* not producing an error?

ANTLR parsing is not finding correct lexer parts

Regular Expressions - tree grammar Antlr Java

How can I create a simple input validator by using ANTLR?

Categories

Resources

Why is ANTLR omitting the final token and not producing an error?