How can I create a simple input validator by using ANTLR? - java

I wrote my grammar in ANTLRWorks and it worked pretty well and then I generated lexer and parser.
Well the code executes and there's no error.
But it makes me crazy even with a wrong input everything is fine. By this I mean that parser.prog() executes just fine. So where is the information that I should get as the result? I just want to check the input to figure it out that if it is a propositional logic statement or not?
I used the below to generate the code but it had some errors like it can not find the main class!
java antlr.jar org.antlr.Tool PropLogic.g
But this code worked :
java -cp antlr.jar org.antlr.Tool PropLogic.g
Here's the Grammar :
grammar PropLogic;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
prog : formula ;
formula : NOT formula
| OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here's my code:
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
public class Tableaux {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}

Given the following test class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(args[0]);
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
which can be invoked on *nix/MacOS like this:
java -cp .:antlr-3.2.jar Main "a b c"
or on Windows
java -cp .;antlr-3.2.jar Main "a b c"
does not produce any errors because your parser and lexer are "content" with the input. The lexer tokenizes the input into the following 3 tokens a, b and c (spaces are ignored). And the parser rule:
prog
: formula
;
matches a single formula, which in its turn matches a SYMBOLS token. Note that although you named it SYMBOLS (plural), it only matches a single lower case letter, or tilde (~):
SYMBOLS : ('a'..'z') | '~' ;
So, in short, from the input source "a b c", only a is being parsed by your parser. You probably want your parser to consume the entire token stream, which can be done by adding the EOF (end of file) token after the entry point of your grammar:
prog
: formula EOF
;
If you run the test class again and provide "a b c" as input, the following error is produced:
line 1:2 missing EOF at 'b'
EDIT
I tested you grammar including the EOF token:
grammar PropLogic;
prog
: formula EOF
;
formula
: NOT formula
| OP formula (AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS
;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ { $channel = HIDDEN; } ;
with the class including the ANTLRStringStream:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
with both ANTLR 3.2, and ANTLR 3.3:
java -cp antlr-3.2.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main
line 1:2 missing EOF at 'b'
java -cp antlr-3.3.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
line 1:2 missing EOF at 'b'
And as you can see, both produce the error message:
line 1:2 missing EOF at 'b'

Related

Making generated parser work in Java for ANTLR 4.8

I've been having trouble getting my generated parser to work in Java for ANTLR 4.8. There are other answers to this question, but it seems that ANTLR has changed things since 4.7 and all the other answers are before this change. My code is:
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
CharStream input = CharStreams.fromString(formula);
Antlr.LogicGrammerLexer lexer = new Antlr.LogicGrammerLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = new ParseTree(parser);
It appears to be reading in the formula correctly into the CharStream, but anything I try to do past that just isn't working at all. For example, if I try to print out the parse tree, nothing will be printed. The following line will print out nothing:
System.out.println(lexer._input.getText(new Interval(0, 100)));
Any advice appreciated.
EDIT: added the grammar file:
grammar LogicGrammer;
logicalStmt: BOOL_EXPR | '('logicalStmt' '*LOGIC_SYMBOL' '*logicalStmt')';
BOOL_EXPR: '('IDENTIFIER' '*MATH_SYMBOL' '*IDENTIFIER')';
IDENTIFIER: CHAR+('.'CHAR*)*;
CHAR: 'a'..'z' | 'A'..'Z' | '1'..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';
This line:
ParseTree pt = new ParseTree(parser);
is incorrect. You need to call the start rule method on your parser object to get your parse tree
Antlr.LogicGrammerParser parser = new Antlr.LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();
So far as printing out your input, generally fields starting with an _ (like _input) are not intended for external use. Though I suspect the failure may be that you don't have 100 characters in your input stream, so the Interval is invalid. (I haven't tried it to see the exact failure)
I you include your grammar, one of us could easily attempt to generate and compile and, perhaps, be more specific.
Using your grammar, this works for me:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.misc.Interval;
import org.antlr.v4.runtime.tree.ParseTree;
public class Logic {
public static void main(String... args) {
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
CharStream input = CharStreams.fromString(formula);
LogicGrammerLexer lexer = new LogicGrammerLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
LogicGrammerParser parser = new LogicGrammerParser(tokens);
ParseTree pt = parser.logicalStmt();
System.out.println(pt.toStringTree());
System.out.println(input.getText(new Interval(1, 28)));
}
}
output:
([] (fm.a < fm.b))
fm.a < fm.b) | (fm.a = fm.b)
BTW, a couple of minor suggestions for your grammar:
set up a rule to skip whitespace WS: [ \t\r\n]+ -> skip;
change BOOL_EXPR to a parser rule (since it's made up of a composition of tokens from other lexer rules:
grammar LogicGrammer
;
logicalStmt
: boolExpr
| '(' logicalStmt LOGIC_SYMBOL logicalStmt ')'
;
boolExpr: '(' IDENTIFIER MATH_SYMBOL IDENTIFIER ')';
IDENTIFIER: CHAR+ ('.' CHAR*)*;
CHAR: 'a' ..'z' | 'A' ..'Z' | '1' ..'9';
LOGIC_SYMBOL: '~' | '|' | '&';
MATH_SYMBOL: '<' | '≤' | '=' | '≥' | '>';
WS: [ \t\r\n]+ -> skip;
The BOOL_EXPR shouldn't be a lexer rule. I suggest you do something like this instead:
grammar LogicGrammer;
parse
: logicalStmt EOF
;
logicalStmt
: logicalStmt LOGIC_SYMBOL logicalStmt
| logicalStmt MATH_SYMBOL logicalStmt
| '(' logicalStmt ')'
| IDENTIFIER
;
IDENTIFIER
: CHAR+ ( '.'CHAR+ )*
;
LOGIC_SYMBOL
: [~|&]
;
MATH_SYMBOL
: [<≤=≥>]
;
SPACE
: [ \t\r\n] -> skip
;
fragment CHAR
: [a-zA-Z1-9]
;
which can be tested by running the following code:
String formula = "(fm.a < fm.b) | (fm.a = fm.b)";
LogicGrammerLexer lexer = new LogicGrammerLexer(CharStreams.fromString(formula));
LogicGrammerParser parser = new LogicGrammerParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();
System.out.println(root.toStringTree(parser));

Antlr3: building parse tree for qualified names

I couldn't find a question/answer that comes close to helping with my issue. Therefore, I am posting this question here.
I am trying to build a parse tree for qualified names. The below example shows an example.
E.g.,
foo_boo.aaa.ccc1_c
Here I have dot separated words. I am using antlr3 and below is my grammer.
parse
: expr
;
list_expr : <I removed the grammar here>
SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
QualifiedType : SimpleType | SimpleType ('\.' SimpleType)+;
expr : list_expr
| QualifiedType
| union_expr;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here, SympleType represents grammar for a word. My requirement is to build the grammar for the QualifiedType. The current grammar given in above is not working as expected (QualifiedType : SimpleType | SimpleType ('\.'SimpleType)+;). How to write correct grammar for Qualified names (Dot separated words)?
Make QualifiedType a parser rule instead of a lexer rule:
qualifiedType : SimpleType ('.' SimpleType)*;
Also, '\.' does not need an escape: '.' is OK.
EDIT
You'll have to set the output to AST and apply some tree rewrite rules to make it work properly. Here's a quick demo:
grammar T;
options {
output=AST;
}
tokens {
Root;
QualifiedName;
}
parse
: qualifiedType EOF -> ^(Root qualifiedType)
;
qualifiedType
: SimpleType ('.' SimpleType)* -> ^(QualifiedName SimpleType+)
;
SimpleType
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
;
And if you now run the code:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.stringtemplate.StringTemplate;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("foo_boo.aaa.ccc1_c"));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
you'll get some DOT output, which corresponds to the following AST:

Handling EOF, white spaces, and new lines in ANTLR

I'm trying to write a grammar to handle binary numbers and compute their values:
grammar T;
options
{
backtrack=true;
}
prog :
(b2 = binarynum NEWLINE)+ EOF {System.out.println($binarynum.value);}
|
b1 = binarynum EOF {System.out.println($binarynum.value);}
;
binarynum returns [double value] :
s1=string '.' s2=string
{$value = $s1.value + $s2.value/Math.pow(2.0,$s2.length);}
|
string
{$value = $string.value;}
;
string returns [double value, int length] :
bit s2=string
{$value = $bit.value*Math.pow(2.0,$s2.length)+$s2.value; $length = $s2.length+1; }
|
bit
{$value = $bit.value; $length = 1; }
;
bit returns [double value] :
'0'
{ $value = 0;}
|
'1'
{ $value = 1;}
;
NEWLINE: ('\r')? '\n' {skip();} ;
Java code:
import org.antlr.runtime.*;
public class TestT {
public static void main(String[] args) throws Exception {
// Create an TLexer that feeds from that stream
//TLexer lexer = new TLexer(new ANTLRInputStream(System.in));
TLexer lexer = new TLexer(new ANTLRFileStream("input.txt"));
// Create a stream of tokens fed by the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// Create a parser that feeds off the token stream
TParser parser = new TParser(tokens);
// Begin parsing at rule prog
parser.prog();
}
}
Input File ("input.txt") contains:
11111.111
1000
1000.1
Error: line 3:4 missing EOF at '.'
I first tested the code with having just one input with the prog statement as the following:
prog :
binarynum EOF {System.out.println($binarynum.value);}
;
Everything works out just fine when I do the above modification with one input, however I can't seem to get the hang of it when using multiple inputs separated by new lines.
Can someone please help me out and tell me where I went wrong.
I also have another question, when should the EOF not be included in the grammar? When I tested the grammar for one input after removing the EOF from the grammar I received no errors and a correct output.
Can someone please help me out and tell me where I went wrong.
Your lexer is skipping line breaks while your parser uses them. Remove {skip();} from the lexer rule.
I also have another question, when should the EOF not be included in the grammar?
You'll usually have it at the end of your top level parser rule, which will force the parser to consume the entire input.

Fetching expressions from a string using ANTLR in JAVA

Given a String like..
(a+(a+b)), (d*e) :- (e-f)
Note: (d*e) and (e-f) are different expressions. How can I fetch the expressions from this string. I have the grammar defined as..
parse returns [String value]
: addExp {$value=$addExp.value;} EOF
;
addExp returns [String value]
: multExp {$value=$multExp.value;} (('+' | '-' | '*') multExp{$value+= '+' + $multExp.value;})*
;
multExp returns [String value]
: atom {$value=$atom.value;} (('*' | '/') atom {$value+=$atom.value;)*
;
atom returns [String value]
: x=ID {$value=$x.text;}
| '(' addExp ')' {$value='('+$addExp.value+')';}
;
ID : 'a'..'z' | 'A'..'Z';
I tried..
ANTLRStringStream a=new ANTLRStringStream("(a+(a+b)), (d*e) :- (e-f)");
SLexer l=new SLexer(a);
CommonTokenStream c=new CommonTokenStream(l);
SParser p=new Sparser(c);
String exp;
while(exp = p.parse())
{
System.out.println(exp);
}
I'm thinking of something like hasNext() and then fetching.
Your lexer rules TEXT possibly matches an empty string, causing the lexer to create an infinite amount of tokens. Also, you don't need all those return statements after your rule: you can simply grab what a parser (or lexer) rule matched by adding .text after it.
You could let your parser return a List<String>, or let it return a single String repeatedly invoke that parser rule until EOF is encountered.
A little demo:
grammar T;
#parser::members {
public static void main(String[] args) throws Exception {
String src = "likes(a, b) :- likes(a, X), likes(X, b). hates(a, b) " +
":- hates(a,X), hates(X,b). likes(a,b) :- says(god, likes(a,b)).";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
List<String> statements = parser.parse();
for(String s : statements) {
System.out.println(s);
}
}
}
parse returns [List<String> statements]
#init{$statements = new ArrayList<String>();}
: (statement {$statements.add($statement.text);} ~TEXT+)+ EOF
;
statement
: TEXT OPAR params CPAR
;
params
: (param (COMMA param)*)?
;
param
: TEXT
| statement
;
COMMA : ',';
OPAR : '(';
CPAR : ')';
TEXT : ('a'..'z' | 'A'..'Z')+;
SPACE : (' ' | '\t') {$channel=HIDDEN;};
OTHER : . ;
Note that ~TEXT+ in the parse rule matches one or more tokens other than TEXT.
If you now create a lexer and parser and run the TParser class:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar TParser
or
Windows
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar TParser
you will see the following being printed to your console:
likes(a, b)
likes(a, X)
likes(X, b)
hates(a, b)
hates(a,X)
hates(X,b)
likes(a,b)
says(god, likes(a,b))
EDIT
And here's how to return a single String opposed to a List<String>:
#parser::members {
public static void main(String[] args) throws Exception {
String src = "likes(a, b) :- likes(a, X), likes(X, b). hates(a, b) " +
":- hates(a,X), hates(X,b). likes(a,b) :- says(god, likes(a,b)).";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
String s;
while((s = parser.parse()) != null) {
System.out.println(s);
}
}
}
parse returns [String s]
: statement ~(TEXT| EOF)* {$s = $statement.text;}
| EOF {$s = null;}
;
You should just be able to call sentence() repeatedly until you hit the end of input.

ANTLR NoViableAltException with JAVA

In my grammar with antlrworks, I can get noviablealtexception for rules like if, while which need corresponding right and left brackets. However, in java, i cannot get noviablealtexception.
loop_statement: (WHILE LPAREN expr RPAREN statement)
| (DO statement WHILE LPAREN expr RPAREN);
condition_statement
: IF LPAREN expr RPAREN statement (options {greedy=true;}: ELSE statement)?
In statement rule I have a block rule which is,
statement_blocks
: (LBRACE statement* RBRACE)
;
And statement rule is below,
statement
: var_dec
| statement_blocks
| condition_statement
| loop_statement
| expr_statement
;
Before posting this I've checked some examples. I think i need to add EOF at the end of each rule. When I add EOF for those rules, I get different errors. For example,
loop_statement: ((WHILE LPAREN expr RPAREN statement)
| (DO statement WHILE LPAREN expr RPAREN)) EOF;
condition_statement
: (
(IF LPAREN expr RPAREN statement (options {greedy=true;}: ELSE statement)?
)EOF
These are what I get for the following inputs;
if(s==d){
d=s;
if(a=n){
s=h;
}
a=g;
}
line 6:0 missing EOF at 'a'
When I remove the first left bracket from the first "if"
if(s==d)
d=s;
if(a=n){
s=h;
}
a=g;
}
testcases/new file line 3:0 missing EOF at 'if',
testcases/new file line 6:0 missing EOF at 'a'
while(s==d){
d=s;
while(a=n){
s=h;
}
a=g;
}
line 6:0 missing EOF at 'a'
When I remove the first left bracket from the first "while"
while(s==d)
d=s;
while(a=n){
s=h;
}
a=g;
}
testcases/new file line 3:0 missing EOF at 'while'
testcases/new file line 6:0 missing EOF at 'a'
No, you need to place EOF at the end of your "main" parser rule, not after more than one statement. By doing so, the parser expects the end of the file after such statements (which is not correct, of course).
My guess is that your entry point does not contain EOF causing the parser to stop prematurely instead of throwing an error/exception when it stumbles upon invalid input.
Here's a demo (note the EOF after the parse rule):
T.g
grammar T;
parse
: statement+ EOF
;
statement
: var_dec
| statement_blocks
| c=condition_statement {System.out.println("parsed :: " + $c.text);}
;
var_dec
: ID '=' ID ';'
;
statement_blocks
: LBRACE statement* RBRACE
;
condition_statement
: IF LPAREN expr RPAREN statement (options {greedy=true;}: ELSE statement)?
;
expr
: ID '==' ID
;
IF : 'if';
ELSE : 'else';
ID : 'a'..'z'+;
LBRACE : '{';
RBRACE : '}';
LPAREN : '(';
RPAREN : ')';
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
which can be tested with the class:
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRFileStream("in.txt"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
Testing it all
If you now parse the input file (in.txt):
if(s==d) {
d=s;
if(a==n){
s=h;
}
a=g;
}
there's no problem, as you can see:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
parsed :: if(a==n){s=h;}
parsed :: if(s==d){d=s;if(a==n){s=h;}a=g;}
And if you remove a ( or ) from the file in.txt, you will get the following (similar) error:
in.txt line 1:8 missing RPAREN at '{'

Categories

Resources