ANTLR4 won't display anything in Eclipse on Mac Os x - java

I am working on "The Definitive ANTLR 4 Reference" book and i'm trying to run ArrayInit.g4 example. I have provide everything which is necessary but when i run the example and enter the values into the console, nothing happens (pages 29 and 30).
Here is the grammar :
grammar ArrayInit;
init : '{' value ( ',' value)* '}';
value : init | INT ;
INT : [0-9]+ ; WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
And here is the Test.java
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
ArrayInitLexer lexer = new ArrayInitLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ArrayInitParser parser = new ArrayInitParser(tokens);
ParseTree tree = parser.init();
System.out.println(tree.toStringTree(parser));
}
}
Given input is : {1,{2,3},4}
The expected output is : ( init {(value 1), (value (init { (value 2), (value 3) })), (value 4)} )

Related

Handling EOF, white spaces, and new lines in ANTLR

I'm trying to write a grammar to handle binary numbers and compute their values:
grammar T;
options
{
backtrack=true;
}
prog :
(b2 = binarynum NEWLINE)+ EOF {System.out.println($binarynum.value);}
|
b1 = binarynum EOF {System.out.println($binarynum.value);}
;
binarynum returns [double value] :
s1=string '.' s2=string
{$value = $s1.value + $s2.value/Math.pow(2.0,$s2.length);}
|
string
{$value = $string.value;}
;
string returns [double value, int length] :
bit s2=string
{$value = $bit.value*Math.pow(2.0,$s2.length)+$s2.value; $length = $s2.length+1; }
|
bit
{$value = $bit.value; $length = 1; }
;
bit returns [double value] :
'0'
{ $value = 0;}
|
'1'
{ $value = 1;}
;
NEWLINE: ('\r')? '\n' {skip();} ;
Java code:
import org.antlr.runtime.*;
public class TestT {
public static void main(String[] args) throws Exception {
// Create an TLexer that feeds from that stream
//TLexer lexer = new TLexer(new ANTLRInputStream(System.in));
TLexer lexer = new TLexer(new ANTLRFileStream("input.txt"));
// Create a stream of tokens fed by the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// Create a parser that feeds off the token stream
TParser parser = new TParser(tokens);
// Begin parsing at rule prog
parser.prog();
}
}
Input File ("input.txt") contains:
11111.111
1000
1000.1
Error: line 3:4 missing EOF at '.'
I first tested the code with having just one input with the prog statement as the following:
prog :
binarynum EOF {System.out.println($binarynum.value);}
;
Everything works out just fine when I do the above modification with one input, however I can't seem to get the hang of it when using multiple inputs separated by new lines.
Can someone please help me out and tell me where I went wrong.
I also have another question, when should the EOF not be included in the grammar? When I tested the grammar for one input after removing the EOF from the grammar I received no errors and a correct output.
Can someone please help me out and tell me where I went wrong.
Your lexer is skipping line breaks while your parser uses them. Remove {skip();} from the lexer rule.
I also have another question, when should the EOF not be included in the grammar?
You'll usually have it at the end of your top level parser rule, which will force the parser to consume the entire input.

Antlr NoViableAltException Thrown In Java With White Spaces

I have a simple grammar defined in Antlr 3 as shown below:
grammar StringProcessor;
options {
output=AST;
}
#header {
package com.processor;
}
#rulecatch {
// ANTLR does not generate its normal rule try/catch
catch(RecognitionException e) {
throw e;
}
}
truevalue : 'true';
falsevalue : 'false';
nullvalue : 'null';
simpleValue : truevalue | falsevalue | nullvalue | STRING | INTEGER | FLOAT;
INTEGER : '0'..'9'+;
FLOAT : INTEGER'.'INTEGER;
QUOTE : '"';
SPECIALCHAR : '-'|':'|';'|'('|')'|'£'|'&'|'#'|','|'!'|'['|']'|'{'|'}'|'#'|'^'|'*'|'+'|'='|'_'|'<'|'>'|'€'|'$'|'%'|'/'|'.'|'?'|'~'|'|';
STRING : QUOTE('a'..'z'|'A'..'Z'|INTEGER|SPECIALCHAR|WS)+QUOTE;
WS : (' '|'\t'|'\f'|'\n'|'\r')+ {skip();}; // handle white space between keywords
When I try the following STRING in AntlrWorks in the intrepreter:
"5Java Developer"
This works. It includes the white space. But when I try to parse this from the Java program, it throws a NoViableAltException. I have seen other posts, but those solutions does not apply to my problem. The WS is part of the STRING. The problem is Java program does not parse anything with a white space, whereas the interprets displays correctly.
An example to show the Exception:
public static void main(String[] args) throws Exception {
String input = ("\"5Java Developer\"");
StringProcessorParser parser = buildParser(input);
CommonTree commonTree = (CommonTree) parser.simpleValue().getTree(); // exception thrown
}
public static StringProcessorParser buildParser(String query) {
CharStream cs = new ANTLRStringStream(query);
// the input needs to be lexed
StringProcessorLexer lexer = new StingProcessorLexer(cs);
CommonTokenStream tokens = new CommonTokenStream();
StringProcessorParser parser = new StringProcessorParser(tokens);
tokens.setTokenSource(lexer);
// use the ASTTreeAdaptor so that the grammar is aware to build tree in AST format
parser.setTreeAdaptor((TreeAdaptor) new ASTTreeAdaptor().getASTTreeAdaptor());
return parser;
}
Having:
input = new String("\"5JavaDeveloper\""); correctly parses.
Any idea why this is not working.
EDIT:
I have also tried adding the $channel = HIDDEN;
But still it does not work
WS : (' '|'\t'|'\f'|'\n'|'\r')+ { $channel = HIDDEN; skip();}; // handle white space between keywords
Removing the skip() has fixed my problem.

In antlr, is there a way to get parsed text of a CommonTree in AST mode?

A simple example:
(grammar):
stat: ID '=' expr NEWLINE -> ^('=' ID expr)
expr: atom '+' atom -> ^(+ atom atom)
atom: INT | ID
...
(input text): a = 3 + 5
The corresponding CommonTree for '3 + 5' contains a '+' token and two children (3, 5).
At this point, what is the best way to recover the original input text that parsed into this tree ('3 + 5')?
I've got the text, position and the line number of individual tokens in the CommonTree object, so theoretically it's possible to make sure only white space tokens are discarded and piece them together using this information, but it looks error prone.
Is there a better way to do this?
Is there a better way to do this?
Better, I don't know. There is another way, of course. You decide what's better.
Another option would be to create a custom AST node class (and corresponding node-adapter) and add the matched text to this AST node during parsing. The trick here is to not use skip(), which discards the token from the lexer, but to put it on the HIDDEN channel. This is effectively the same, however, the text these (hidden) tokens match are still available in the parser.
A quick demo: put all these 3 file in a directory named demo:
demo/T.g
grammar T;
options {
output=AST;
ASTLabelType=XTree;
}
#parser::header {
package demo;
import demo.*;
}
#lexer::header {
package demo;
import demo.*;
}
parse
: expr EOF -> expr
;
expr
#after{$expr.tree.matched = $expr.text;}
: Int '+' Int ';' -> ^('+' Int Int)
;
Int
: '0'..'9'+
;
Space
: ' ' {$channel=HIDDEN;}
;
demo/XTree.java
package demo;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class XTree extends CommonTree {
protected String matched;
public XTree(Token t) {
super(t);
matched = null;
}
}
demo/Main.java
package demo;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "12 + 42 ;";
TLexer lexer = new TLexer(new ANTLRStringStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.setTreeAdaptor(new CommonTreeAdaptor(){
#Override
public Object create(Token t) {
return new XTree(t);
}
});
XTree root = (XTree)parser.parse().getTree();
System.out.println("tree : " + root.toStringTree());
System.out.println("matched : " + root.matched);
}
}
You can run this demo by opening a shell and cd-ing to the directory that holds the demo directory and execute the following:
java -cp demo/antlr-3.3.jar org.antlr.Tool demo/T.g
javac -cp demo/antlr-3.3.jar demo/*.java
java -cp .:demo/antlr-3.3.jar demo.Main
which will produce the following output:
tree : (+ 12 42)
matched : 12 + 42 ;
Another possibility is to use TokenRewriteStream which has several toString() methods.
To borrow from #Bart Kiers' example Demo/Main.java
TokenRewriteStream tokens = new TokenRewriteStream(lexer)
TParser parser = new TParser(tokens);
...
tokens.toString(n.getTokenStartIndex(), n.getTokenStopIndex() + 1).trim()
So given any node 'n' of your parse tree, calling toString() on it as above will produce the string that "generated" this node.

Fetching expressions from a string using ANTLR in JAVA

Given a String like..
(a+(a+b)), (d*e) :- (e-f)
Note: (d*e) and (e-f) are different expressions. How can I fetch the expressions from this string. I have the grammar defined as..
parse returns [String value]
: addExp {$value=$addExp.value;} EOF
;
addExp returns [String value]
: multExp {$value=$multExp.value;} (('+' | '-' | '*') multExp{$value+= '+' + $multExp.value;})*
;
multExp returns [String value]
: atom {$value=$atom.value;} (('*' | '/') atom {$value+=$atom.value;)*
;
atom returns [String value]
: x=ID {$value=$x.text;}
| '(' addExp ')' {$value='('+$addExp.value+')';}
;
ID : 'a'..'z' | 'A'..'Z';
I tried..
ANTLRStringStream a=new ANTLRStringStream("(a+(a+b)), (d*e) :- (e-f)");
SLexer l=new SLexer(a);
CommonTokenStream c=new CommonTokenStream(l);
SParser p=new Sparser(c);
String exp;
while(exp = p.parse())
{
System.out.println(exp);
}
I'm thinking of something like hasNext() and then fetching.
Your lexer rules TEXT possibly matches an empty string, causing the lexer to create an infinite amount of tokens. Also, you don't need all those return statements after your rule: you can simply grab what a parser (or lexer) rule matched by adding .text after it.
You could let your parser return a List<String>, or let it return a single String repeatedly invoke that parser rule until EOF is encountered.
A little demo:
grammar T;
#parser::members {
public static void main(String[] args) throws Exception {
String src = "likes(a, b) :- likes(a, X), likes(X, b). hates(a, b) " +
":- hates(a,X), hates(X,b). likes(a,b) :- says(god, likes(a,b)).";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
List<String> statements = parser.parse();
for(String s : statements) {
System.out.println(s);
}
}
}
parse returns [List<String> statements]
#init{$statements = new ArrayList<String>();}
: (statement {$statements.add($statement.text);} ~TEXT+)+ EOF
;
statement
: TEXT OPAR params CPAR
;
params
: (param (COMMA param)*)?
;
param
: TEXT
| statement
;
COMMA : ',';
OPAR : '(';
CPAR : ')';
TEXT : ('a'..'z' | 'A'..'Z')+;
SPACE : (' ' | '\t') {$channel=HIDDEN;};
OTHER : . ;
Note that ~TEXT+ in the parse rule matches one or more tokens other than TEXT.
If you now create a lexer and parser and run the TParser class:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar TParser
or
Windows
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar TParser
you will see the following being printed to your console:
likes(a, b)
likes(a, X)
likes(X, b)
hates(a, b)
hates(a,X)
hates(X,b)
likes(a,b)
says(god, likes(a,b))
EDIT
And here's how to return a single String opposed to a List<String>:
#parser::members {
public static void main(String[] args) throws Exception {
String src = "likes(a, b) :- likes(a, X), likes(X, b). hates(a, b) " +
":- hates(a,X), hates(X,b). likes(a,b) :- says(god, likes(a,b)).";
TLexer lexer = new TLexer(new ANTLRStringStream(src));
TParser parser = new TParser(new CommonTokenStream(lexer));
String s;
while((s = parser.parse()) != null) {
System.out.println(s);
}
}
}
parse returns [String s]
: statement ~(TEXT| EOF)* {$s = $statement.text;}
| EOF {$s = null;}
;
You should just be able to call sentence() repeatedly until you hit the end of input.

How can I create a simple input validator by using ANTLR?

I wrote my grammar in ANTLRWorks and it worked pretty well and then I generated lexer and parser.
Well the code executes and there's no error.
But it makes me crazy even with a wrong input everything is fine. By this I mean that parser.prog() executes just fine. So where is the information that I should get as the result? I just want to check the input to figure it out that if it is a propositional logic statement or not?
I used the below to generate the code but it had some errors like it can not find the main class!
java antlr.jar org.antlr.Tool PropLogic.g
But this code worked :
java -cp antlr.jar org.antlr.Tool PropLogic.g
Here's the Grammar :
grammar PropLogic;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
prog : formula ;
formula : NOT formula
| OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Here's my code:
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
public class Tableaux {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
Given the following test class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(args[0]);
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
which can be invoked on *nix/MacOS like this:
java -cp .:antlr-3.2.jar Main "a b c"
or on Windows
java -cp .;antlr-3.2.jar Main "a b c"
does not produce any errors because your parser and lexer are "content" with the input. The lexer tokenizes the input into the following 3 tokens a, b and c (spaces are ignored). And the parser rule:
prog
: formula
;
matches a single formula, which in its turn matches a SYMBOLS token. Note that although you named it SYMBOLS (plural), it only matches a single lower case letter, or tilde (~):
SYMBOLS : ('a'..'z') | '~' ;
So, in short, from the input source "a b c", only a is being parsed by your parser. You probably want your parser to consume the entire token stream, which can be done by adding the EOF (end of file) token after the entry point of your grammar:
prog
: formula EOF
;
If you run the test class again and provide "a b c" as input, the following error is produced:
line 1:2 missing EOF at 'b'
EDIT
I tested you grammar including the EOF token:
grammar PropLogic;
prog
: formula EOF
;
formula
: NOT formula
| OP formula (AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS
;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ { $channel = HIDDEN; } ;
with the class including the ANTLRStringStream:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("a b c");
PropLogicLexer lexer = new PropLogicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PropLogicParser parser = new PropLogicParser(tokens);
parser.prog();
}
}
with both ANTLR 3.2, and ANTLR 3.3:
java -cp antlr-3.2.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main
line 1:2 missing EOF at 'b'
java -cp antlr-3.3.jar org.antlr.Tool PropLogic.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
line 1:2 missing EOF at 'b'
And as you can see, both produce the error message:
line 1:2 missing EOF at 'b'

Categories

Resources