ANTLR java test file can't create object of tree grammar - java

I am creating a parser using ANTLR 3.x that targets java. I have written both parser grammar (for creating Abstract Syntax Tree, AST) and Tree Grammar (for performing operations on AST). Finally, to test both grammar files, I have written a test file in Java.
Have a look at the below code,
protocol grammar
grammar protocol;
options {
language = Java;
output = AST;
}
tokens{ //imaginary tokens
PROT;
INITIALP;
PROC;
TRANSITIONS;
}
#header {
import twoprocess.Configuration;
package com.javadude.antlr3.x.tutorial;
}
#lexer::header {
package com.javadude.antlr3.x.tutorial;
}
/*
parser rules, in lowercase letters
*/
program
: declaration+
;
declaration
:protocol
|initialprocess
|process
|transitions
;
protocol
:'protocol' ID ';' -> ^(PROT ID)
;
initialprocess
:'pin' '=' INT ';' -> ^(INITIALP INT)
;
process
:'p' '=' INT ';' -> ^(PROC INT)
;
transitions
:'transitions' '=' INT ('(' INT ',' INT ')') + ';' -> ^(TRANSITIONS INT INT INT*)
;
/*
lexer rules (tokens), in upper case letters
*/
ID
: (('a'..'z' | 'A'..'Z'|'_')('a'..'z' | 'A'..'Z'|'0'..'9'|'_'))*;
INT
: ('0'..'9')+;
WHITESPACE
: ('\t' | ' ' | '\r' | '\n' | '\u000C')+ {$channel = HIDDEN;};
protocolWalker
grammar protocolWalker;
options {
language = Java;
//Error, eclipse can't access tokenVocab named protocol
tokenVocab = protocol; //import tokens from protocol.g i.e, from protocol.tokens file
ASTLabelType = CommonTree;
}
#header {
import twoprocess.Configuration;
package com.javadude.antlr3.x.tutorial;
}
program
: declaration+
;
declaration
:protocol
|initialprocess
|process
|transitions
;
protocol
:^(PROT ID)
{System.out.println("create protocol " +$ID.text);}
;
initialprocess
:^(INITIALP INT)
{System.out.println("");}
;
process
:^(PROC INT)
{System.out.println("");}
;
transitions
:^(TRANSITIONS INT INT INT*)
{System.out.println("");}
;
Protocoltest.java
package com.javadude.antlr3.x.tutorial;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
public class Protocoltest {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
//create input stream from standard input
ANTLRInputStream input = new ANTLRInputStream(System.in);
//create a lexer attached to that input stream
protocolLexer lexer = new protocolLexer(input);
//create a stream of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
//create a pareser attached to teh token stream
protocolParser parser = new protocolParser(tokens);
//invoke the program rule in get return value
protocolParser.program_return r =parser.program();
CommonTree t = (CommonTree)r.getTree();
//output the extracted tree to the console
System.out.println(t.toStringTree());
//walk resulting tree; create treenode stream first
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
//AST nodes have payloads that point into token stream
nodes.setTokenStream(tokens);
//create a tree walker attached to the nodes stream
//Error, can't create TreeGrammar object called walker
protocolWalker walker = new protocolWalker(nodes);
//invoke the start symbol, rule program
walker.program();
}
}
Problems:
In protocolWalker, I can't access the tokens (protocol.tokens)
//Error, eclipse can't access tokenVocab named protocol
tokenVocab = protocol; //import tokens from protocol.g i.e, from protocol.tokens file
In In protocolWalker, can I create the object of java class, called Configuration, in the action list?
protocol
:^(PROT ID)
{System.out.println("create protocol " +$ID.text);
Configuration conf = new Configuration();
}
;
In Protocoltest.java
//create a tree walker attached to the nodes stream
//Error, can't create TreeGrammar object called walker
protocolWalker walker = new protocolWalker(nodes);
Object of protocolWalker can't be created. I have seen in the examples and the tutorials that such object is created.

In protocolWalker, I can't access the tokens (protocol.tokens)...
It seems to be accessing protocol.tokens fine: changing tokenVocab to something else produces an error that it doesn't produce now. The problem with protocolWalker.g is that it's defined as a token parser (grammar protocolWalker) but it's being used like a tree parser. Defining the grammar as tree grammar protocolWalker took away the errors that I was seeing about the undefined tokens.
In protocolWalker, can I create the object of java class, called Configuration, in the action list?
Yes, you can. The normal Java programming caveats apply about importing the class and so on, but it's as available to you as code like System.out.println.
In Protocoltest.java ... Object of protocolWalker can't be created.
protocolWalker.g (as it is now) produces a token parser named protocolWalkerParser. When you change it to a tree grammar, it'll produce a tree parser named protocolWalker instead.
Thanks a lot for posting the whole grammars. That made answering the question much easier.

Thank you for your reply, that was a silly mistake.
Tokens problem and creating object of protocolWalker is resolved now but whenever, I change the grammar whether, protocol.g or protocolWalker.g, I had to write package name again(every time) in protocolParser.java and protocolWalker.java. I had the same problem with lexer file before but that was overcomed by the following declaration.
#header {
package com.javadude.antlr3.x.tutorial;
}
but I don't know how to overcome this problem?
Also, I have developed a GUI in Java using SWING where I have a textarea. In that text area,
user will write the input, like for my grammar user will write,
protocol test;
pin = 5;
p = 3;
transitions = 2(5,0) (5,1);
How can I process this input in Java Swing GUI and produce output there?
Moreover, if I give the following section of protocolWalker.g to
protocol
:^(PROT ID)
{
System.out.println("create protocol " +$ID.text);
Configuration conf = new Configuration();
conf.showConfiguration();
}
;
initialprocess
:^(INITIALP INT)
{System.out.println("create initial process (with state) ");}
;
process
:^(PROC INT)
{System.out.println("create processes ");}
;
and run the test file with the following input,
protocol test;
pin = 5;
p = 3;
transitions = 2(5,0) (5,1);
I get the following output
(PROT test) (INITIALP 5) (PROC 3) (TRANSITIONS 2 5 0 5 1)
create protocol test
why the second and the third println in the protocolWalker.g are not shown in the output?
Any thoughts/help?
Thank you once again.

Related

Formatting string content xtext 2.14

Given a grammar (simplified version below) where I can enter arbitrary text in a section of the grammar, is it possible to format the content of the arbitrary text? I understand how to format the position of the arbitrary text in relation to the rest of the grammar, but not whether it is possible to format the content string itself?
Sample grammar
Model:
'content' content=RT
terminal RT: // (returns ecore::EString:)
'RT>>' -> '<<RT';
Sample content
content RT>>
# Some sample arbitrary text
which I would like to format
<<RT
you can add custom ITextReplacer to the region of the string.
assuming you have a grammar like
Model:
greetings+=Greeting*;
Greeting:
'Hello' name=STRING '!';
you can do something like the follow in the formatter
def dispatch void format(Greeting model, extension IFormattableDocument document) {
model.prepend[newLine]
val region = model.regionFor.feature(MyDslPackage.Literals.GREETING__NAME)
val r = new AbstractTextReplacer(document, region) {
override createReplacements(ITextReplacerContext it) {
val text = region.text
var int index = text.indexOf(SPACE);
val offset = region.offset
while (index >=0){
it.addReplacement(region.textRegionAccess.rewriter.createReplacement(offset+index, SPACE.length, "\n"))
index = text.indexOf(SPACE, index+SPACE.length()) ;
}
it
}
}
addReplacer(r)
}
this will turn this model
Hello "A B C"!
into
Hello "A
B
C"!
of course you need to come up with a more sophisticated formatter logic.
see How to define different indentation levels in the same document with Xtext formatter too

Nashorn Abstract Syntax Tree Traversal

I am attempting to parse this Javascript via Nashorn:
function someFunction() { return b + 1 };
and navigate to all of the statements. This including statements inside the function.
The code below just prints:
"function {U%}someFunction = [] function {U%}someFunction()"
How do I "get inside" the function node to it's body "return b + 1"? I presume I need to traverse the tree with a visitor and get the child node?
I have been following the second answer to the following question:
Javascript parser for Java
import jdk.nashorn.internal.ir.Block;
import jdk.nashorn.internal.ir.FunctionNode;
import jdk.nashorn.internal.ir.Statement;
import jdk.nashorn.internal.parser.Parser;
import jdk.nashorn.internal.runtime.Context;
import jdk.nashorn.internal.runtime.ErrorManager;
import jdk.nashorn.internal.runtime.Source;
import jdk.nashorn.internal.runtime.options.Options;
import java.util.List;
public class Main {
public static void main(String[] args){
Options options = new Options("nashorn");
options.set("anon.functions", true);
options.set("parse.only", true);
options.set("scripting", true);
ErrorManager errors = new ErrorManager();
Context context = new Context(options, errors, Thread.currentThread().getContextClassLoader());
Source source = Source.sourceFor("test", "function someFunction() { return b + 1; } ");
Parser parser = new Parser(context.getEnv(), source, errors);
FunctionNode functionNode = parser.parse();
Block block = functionNode.getBody();
List<Statement> statements = block.getStatements();
for(Statement statement: statements){
System.out.println(statement);
}
}
}
Using private/internal implementation classes of nashorn engine is not a good idea. With security manager on, you'll get access exception. With jdk9 and beyond, you'll get module access error w/without security manager (as jdk.nashorn.internal.* packages not exported from nashorn module).
You've two options to parse javascript using nashorn:
Nashorn parser API ->https://docs.oracle.com/javase/9/docs/api/jdk/nashorn/api/tree/Parser.html
To use Parser API, you need to use jdk9+.
For jdk8, you can use parser.js
load("nashorn:parser.js");
and call "parse" function from script. This function returns a JSON object that represents AST of the script parsed.
See this sample: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/nashorn/file/a6d0aec77286/samples/astviewer.js

how to get details of the pl sql package after parsing in java

I have a pkb file. It contain a package and under that package it has multiple functions.
I have to get the following details out of it:
package name
function names (for all functions one by one)
params in function
return type of function
Approach: I am parsing the pkb file. I have taken the grammar from these sources:
Presto
Antlrv4 Grammer for plsql
After getting these grammar I downloaded the jar from antlr-4.5.3-complete.jar. Then using
java -cp org.antlr.v4.Tool grammar.g
one by one I execute this command on these grammars separately to generate listener, lexer, parser and other files.
After this I created two project in eclipse one for each grammar. I imported these generated file into the respective and set antlr-4.5.3-complete.jar file into the path. After this I used following code to check if my .pkb file is correct or not?
public static void parse(String file) {
try {
SqlBaseLexer lex = new SqlBaseLexer(new org.antlr.v4.runtime.ANTLRInputStream(file));
CommonTokenStream tokens = new CommonTokenStream(lex);
SqlBaseParser parser = new SqlBaseParser(tokens);
System.err.println(parser.getNumberOfSyntaxErrors()+" Errors");
} catch (RecognitionException e) {
System.err.println(e.toString());
} catch (java.lang.OutOfMemoryError e) {
System.err.println(file + ":");
System.err.println(e.toString());
} catch (java.lang.ArrayIndexOutOfBoundsException e) {
System.err.println(file + ":");
System.err.println(e.toString());
}
}
I am not getting any error in parsing the file.
But after this I am stuck with next steps. I need to get all the package name, functions, params etc.
How to get these details?
Also is my approach is correct to attain the required output.
The Presto grammar is a generic SQL grammar which is not suitable for parsing Oracle packages. The ANTLRv4 grammar for PL/SQL is the right tool for your task.
Generally an ANTLR grammar as such works as a validator. When you want to make some additional processing while parsing you should use ANTLR actions (see overview slide in this presentation). These are blocks of text written in the target language (e.g. Java) and enclosed in curly braces (see documentation).
There are at least two ways to solve your task with ANTLR actions.
Stdout output
The simplest way is to add println()s for certain rules.
To print package name modify package_body rule in plsql.g4 as follows:
package_body
: BODY package_name (IS | AS) package_obj_body*
(BEGIN seq_of_statements | END package_name?)
{System.out.println("Package name is "+$package_name.text);}
;
Similarly to print information about function's arguments and return type: add prinln()s in create_function_body rule. But there is an issue whith printing of parameters. If you use $parameter.text it will return name, type specification and default value according to parameter rule without spaces (as token sequence). If you add println() to parameter rule and use $parameter_name.text it will print all parameter's names (including parameters of procedures, not only functions). So you can add an ANTLR return value for parameter rule and assign $parameter_name.text to the return value:
parameter returns [String p_name]
: parameter_name (IN | OUT | INOUT | NOCOPY)*
type_spec? default_value_part?
{$p_name=$parameter_name.text;}
;
Thus is context of create_function_body we can access the parameter's name by $parameter.p_name:
create_function_body
: (CREATE (OR REPLACE)?)? FUNCTION function_name
{System.out.println("Parameters of function "+$function_name.text+":");}
('(' parameter {System.out.println($parameter.p_name);}
(',' parameter {System.out.println($parameter.p_name);})* ')')?
RETURN type_spec
(invoker_rights_clause|parallel_enable_clause|result_cache_clause|DETERMINISTIC)*
((PIPELINED? (IS | AS) (DECLARE? declare_spec* body | call_spec))
| (PIPELINED | AGGREGATE) USING implementation_type_name) ';'
{System.out.println("Return type of function "
+$function_name.text+" is "
+ $type_spec.text);}
;
Accumulation
Also you can save some calculations to variables and access them as parser class members. E.g. you can accumulate function's name in variable func_name. For this add #members section at beginning of the grammar:
grammar plsql;
#members{
String func_name = "";
}
And modify function_name rule as follows:
function_name
: id ('.' id_expression)? {func_name = func_name+$id.text + " ";}
;
Using lexer and parser classes
Here is an example of application to run your parser parse.java:
import org.antlr.v4.runtime.*;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class parse {
static String readFile(String path) throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, "UTF-8");
}
public static void main(String[] args) throws Exception {
// create input stream `in`
ANTLRInputStream in = new ANTLRInputStream( readFile(args[0]) );
// create lexer `lex` with `in` at input
plsqlLexer lex = new plsqlLexer(in);
// create token stream `tokens` with `lex` at input
CommonTokenStream tokens = new CommonTokenStream(lex);
// create parser with `tokens` at input
plsqlParser parser = new plsqlParser(tokens);
// call start rule of parser
parser.sql_script();
// print func_name
System.out.println("Function names: "+parser.func_name);
}
}
Compile and run
After this generate java code by ANTLR:
java org.antlr.v4.Tool plsql.g4
and compile your Java code:
javac plsqlLexer.java plsqlParser.java plsqlListener.java parse.java
then run it for some .pkb file:
java parse green_tools.pkb
You can find modified parse.java, plsql.g4 and green_tools.pkb here.

Traversal of tokens using ParserRuleContext in listener - ANTLR4

While iterating over the tokens using a Listener, I would like to know how to use the ParserRuleContext to peek at the next token or the next few tokens in the token stream?
In the code below I am trying to peek at all the tokens after the current token till the EOF:
#Override
public void enterSemicolon(JavaParser.SemicolonContext ctx) {
Token tok, semiColon = ctx.getStart();
int currentIndex = semiColon.getStartIndex();
int reqInd = currentIndex+1;
TokenSource tokSrc= semiColon.getTokenSource();
CharStream srcStream = semiColon.getInputStream();
srcStream.seek(currentIndex);
while(true){
tok = tokSrc.nextToken() ;
System.out.println(tok);
if(tok.getText()=="<EOF>"){break;}
srcStream.seek(reqInd++);
}
}
But the output I get is:
.
.
.
.
.
[#-1,131:130='',<-1>,13:0]
[#-1,132:131='',<-1>,13:0]
[#-1,133:132='',<-1>,13:0]
[#-1,134:133='',<-1>,13:0]
[#-1,135:134='',<-1>,13:0]
[#-1,136:135='',<-1>,13:0]
[#-1,137:136='',<-1>,13:0]
[#-1,138:137='',<-1>,13:0]
[#-1,139:138='',<-1>,13:0]
[#-1,140:139='',<-1>,13:0]
[#-1,141:140='',<-1>,13:0]
[#-1,142:141='',<-1>,13:0]
[#-1,143:142='',<-1>,13:0]
[#-1,144:143='',<-1>,13:0]
[#-1,145:144='',<-1>,13:0]
[#-1,146:145='',<-1>,13:0]
[#-1,147:146='',<-1>,13:0]
[#-1,148:147='',<-1>,13:0]
[#-1,149:148='',<-1>,13:0]
[#-1,150:149='',<-1>,13:0]
[#-1,151:150='',<-1>,13:0]
[#-1,152:151='',<-1>,13:0]
[#-1,153:152='',<-1>,13:0]
[#-1,154:153='',<-1>,13:0]
[#-1,155:154='',<-1>,13:0]
[#-1,156:155='',<-1>,13:0]
[#-1,157:156='',<-1>,13:0]
[#-1,158:157='',<-1>,13:0]
[#-1,159:158='',<-1>,13:0]
[#-1,160:159='',<-1>,13:0]
[#-1,161:160='<EOF>',<-1>,13:0]
[#-1,137:136='',<-1>,13:0]
[#-1,138:137='',<-1>,13:0]
[#-1,139:138='',<-1>,13:0]
[#-1,140:139='',<-1>,13:0]
[#-1,141:140='',<-1>,13:0]
[#-1,142:141='',<-1>,13:0]
[#-1,143:142='',<-1>,13:0]
[#-1,144:143='',<-1>,13:0]
[#-1,145:144='',<-1>,13:0]
[#-1,146:145='',<-1>,13:0]
[#-1,147:146='',<-1>,13:0]
[#-1,148:147='',<-1>,13:0]
[#-1,149:148='',<-1>,13:0]
[#-1,150:149='',<-1>,13:0]
[#-1,151:150='',<-1>,13:0]
[#-1,152:151='',<-1>,13:0]
[#-1,153:152='',<-1>,13:0]
[#-1,154:153='',<-1>,13:0]
[#-1,155:154='',<-1>,13:0]
[#-1,156:155='',<-1>,13:0]
[#-1,157:156='',<-1>,13:0]
[#-1,158:157='',<-1>,13:0]
[#-1,159:158='',<-1>,13:0]
[#-1,160:159='',<-1>,13:0]
[#-1,161:160='<EOF>',<-1>,13:0]
.
.
.
.
We see that although I am able to traverse through all the tokens till EOF, I unable to get the actual content or type of the tokens. I would like to know if there is a neat way of doing this using listener traversing.
Hard to be certain, but
tok = tokSrc.nextToken() ;
appears to be rerunning the lexer, starting at a presumed proper token boundary, but without having reset the lexer. The lexer throwing errors might explain the observed behavior.
Still, a better approach would be to simply recover the existing Token stream:
public class Walker implements YourJavaListener {
CommonTokenStream tokens;
public Walker(JavaParser parser) {
tokens = (CommonTokenStream) parser.getTokenStream()
}
then access the stream to get the desired tokens:
#Override
public void enterSemicolon(JavaParser.SemicolonContext ctx) {
TerminalNode semi = ctx.semicolon(); // adjust as needed for your impl.
Token tok = semi.getSymbol();
int idx = tok.getTokenIndex();
while(tok.getType() != IntStream.EOF) {
System.out.println(tok);
tok = tokens.get(idx++);
}
}
An entirely different approach that might serve your ultimate purpose is to get a limited set of tokens directly from the parent context:
ParserRuleContext pctx = ctx.getParent();
List<TerminalNode> nodes = pctx.getTokens(pctx.getStart(), pctx.getStop());

Using RapidMiner Textprocessing plugin in Java - Not able to use 'Document' object in the code

I am using RapidMiner 5. I want to make a text preprocessing module to use with a categorization system. I created a process in RapidMiner with these steps.
Tokenize
Transform Case
Stemming
Filtering stopwords
Generating n-grams
I want to write a script to do spell correction for these words. So, I used 'Execute Script' operator and wrote a groovy script for doing this (from here- raelcunha). This is the code ( helped by RapidMiner community) I wrote in execute Script operator of rapid miner.
Document doc=input[0]
List<Token> newTokens = new LinkedList<Token>();
nWords=train("set2.txt")
for (Token token : doc.getTokenSequence()) {
//String output=correct((String)token.getToken(),nWords)
println token.getToken();
Token nToken = new Token(correct("garbge",nWords), token);
newTokens.add(nToken);
}
doc.setTokenSequence(newTokens);
return doc;
This is the code for spell correction. ( Thanks to Norvig.)
import com.rapidminer.operator.text.Document;
import com.rapidminer.operator.text.Token;
import java.util.List;
import java.util.LinkedList;
def train(f){
def n = [:]
new File(f).eachLine{it.toLowerCase().eachMatch(/\w+/){n[it]=n[it]?n[it]+1:1}}
n
}
def edits(word) {
def result = [], n = word.length()-1
for(i in 0..n) result.add(word[0..<i] + word.substring(i+1))
for(i in 0..n-1) result.add(word[0..<i] + word[i+1] + word[i, i+1] + word.substring(i+2))
for(i in 0..n) for(c in 'a'..'z') result.add(word[0..<i] + c + word.substring(i+1))
for(i in 0..n) for(c in 'a'..'z') result.add(word[0..<i] + c + word.substring(i))
result
}
def correct(word, nWords) {
if(nWords[word]) return word
def list = edits(word), candidates = [:]
for(s in list) if(nWords[s]) candidates[nWords[s]] = s
if(candidates.size() > 0) return candidates[candidates.keySet().max()]
for(s in list) for(w in edits(s)) if(nWords[w]) candidates[nWords[w]] = w
return candidates.size() > 0 ? candidates[candidates.keySet().max()] : word
}
I am getting String index out of bounds exception while calling edits method.
And, I do not know how to debug this because rapidminer just tells me that there is an issue in the Execute Script operator and not saying which line of script caused this issue.
So, I am planning to do the same thing by creating an operator in Java as mentioned here-How to extend RapidMiner
The things I did:
Included all jar files from RapidMiner Lib folder , (C:\Program Files (x86)\Rapid-I\RapidMiner5\lib ) into the build path of my java project.
Started coding using the same guide the link to which is given above.
Input for my operator is a Document ( com.rapidminer.operator.text.Document)
as in the script.
But, I am not able to use this Document object in this code. Can you tell me why? Where are the text processing jars located?
For using the plugin jars, should we add some other locations to the BuildPath?

Categories

Resources