I use the grammar Java.g from the ANTLR wiki produces a lexer and parser for Java source files.Then use the following code to generate an abstract syntax tree (AST).
ANTLRInputStream input = new ANTLRInputStream(new FileInputStream(fileName));
JavaLexer lexer = new JavaLexer(input); // create lexer
// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens); // create parser
JavaParser.javaSource_return r = parser.javaSource(); // parse rule 'javaSource'
/*RuleReturnScope result = parser.compilationUnit();
CommonTree t = (CommonTree) result.getTree();*/
// WALK TREE
// get the tree from the return structure for rule prog
CommonTree t = (CommonTree)r.getTree();
Then modify the AST. For example,replace "File file = new File(filepath, fileType);" to
"S3Object _file = new S3Object(_fileName);" by modify the AST node. After this,I want to translate this AST to java source code.I modify the JavaTreeParser.g and write a stringtemplate and use the following method to get the java source code:
FileReader groupFileR = new FileReader("src/com/googlecode/zcg/templates/JavaTemplate.stg");
StringTemplateGroup templates = new StringTemplateGroup(groupFileR);
groupFileR.close();
// create a stream of tree nodes from AST built by parser
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
// tell it where it can find the token objects
nodes.setTokenStream(tokens);
JavaTreeParser walker = new JavaTreeParser(nodes); // create the tree Walker
walker.setTemplateLib(templates); // where to find templates
// invoke rule prog, passing in information from parser
JavaTreeParser.javaSource_return r2 = walker.javaSource();
// EMIT BYTE CODES
// get template from return values struct
StringTemplate output = (StringTemplate)r2.getTemplate();
System.out.println(output.toString()); // render full template
If I don't modify the AST,it will get the java source code correctly,but after I modify the AST,it doesn't get the right java source code(the AST was modified correctly).For example,if I input the following souce code,and translate to AST,then modify "File file = new File(filepath, fileType);" to "S3Object _file = new S3Object(_fileName);":
public void methodname(String address){
String filepath = "file";
int fileType = 3;
File file = new File(filepath, fileType);
}
the result will be the following:
public void methodname( String address)
{
String filepath="file";
int fileType=3;
methodname (Stringaddress){Stringfilepath;//it's not what I wanted
}
Am I doing it wrong? Is there a more proper way for me to solve this problem?
unfortunately I cannot recommend doing source to source translation by rewriting the abstract syntax trees; try using the parse trees. If I remember ANTLR 3 can also generate those easily.
Ter
Related
I am trying to compile a pattern for html grammar. The code below shows how to parse a string containing htmlAttributeRule:
String code = "href=\"val\"";
CharStream chars = CharStreams.fromString(code);
Lexer lexer = new HTMLLexer(chars);
lexer.pushMode(HTMLLexer.TAG);
TokenStream tokens = new CommonTokenStream(lexer);
HTMLParser parser = new HTMLParser(tokens);
parser.htmlAttribute();
But when i'm trying to:
ParseTreePatternMatcher matcher = new ParseTreePatternMatcher(lexer, parser);
matcher.compile(code, HTMLParser.RULE_htmlAttribute);
it fails with error:
line 1:0 no viable alternative at input 'href="val"'
org.antlr.v4.runtime.NoViableAltException
at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2026)
at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:467)
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:393)
at org.antlr.v4.runtime.ParserInterpreter.visitDecisionState(ParserInterpreter.java:316)
at org.antlr.v4.runtime.ParserInterpreter.visitState(ParserInterpreter.java:223)
at org.antlr.v4.runtime.ParserInterpreter.parse(ParserInterpreter.java:194)
at org.antlr.v4.runtime.tree.pattern.ParseTreePatternMatcher.compile(ParseTreePatternMatcher.java:205)
When i tried to:
List<? extends Token> tokenList = matcher.tokenize(code);
The result contained a single token, the same as when using the lexer with DEFAULT_MODE. Is there some way to fix this?
The problem was the following code from ParseTreePatternMatcher::tokenize:
TextChunk textChunk = (TextChunk)chunk;
ANTLRInputStream in = new ANTLRInputStream(textChunk.getText());
lexer.setInputStream(in);
Token t = lexer.nextToken();
Lexer::setInputStream clears _modeStack and sets _mode to 0. One possible solution is to extend ParseTreePatternMatcher, override method tokenize and insert lexer.pushMode(lexerMode) after lexer.setInputStream(in):
TextChunk textChunk = (TextChunk)chunk;
ANTLRInputStream in = new ANTLRInputStream(textChunk.getText());
lexer.setInputStream(in);
lexer.pushMode(lexerMode);
Token t = lexer.nextToken();
But method tokenize uses Chunk and TextChunk which cannot be accesses from outsize package, so we are obligated to define the extension class in the same package as ParseTreePatternMatcher.
Another solution i'm considering is to modify byte code of the method using ASM.
I have a Lexer and a Parser called y86 Lexer and Parser which work as far as I know. But I have a file with y86 commands and I want to parse them using Java. So far I have code as follows.
y86Lexer y86 = null;
CommonTokenStream tokenStream = null;
y86Parser y86p = null;
try
{
y86 = new y86Lexer(CharStreams.fromFileName("C:\\Users\\saigbomian\\Documents"
+ "\\LearnANTLR\\src\\sum.ys"));
tokenStream = new CommonTokenStream(y86);
y86p = new y86Parser(tokenStream);
}
catch (IOException e)
{
log.error("Error occured while reading from file");
e.printStackTrace();
}
I'm not sure how to do the parsing. I have seen people use something like y86Parser.CompilationUnitContext but I can seem to find that class. I have tried printing from the Listeners antlr creates but I don't know how to trigger these listeners
For each rule ruleName in your grammar, the y86Parser class will contain a class named RuleNameContext and a method named ruleName(), which will parse the input according to that rule and return an instance of the RuleNameContext class containing the parse tree. You can then use listeners or visitors to walk that parse tree.
So if you don't have a compilationUnit method or a CompilationUnitContext class, your grammar probably just doesn't have a rule named compilationUnit. Instead you should pick a rule that you do have and call the method corresponding to that rule.
I am trying to read the contents of a PDF file using Java-Selenium. Below is my code. getWebDriver is a custom method in the framework. It returns the webdriver.
URL urlOfPdf = new URL(this.getWebDriver().getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());
PDFParser parser = new PDFParser((RandomAccessRead) fileToParse);
parser.parse();
String output = new PDFTextStripper().getText(parser.getPDDocument());
The second line of the code gives compile time error if I don't parse it to RandomAccessRead type.
And when I parse it, I get this run time error:
java.lang.ClassCastException: java.io.BufferedInputStream cannot be cast to org.apache.pdfbox.io.RandomAccessRead
I need help with getting rid of these errors.
First of, unless you want to interfere in the PDF loading process, there is no need to explicitly use the PdfParser class. You can instead use a static PDDocument.load method:
URL urlOfPdf = new URL(this.getWebDriver().getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());
PDDocument document = PDDocument.load(fileToParse);
String output = new PDFTextStripper().getText(document);
Otherwise, if you do want to interfere in the loading process, you have to create a RandomAccessRead instance for your BufferedInputStream, you cannot simply cast it because the classes are not related.
You can do that like this
URL urlOfPdf = new URL(this.getWebDriver().getCurrentUrl());
BufferedInputStream fileToParse = new BufferedInputStream(urlOfPdf.openStream());
MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMainMemoryOnly();
ScratchFile scratchFile = new ScratchFile(memUsageSetting);
PDFParser parser;
try
{
RandomAccessRead source = scratchFile.createBuffer(fileToParse);
parser = new PDFParser(source);
parser.parse();
}
catch (IOException ioe)
{
IOUtils.closeQuietly(scratchFile);
throw ioe;
}
String output = new PDFTextStripper().getText(parser.getPDDocument());
(This essentially is copied and pasted from the source of PDDocument.load.)
Antlr4 has a new class ParseTreeWalker. But how do I use it? I am looking for a minimal working example. My grammar file is 'gram.g4' and I want to parse a file 'program.txt'
Here is my code so far. (This assumes ANTLR has run my grammar file and created all of the gramBaseListener, gramLexer, etc etc):
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import static org.antlr.v4.runtime.CharStreams.fromFileName;
public class launch{
public static void main(String[] args) {
CharStream cs = fromFileName("gram.g4"); //load the file
gramLexer lexer = new gramLexer(cs); //instantiate a lexer
CommonTokenStream tokens = new CommonTokenStream(lexer); //scan stream for tokens
gramParser parser = new gramParser(tokens); //parse the tokens
// Now what?? How do I connect the above with the below?
ParseTreeWalker walker = new ParseTreeWalker(); // how do I use this to parse program.txt??
}}
I am using java but I assume it is similar in other languages.
The ANTLR documentation (http://www.antlr.org/api/Java/index.html) is short on examples. There are many tutorials on the internet but they are mostly for ANTLR version 3. The few using version 4 don't work or are outdated (for example, there is no parser.init() function, and classes like ANTLRInputStream are depreciated)
Thanks in advance for anyone who can help.
For each of your parser rules in your grammar the generated parser will have a corresponding method with that name. Calling that method will start parsing at that rule.
Therefore if your "root-rule" is named start then you'd start parsing via gramParser.start() which returns a ParseTree. This tree can then be fed into the ParseTreeWalker alongside with the listener you want to be using.
All in all it could look something like this (EDITED BY OP):
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import static org.antlr.v4.runtime.CharStreams.fromFileName;
public class launch{
public static void main(String[] args) {
CharStream cs = fromFileName("program.txt"); //load the file
gramLexer lexer = new gramLexer(cs); //instantiate a lexer
CommonTokenStream tokens = new CommonTokenStream(lexer); //scan stream for tokens
gramParser parser = new gramParser(tokens); //parse the tokens
ParseTree tree = parser.start(); // parse the content and get the tree
Mylistener listener = new Mylistener();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(listener,tree);
}}
************ NEW FILE Mylistener.java ************
public class Mylistener extends gramBaseListener {
#Override public void enterEveryRule(ParserRuleContext ctx) { //see gramBaseListener for allowed functions
System.out.println("rule entered: " + ctx.getText()); //code that executes per rule
}
}
Of course you have to replace <listener> with your implementation of BaseListener
And just one small sidenode: In Java it is convention to start classnames with capital letters and I'd advise you to stick to that in order for making the code more readable for other people.
This example should work with ANTLR 4.8.
Below the example you can find references to setup your Java env, API and Listeners.
public class Launch {
public static void main(String[] args) {
InputStream inputStream = null;
MyprogramLexer programLexer = null;
try {
File file = new File("/program.txt");
inputStream = new FileInputStream(file);
programLexer = new MyprogramLexer(CharStreams.fromStream(inputStream)); // read your program input and create lexer instance
} finally {
if (inputStream != null) {
inputStream.close();
}
}
/* assuming a basic grammar:
myProgramStart: TOKEN1 otherRule TOKEN2 ';' | TOKENX finalRule ';'
...
*/
CommonTokenStream tokens = new CommonTokenStream(programLexer); // get tokens
MyParser parser = new MyParser(tokens);
MyProgramListener listener = new MyProgramListener(); // your custom extension from BaseListener
parser.addParseListener(listener);
parser.myProgramStart().enterRule(listener); // myProgramStart is your grammar rule to parse
// what we had built?
MyProgram myProgramInstance = listener.getMyProgram(); // in your listener implementation populate a MyProgram instance
System.out.println(myProgramInstance.toString());
}
}
References:
https://www.antlr.org/api/Java/
https://tomassetti.me/antlr-mega-tutorial/#java-setup
https://riptutorial.com/antlr/example/16571/listener-events-using-labels
I am using antlr v4 for extracting parse tree of java programs for other purposes. I have started from this sample: ANTLR v4 visitor sample
And I have tested the steps on given link to check if it works and everything gone right:
java Run
a = 1+2
b = a^2
c = a+b*(a-1)
a+b+c
^Z
Result: 33.0
And then I wrote my own to parse java programs as Structure below:
|_Java.g4
|_Java.tokens
|_JavaBaseVisitor.java
|_JavaLexer.java
|_JavaLexer.tokens
|_JavaParser.java
|_JavaTreeExtractorVisitor.java
|_JavaVisitor.java
|_Run.java
And the Run.java is as below:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Run {
public static void main(String[] args) throws Exception {
CharStream input = CharStreams.fromFileName("F:\\Projects\\Java\\Netbeans\\ASTProj\\JavaTreeExtractor\\prog.java");
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
ParseTree tree = parser.getContext();
JavaTreeExtractorVisitor calcVisitor = new JavaTreeExtractorVisitor();
String result = calcVisitor.visit(tree);
System.out.println("Result: " + result);
}
}
But at the statement ParseTree tree = parser.getContext(); the tree object gets null.
As I am new to antlr, any suggestions for me to check or any solution?
(If more info is required, just notify me).
TG.
Assuming you're using the grammar here, you want the starting point for parsing a Java file to be
ParseTree tree = parser.compilationUnit();
(For anyone not using that grammar, you want whatever you named your top-level parser rule.)
Shouldn't you be doing:
ParseTree tree = parser.input();
as in the calculator example?