Antlr pattern matching and lexer modes

Antlr pattern matching and lexer modes - java

I am trying to compile a pattern for html grammar. The code below shows how to parse a string containing htmlAttributeRule:
String code = "href=\"val\"";
CharStream chars = CharStreams.fromString(code);
Lexer lexer = new HTMLLexer(chars);
lexer.pushMode(HTMLLexer.TAG);
TokenStream tokens = new CommonTokenStream(lexer);
HTMLParser parser = new HTMLParser(tokens);
parser.htmlAttribute();
But when i'm trying to:
ParseTreePatternMatcher matcher = new ParseTreePatternMatcher(lexer, parser);
matcher.compile(code, HTMLParser.RULE_htmlAttribute);
it fails with error:
line 1:0 no viable alternative at input 'href="val"'
org.antlr.v4.runtime.NoViableAltException
at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2026)
at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:467)
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:393)
at org.antlr.v4.runtime.ParserInterpreter.visitDecisionState(ParserInterpreter.java:316)
at org.antlr.v4.runtime.ParserInterpreter.visitState(ParserInterpreter.java:223)
at org.antlr.v4.runtime.ParserInterpreter.parse(ParserInterpreter.java:194)
at org.antlr.v4.runtime.tree.pattern.ParseTreePatternMatcher.compile(ParseTreePatternMatcher.java:205)
When i tried to:
List<? extends Token> tokenList = matcher.tokenize(code);
The result contained a single token, the same as when using the lexer with DEFAULT_MODE. Is there some way to fix this?

The problem was the following code from ParseTreePatternMatcher::tokenize:
TextChunk textChunk = (TextChunk)chunk;
ANTLRInputStream in = new ANTLRInputStream(textChunk.getText());
lexer.setInputStream(in);
Token t = lexer.nextToken();
Lexer::setInputStream clears _modeStack and sets _mode to 0. One possible solution is to extend ParseTreePatternMatcher, override method tokenize and insert lexer.pushMode(lexerMode) after lexer.setInputStream(in):
TextChunk textChunk = (TextChunk)chunk;
ANTLRInputStream in = new ANTLRInputStream(textChunk.getText());
lexer.setInputStream(in);
lexer.pushMode(lexerMode);
Token t = lexer.nextToken();
But method tokenize uses Chunk and TextChunk which cannot be accesses from outsize package, so we are obligated to define the extension class in the same package as ParseTreePatternMatcher.
Another solution i'm considering is to modify byte code of the method using ASM.

Related

Parse a file using ANTLR4

I have a Lexer and a Parser called y86 Lexer and Parser which work as far as I know. But I have a file with y86 commands and I want to parse them using Java. So far I have code as follows.
y86Lexer y86 = null;
CommonTokenStream tokenStream = null;
y86Parser y86p = null;
try
{
y86 = new y86Lexer(CharStreams.fromFileName("C:\\Users\\saigbomian\\Documents"
+ "\\LearnANTLR\\src\\sum.ys"));
tokenStream = new CommonTokenStream(y86);
y86p = new y86Parser(tokenStream);
}
catch (IOException e)
{
log.error("Error occured while reading from file");
e.printStackTrace();
}
I'm not sure how to do the parsing. I have seen people use something like y86Parser.CompilationUnitContext but I can seem to find that class. I have tried printing from the Listeners antlr creates but I don't know how to trigger these listeners

For each rule ruleName in your grammar, the y86Parser class will contain a class named RuleNameContext and a method named ruleName(), which will parse the input according to that rule and return an instance of the RuleNameContext class containing the parse tree. You can then use listeners or visitors to walk that parse tree.
So if you don't have a compilationUnit method or a CompilationUnitContext class, your grammar probably just doesn't have a rule named compilationUnit. Instead you should pick a rule that you do have and call the method corresponding to that rule.

ANTLR v4, JavaLexer and JavaParser returning null as parse tree

I am using antlr v4 for extracting parse tree of java programs for other purposes. I have started from this sample: ANTLR v4 visitor sample
And I have tested the steps on given link to check if it works and everything gone right:
java Run
a = 1+2
b = a^2
c = a+b*(a-1)
a+b+c
^Z
Result: 33.0
And then I wrote my own to parse java programs as Structure below:
|_Java.g4
|_Java.tokens
|_JavaBaseVisitor.java
|_JavaLexer.java
|_JavaLexer.tokens
|_JavaParser.java
|_JavaTreeExtractorVisitor.java
|_JavaVisitor.java
|_Run.java
And the Run.java is as below:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Run {
public static void main(String[] args) throws Exception {
CharStream input = CharStreams.fromFileName("F:\\Projects\\Java\\Netbeans\\ASTProj\\JavaTreeExtractor\\prog.java");
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
ParseTree tree = parser.getContext();
JavaTreeExtractorVisitor calcVisitor = new JavaTreeExtractorVisitor();
String result = calcVisitor.visit(tree);
System.out.println("Result: " + result);
}
}
But at the statement ParseTree tree = parser.getContext(); the tree object gets null.
As I am new to antlr, any suggestions for me to check or any solution?
(If more info is required, just notify me).
TG.

Assuming you're using the grammar here, you want the starting point for parsing a Java file to be
ParseTree tree = parser.compilationUnit();
(For anyone not using that grammar, you want whatever you named your top-level parser rule.)

Shouldn't you be doing:
ParseTree tree = parser.input();
as in the calculator example?

Antlr4 CommonTokenStream Constructor

I took a source code from ANTLR4 Doc site.
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
JavaParser.CompilationUnitContext tree = parser.compilationUnit();// parse a compilationUnit
But
new CommonTokenStream(lexer)
is problematic. Because there is no constructor for new CommonTokenStream(lexer). There are constructors new CommonTokenStream() and new CommonTokenStream(TokenStream) but many examples in the internet use that constructor like given in the code which is Antlr4 Doc site's example.
Also there is no constructor for JavaParser(CommonTokenStream).
Thanks

There are constructors new CommonTokenStream() and new CommonTokenStream(TokenStream)
No, there 2 constructors in CommonTokenStream:
CommonTokenStream(TokenSource tokenSource)
CommonTokenStream(TokenSource tokenSource, int channel)
not the two you mention. See: http://www.antlr.org/api/Java/org/antlr/v4/runtime/CommonTokenStream.html
Because there is no constructor for new CommonTokenStream(lexer).
Every generated lexer extends ANTLR4's Lexer class, which implements TokenSource, so doing new CommonTokenStream(lexer) will be just fine.

Csv: search for String and replace with another string

I have a .csv file that contains:
scenario, custom, master_data
1, ${CUSTOM}, A_1
I have a string:
a, b, c
and I want to replace 'custom' with 'a, b, c'. How can I do that and save to the existing .csv file?

Probably the easiest way is to read in one file and output to another file as you go, modifying it on a per-line basis
You could try something with tokenizers, this may not be completely correct for your output/input, but you can adapt it to your CSV file formatting
BufferedReader reader = new BufferedReader(new FileReader("input.csv"));
BufferedWriter writer = new BufferedWriter(new FileWriter("output.csv"));
String custom = "custom";
String replace = "a, b, c";
for(String line = reader.readLine(); line != null; line = reader.readLine())
{
String output = "";
StringTokenizer tokenizer = new StringTokenizer(line, ",");
for(String token = tokenizer.nextToken(); tokenizer.hasMoreTokens(); token = tokenizer.nextToken())
if(token.equals(custom)
output = "," + replace;
else
output = "," + token;
}
readInventory.close();
If this is for a one off thing, it also has the benefit of not having to research regular expressions (which are quite powerful and useful, good to know, but maybe for a later date?)

Have a look at Can you recommend a Java library for reading (and possibly writing) CSV files?
And once the values have been read, search for strings / value that start with ${ and end with }. Use Java Regular Expressions like \$\{(\w)\}. Then use some map for looking up the found key, and the related value. Java Properties would be a good candidate.
Then write a new csv file.

Since your replacement string is quite unique you can do it quickly without complicated parsing by just reading your file into a buffer, and then converting that buffer into a string. Replace all occurrences of the text you wish to replace with your target text. Then convert the string to a buffer and write that back to the file...
Pattern.quote is required because your string is a regular expression. If you don't quote it you may run into unexpected results.
Also it's generally not smart to overwrite your source file. Best is to create a new file then delete the old and rename the new to the old. Any error halfway will then not delete all your data.
final Path yourPath = Paths.get("Your path");
byte[] buff = Files.readAllBytes(yourPath);
String s = new String(buff, Charset.defaultCharset());
s = s.replaceAll(Pattern.quote("${CUSTOM}"), "a, b, c");
Files.write(yourPath, s.getBytes());

ANTLR:Translate the modified AST to java source code by stringtemplate

I use the grammar Java.g from the ANTLR wiki produces a lexer and parser for Java source files.Then use the following code to generate an abstract syntax tree (AST).
ANTLRInputStream input = new ANTLRInputStream(new FileInputStream(fileName));
JavaLexer lexer = new JavaLexer(input); // create lexer
// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens); // create parser
JavaParser.javaSource_return r = parser.javaSource(); // parse rule 'javaSource'
/*RuleReturnScope result = parser.compilationUnit();
CommonTree t = (CommonTree) result.getTree();*/
// WALK TREE
// get the tree from the return structure for rule prog
CommonTree t = (CommonTree)r.getTree();
Then modify the AST. For example,replace "File file = new File(filepath, fileType);" to
"S3Object _file = new S3Object(_fileName);" by modify the AST node. After this,I want to translate this AST to java source code.I modify the JavaTreeParser.g and write a stringtemplate and use the following method to get the java source code:
FileReader groupFileR = new FileReader("src/com/googlecode/zcg/templates/JavaTemplate.stg");
StringTemplateGroup templates = new StringTemplateGroup(groupFileR);
groupFileR.close();
// create a stream of tree nodes from AST built by parser
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t);
// tell it where it can find the token objects
nodes.setTokenStream(tokens);
JavaTreeParser walker = new JavaTreeParser(nodes); // create the tree Walker
walker.setTemplateLib(templates); // where to find templates
// invoke rule prog, passing in information from parser
JavaTreeParser.javaSource_return r2 = walker.javaSource();
// EMIT BYTE CODES
// get template from return values struct
StringTemplate output = (StringTemplate)r2.getTemplate();
System.out.println(output.toString()); // render full template
If I don't modify the AST,it will get the java source code correctly,but after I modify the AST,it doesn't get the right java source code(the AST was modified correctly).For example,if I input the following souce code,and translate to AST,then modify "File file = new File(filepath, fileType);" to "S3Object _file = new S3Object(_fileName);":
public void methodname(String address){
String filepath = "file";
int fileType = 3;
File file = new File(filepath, fileType);
}
the result will be the following:
public void methodname( String address)
{
String filepath="file";
int fileType=3;
methodname (Stringaddress){Stringfilepath;//it's not what I wanted
}
Am I doing it wrong? Is there a more proper way for me to solve this problem?

unfortunately I cannot recommend doing source to source translation by rewriting the abstract syntax trees; try using the parse trees. If I remember ANTLR 3 can also generate those easily.
Ter

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Antlr pattern matching and lexer modes - java

Related

Parse a file using ANTLR4

ANTLR v4, JavaLexer and JavaParser returning null as parse tree

Antlr4 CommonTokenStream Constructor

Csv: search for String and replace with another string

ANTLR:Translate the modified AST to java source code by stringtemplate

Categories

Resources