Antlr: Decision can match with multiple alternatives (starting with an illegal token?) - java

I have a grammar in Antlr to parse the format of a file I save. I broke the grammar down to the part that is not working and I hope someone can clarify. Here is the grammar:
grammar OptFile;
parseFile returns [java.util.List<java.util.List<java.util.List<String>>> list] :
{ list = new java.util.ArrayList<List<List<String>>>(); }
vc=VARIABLESCAPTION v=variables oc=OBJECTIVECAPTION o=objective
{ list.add($v.list); list.add($o.list); }
;
variables returns [java.util.List<java.util.List<String>> list] :
{ list = new java.util.ArrayList<List<String>>(); }
(v=variable { list.add($v.list); } )*
;
variable returns [java.util.List<String> list] :
{ list = new java.util.ArrayList<String>(); }
n=characters ';' t=characters ';' lb=characters ';' ub=characters ';'
{ list = new java.util.ArrayList(); list.add($n.string); list.add($t.string); list.add($lb.string); list.add($ub.string); }
;
objective returns [java.util.List<String> list] :
{ list = new java.util.ArrayList<String>(); }
t=characters ';' { list.add($t.string); }
(
'PIECEWISE;' pw=piecewisefunction { list.add($pw.string); }
| 'REGULAR;' rf=characters ';' { list.add($rf.string); }
);
piecewisefunction returns [String string] :
( characters ';' characters ';' characters ';' characters ';' )*
{ string = getText(); }
;
characters returns [String string] :
( ~(';') )* { string = getText(); }
;
VARIABLESCAPTION : '--Variables:--' ;
OBJECTIVECAPTION : '--ObjectiveFunction:--' ;
A valid input shall look like one this:
--Variables--x;INTEGER;0;INFTY;y;CONTINUOUS;-12;13;--ObjectiveFunction--MAX;13x^27+SIN(y);
or like this
--Variables--x;INTEGER;12;20;--ObjectiveFunction--MAX;x;12;x;16;0,5x;16;x;20;
After '--Variables--' can be arbitrary many variables with four fields each, after '--ObjectiveFunction--' is one field and then either one more field or arbitrary many "packs" of four fields.
Apparently, when compiling with Antlr, I get the following error:
warning(200): OptFile.g:26:37:
Decision can match input such as "OBJECTIVECAPTION {OBJECTIVECAPTION..VARIABLESCAPTION, 'PIECEWISE;'..'REGULAR;'} ';' 'PIECEWISE;' {OBJECTIVECAPTION..VARIABLESCAPTION, 'PIECEWISE;'..'REGULAR;'} ';' {OBJECTIVECAPTION..VARIABLESCAPTION, 'PIECEWISE;'..'REGULAR;'} ';' {OBJECTIVECAPTION..VARIABLESCAPTION, 'PIECEWISE;'..'REGULAR;'} ';' OBJECTIVECAPTION ';' 'PIECEWISE;'" using multiple alternatives: 1,2
As a result, alternative(s) 2 were disabled for that input
My questions now are:
How can the input even start with OBJECTIVECAPTION, as far as I understand, the input for my grammar has to start with VARIABLESCAPTION.
What do I need to change, to get this grammar running?

The error message may be a bit cryptic, but the problem is in production variables, it defines zero-or-more occurrences of variable. A variable can begin with the input shown in the error message, but variables can also be followed by the same input, that occurs in its invocation environment. Thus there is a problem deciding between continuation in variables (alternative 1) and completing it (alternative 2).
So the error message does not refer to the complete input, but to an input fragment that is going to be matched by variables. The line number shown should point you to the production that presents the problem.
For fixing it, you could introduce a delimiter for the list, such that it becomes clear when to stop collecting more occurrences of variable, e.g.
parseFile : VARIABLESCAPTION variables '.' OBJECTIVECAPTION objective ;
EDIT by Asker:
I tried the approach and it works great, but only if the dot that is used as seperation symbol is added to the list of characters that have to be ignored, i.e. the code line for characters has to be modified:
characters : ( ~(';' | '.') )*;
After that, it works just fine.

Related

how to check for brackets outside two quotes in a String

i have for instance a String : if('{' == '{'){.
i would want it to not detect the brackets inside the ' '
but i do want it to detect the one at the end which is not inside the quotes
what the best way of doing i tried to loop through all chars but is there a quicker method?
You can simply remove those braces altogether or replace them with something meaningful. To just remove those '{' and '}' characters from the string then you could do it like this using the String#replaceAll() method which accepts a Regular Expression (regex), for example:
// Original String:
String strg1 = "if('{' == '{'){";
/* Apply change and place into a new string (strg2)
so to keep old string original (in strg1). */
String strg2 = strg1.replaceAll("'\\{'|'\\}'", "");
// Display the new String strg2:
System.out.println(strg2);
The Console Window will display:
if( == ){
The regex "'\\{'|'\\}'" used above basically does the following:
Replace All instances of '{' OR | All instances of '}' within the string with Null String "" (nothing). Within the Expression, the curly braces are escaped \\{ because they have special meaning within a Regular Expression and by escaping those characters informs regex processing that we mean literal character.
If you want to replace those '{' and '}' character combinations with something meaningful like specific values then you can replace those items separately, for example:
// Original String:
String strg1 = "if('{' == '}'){";
int value1 = 10; // An open brace ( '{' ) will be converted to this value
int value2 = 24; // A close brace ( '}' ) will be converted to this value
/* Apply change and place into a new string (strg2)
so to keep old string original (in strg1). */
String strg2 = strg1.replaceAll("'\\{'", String.valueOf(value1))
.replaceAll("'\\}'", String.valueOf(value2));
// Display the new String strg2:
System.out.println(strg2);
The console window will display:
if(10 == 24){
In the above code you will of noticed that the String#valueOf() method was used within the replacement section of the String#replaceAll() method. This is because the replacement values we want to use were declared as int data types and the replaceAll() method requires String as a replcement argument therefore we take those int values and convert them to String type.

How can I find all "((" and replace them with "("?

I'm given a string, and I want to replace all open parenthesis that occur in succession, with a single one
((5)) → (5)
((((5)))) → (5)
I tried
str = str.replaceAll("((", "(");
and got regex patttern error
then i tried
str = str.replaceAll("\\((", "(");
then i tried
str = str.replaceAll("\\\\((", "(");
I keep getting the same error!
have you tried this?
str = str.replaceAll("\\({2,}", "(");
The '\' is the escape character, so every special character must be proceeded by it. Without them, regex reads it as an open parentheses used for grouping and expects a closed parentheses.
Edit: Originally, I thought he was trying to match exactly 2
You need to escape each parenthesis and add + to account for successive occurrences:
str = str.replaceAll("\\(\\(+","(");
Assuming the parentheses don't need to be paired, e.g. ((((5)) should become (5), then the following will do:
str = str.replaceAll("([()])\\1+", "$1");
Test
for (String str : new String[] { "(5)", "((5))", "((((5))))", "((((5))" }) {
str = str.replaceAll("([()])\\1+", "$1");
System.out.println(str);
}
Output
(5)
(5)
(5)
(5)
Explanation
( Start capture group
[()] Match a '(' or a ')'. In a character class, '(' and ')'
has no special meaning, so they don't need to be escaped
) End capture group, i.e. capture the matched '(' or ')'
\1+ Match 1 or more of the text from capture group #1. As a
Java string literal, the `\` was escaped (doubled)
$1 Replace with the text from capture group #1
See also regex101.com for demo.
I am not sure if the brackets are fixed or dynamic but assuming they may be dynamic what you could do here is use replaceAll and then use String.Format to format the string.
Hope it helps
public class HelloWorld{
public static void main(String []args){
String str = "((((5))))";
String abc = str.replaceAll("\\(", "").replaceAll("\\)","");
abc = String.format("(%s)", abc);
System.out.println(abc);
}
}
Output: (5)
I have tried the above code with ((5)) and (((5))) and it produces the same output.

ANTLR4 JAVA -Is it possible to extract fragments from the lexer at the Parser Listener point?

I have a Lexer Rule as follows:
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
SUFFIX : [ab];
TCHAN : PREFIX EXTRA? DIGIT+ SUFFIX?;
and a parser rule:
tpin : TCHAN
;
In the exit_tpin() Listiner method, is there a syntax where I can extract the DIGIT component of the token? Right now I can get the ctx.TCHAN() element, but this is a string. I just want the digit portion of TCHAN.
Or should I remove TCHAN as a TOKEN and move that rule to be tpin (i.e)
tpin : PREFIX EXTRA? DIGIT+ SUFFIX?
Where I know how to extract DIGIT from the listener.
My guess is that by the time the TOKEN is presented to the parser it is too late to deconstruct it... but I was wondering if some ANTLR guru's out there knew of a technique.
If I re-write my TOKENIZER, there is a possiblity that TCHAN tokens will be missed for INT/ID tokens (I think thats why I ended up parsing as I do).
I can always do some regexp work in the listener method... but that seemed like bad form ... as I had the individual components earlier. I'm just lazy, and was wondering if a techniqe other than refactoring the parsing grammar was possible.
In The Definitive ANTLR Reference you can find examples of complex lexers where much of the work is done. But when learning ANTLR, I would advise to consider the lexer mostly for its splitting function of the input stream into small tokens. Then do the big work in the parser. In the present case I would do :
grammar Question;
/* extract digit */
question
: tpin EOF
;
tpin
// : PREFIX EXTRA? DIGIT+ SUFFIX?
// {System.out.println("The only useful information is " + $DIGIT.text);}
: PREFIX EXTRA? number SUFFIX?
{System.out.println("The only useful information is " + $number.text);}
;
number
: DIGIT+
;
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
DIGIT : [0-9] ;
SUFFIX : [ab];
WS : [ \t\r\n]+ -> skip ;
Say the input is d_xyz123456b. With the first version
: PREFIX EXTRA? DIGIT+ SUFFIX?
you get
$ grun Question question -tokens data.txt
[#0,0:1='d_',<PREFIX>,1:0]
[#1,2:4='xyz',<EXTRA>,1:2]
[#2,5:5='1',<DIGIT>,1:5]
[#3,6:6='2',<DIGIT>,1:6]
[#4,7:7='3',<DIGIT>,1:7]
[#5,8:8='4',<DIGIT>,1:8]
[#6,9:9='5',<DIGIT>,1:9]
[#7,10:10='6',<DIGIT>,1:10]
[#8,11:11='b',<SUFFIX>,1:11]
[#9,13:12='<EOF>',<EOF>,2:0]
The only useful information is 6
Because the parsing of DIGIT+ translates to a loop which reuses DIGIT
setState(12);
_errHandler.sync(this);
_la = _input.LA(1);
do {
{
{
setState(11);
((TpinContext)_localctx).DIGIT = match(DIGIT);
}
}
setState(14);
_errHandler.sync(this);
_la = _input.LA(1);
} while ( _la==DIGIT );
and $DIGIT.text translates to ((TpinContext)_localctx).DIGIT.getText(), only the last digit is retained. That's why I define a subrule number
: PREFIX EXTRA? number SUFFIX?
which makes it easy to capture the value :
[#0,0:1='d_',<PREFIX>,1:0]
[#1,2:4='xyz',<EXTRA>,1:2]
[#2,5:5='1',<DIGIT>,1:5]
[#3,6:6='2',<DIGIT>,1:6]
[#4,7:7='3',<DIGIT>,1:7]
[#5,8:8='4',<DIGIT>,1:8]
[#6,9:9='5',<DIGIT>,1:9]
[#7,10:10='6',<DIGIT>,1:10]
[#8,11:11='b',<SUFFIX>,1:11]
[#9,13:12='<EOF>',<EOF>,2:0]
The only useful information is 123456
You can even make it simpler :
tpin
: PREFIX EXTRA? INT SUFFIX?
{System.out.println("The only useful information is " + $INT.text);}
;
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
INT : [0-9]+ ;
SUFFIX : [ab];
WS : [ \t\r\n]+ -> skip ;
$ grun Question question -tokens data.txt
[#0,0:1='d_',<PREFIX>,1:0]
[#1,2:4='xyz',<EXTRA>,1:2]
[#2,5:10='123456',<INT>,1:5]
[#3,11:11='b',<SUFFIX>,1:11]
[#4,13:12='<EOF>',<EOF>,2:0]
The only useful information is 123456
In the listener you have a direct access to these values through the rule context TpinContext :
public static class TpinContext extends ParserRuleContext {
public Token INT;
public TerminalNode PREFIX() { return getToken(QuestionParser.PREFIX, 0); }
public TerminalNode INT() { return getToken(QuestionParser.INT, 0); }
public TerminalNode EXTRA() { return getToken(QuestionParser.EXTRA, 0); }
public TerminalNode SUFFIX() { return getToken(QuestionParser.SUFFIX, 0); }

ANTLR: parse configuration file

I'm missing some basic knowledge. Started playing around with ATLR today missing any source telling me how to do the following:
I'd like to parse a configuration file a program of mine currently reads in a very ugly way. Basically it looks like:
A [Data] [Data]
B [Data] [Data] [Data]
where A/B/... are objects with their associated data following (dynamic amount, only simple digits).
A grammar should not be that hard but how to use ANTLR now?
lexer only: A/B are tokens and I ask for the tokens he read. How to ask this and how to detect malformatted input?
lexer & parser: A/B are parser rules and... how to know the parser processed successfully A/B? The same object could appear multiple times in the file and I need to consider every single one. It's more like listing instances in the config file.
Edit:
My problem is not the grammer but how to get informed by parser/lexer what they actually found/parsed? Best would be: invoke a function upon recognition of a rule like recursive descent
ANTLR production rules can have return value(s) you can use to get the contents of your configuration file.
Here's a quick demo:
grammar T;
parse returns [java.util.Map<String, List<Integer>> map]
#init{$map = new java.util.HashMap<String, List<Integer>>();}
: (line {$map.put($line.key, $line.values);} )+ EOF
;
line returns [String key, List<Integer> values]
: Id numbers (NL | EOF)
{
$key = $Id.text;
$values = $numbers.list;
}
;
numbers returns [List<Integer> list]
#init{$list = new ArrayList<Integer>();}
: (Num {$list.add(Integer.parseInt($Num.text));} )+
;
Num : '0'..'9'+;
Id : ('a'..'z' | 'A'..'Z')+;
NL : '\r'? '\n' | '\r';
Space : (' ' | '\t')+ {skip();};
If you runt the class below:
import org.antlr.runtime.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
String input = "A 12 34\n" +
"B 5 6 7 8\n" +
"C 9";
TLexer lexer = new TLexer(new ANTLRStringStream(input));
TParser parser = new TParser(new CommonTokenStream(lexer));
Map<String, List<Integer>> values = parser.parse();
System.out.println(values);
}
}
the following will be printed to the console:
{A=[12, 34], B=[5, 6, 7, 8], C=[9]}
The grammar should be something like this (it's pseudocode not ANTLR):
FILE ::= STATEMENT ('\n' STATEMENT)*
STATEMENT ::= NAME ITEM*
ITEM = '[' \d+ ']'
NAME = \w+
If you are looking for way to execute code when something is parsed, you should either use actions or AST (look them up in the documentation).

Simple ANTLR error

I'm starting with ANTLR, but I get some errors and I really don't understand why.
Here you have my really simple grammar
grammar Expr;
options {backtrack=true;}
#header {}
#members {}
expr returns [String s]
: (LETTER SPACE DIGIT | TKDC) {$s = $DIGIT.text + $TKDC.text;}
;
// TOKENS
SPACE : ' ' ;
LETTER : 'd' ;
DIGIT : '0'..'9' ;
TKDC returns [String s] : 'd' SPACE 'C' {$s = "d C";} ;
This is the JAVA source, where I only ask for the "expr" result:
import org.antlr.runtime.*;
class Testantlr {
public static void main(String[] args) throws Exception {
ExprLexer lex = new ExprLexer(new ANTLRFileStream(args[0]));
CommonTokenStream tokens = new CommonTokenStream(lex);
ExprParser parser = new ExprParser(tokens);
try {
System.out.println(parser.expr());
} catch (RecognitionException e) {
e.printStackTrace();
}
}
}
The problem comes when my input file has the following content d 9.
I get the following error:
x line 1:2 mismatched character '9' expecting 'C'
x line 1:3 no viable alternative at input '<EOF>'
Does anyone knwos the problem here?
There are a few things wrong with your grammar:
lexer rules can only return Tokens, so returns [String s] is ignored after TKDC;
backtrack=true in your options section does not apply to lexer rules, that is why you get mismatched character '9' expecting 'C' (no backtracking there!);
the contents of your expr rule: (LETTER SPACE DIGIT | TKDC) {$s = $DIGIT.text + $TKDC.text;} doesn't make much sense (to me). You either want to match LETTER SPACE DIGIT or TKDC, yet you're trying to grab the text of both choices: $DIGIT.text and $TKDC.text.
It looks to me TKDC needs to be "promoted" to a parser rule instead.
I think you dumbed down your example a bit too much to illustrate the problem you were facing. Perhaps it's a better idea to explain your actual problem instead: what are you trying to parse exactly?

Categories

Resources