ANTLR4 no viable alternative; problem with spaces

ANTLR4 no viable alternative; problem with spaces - java

grammar Hello;
#parser::header {
import java.util.*;
}
#parser::members {
Map<Integer, String> map = new HashMap<Integer, String>();
int A = 0;
int B = 0;
int max = 0;
}
prog
#after {
List<Integer> msgs = new ArrayList<>(map.keySet());
Collections.sort(msgs);
for (int i=0; i< msgs.size(); i++){
System.out.println(map.get(msgs.get(i)));
}
System.out.println("Alice: "+ A +", Bob: "+B);
System.out.println(max);
}
: stat+;
stat: message NL | message;
message: h=T_NUM ':' m=T_NUM ':' s=T_NUM 'A' ':' T_MSG {
String msg = $h.getText() + ":" + $m.getText() + ":" + $s.getText() + " A: " + $T_MSG.getText();
int len = $T_MSG.getText().length();
if (len > max) max = len;
A++;
int id = Integer.parseInt($h.getText()) * 3600 + Integer.parseInt($m.getText()) * 60 + Integer.parseInt($s.getText());
map.put(id, msg);
} | h=T_NUM ':' m=T_NUM ':' s=T_NUM 'B' ':' T_MSG {
String msg = $h.getText() + ":" + $m.getText() + ":" + $s.getText() + " A: " + $T_MSG.getText();
int len = $T_MSG.getText().length();
if (len > max) max = len;
B++;
int id = Integer.parseInt($h.getText()) * 3600 + Integer.parseInt($m.getText()) * 60 + Integer.parseInt($s.getText());
map.put(id, msg);
};
T_NUM: [0-9][0-9];
T_MSG: [A-Za-z0-9.,!? ]+;
NL: [\n]+;
WS : [ \t\r]+ -> skip ; // skip spaces, tabs, newlines
Hello! So I have a task to write grammar and parser in ANTLR4 which recognizes this kind of input:
00:10:11 A: Message 1
23:12:12 B: Message 5
11:12:13 A: Message 2
12:21:12 B: Message 4
11:12:15 A: Message 3
and as an output, it has to sort out messages by time. Now my problem is with spaces. I want to be able to recognize spaces in messages but I get an error:
line 1:6 no viable alternative at input '00:10:11 A'
Alice: 0, Bob: 0
0
When I remove space from the T_MSG token and obviously input it works. But I don't know how to make it work for it to be able to recognize spaces in messages.

Always dump your token stream to see what the Lexer produces for the Parser to consume.
For the first line of your test input, (using grun Hello prog -tokens < Hello.txt), I get:
[#0,0:1='00',<T_NUM>,1:0]
[#1,2:2=':',<':'>,1:2]
[#2,3:4='10',<T_NUM>,1:3]
[#3,5:5=':',<':'>,1:5]
[#4,6:9='11 A',<T_MSG>,1:6]
[#5,10:10=':',<':'>,1:10]
[#6,11:21=' Message 1 ',<T_MSG>,1:11]
[#7,22:22='\n',<NL>,1:22]
[#8,23:22='<EOF>',<EOF>,2:0]
line 1:6 no viable alternative at input '00:10:11 A'
Alice: 0, Bob: 0
0
in particular notice the line
[#4,6:9='11 A',<T_MSG>,1:6]
Your parser isn't seeing the stream of tokens that your parser rules assume it will see.
This is because "11 A" matches the T_MSG Lexer rule. Note: even though the T_NUM rule matches the "11" input, ANTLR's Lexer will use the Lexer rule that consumes the most input, so ANTLR will produce a T_MSG token.
That's why you're getting the observed error.
There are ways using Lexer modes, not skipping WS (which means accounting for all places where WS can occur in your parser rules), or maybe a couple of other techniques.
That said, you're really applying the wrong tool to the job. Reading this input in line by line and applying a Regex with capture groups will be MUCH simpler. There's nothing about your input that requires a full-fledged parser.
IF your push forward with ANTLR, you're probably also much better off just working out the grammar to get the correct parse tree and then using a listener to handle the results. All of the #parser* , prog {...} and parser rule actions, are distractions at best if you're not yet building the correct parse tree.

Related

How to pattern match and transform string to generate certain output?

The below code is for getting some form of input which includes lots of whitespace in between important strings and before and after the important strings, so far I have been able to filter the whitespace out. After preparing the string what I want to do is process it.
Here is an example of the inputs that I may get and the favorable output I want;
Input
+--------------+
EDIT example.mv Starter web-onyx-01.example.net.mv
Notice how whitespace id before and after the domain, this whitespace could be concluded as random amount.
Output
+--------------+
example.mv. in ns web-onyx-01.example.net.mv.
In the output the important bit is the whitespace between the domain (Example.) and the keyword (in) and keyword (ns) and host (web-onyx-01.example.net.mv.)
Also notice the period (".") after the domain and host. Another part is the fact that if its a (.mv) ccTLD we will have to remove that bit from the string,
What I would like to achieve is this transformation with multiple lines of text, meaning I want to process a bunch of unordered chaotic list of strings and batch process them to produce the clean looking outputs.
The code is by no-means any good design, but this is at least what I have come up with. NOTE: I am a beginner who is still learning about programming. I would like your suggestions to improve the code as well as to solve the problem at hand i.e transform the input to the desired output.
P.S The output is for zone files in DNS, so errors can be very problematic.
So far my code is accepting text from a textarea and outputs the text into another textarea which shows the output.
My code works for as long as the array length is 2 and 3 but fails at anything larger. So how do I go about being able to process the input to the output dynamically for as big as the list/array may become in the future?
String s = jTextArea1.getText();
Pattern p = Pattern.compile("ADD|EDIT|DELETE|Domain|Starter|Silver|Gold|ADSL Business|Pro|Lite|Standard|ADSL Multi|Pro Plus", Pattern.MULTILINE);
Matcher m = p.matcher(s);
s = m.replaceAll("");
String ms = s.replaceAll("(?m)(^\\s+|[\\t\\f ](?=[\\t\\f ])|[\\t\\f ]$|\\s+\\z)", "");
String[] last = ms.split(" ");
for (String test : last){
System.out.println(test);
}
System.out.println("The length of array is: " +last.length);
if (str.isContain(last[0], ".mv")) {
if (last.length == 2) {
for(int i = 0; i < last.length; i++) {
last[0] = last[0].replaceFirst(".mv", "");
System.out.println(last[0]);
last[i] += ".";
if (last[i] == null ? last[0] == null : last[i].equals(last[0])) {
last[i]+= " in ns ";
}
String str1 = String.join("", last);
jTextArea2.setText(str1);
System.out.println(str1);
}
}
else if (last.length == 3) {
for(int i = 0; i < last.length; i++) {
last[0] = last[0].replaceFirst(".mv", "");
System.out.println(last[0]);
last[i] += ".";
if (last[i] == null ? last[0] == null : last[i].equals(last[0])) {
last[i]+= " in ns ";
}
if (last[i] == null ? last[1] == null : last[i].equals(last[1])){
last[i] += "\n";
}
if (last[i] == null ? last[2] == null : last[i].equals(last[2])){
last[i] = last[0] + last[2];
}
String str1 = String.join("", last);
jTextArea2.setText(str1);
System.out.println(str1);
}
}
}

As I understand your question you have multiple lines of input in the following form:
whitespace[command]whitespace[domain]whitespace[label]whitespace[target-domain]whitespace
You want to convert that to the following form such that multiple lines are aligned nicely:
[domain]. in ns [target-domain].
To do that I'd suggest the following:
Split your input into multiple lines
Use a regular expression to check the line format (e.g. for a valid command etc.) and extract the domains
store the maximum length of both domains separately
build a string format using the maximum lengths
iterate over the extraced domains and build a string for that line using the format defined in step 4
Example:
String input = " EDIT domain1.mv Starter example.domain1.net.mv \n" +
" DELETE long-domain1.mv Silver long-example.long-domain1.net.mv \n" +
" ADD short-domain1.mv ADSL Business ex.sdomain1.net.mv \n";
//step 1: split the input into lines
String[] lines = input.split( "\n" );
//step 2: build a regular expression to check the line format and extract the domains - which are the (\S+) parts
Pattern pattern = Pattern.compile( "^\\s*(?:ADD|EDIT|DELETE)\\s+(\\S+)\\s+(?:Domain|Starter|Silver|Gold|ADSL Business|Pro|Lite|Standard|ADSL Multi|Pro Plus)\\s+(\\S+)\\s*$" );
List<String[]> lineList = new LinkedList<>();
int maxLengthDomain = 0;
int maxLengthTargetDomain = 0;
for( String line : lines )
{
//step 2: check the line
Matcher matcher = pattern.matcher( line );
if( matcher.matches() ) {
//step 2: extract the domains
String domain = matcher.group( 1 );
String targetDomain = matcher.group( 2 );
//step 3: get the maximum length of the domains
maxLengthDomain = Math.max( maxLengthDomain, domain.length() );
maxLengthTargetDomain = Math.max( maxLengthTargetDomain, targetDomain.length() );
lineList.add( new String[] { domain, targetDomain } );
}
}
//step 4: build the format string with variable lengths
String formatString = String.format( "%%-%ds in ns %%-%ds", maxLengthDomain + 5, maxLengthTargetDomain + 2 );
//step 5: build the output
for( String[] line : lineList ) {
System.out.println( String.format( formatString, line[0] + ".", line[1] + "." ) );
}
Result:
domain1.mv. in ns example.domain1.net.mv.
long-domain1.mv. in ns long-example.long-domain1.net.mv.
short-domain1.mv. in ns ex.sdomain1.net.mv.

How do I use printf to format separate strings into one line?

I am using a while loop and getting data from a text file and using classes to reference each string. I don't have any issues getting the values for each string and printing it out.
However, I am confused on how to use System.out.printf(....) to put all of the strings I need in one line while using a loop.
For example, let's say the text file was:
I
like
to
use
computers
I want to use a loop to print out the words into one string and I may have different spacing between each word.
The code I have so far:
while (!readyOrder.isEmpty()) {
s = readyOrder.poll();
System.out.printf(s.getQuantity() + " x " + s.getName()
+ "(" + s.getType() + ")" + " "
+ s.getPrice() * s.getQuantity());
System.out.println(" ");
total = total + s.getPrice() * s.getQuantity();
}
And the output should be:
1_x_The Shawshank Redemption_______(DVD)________________19.95
The underlined spaces are where the spaces should be and how long they should be.
How can I use printf to do that?

I think you need to use the string padding functionality of printf. For example %-30s formats to width of 30 characters, - means left justify.
for (Stock s : Arrays.asList(
new Stock(1, "The Shawshank Redemption", 100, "DVD"),
new Stock(2, "Human Centipede", 123, "VHS"),
new Stock(1, "Sharknado 2", 123, "Blu ray"))) {
System.out.printf("%2d x %-30s (%-7s) %5.2f\n",
s.getQuantity(), s.getName(), s.getType(),
s.getPrice() * s.getQuantity());
}
Output
1 x The Shawshank Redemption (DVD ) 100.00
2 x Human Centipede (VHS ) 246.00
1 x Sharknado 2 (Blu ray) 123.00

Insert a space after every given character - java

I need to insert a space after every given character in a string.
For example "abc.def..."
Needs to become "abc. def. . . "
So in this case the given character is the dot.
My search on google brought no answer to that question
I really should go and get some serious regex knowledge.
EDIT : ----------------------------------------------------------
String test = "0:;1:;";
test.replaceAll( "\\:", ": " );
System.out.println(test);
// output: 0:;1:;
// so didnt do anything
SOLUTION: -------------------------------------------------------
String test = "0:;1:;";
**test =** test.replaceAll( "\\:", ": " );
System.out.println(test);

You could use String.replaceAll():
String input = "abc.def...";
String result = input.replaceAll( "\\.", ". " );
// result will be "abc. def. . . "
Edit:
String test = "0:;1:;";
result = test.replaceAll( ":", ": " );
// result will be "0: ;1: ;" (test is still unmodified)
Edit:
As said in other answers, String.replace() is all you need for this simple substitution. Only if it's a regular expression (like you said in your question), you have to use String.replaceAll().

You can use replace.
text = text.replace(".", ". ");
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace%28java.lang.CharSequence,%20java.lang.CharSequence%29

If you want a simple brute force technique. The following code will do it.
String input = "abc.def...";
StringBuilder output = new StringBuilder();
for(int i = 0; i < input.length; i++){
char c = input.getCharAt(i);
output.append(c);
output.append(" ");
}
return output.toString();

Java Regex causes hung thread

Pattern:
"(([^",\n ]*[,\n ])*([^",\n ]*"{2})*)*[^",\n ]*"[ ]*,[ ]*|[^",\n]*[ ]*,[ ]*|"(([^",\n ]*[,\n ])*([^",\n ]*"{2})*)*[^",\n ]*"[ ]*|[^",\n]*[ ]*
This Regex is for parsing CSV file. But when it goes into Pattern.matcher, I encounter a hung thread exception. Appreciate it if someone can help fine tune this pattern.
[7/1/13 16:45:26:745 GMT+08:00] 00000029 ThreadMonitor W WSVR0605W: Thread "MessageListenerThreadPool : 0" (00000035) has been active for 691836 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.match(Pattern.java:4733)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4754)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$Loop.match(Pattern.java:4742)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$BitClass.match(Pattern.java:2912)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4278)
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)

Description
The problem appears to be the shear amount of back tracking being done to accomplish the match.
If your CSV is well formed you could use a more simple regex to parse each line. Note this will only separate the quote-comma and comma delimited values from a string, so you'd need to pass each line through the .matcher with this regex and iterate over each of the matches.
regex: (?:^|,)"?((?<=")[^"]*|[^,"]*)"?(?=,|$)
Java Code Example:
Live example: http://ideone.com/NBmzrk
Sample Text
"root",test1,1111,"22,22",,fdsa
Code
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "source string to match with pattern";
Pattern re = Pattern.compile("(?:^|,)\"?((?<=\")[^\"]*|[^,\"]*)\"?(?=,|$)",Pattern.CASE_INSENSITIVE);
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
Capture Group 1
[0] => root
[1] => test1
[2] => 1111
[3] => 22,22
[4] =>
[5] => fdsa

Regex word boundry

I am splitting a string by word boundary.
What I am expecting is:
TOKEN 0
TOKEN 1 0
TOKEN 2
TOKEN 3 +Ve
and, what I am getting is,
TOKEN 0
TOKEN 1 0
TOKEN 2 +
TOKEN 3 Ve
public void StringExample(){
String str = " 0 +Ve";
String[] token = str.split("\\b");
System.out.println("TOKEN 0 " + token[0]);
System.out.println("TOKEN 1 " + token[1]);
System.out.println("TOKEN 2 " + token[2]);
System.out.println("TOKEN 3 " + token[3]);
}
Can someone give a clue where its going wrong? and Possible corrections if any,

Both #pb2q and #Hovercraft have already explained why word boundary doesn't work in your situation. An alternative, is to use a Pattern and capture each group, which will give you what you want:
String str = " 0 +Ve";
Pattern p = Pattern.compile("( |[^ ]+)");
Matcher m = p.matcher(str);
List<String> tokens = new ArrayList<String>();
while (m.find()) {
tokens.add(m.group(1));
}
System.out.println("TOKEN 0 " + tokens.get(0));
System.out.println("TOKEN 1 " + tokens.get(1));
System.out.println("TOKEN 2 " + tokens.get(2));
System.out.println("TOKEN 3 " + tokens.get(3));

Nothing is going wrong, and the results are as should be expected. Word boundaries match at the before the first character of a String, after the last character of a String and between two characters in the string, where one is a word character and the other is not a word character. The last rule will result in a match between '+' and 'V', and so your results make perfect sense.
Perhaps you want to use look ahead and look behind to match anything next to a space. For example:
public class Foo001 {
// private static final String REGEX1 = "\\b";
private static final String REGEX2 = "(?= )|(?<= )";
public static void main(String[] args) {
String str = " 0 +Ve";
String[] tokens = str.split(REGEX2);
for (int i = 0; i < tokens.length; i++) {
System.out.printf("token %d: \"%s\"%n", i, tokens[i]);
}
}
}
This will also match the left of the first space giving an extra token:
token 0: ""
token 1: " "
token 2: "0"
token 3: " "
token 4: "+Ve"

+ is not counted as a word char for word boundaries. Word chars are [a-zA-Z_0-9], that is, alphanumeric, and underscore
Unless your strings get more complex than your example, this is another instance where you can just split around the space:
" 0 +Ve".split(" ");
This should yield this array: [" ", "0", "+Ve"].
Which doesn't quite match the token list that you expect, but may suit your purposes. With this token list you know that there is a leading space character, and you can infer a space as the third token.
A problem with splitting this way is that multiple space characters will yield additional " " tokens in the resulting array.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ANTLR4 no viable alternative; problem with spaces - java

Related

How to pattern match and transform string to generate certain output?

How do I use printf to format separate strings into one line?

Insert a space after every given character - java

Java Regex causes hung thread

Regex word boundry

Categories

Resources