How to program a context-free grammar? - java

I have two classes here.
The CFG class takes a string array in its constructor that defines the context-free grammar. The SampleTest class is being used to test the CFG class by inputting the grammar (C) into the class, then inputting a string by the user, and seeing if that string can be generated by the context-free grammar.
The problem I'm running into is a stack overflow (obviously). I'm assuming that I just created a never-ending recursive function.
Could someone take a look at the processData() function, and help me out figure out how to correctly configure it. I'm basically using recursion to take generate all possibilities for strings that the CFG can create, then returning true if one of those possibilities being generated matches the user's input (inString). Oh, and the wkString parameter is simply the string being generated by the grammar through each recursive iteration.
public class SampleTest {
public static void main(String[] args) {
// Language: strings that contain 0+ b's, followed by 2+ a's,
// followed by 1 b, and ending with 2+ a's.
String[] C = { "S=>bS", "S=>aaT", "T=>aT", "T=>bU", "U=>Ua", "U=>aa" };
String inString, startWkString;
boolean accept1;
CFG CFG1 = new CFG(C);
if (args.length >= 1) {
// Input string is command line parameter
inString = args[0];
char[] startNonTerm = new char[1];
startNonTerm[0] = CFG1.getStartNT();
startWkString = new String(startNonTerm);
accept1 = CFG1.processData(inString, startWkString);
System.out.println(" Accept String? " + accept1);
}
} // end main
} // end class
public class CFG {
private String[] code;
private char startNT;
CFG(String[] c) {
this.code = c;
setStartNT(c[0].charAt(0));
}
void setStartNT(char startNT) {
this.startNT = startNT;
}
char getStartNT() {
return this.startNT;
}
boolean processData(String inString, String wkString) {
if (inString.equals(wkString)) {
return true;
} else if (wkString.length() > inString.length()) {
return false;
}
// search for non-terminal in the working string
boolean containsNT = false;
for (int i = 0; i < wkString.length(); i++) {
// if one of the characters in the working string is a non-terminal
if (Character.isUpperCase(wkString.charAt(i))) {
// mark containsNT as true, and exit the for loop
containsNT = true;
break;
}
}
// if there isn't a non-terminal in the working string
if (containsNT == false) {
return false;
}
// for each production rule
for (int i = 0; i < this.code.length; i++) {
// for each character on the RHS of the production rule
for (int j = 0; j <= this.code[i].length() - 3; j++) {
if (Character.isUpperCase(this.code[i].charAt(j))) {
// make substitution for non-terminal, creating a new working string
String newWk = wkString.replaceFirst(Character.toString(this.code[i].charAt(0)), this.code[i].substring(3));
if (processData(inString, newWk) == true) {
return true;
}
}
}
} // end for loop
return false;
} // end processData
} // end class

Your grammar contains a left-recursive rule
U=>Ua
Recursive-descent parsers can't handle left-recursion, as you've just discovered.
You have two options: Alter your grammar to not be left-recursive anymore, or use a parsing algorithm that can handle it, such as LR1. In your case, U is matching "at least two a characters", so we can just move the recursion to the right.
U=>aU
and everything will be fine. This isn't always possible to do in such a nice way, but in your case, avoiding left-recursion is the easy solution.

You don't need this for loop: "for (int j = 0; j <= this.code[i].length() - 3; j++)". jus create a var to hold the Capital letter in the nonterminal search you did above. Then do your outer for loop followed by if there is a production rule in String[] that starts with that found Non-terminal, do your substitution and recursion.

Related

Convert a String to customised version of Snake case and capitalise first character before every underscore in Java

Suppose I have string like FOO_BAR or foo_bar or Foo_Bar, I want to convert it to customised snake case like below
FOO_BAR -> Foo Bar
`foo_bar` -> `Foo Bar`
`Foo_Bar` -> `Foo Bar`
As you can notice all three inputs provide the same output, capitalise the first character before every underscore and first one, and remove the _ (underscore).
Looked into various libraries like guava and apache but as this is customized version couldn't find any Out of the box solution.
Tried below code but and make it work but its looking bit complex
str.replaceAll("([a-z])([A-Z])", "$1_$2").replaceAll("_", " ")
Output of above code is like FOO BAR basically all characters in uppercase, that i can fix in another iteration but looking for something more efficient and simple.
Just for a bit of fun, here is a stream-based answer:
var answer = Arrays.stream(s.split("_"))
.map(i -> i.substring(0, 1).toUpperCase() + i.substring(1).toLowerCase())
.collect(Collectors.joining(" "));
Here's a simple implementation. I would add a few more test cases before I trusted it.
It does not handle Unicode characters of more than two bytes.
public class Snakify {
public static String toSnake(String in) {
boolean first = true;
boolean afterUnderscore = false;
char[] chars = in.toCharArray();
for (int i = 0; i < chars.length; i++) {
if ((first || afterUnderscore) && Character.isAlphabetic(chars[i])) {
chars[i] = Character.toUpperCase(chars[i]);
first = false;
afterUnderscore = false;
} else if (chars[i] == '_') {
chars[i] = ' ';
afterUnderscore = true;
} else if (Character.isAlphabetic(chars[i])) {
chars[i] = Character.toLowerCase(chars[i]);
}
}
return new String(chars);
}
public static void main(String[] args) {
System.out.println(toSnake("FOO_BAR").equals("Foo Bar"));
System.out.println(toSnake("foo_bar").equals("Foo Bar"));
System.out.println(toSnake("Foo_Bar").equals("Foo Bar"));
System.out.println(toSnake("àèì_òù_ÀÈÌ").equals("Àèì Òù Àèì"));
}
}

Create a linked-list of characters on a stack?

I have an assignment that I'm struggling with.
Write code based on referenced-based stack to implement the balance check of a user input string with ‘{’, ‘}’, ‘(’, ‘)’, and ‘[’, and ‘]’. For instance, if user inputs “(abc[d]e{f})”, your code should say that the expression is balanced.
I have the functions push / pop already written:
public void push(Object newItem) {
top = new Node(newItem, top);
} // end push
public Object pop(){
if (!isEmpty()) {
Node temp = top;
top = top.getNext();
return temp.getItem();
} else {
System.out.print("StackError on " +
"pop: stack empty");
return null;
} // end if
} // end pop
However, what I am struggling with is understanding how to create a new node for each character. Could somebody please help me?
Since your assignment instructions ask you to "Write code based on referenced-based stack", it seems your question is more about how to convert each of user's input string into a node. In that case, you can convert them first to a list of chars simply like this:
public class Main {
public static void main(String[] args){
String str = new String("[(a)bcde]");
System.out.println(str.toCharArray());
}
}
And then use ASCII table to tell whether it's a special character. eg: in above code:
(int) str.toCharArray()[0] // will show ASCII code of '[', 91
Some useful implementations about Reference-based Stack
The isbalanced mechanism simplified for [,],(,):
Always add(push) a [, or (
When you get to a ) check the last added character was a (. If it was remove(pop) it, otherwise mark as unbalanced.
When you get to a ] check the last added character was a [. If it was remove(pop) it, otherwise mark as unbalanced.
If the stack is empty by the end of the string, it is balanced.
In reponse to your comment
Based off an answer for iterate through the characters of a string
unbalanced=false;
for (int i = 0; i < s.length(); i++)
{
char c = s.charAt(i);
if(c.equal('[')
{
push(c);
}
if(c.equal(']')
{
Char tmp = (Char)pop();
if(!tmp.equals('['))
unbalanced=true;
break;
}
}
}
if(pop()!=null)
{
unbalanced=true;
}
Here is what the professor was looking for:
... }
if(currChar.equals("["))
{
myStackRef.push("[");
}
if(currChar.equals("}") && myStackRef.peek().equals("{"))
{
myStackRef.pop();
}
if(currChar.equals(")") && myStackRef.peek().equals("("))
{
myStackRef.pop();
}
if(currChar.equals("]") && myStackRef.peek().equals("["))
{
myStackRef.pop();
}
}
if(myStackRef.isEmpty())
{
System.out.println("Balanced");
}
else
{
System.out.println("Unbalanced");
}
}
}

Removing leading zero in java code

May I know how can I remove the leading zero in JAVA code? I tried several methods like regex tools
"s.replaceFirst("^0+(?!$)", "") / replaceAll("^0*", "");`
but it's seem like not support with my current compiler compliance level (1.3), will have a red line stated the method replaceFirst(String,String)is undefined for the type String.
Part of My Java code
public String proc_MODEL(Element recElement)
{
String SEAT = "";
try
{
SEAT = setNullToString(recElement.getChildText("SEAT")); // xml value =0000500
if (SEAT.length()>0)
{
SEAT = SEAT.replaceFirst("^0*", ""); //I need to remove leading zero to only 500
}
catch (Exception e)
{
e.printStackTrace();
return "501 Exception in proc_MODEL";
}
}
}
Appreciate for help.
If you want remove leading zeros, you could parse to an Integer and convert back to a String with one line like
String seat = "001";// setNullToString(recElement.getChildText("SEAT"));
seat = Integer.valueOf(seat).toString();
System.out.println(seat);
Output is
1
Of course if you intend to use the value it's probably better to keep the int
int s = Integer.parseInt(seat);
System.out.println(s);
replaceFirst() was introduced in 1.4 and your compiler pre-dates that.
One possibility is to use something like:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
while ((s.length() > 1) && (s.charAt(0) == '0'))
s = s.substring(1);
System.out.println(s);
}
}
It's not the most efficient code in the world but it'll get the job done.
A more efficient segment without unnecessary string creation could be:
public class testprog {
public static void main(String[] args) {
String s = "0001000";
int pos = 0;
int len = s.length();
while ((pos < len-1) && (s.charAt(pos) == '0'))
pos++;
s = s.substring(pos);
System.out.println(s);
}
}
Both of those also handle the degenerate cases of an empty string and a string containing only 0 characters.
Using a java method str.replaceAll("^0+(?!$)", "") would be simple;
First parameter:regex -- the regular expression to which this string is to be matched.
Second parameter: replacement -- the string which would replace matched expression.
As stated in Java documentation, 'replaceFirst' only started existing since Java 1.4 http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceFirst(java.lang.String,%20java.lang.String)
Use this function instead:
String removeLeadingZeros(String str) {
while (str.indexOf("0")==0)
str = str.substring(1);
return str;
}

java find if the string contains 2 other strings

I have 2 strings "test" "bet" and another string a="tbtetse". I need to check if the "tbtetse" contains the other two strings.
I was thinking if I could find all the anagrams of string a and and then find the other two strings in those, but it doesn't work that way and also my anagram code is failing for a lengthy string.
Could you please help with any other ways to solve it?
Assuming you're trying to test whether the letters in a can be used to form an anagram of the test strings test and bet: I recommend making a dictionary (HashMap or whatever) of character counts from string a, indexed by character. Build a similar dictionary for the words you're testing. Then make sure that a has at least as many instances of each character from the test strings as they have.
Edit: Alcanzar suggests arrays of length 26 for holding the counts (one slot for each letter). Assuming you're dealing with only English letters, that is probably less of a hassle than dictionaries. If you don't know the number of allowed characters, the dictionary route is necessary.
Check below code, it may help you.
public class StringTest {
public static void main(String[] args) {
String str1 = "test";
String str2 = "bev";
String str3 = "tbtetse";
System.out.println(isStringPresent(str1, str2, str3));
}
private static boolean isStringPresent(String str1, String str2, String str3) {
if ((str1.length() + str2.length()) != str3.length()) {
return false;
} else {
String[] str1Arr = str1.split("");
String[] str2Arr = str2.split("");
for (String string : str1Arr) {
if (!str3.contains(string)) {
return false;
}
}
for (String string : str2Arr) {
if (!str3.contains(string)) {
return false;
}
}
}
return true;
}
}
basically you need to count characters in both sets and compare them
void fillInCharCounts(String word,int[] counts) {
for (int i = 0; i<word.length(); i++) {
char ch = word.charAt(i);
int index = ch - 'a';
counts[index]++;
}
}
int[] counts1 = new int[26];
int[] counts2 = new int[26];
fillInCharCounts("test",counts1);
fillInCharCounts("bet",counts1);
fillInCharCounts("tbtese",counts2);
boolean failed = false;
for (int i = 0; i<counts1.length; i++) {
if (counts1[i] > counts2[i]) {
failed = true;
}
}
if (failed) {
whatever
} else {
something else
}
If you are generalizing it, don't forget to call .toLowerCase() on the word before sending it in (or fix the counting method).
Pseudo code:
Make a copy of string "tbtetse".
Loop through each character in "test".
Do a indexOf() search for the character in your copied string and remove it if found.
If not found, fail.
Do the same for the string "bet".
class WordLetter {
char letter;
int nth; // Occurrence of that letter
...
}
One now can use Sets
Set<WordLetter>
// "test" = { t0 e0 s0 t1 }
Then testing reduces to set operations. If both words need to be present, a union can be tested. If both words must be formed from separate letters, a set of the concatenation can be tested.

Retrieve method source code from class source code file

I have here a String that contains the source code of a class. Now i have another String that contains the full name of a method in this class. The method name is e.g.
public void (java.lang.String test)
Now I want to retieve the source code of this method from the string with the class' source code. How can I do that? With String#indexOf(methodName) i can find the start of the method source code, but how do i find the end?
====EDIT====
I used the count curly-braces approach:
internal void retrieveSourceCode()
{
int startPosition = parentClass.getSourceCode().IndexOf(this.getName());
if (startPosition != -1)
{
String subCode = parentClass.getSourceCode().Substring(startPosition, parentClass.getSourceCode().Length - startPosition);
for (int i = 0; i < subCode.Length; i++)
{
String c = subCode.Substring(0, i);
int open = c.Split('{').Count() - 1;
int close = c.Split('}').Count() - 1;
if (open == close && open != 0)
{
sourceCode = c;
break;
}
}
}
Console.WriteLine("SourceCode for " + this.getName() + "\n" + sourceCode);
}
This works more or less fine, However, if a method is defined without body, it fails. Any hints how to solve that?
Counting braces and stopping when the count decreases to 0 is indeed the way to go. Of course, you need to take into account braces that appear as literals and should thus not be counted, e.g. braces in comments and strings.
Overall this is kind of a thankless endeavour, comparable in complexity to say, building a command line parser if you want to get it working really reliably. If you know you can get away with it you could cut some corners and just count all the braces, although I do not recommend it.
Update:
Here's some sample code to do the brace counting. As I said, this is a thankless job and there are tons of details you have to get right (in essence, you 're writing a mini-lexer). It's in C#, as this is the closest to Java I can write code in with confidence.
The code below is not complete and probably not 100% correct (for example: verbatim strings in C# do not allow spaces between the # and the opening quote, but did I know that for a fact or just forgot about it?)
// sourceCode is a string containing all the source file's text
var sourceCode = "...";
// startIndex is the index of the char AFTER the opening brace
// for the method we are interested in
var methodStartIndex = 42;
var openBraces = 1;
var insideLiteralString = false;
var insideVerbatimString = false;
var insideBlockComment = false;
var lastChar = ' '; // White space is ignored by the C# parser,
// so a space is a good "neutral" character
for (var i = methodStartIndex; openBraces > 0; ++i) {
var ch = sourceCode[i];
switch (ch) {
case '{':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
++openBraces;
}
break;
case '}':
if (!insideBlockComment && !insideLiteralString && !insideVerbatimString) {
--openBraces;
}
break;
case '"':
if (insideBlockComment) {
continue;
}
if (insideLiteralString) {
// "Step out" of the string if this is the closing quote
insideLiteralString = lastChar != '\';
}
else if (insideVerbatimString) {
// If this quote is part of a two-quote pair, do NOT step out
// (it means the string contains a literal quote)
// This can throw, but only for source files with syntax errors
// I 'm ignoring this possibility here...
var nextCh = sourceCode[i + 1];
if (nextCh == '"') {
++i; // skip that next quote
}
else {
insideVerbatimString = false;
}
}
else {
if (lastChar == '#') {
insideVerbatimString = true;
}
else {
insideLiteralString = true;
}
}
break;
case '/':
if (insideLiteralString || insideVerbatimString) {
continue;
}
// TODO: parse this
// It can start a line comment, if followed by /
// It can start a block comment, if followed by *
// It can end a block comment, if preceded by *
// Line comments are intended to be handled by just incrementing i
// until you see a CR and/or LF, hence no insideLineComment flag.
break;
}
lastChar = ch;
}
// From the values of methodStartIndex and i we can now do sourceCode.Substring and get the method source
Have a look at:- Parser for C#
It recommends using NRefactory to parse and tokenise source code, you should be able to use that to navigate your class source and pick out methods.
You will have to, probably, know the sequence of the methods listed in the code file. So that, you can look for the method closing scope } which may be right above start of next method.
So you code might look like:
nStartOfMethod = String.indexOf(methodName)
nStartOfNextMethod = String.indexOf(NextMethodName)
Look for .LastIndexOf(yourMethodTerminator /*probably a}*/,...) between a string of nStartOfMethod and nStartOfNextMethod
In this case, if you dont know the sequence of methods, you might end up skipping a method in between, to find an ending brace.

Categories

Resources