Parse an Expression to its components and sub components

Parse an Expression to its components and sub components - java

I need to parse an expression such as: neg(and(X,Y))
I need it to come out with the Abstract Stack Machine Code Such as for the example above:
LOAD X;
LOAD Y;
EXEC and;
EXEC neg;
But for now the machine code is not an issue, how can i parse / break up my input string of an expression into all its sub expressions?
I have tried to find the first bracket and then concat from that to the last bracket but that then gives isuess if you have a inner expression?
code that i have tried: (please not it is still very much in the development phase)
private boolean evaluateExpression(String expression) {
int brackets = 0;
int beginIndex = -1;
int endIndex = -1;
for (int i = 0; i < expression.length(); i++) {
if (expression.charAt(i) == '(') {
brackets++;
if (brackets == 0) {
endIndex = i;
System.out.println("the first expression ends at " + i);
}
}
if (expression.charAt(i) == ')') {
brackets--;
if (brackets == 0) {
endIndex = i;
System.out.println("the first expression ends at " + i);
}
}
}
// Check for 1st bracket
for (int i = 0; i < expression.length(); i++) {
if (expression.charAt(i) == '(') {
beginIndex = i;
break;
}
}
String subExpression = expression.substring(beginIndex, endIndex);
System.out.println("Sub expression: " + subExpression);
evaluateExpression(subExpression);
return false;
}
I am just looking for a basic solution, It only has to do: and, or, neg

The expressions you are trying to parse are actually making a Context Free Language, which can be represented as a Context Free Grammer.
You can create a context free grammer that represents this language of expressions, and use a CFG parser to parse it.
One existing java tool that does it (and more) is JavaCC, though it could be an overkill here.
Another algorithm to parse sentences using a CFG is CYK, which is fairly easy to program and use.
In here, the CFG representing the available expressions are:
S -> or(S,S)
S -> and(S,S)
S -> not(S)
S -> x | for each variable x
Note that though this is relatively simple CFG - the language it describes is irregular, so if you were hoping for regex - it's probably not the way to go.

Actually if you want your parser to be strong enough to deal with most cases, you would like to use a tokenizer(java has a implemented tokenizer class) to token the string first, then try to recognize each expression, storing operands and operators in a tree structure, then evaluate them recursively.
If you only want to deal with some simple situations, remember to use recursion, that is the core part~

Parsing things like this is typically done using syntax trees, using some type of preference for order of operations. An example for what you have posted would be as follows:
Processing items left to right the tree would be populated like this
1arg_fcall(neg)
2arg_fcall(and)
Load Y
Load X
Now we can recursively visit this tree bottom to top to get
Load X
Load Y
EXEC and //on X and Y
EXEC neg //on result of and

Related

Get parameters of nested functions

I am trying to implement a parser in Java to extract the arguments of some functions.
When I have a function like:
max(1, 2, 3)
I just simply can use a Regular Expresion to extract the args.
But all my functions are not like that. If I have some nested function, eg:
max(sum(1, max(1,2,sum(2,5)), 3, 5, mult(3,3))
I would like to obtain:
sum(1, max(1,2,sum(2,5))
3
5
mult(3,3)
I tried using a Regular Expression, but I asume the language is not regular. Another approach was splliting by ',', but did not work as well.
Is there any method for extracting the arguments of a function? I do not really know how this type of problem can be solved since there is no a pattern to use for extracting the arguments.
Any help or insight would be really appreciated. Thanks!!

Parsing a source code into an some abstract model is quite complex topic, depending on the language complexity.
But first step is usually tokenization, where you read one character at a time and detect full tokens (like variable names, function names, operators, literals etc).
Since you presented only very limited scope for the problem , you have very small set of tokens:
name of a function
( and ) to indicate method call
, to separate arguments
numbers
Reading one symbol at the time, you should be able to very easily detect when one token ends and the next one begins. Also your tokens are very distinct (i.e. you don't have to differentiate function name from variable name), you can very easily categorize them.
Once you have a token, you know the grammar (you have only function calls), you can easily build a syntax tree (where at the root you have top level function call with its arguments being children nodes).
From that structure you can easily fetch whichever parts you wish.
If you are more interested in how it works in the javac compiler, you can always check out its source code (it's open source after all):
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavaTokenizer.java
However, that may be quite a long read.

Finally found a method that works:
public List<String> parseArgs(String l){
int startIdx = l.indexOf("(") + 1;
int endIdx = l.lastIndexOf(")") - 1;
int count = 0;
int argIdx = startIdx;
List<String> args = new ArrayList<>();
for (int i = startIdx; i < endIdx; i++) {
if (l.charAt(i) == '(')
count -= 1;
else if (l.charAt(i) == ')'){
count += 1;
}
else if (l.charAt(i) == ',' && count == 0){
args.add(l.substring(argIdx, i).trim());
argIdx = i + 1;
}
}
args.add(l.substring(argIdx, endIdx + 1).trim());
return args;
}
String l = "max(sum(1, max(1,2,sum(2,5))), 3, 5, mult(3,3))";
parseArgs(l).forEach(System.out::println);
//Prints
sum(1, max(1,2,sum(2,5)))
3
5
mult(3,3)

Find all valid words when given a string of characters (Recursion / Binary Search)

I'd like some feedback on a method I tried to implement that isn't working 100%. I'm making an Android app for practice where the user is given 20 random letters. The user then uses these letters to make a word of whatever size. It then checks a dictionary to see if it is a valid English word.
The part that's giving me trouble is with showing a "hint". If the user is stuck, I want to display the possible words that can be made. I initially thought recursion. However, with 20 letters this can take quite a long time to execute. So, I also implemented a binary search to check if the current recursion path is a a prefix to anything in the dictionary. I do get valid hints to be output however it's not returning all possible words. Do I have a mistake here in my recursion thinking? Also, is there a recommended, faster algorithm? I've seen a method in which you check each word in a dictionary and see if the characters can make each word. However, I'd like to know how effective my method is vs. that one.
private static void getAllWords(String letterPool, String currWord) {
//Add to possibleWords when valid word
if (letterPool.equals("")) {
//System.out.println("");
} else if(currWord.equals("")){
for (int i = 0; i < letterPool.length(); i++) {
String curr = letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)){
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
} else {
//Every time we add a letter to currWord, delete from letterPool
//Attach new letter to curr and then check if in dict
for(int i=0; i<letterPool.length(); i++){
String curr = currWord + letterPool.substring(i, i+1);
String newLetterPool = (letterPool.substring(0, i) + letterPool.substring(i+1));
if(dict.contains(curr)) {
possibleWords.add(curr);
}
boolean prefixInDic = binarySearch(curr);
if( !prefixInDic ){
break;
} else {
getAllWords(newLetterPool, curr);
}
}
}
private static boolean binarySearch(String word){
int max = dict.size() - 1;
int min = 0;
int currIndex = 0;
boolean result = false;
while(min <= max) {
currIndex = (min + max) / 2;
if (dict.get(currIndex).startsWith(word)) {
result = true;
break;
} else if (dict.get(currIndex).compareTo(word) < 0) {
min = currIndex + 1;
} else if(dict.get(currIndex).compareTo(word) > 0){
max = currIndex - 1;
} else {
result = true;
break;
}
}
return result;
}

The simplest way to speed up your algorithm is probably to use a Trie (a prefix tree)
Trie data structures offer two relevant methods. isWord(String) and isPrefix(String), both of which take O(n) comparisons to determine whether a word or prefix exist in a dictionary (where n is the number of letters in the argument). This is really fast because it doesn't matter how large your dictionary is.
For comparison, your method for checking if a prefix exists in your dictionary using binary search is O(n*log(m)) where n is the number of letters in the string and m is the number of words in the dictionary.
I coded up a similar algorithm to yours using a Trie and compared it to the code you posted (with minor modifications) in a very informal benchmark.
With 20-char input, the Trie took 9ms. The original code didn't complete in reasonable time so I had to kill it.
Edit:
As to why your code doesn't return all hints, you don't want to break if the prefix is not in your dict. You should continue to check the next prefix instead.

Is there a recommended, faster algorithm?
See Wikipedia article on "String searching algorithm", in particular the section named "Algorithms using a finite set of patterns", where "finite set of patterns" is your dictionary.
The Aho–Corasick algorithm listed first might be a good choice.

Efficiently parse single digit arithmetic expression

How would you efficiently (optimizing for runtime but also keeping space at a minimum) parse and evaluate a single digit arithmetic expression in Java.
The following arithmetic expressions are all valid:
eval("-5")=-5
eval("+4")=4
eval("4")=4
eval("-7+2-3")=-8
eval("5+7")=12
My approach is to iterate over all elements, keeping track of the current arithmetic operation using a flag, and evaluate digit by digit.
public int eval(String s){
int result = 0;
boolean add = true;
for(int i = 0; i < s.length(); i++){
char current = s.charAt(i);
if(current == '+'){
add = true;
} else if(current == '-'){
add = false;
} else {
if(add){
result += Character.getNumericValue(current);
} else {
result -= Character.getNumericValue(current);
}
}
}
return result;
}
Is this the only optimal solution? I have tried to use stacks to keep track of the arithmetic operator, but I am not sure this is any more efficient. I also have not tried regular expressions. I only ask because I gave the above solution in an interview and was told it is sub-optimal.

This seems a bit more compact. It certainly requires fewer lines and conditionals. The key is addition is the "default" behavior and each minus sign you encounter changes the sign of what you want to add; provided you remember to reset the sign after each addition.
public static int eval(String s){
int result = 0;
int sign = 1;
for(int i = 0; i < s.length(); i++){
char current = s.charAt(i);
switch (current)
{
case '+': break;
case '-': sign *= -1; break;
default:
result += sign * Character.getNumericValue(current);
sign = 1;
break;
}
}
return result;
}
As a note, I don't think yours produces correct results for adding a negative, e.g., "4- -3". Your code produces 1, rather than the correct value of 7. On the other hand, mine allows expressions such as "5+-+-3", which would produce the result 8 (I suppose that's correct? :). However, you didn't list validation as a requirement and neither of us are checking for sequential digits, alpha characters, white space, etc. If we assume the data is properly formatted, the above implementation should work. I don't see how adding data structures (such as queues) could possibly be helpful here. I'm also assuming just addition and subtraction.
These test cases produce the following results:
System.out.println(eval("1+2+3+4"));
System.out.println(eval("1--3"));
System.out.println(eval("1+-3-2+4+-3"));
10
4
-3

You need to lookup up 'recursive descent expression parser' or the Dijkstra shunting-yard algorithm. Your present approach is doomed to failure the moment you have to cope with operator precedence or parentheses. You also need to forget about regular expressions and resign yourself to writing a proper scanner.

Changing recursive method to iterative

i have recrusive function which works fine. The problem is it gives stackoverflow error when the number of lines are huge. I want to put it in iterative, probably using a for loop. Need some help in doing it.
private TreeSet validate(int curLine, TreeSet errorSet) {
int increment = 0;
int nextLine = 0;
if (curLine == lines.length || errorSet.size() != 0) {
return errorSet;
} else {
String line = lines[curLine];
//validation starts. After validation, line is incremented as per the requirements
increment = 1 //As per requirement. Depends on validation results of the line
if (increment > 0) {
try{
Thread.currentThread().sleep(100);
}catch(Exception ex){
System.out.println(ex);
}
nextLine = (curLine + increment);
validate(nextLine, errorSet);
}
}
return errorSet;
}
Poster's description of the method:
The method does validates textlines, these lines has instructions of how much line has to be skipped, if the line is valid. So, if the line is valid that many of lines will be skipped using the increment. if the line is not valid increment will be 0.

I'm not sure why this was ever recursive in the first place. This is perfectly suited for the use of a FOR loop. use something like so:
private TreeSet validate(int curLine, TreeSet errorSet) {
int increment = 0;
if (errorSet.size() != 0)
return errorSet;
for (int curLine = 0; curLine < lines.Length; curLine += increment)
{
// put your processing logic in here
// set the proper increment here.
}
}
If the increment is always going to be 1, then you can just use curr++ instead of curLine += increment

for(String line : lines) {
// validate line here
if(!errorSet.isEmpty()) {
break;
}
}

The solution for your problem could be simple for loop or while, with logical expression for stop condition. Typically we use for loop when we have to pass through all elements of Iterable or array. In case when we are not aware how many loops we are going to do we use a while loop. Advantage of for loop over while, is that we for free have localized variables so we ca not use them out side of the loop, therefore we reduce possibility to have some bug.
You problem is that you have to break the program on two conditions:
When errorSet is not empty.
When the array of lines have no longer items.
As contradiction, we can say that your program should continue:
Until errorSet is empty,
and until line number is smaller than array size where they are stored.
This provide us to simply expression
errorSet.isEmpty()
lineNumber < lines.length()
We can combine them using logical operator && and use as a stop rule in for loop.
for(int lineNumber= 0; errorSet.isEmpty() && lineNumber< lines.length(); lineNumber++) {
//code to operate
}
Note:
Typically for logical expression is used operator &&, that assure that every part of the logical expression is evaluated. An alternative for this is &, that in case of false do not operate longer and return false. We could be tempted to use this operator for this expression but i would be bad idea. Because when we would traversed all lines without error code will generate IndexOutOfBoundException, if we switch the places then we would not have any optimization as first expression would be evaluated same number of times.

Is there a way to pre increment by more than 1 in Java?

In Java you can do a post increment of an integer i by more that one in this manner:
j + i += 2.
I would like to do the same thing with a pre increment.
e.g j + (2 += i) //This will not work

Just put the increment statement in parentheses. For example, the following will output pre: 2:
int i = 0;
System.out.println(
((i+=2) == 0)
? "post: " + i : "pre: " + i);
However, writing code like this borders on obfuscation. Splitting up the statement into multiple lines will significantly improve readability.

Not sure if there is a confusion in terminology going on, but += is not a post or pre-increment operator! Java follows the C/C++ definition of post-increment/pre-increment and they are well defined in the standard as unary operators. += is a shortcut for a binary operator. It evaluates to this:
lvalue1 += 5 ;
// is really (almost)
lvalue1 = lvalue1 + 5;
The assembler for the instruction doesn't look exactly like the binary version but at the level your using Java you do not see that.
The post-increment/pre-increment are unary operators that function kind of like this:
i++ ; // is something like _temp = i; i = i + 1; return temp;
++i; // is something like i = i + 1; return i;
This is just an example of how it works, the byte code doesn't translate too multiple statements for the post-increment case.
In your example, you could say a post-increment occurs but really, its just an increment. which is why I believe you have made the (incorrect) leap that it might be possible to have a pre-increment version of the same operation. But such a thing does not exist in C, C++, or Java.

Something like:
int increment = 42;
int oldI = i;
i += increment;
result = myMethod(oldI);
// Rest of code follows.
But why would you want to do this?

The += is pre increment. If you want post increment simply create wrapper with postInc method which will do this. If you really need this it would be more readable than parenthesis.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse an Expression to its components and sub components - java

Related

Get parameters of nested functions

Find all valid words when given a string of characters (Recursion / Binary Search)

Efficiently parse single digit arithmetic expression

Changing recursive method to iterative

Is there a way to pre increment by more than 1 in Java?

Categories

Resources