Given an expression with operators, functions, and operands, such as:
2 + sin ( max ( 2, 3 ) / 3 * 3.1415 )
How can I programmatically validate the expression, such that any functions must have the correct number of parameters? For example abs,sin,cos must have exactly 1 parameter, whereas sum,avg,max,min have 2 or more.
Given that each parameter can itself be a very complicated expression, it seems non-trivial to programmatically determine this. I have already written a lexical tokenizer (lexer), and I've managed to convert the expression to postfix/RPN. (Which is: 2 3 max 3 / 3.1415 * sin 2 +). I am still no closer to a solution.
I would appreciate some code or pseudocode that will guide me in writing something from scratch. Java would be great.
Below is my lexer code:
public static List<Token> shunt(List<Token> tokens) throws Exception {
List<Token> rpn = new ArrayList<Token>();
Iterator<Token> it = tokens.iterator();
Stack<Token> stack = new Stack<Token>();
while (it.hasNext()) {
Token token = it.next();
if (Type.NUMBER.equals(token.type))
rpn.add(token);
if (Type.FUNCTION.equals(token.type) || Type.LPAREN.equals(token.type))
stack.push(token);
if (Type.COMMA.equals(token.type)) {
while (!stack.isEmpty() && !Type.LPAREN.equals(stack.peek().type))
rpn.add(stack.pop());
if (stack.isEmpty())
throw new Exception("Missing left parenthesis!");
}
if (Type.OPERATOR.equals(token.type)) {
while (!stack.isEmpty() && Type.OPERATOR.equals(stack.peek().type))
rpn.add(stack.pop());
stack.add(token);
}
if (Type.RPAREN.equals(token.type)) {
while (!stack.isEmpty() && !Type.LPAREN.equals(stack.peek().type))
rpn.add(stack.pop());
if (stack.isEmpty())
throw new Exception("Missing left parenthesis!");
stack.pop();
if (!stack.isEmpty() && Type.FUNCTION.equals(stack.peek().type))
rpn.add(stack.pop());
}
}
while (!stack.isEmpty()) {
if (Type.LPAREN.equals(stack.peek().type) || Type.RPAREN.equals(stack.peek().type))
throw new Exception("Mismatched parenthesis!");
rpn.add(stack.pop());
}
return rpn;
}
What you want to do is implement a precise parser, that knows the exact syntax of your language (that includes "how many operators does a function have").
It is easy to write such a parser for expressions. See https://stackoverflow.com/a/2336769/120163
You either need to detect it during the Shunting Yard. A quick idea would be on the operator stack, keep a counter against each element. Count the number of commas detected. Then either on a close parenthesis or just at the end, check the number of arguments against each function entry.
An alternative might be to keep some more of the information as additional values for your RPN. e.g. keep the commas, you then get:
2 , 3 max 3 / 3.1415 * sin 2 +
When processing a function, it not only must eat values from the stack it must also eat the correct number of ,s. And too many will show itself later on.
I fear that way has some edge cases though like this; so probably better a precise parser.
sin(1,2) * max (3)
1 , 2 sin 3 max *
Related
I am trying to implement a parser in Java to extract the arguments of some functions.
When I have a function like:
max(1, 2, 3)
I just simply can use a Regular Expresion to extract the args.
But all my functions are not like that. If I have some nested function, eg:
max(sum(1, max(1,2,sum(2,5)), 3, 5, mult(3,3))
I would like to obtain:
sum(1, max(1,2,sum(2,5))
3
5
mult(3,3)
I tried using a Regular Expression, but I asume the language is not regular. Another approach was splliting by ',', but did not work as well.
Is there any method for extracting the arguments of a function? I do not really know how this type of problem can be solved since there is no a pattern to use for extracting the arguments.
Any help or insight would be really appreciated. Thanks!!
Parsing a source code into an some abstract model is quite complex topic, depending on the language complexity.
But first step is usually tokenization, where you read one character at a time and detect full tokens (like variable names, function names, operators, literals etc).
Since you presented only very limited scope for the problem , you have very small set of tokens:
name of a function
( and ) to indicate method call
, to separate arguments
numbers
Reading one symbol at the time, you should be able to very easily detect when one token ends and the next one begins. Also your tokens are very distinct (i.e. you don't have to differentiate function name from variable name), you can very easily categorize them.
Once you have a token, you know the grammar (you have only function calls), you can easily build a syntax tree (where at the root you have top level function call with its arguments being children nodes).
From that structure you can easily fetch whichever parts you wish.
If you are more interested in how it works in the javac compiler, you can always check out its source code (it's open source after all):
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java
https://github.com/openjdk/jdk/blob/master/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavaTokenizer.java
However, that may be quite a long read.
Finally found a method that works:
public List<String> parseArgs(String l){
int startIdx = l.indexOf("(") + 1;
int endIdx = l.lastIndexOf(")") - 1;
int count = 0;
int argIdx = startIdx;
List<String> args = new ArrayList<>();
for (int i = startIdx; i < endIdx; i++) {
if (l.charAt(i) == '(')
count -= 1;
else if (l.charAt(i) == ')'){
count += 1;
}
else if (l.charAt(i) == ',' && count == 0){
args.add(l.substring(argIdx, i).trim());
argIdx = i + 1;
}
}
args.add(l.substring(argIdx, endIdx + 1).trim());
return args;
}
String l = "max(sum(1, max(1,2,sum(2,5))), 3, 5, mult(3,3))";
parseArgs(l).forEach(System.out::println);
//Prints
sum(1, max(1,2,sum(2,5)))
3
5
mult(3,3)
In my software, I need to decide the version of a feature based on 2 parameters. Eg.
Render version 1 -> if (param1 && param2) == true;
Render version 2 -> if (!param1 && !param2) == true;
Render version 3 -> if only param1 == true;
Render version 4 -> if only param2 == true;
So, to meet this requirement, I wrote a code which looks like this -
if(param1 && param2) //both are true {
version = 1;
}
else if(!param1 && !param2) //both are false {
version = 2;
}
else if(!param2) //Means param1 is true {
version = 3;
}
else { //Means param2 is true
version = 4;
}
There are definitely multiple ways to code this but I finalised this approach after trying out different approaches because this is the most readable code I could come up with.
But this piece of code is definitely not scalable because -
Let say tomorrow we want to introduce new param called param3. Then
the no. of checks will increase because of multiple possible
combinations.
For this software, I am pretty much sure that we
will have to accommodate new parameters in future.
Can there be any scalable & readable way to code these requirements?
EDIT:
For a scalable solution define the versions for each parameter combination through a Map:
Map<List<Boolean>, Integer> paramsToVersion = Map.of(
List.of(true, true), 1,
List.of(false, false), 2,
List.of(true, false), 3,
List.of(false, true), 4);
Now finding the right version is a simple map lookup:
version = paramsToVersion.get(List.of(param1, param2));
The way I initialized the map works since Java 9. In older Java versions it’s a little more wordy, but probably still worth doing. Even in Java 9 you need to use Map.ofEntries if you have 4 or more parameters (for 16 combinations), which is a little more wordy too.
Original answer:
My taste would be for nested if/else statements and only testing each parameter once:
if (param1) {
if (param2) {
version = 1;
} else {
version = 3;
}
} else {
if (param2) {
version = 4;
} else {
version = 2;
}
}
But it scales poorly to many parameters.
If you have to enumerate all the possible combinations of Booleans, it's often simplest to convert them into a number:
// param1: F T F T
// param2; F F T T
static final int[] VERSIONS = new int[]{2, 3, 4, 1};
...
version = VERSIONS[(param1 ? 1:0) + (param2 ? 2:0)];
I doubt that there is a way that would be more compact, readable and scalable at the same time.
You express the conditions as minimized expressions, which are compact and may have meaning (in particular, the irrelevant variables don't clutter them). But there is no systematism that you could exploit.
A quite systematic alternative could be truth tables, i.e. the explicit expansion of all combinations and the associated truth value (or version number), which can be very efficient in terms of running-time. But these have a size exponential in the number of variables and are not especially readable.
I am afraid there is no free lunch. Your current solution is excellent.
If you are after efficiency (i.e. avoiding the need to evaluate all expressions sequentially), then you can think of the truth table approach, but in the following way:
declare an array of version numbers, with 2^n entries;
use the code just like you wrote to initialize all table entries; to achieve that, enumerate all integers in [0, 2^n) and use their binary representation;
now for a query, form an integer index from the n input booleans and lookup the array.
Using the answer by Olevv, the table would be [2, 4, 3, 1]. A lookup would be like (false, true) => T[01b] = 4.
What matters is that the original set of expressions is still there in the code, for human reading. You can use it in an initialization function that will fill the array at run-time, and you can also use it to hard-code the table (and leave the code in comments; even better, leave the code that generates the hard-coded table).
Your combinations of parameters is nothing more than a binary number (like 01100) where the 0 indicates a false and the 1 a true.
So your version can be easily calculated by using all the combinations of ones and zeroes. Possible combinations with 2 input parameters are:
11 -> both are true
10 -> first is true, second is false
01 -> first is false, second is true
00 -> both are false
So with this knowledge I've come up with a quite scalable solution using a "bit mask" (nothing more than a number) and "bit operations":
public static int getVersion(boolean... params) {
int length = params.length;
int mask = (1 << length) - 1;
for(int i = 0; i < length; i++) {
if(!params[i]) {
mask &= ~(1 << length - i - 1);
}
}
return mask + 1;
}
The most interesting line is probably this:
mask &= ~(1 << length - i - 1);
It does many things at once, I split it up. The part length - i - 1 calculates the position of the "bit" inside the bit mask from the right (0 based, like in arrays).
The next part: 1 << (length - i - 1) shifts the number 1 the amount of positions to the left. So lets say we have a position of 3, then the result of the operation 1 << 2 (2 is the third position) would be a binary number of the value 100.
The ~ sign is a binary inverse, so all the bits are inverted, all 0 are turned to 1 and all 1 are turned to 0. With the previous example the inverse of 100 would be 011.
The last part: mask &= n is the same as mask = mask & n where n is the previously computed value 011. This is nothing more than a binary AND, so all the same bits which are in mask and in n are kept, where as all others are discarded.
All in all, does this single line nothing more than remove the "bit" at a given position of the mask if the input parameter is false.
If the version numbers are not sequential from 1 to 4 then a version lookup table, like this one may help you.
The whole code would need just a single adjustment in the last line:
return VERSIONS[mask];
Where your VERSIONS array consists of all the versions in order, but reversed. (index 0 of VERSIONS is where both parameters are false)
I would have just gone with:
if (param1) {
if (param2) {
} else {
}
} else {
if (param2) {
} else {
}
}
Kind of repetitive, but each condition is evaluated only once, and you can easily find the code that executes for any particular combination. Adding a 3rd parameter will, of course, double the code. But if there are any invalid combinations, you can leave those out which shortens the code. Or, if you want to throw an exception for them, it becomes fairly easy to see which combination you have missed. When the IF's become too long, you can bring the actual code out in methods:
if (param1) {
if (param2) {
method_12();
} else {
method_1();
}
} else {
if (param2) {
method_2();
} else {
method_none();
}
}
Thus your whole switching logic takes up a function of itself and the actual code for any combination is in another method. When you need to work with the code for a particular combination, you just look up the appropriate method. The big IF maze is then rarely looked at, and when it is, it contains only the IFs themselves and nothing else potentially distracting.
In logic expressions remaining part would be skipped if it is unnecessary
boolean b = false && checkSomething( something)
//checkSomething() doesn't get called
What is a good way to achieve the same with arithmetic expressions ?
int i = 0 * calculateSomethig ( something )
It is possible to add ifs before * . But is there a more elegant way to solve this problem? Without of adding much stuff into expression, so that expression itself would look as close to original as possible
Why i do not want to use ifs?
from
return calculateA() * calculateB()
it'll become bulky and unclear
int result
int a = calculateA();
if (a!=0) {
result = a*calculateB()
}else{
result = 0
}
return result
8 lines of code instead of 1,
those expressions might be more complex than a*b
those expressions represent business logic so i want to keep them
clear and easily readable
there might be whole bunch of them
Why do i bother with this at all?
Because calculation methods might be expensive
uses values form other places, where searches and sorts are happening
lots of those expressions can be executed at once ( after user event and user should see result "instantly"
P( *0 in expression ) >0.5
&& and || are called short-circuit operators because they don't evaluate if the JVM will find the value of the whole expression without evaluating the whole expression. For example, the JVM does not have to evaluate the second part of the following expression to tell it evaluates to true:
6 == (2 + 4) || 8 == 9
The JVM does not have to evaluate all of the following expression either to tell it evaluates to false:
9 == 8 && 7 == 7
The multiplication operator (*) is not a short-circuit operator. And so, it does not behave that way. You can do this as you mentioned using if statements. There is no predefined way to do this.
You can create a structure that uses lambdas to evaluate its arguments lazily:
class LazyMul implements IntSupplier {
private final IntSupplier [] args;
private LazyMul(IntSupplier[] args) {
//argument checking omitted for brevity :)
this.args = args;
}
public static LazyMul of(IntSupplier ... args) {
return new LazyMul(args);
}
#Override
public int getAsInt() {
int res = 1;
for (IntSupplier arg: args) {
res *= arg.getAsInt();
if (res == 0)
break;
}
return res;
}
}
Of course this is even longer but using it is as simple as LazyMul.of(this::calculateA, this::calculateB), so if you use it several times, it's better than having an if every time around.
Unfortunately with complicated (particularly nested) expressions readability suffers, but these are the limitations of Java as a language.
I am just practicing lamdas java 8. My problem is as follows
Sum all the digits in an integer until its less than 10(means single digit left) and checks if its 1
Sample Input 1
100
Sample Output 1
1 // true because its one
Sample Input 2
55
Sample Output 2
1 ie 5+5 = 10 then 1+0 = 1 so true
I wrote a code
System.out.println(Arrays.asList( String.valueOf(number).split("") ).stream()
.map(Integer::valueOf)
.mapToInt(i->i)
.sum() == 1);
It works for the input 1 ie 100 but not for input 2 ie 55 which I clearly understand that in second case 10 is the output because the iteration is not recursive .
So how can I make this lambdas expression recursive so that it can work in second case also? I can create a method with that lambda expression and call it each time until return value is< 10 but I was thinking if there is any approach within lambdas.
Thanks
If you want a pure lambda solution, you should forget about making it recursive, as there is absolutely no reason to implement an iterative process as a recursion:
Stream.iterate(String.valueOf(number),
n -> String.valueOf(n.codePoints().map(Character::getNumericValue).sum()))
.filter(s -> s.length()==1)
.findFirst().ifPresent(System.out::println);
Demo
Making lambdas recursive in Java is not easy because of the "variable may be uninitialized" error, but it can be done. Here is a link to an answer describing one way of doing it.
When applied to your task, this can be done as follows:
// This comes from the answer linked above
class Recursive<I> {
public I func;
}
public static void main (String[] args) throws java.lang.Exception {
Recursive<Function<Integer,Integer>> sumDigits = new Recursive<>();
sumDigits.func = (Integer number) -> {
int s = Arrays.asList( String.valueOf(number).split("") )
.stream()
.map(Integer::valueOf)
.mapToInt(i->i)
.sum();
return s < 10 ? s : sumDigits.func.apply(s);
};
System.out.println(sumDigits.func.apply(100) == 1);
System.out.println(sumDigits.func.apply(101) == 1);
System.out.println(sumDigits.func.apply(55) == 1);
System.out.println(sumDigits.func.apply(56) == 1);
}
I took your code, wrapped it in { ... }s, and added a recursive invocation on the return line.
Demo.
I generate randomly this expression [0-9*]. The symbol '*' is end line symbol. When it is generated, I jump to next line and fill it until '*' is generated again and so on. But in some cases, my first generated symbol is '*' and then jump to next line. for example:
116165464*
56465*
*
654*
64*
*
14*
and so on
...
..
.
As you can see, end line symbol like in 3 line is not suitable and useful. So I want to avoid this. How can I generate numbers and prevent to generate like 3 line and 6 line in my example? In other words, I want to generate lines which must to contain numbers ( there cannot be like 3 and 6 lines showed in my example)
(Assume that I will delete all '*' symbols in the future, and if there will be lines like 3 and 6 in my example, there will be only empty space.)
My code looks like this: (it will generate symbols, c - char type)
for(int i = 1;i<max;i++){
if(i == max-1)
c = '*';
c = numbers.charAt(rnd.nextInt(numbers.length()));
listChar.add(c);
Thanks
I think I solved my problem by doing this:
I add additional char temp;
I stored my previous symbol in my temp;
And check this condition:
temp = c;
c = generate symbol;
...
if(temp == c)
continue;
if not equal
add(c)
(It is kind of pseudo to get an idea)
One from my results:
123355666778999
98631
112339
7
8
88877431
169
99988765544443211
112444456788999
981
1345
98876655543211
334667899
85431
34569
876521
1112334556678
88764333211
With small change to your code what you wanted may be achieved.
I'm assuming that number = "0123456789*" based on your logic. If you want to achieve a non zero
// Next line will generate one number from 0 to 9 as we are ignoring last character.
listChar.add(numbers.charAt(rnd.nextInt(numbers.length()-1)));
for(int i = 2;i<max;i++){
c = numbers.charAt(rnd.nextInt(numbers.length()));
listChar.add(c);
if(c == '*')
break;
}
System.out.println(listChar);
Below code will ensure you can avoid adding an asterisk.
// Next line will generate one number from 0 to 9 as we are ignoring last character.
listChar.add(numbers.charAt(rnd.nextInt(numbers.length()-1)));
for(int i = 2;i<max;i++){
c = numbers.charAt(rnd.nextInt(numbers.length()));
if(c == '*')
{
break;
}
else
{
listChar.add(c);
}
}
System.out.println(listChar);
However there can always be a better algorithm to generate what you wanted. But this will work.
public void genLine(){
int random = Math.random()*10;
System.out.print(random);
while (random!=10)
random = Math.random()*11;
if (random == 10) System.out.print('*');
else System.out.print(random);
}
}
This will generate one line for you. The first character is a special case because it cannot be a '*' so I just hard coded it. In every other case generate a random number with one more possibility. If that extra possibility is chosen, print '*'.