I have successfully implemented a shunting yard algorithm in java. The algorithm itself was simple however I am having trouble with the tokenizer. Currently the algorithm works with everything I want excluding one thing. How can I tell the difference between subtraction(-) and negative (-)
such as 4-3 is subtraction
but -4+3 is negative
I now know how to find out when it should be a negative and when it should be a minus, but where in the algorithm should it be placed because if you use it like a function it wont always work for example
3 + 4 * 2 / -( 1 − 5 ) ^ 2 ^ 3
when 1-5 becomes -4 it will become 4 before it gets squared and cubed
just like
3 + 4 * 2 / cos( 1 − 5 ) ^ 2 ^ 3 , you would take the cosine before squaring and cubing
but in real math you wouldn’t with a - because what your really saying is
3 + 4 * 2 / -(( 1 − 5 ) ^ 2 ^ 3) in order to have the right value
It sounds like you're doing a lex-then-parse style parser, where you're going to need a simple state machine in the lexer in order to get separate tokens for unary and binary minus. (In a PEG parser, this isn't something you have to worry about.)
In JavaCC, you would have a DEFAULT state, where you would consider the - character to be UNARY_MINUS. When you tokenized the end of a primary expression (either a closing paren, or an integer, based on the examples you gave), then you would switch to the INFIX state where - would be considered to be INFIX_MINUS. Once you encountered any infix operator, you would return to the DEFAULT state.
If you're rolling your own, it might be a bit simpler than that. Look at this Python code for a clever way of doing it. Basically, when you encounter a -, you just check to see if the previous token was an infix operator. That example uses the string "-u" to represent the unary minus token, which is convenient for an informal tokenization. Best I can tell, the Python example does fail to handle case where a - follows an open paren, or comes at the beginning of the input. Those should be considered unary as well.
In order for unary minus to be handled correctly in the shunting-yard algorithm itself, it needs to have higher precedence than any of the infix operators, and it needs to marked as right-associative. (Make sure you handle right-associativity. You may have left it out since the rest of your operators are left-associative.) This is clear enough in the Python code (although I would use some kind of struct rather than two separate maps).
When it comes time to evaluate, you will need to handle unary operators a little differently, since you only need to pop one number off the stack, rather than two. Depending on what your implementation looks like, it may be easier to just go through the list and replace every occurrence of "-u" with [-1, "*"].
If you can follow Python at all, you should be able to see everything I'm talking about in the example I linked to. I find the code to be a bit easier to read than the C version that someone else mentioned. Also, if you're curious, I did a little write-up a while back about using shunting-yard in Ruby, but I handled unary operators as a separate nonterminal, so they are not shown.
The answers to this question might be helpful.
In particular, one of those answers references a solution in C that handles unary minus.
Basically, you have to recognize a unary minus based on the appearance of the minus sign in positions where a binary operator can't be, and make a different token for it, as it has different precedence.
Dijkstra's original paper doesn't too clearly explain how he dealt with this, but the unary minus was listed as a separate operator.
This isn't in Java, but here is a library I wrote to specifically solve this problem after searching and not finding any clear answers.
This does all you want and more:
https://marginalhacks.com/Hacks/libExpr.rb/
It is a ruby library (as well as a testbench to check it) that runs a modified shunting yard algorithm that also supports unary ('-a') and ternary ('a?b:c') ops. It also does RPN, Prefix and AST (abstract syntax trees) - your choice, and can evaluate the expression, including the ability to yield to a block (a lambda of sorts) that can handle any variable evaluation. Only AST does the full set of operations, including the ability to handle short-circuit operations (such as '||' and '?:' and so on), but RPN does support unary. It also has a flexible precedence model that includes presets for precedence as done by C expressions or by Ruby expressions (not the same). The testbench itself is interesting as it can create random expressions which it can then eval() and also run through libExpr to compare results.
It's fairly documented/commented, so it shouldn't be too hard to convert the ideas to Java or some other language.
The basic idea as far as unary operators is that you can recognize them based on the previous token. If the previous token is either an operator or a left-paren, then the "unary-possible" operators (+ and -) are just unary and can be pushed with only one operand. It's important that your RPN stack distinguishes between the unary operator and the binary operator so it knows what to do on evaluation.
In your lexer, you can implement this pseudo-logic:
if (symbol == '-') {
if (previousToken is a number
OR previousToken is an identifier
OR previousToken is a function) {
currentToken = SUBTRACT;
} else {
currentToken = NEGATION;
}
}
You can set up negation to have a precedence higher than multiply and divide, but lower than exponentiation. You can also set it up to be right associative (just like '^').
Then you just need to integrate the precedence and associativity into the algorithm as described on Wikipedia's page.
If the token is an operator, o1, then: while there is an operator
token, o2, at the top of the stack, and either o1 is left-associative
and its precedence is less than or equal to that of o2, or o1 has
precedence less than that of o2, pop o2 off the stack, onto the output
queue; push o1 onto the stack.
I ended up implementing this corresponding code:
} else if (nextToken instanceof Operator) {
final Operator o1 = (Operator) nextToken;
while (!stack.isEmpty() && stack.peek() instanceof Operator) {
final Operator o2 = (Operator) stack.peek();
if ((o1.associativity == Associativity.LEFT && o1.precedence <= o2.precedence)
|| (o1.associativity == Associativity.RIGHT && o1.precedence < o2.precedence)) {
popStackTopToOutput();
} else {
break;
}
}
stack.push(nextToken);
}
Austin Taylor is quite right that you only need to pop off one number for a unary operator:
if (token is operator negate) {
operand = pop;
push operand * -1;
}
Example project:
https://github.com/Digipom/Calculator-for-Android/
Further reading:
http://en.wikipedia.org/wiki/Shunting-yard_algorithm
http://sankuru.biz/blog/1-parsing-object-oriented-expressions-with-dijkstras-shunting-yard-algorithm
I know it's an old post, but may be someone will find it useful .
I implemented this algorithm before, starting by toknizer using StreamTokenizer class
and it works fine. In StreamTokenizer in Java, there are some character with specific meaning. For example: ( is an operator, sin is a word,...
For your question, There is a method called "streamToknizer.ordinaryChar(..)" which it specifies that the character argument is "ordinary" in this tokenizer. It removes any special significance the character has as a comment character, word component, string delimiter, white space, or number character. Source here
So you can define - as ordinary character which means, it won't be considered as a sign for number.For example, if you have expression 2-3 , You will have [2,-,3], but if you didn't specify it as ordinary, so it will be [2,-3]
Related
One of my midterm review questions asks to parse this tree in different ways - pre/postfix etc. It asks these two ways as well though: In "Infix, Java precedence rules" and in "Infix, left-to-right precedence"
What is the difference between Java precedence rule and plain left-to-right infix rule? I thought if it was as Java precedence, something like "newline" may be needed like the actual java code but I really don't see what's really asked here. Thanks for your help in advance
Another question. How would you regard d and e nodes?
If it was postfix, (d e) f h * - would be appropriate for that portion of tree?
I think left-to-right precedence simply means that you just apply all the infix operators from left to right, so that
2 * 3 + 4 * 5
is interpreted as
((2 * 3) + 4) * 5 = 50
In Java and every other programming language I know of except APL, however, * is given higher precedence than + or -, which means the expression is interpreted as
(2 * 3) + (4 * 5) = 26
(Java has a lot more operators, so the order of precedence is pretty complicated. But if you're only going to see +, -, *, and /, all you need to know is that * and / have higher precedence; and that for operators with the same precedence, they're evaluated left to right.)
I'm guessing that the assignment is asking you how the tree would be represented using the two different precedence rules. Of course, you could put parentheses around everything, which means the precedence rules wouldn't apply at all:
(foo (a, (b + c) * ((d ? e) - (f * h)), (j * k)) - g
[The ? is there because there seems to be a box missing from the diagram.] So you're probably supposed to write it in a way without unnecessary parentheses, which means you need to know the precedence rules.
To answer your last question about d and e: you should ask your instructor, because I'm guessing it means "misprint". Unless they've come up with some new kinds of syntax tree diagrams since I studied this, it looks like a box is missing.
I needed some help with creating custom trees given an arithmetic expression. Say, for example, you input this arithmetic expression:
(5+2)*7
The result tree should look like:
*
/ \
+ 7
/ \
5 2
I have some custom classes to represent the different types of nodes, i.e. PlusOp, LeafInt, etc. I don't need to evaluate the expression, just create the tree, so I can perform other functions on it later.
Additionally, the negative operator '-' can only have one child, and to represent '5-2', you must input it as 5 + (-2).
Some validation on the expression would be required to ensure each type of operator has the correct the no. of arguments/children, each opening bracket is accompanied by a closing bracket.
Also, I should probably mention my friend has already written code which converts the input string into a stack of tokens, if that's going to be helpful for this.
I'd appreciate any help at all. Thanks :)
(I read that you can write a grammar and use antlr/JavaCC, etc. to create the parse tree, but I'm not familiar with these tools or with writing grammars, so if that's your solution, I'd be grateful if you could provide some helpful tutorials/links for them.)
Assuming this is some kind of homework and you want to do it yourself..
I did this once, you need a stack
So what you do for the example is:
parse what to do? Stack looks like
( push it onto the stack (
5 push 5 (, 5
+ push + (, 5, +
2 push 2 (, 5, +, 2
) evaluate until ( 7
* push * 7, *
7 push 7 +7, *, 7
eof evaluate until top 49
The symbols like "5" or "+" can just be stored as strings or simple objects, or you could store the + as a +() object without setting the values and set them when you are evaluating.
I assume this also requires an order of precedence, so I'll describe how that works.
in the case of: 5 + 2 * 7
you have to push 5 push + push 2 next op is higher precedence so you push it as well, then push 7. When you encounter either a ) or the end of file or an operator with lower or equal precedence you start calculating the stack to the previous ( or the beginning of the file.
Because your stack now contains 5 + 2 * 7, when you evaluate it you pop the 2 * 7 first and push the resulting *(2,7) node onto the stack, then once more you evaluate the top three things on the stack (5 + *node) so the tree comes out correct.
If it was ordered the other way: 5 * 2 + 7, you would push until you got to a stack with "5 * 2" then you would hit the lower precedence + which means evaluate what you've got now. You'd evaluate the 5 * 2 into a *node and push it, then you'd continue by pushing the + and 3 so you had *node + 7, at which point you'd evaluate that.
This means you have a "highest current precedence" variable that is storing a 1 when you push a +/-, a 2 when you push a * or / and a 3 for "^". This way you can just test the variable to see if your next operator's precedence is < = your current precedence.
if ")" is considered priority 4 you can treat it as other operators except that it removes the matching "(", a lower priority would not.
I wanted to respond to Bill K.'s answer, but I lack the reputation to add a comment there (that's really where this answer belongs). You can think of this as a addendum to Bill K.'s answer, because his was a little incomplete. The missing consideration is operator associativity; namely, how to parse expressions like:
49 / 7 / 7
Depending on whether division is left or right associative, the answer is:
49 / (7 / 7) => 49 / 1 => 49
or
(49 / 7) / 7 => 7 / 7 => 1
Typically, division and subtraction are considered to be left associative (i.e. case two, above), while exponentiation is right associative. Thus, when you run into a series of operators with equal precedence, you want to parse them in order if they are left associative or in reverse order if right associative. This just determines whether you are pushing or popping to the stack, so it doesn't overcomplicate the given algorithm, it just adds cases for when successive operators are of equal precedence (i.e. evaluate stack if left associative, push onto stack if right associative).
The "Five minute introduction to ANTLR" includes an arithmetic grammar example. It's worth checking out, especially since antlr is open source (BSD license).
Several options for you:
Re-use an existing expression parser. That would work if you are flexible on syntax and semantics. A good one that I recommend is the unified expression language built into Java (initially for use in JSP and JSF files).
Write your own parser from scratch. There is a well-defined way to write a parser that takes into account operator precedence, etc. Describing exactly how that's done is outside the scope of this answer. If you go this route, find yourself a good book on compiler design. Language parsing theory is going to be covered in the first few chapters. Typically, expression parsing is one of the examples.
Use JavaCC or ANTLR to generate lexer and parser. I prefer JavaCC, but to each their own. Just google "javacc samples" or "antlr samples". You will find plenty.
Between 2 and 3, I highly recommend 3 even if you have to learn new technology. There is a reason that parser generators have been created.
Also note that creating a parser that can handle malformed input (not just fail with parse exception) is significantly more complicated that writing a parser that only accepts valid input. You basically have to write a grammar that spells out the various common syntax errors.
Update: Here is an example of an expression language parser that I wrote using JavaCC. The syntax is loosely based on the unified expression language. It should give you a pretty good idea of what you are up against.
Contents of org.eclipse.sapphire/plugins/org.eclipse.sapphire.modeling/src/org/eclipse/sapphire/modeling/el/parser/internal/ExpressionLanguageParser.jj
the given expression (5+2)*7 we can take as infix
Infix : (5+2)*7
Prefix : *+527
from the above we know the preorder and inorder taversal of tree ... and we can easily construct tree from this.
Thanks,
Firstly I learnt that &, |, ^ are the bitwise operators, and now somebody mentioned them as logical operators with &&, ||, I am completely confused - the same operator has two names? There are already logical operators &&, ||, then why use &, |, ^?
The Java operators &, | and ^ are EITHER bitwise operators OR logical operators ... depending on the types of the operands. If the operands are integers, the operators are bitwise. If they are booleans, then the operators are logical.
And this is not just me saying this. The JLS describes these operators this way too; see JLS 15.22.
(This is just like + meaning EITHER addition OR string concatenation ... depending on the types of the operands. Or just like a "rose" meaning either a flower or a shower attachment. Or "cat" meaning either a furry animal or a UNIX command. Words mean different things in different contexts. And this is true for the symbols used in programming languages too.)
There are already logical operators &&, ||, why use &, |, ^?
In the case of the first two, it is because the operators have different semantics with regards to when / whether the operands get evaluated. The two different semantics are needed in different situations; e.g.
boolean res = str != null && str.isEmpty();
versus
boolean res = foo() & bar(); // ... if I >>need<< to call both methods.
The ^ operator has no short-circuit equivalent because it simply doesn't make sense to have one.
Having a language reference is one thing, interpreting it correctly is another.
We need to interpret things correctly.
Even if Java documented that & is both bitwise and logical, we could make an argument that & really didn't lost its logical-operator-ness mojo since time immemorial, since C. That is, & is first and foremost, an inherently logical operator(albeit a non-short-circuited one at that)
& parses lexically+logically as logical operation.
To prove the point, both of these lines behaves the same, eversince C and upto now(Java, C#, PHP, etc)
if (a == 1 && b)
if (a == 1 & b)
That is, the compiler will interpret those as these:
if ( (a == 1) && (b) )
if ( (a == 1) & (b) )
And even if both variables a and b are both integers. This...
if (a == 1 & b)
... will still be interpereted as:
if ( (a == 1) & (b) )
Hence, this will yield a compilation error on languages which doesn't facilitate integer/boolean duality, e.g. Java and C#:
if (a == 1 & b)
In fact, on the compilation error above, we could even make an argument that & didn't lost its logical(non-short-circuit) operation mojo, and we can conclude that Java continues the tradition of C making the & still a logical operation. Consequently, we could say it's the other way around, i.e. the & can be repurposed as bitwise operation (by applying parenthesis):
if ( a == (1 & b) )
So there we are, in another parallel universe, someone could ask, how to make the & expression become a bitmask operation.
How to make the following compile, I read in JLS that & is a bitwise
operation. Both a and b are integers, but it eludes me why the
following bitwise operation is a compilation error in Java:
if (a == 1 & b)
Or this kind of question:
Why the following didn't compile, I read in JLS that & is a bitwise
operation when both its operands are integers. Both a and b are
integers, but it eludes me why the following bitwise operation is a
compilation error in Java:
if (a == 1 & b)
In fact, I would not be surprised if there's already an existing stackoverflow questions similar to above questions that asked how to do that masking idiom in Java.
To make that logical operation interpretation by the language become bitwise, we have to do this (on all languages, C, Java, C#, PHP, etc):
if ( a == (1 & b) )
So to answer the question, it's not because JLS defined things such way, it's because Java(and other languages inspired by C)'s & operator is for all intents and purposes is still a logical operator, it retained C's syntax and semantics. It's the way it is since C, since time immemorial, since before I was even born.
Things just don't happen by chance, JLS 15.22 didn't happen by chance, there's a deep history around it.
In another parallel universe, where && was not introduced to the language, we will still be using & for logical operations, one might even ask a question today:
Is it true, we can use the logical operator & for bitwise operation?
& doesn't care if its operands are integers or not, booleans or not. It's still a logical operator, a non-short-circuited one. And in fact, the only way to force it to become a bitwise operator in Java(and even in C) is to put parenthesis around it. i.e.
if ( a == (1 & b) )
Think about it, if && was not introduced to C language(and any language who copied its syntax and semantics), anyone could be asking now:
how to use & for bitwise operations?
To sum it up, first and foremost Java & is inherently a logical operator(a non-short-circuited one), it doesn't care about its operands, it will do its business as usual(applying logical operation) even if both operands are integers(e.g. masking idiom). You can only force it to become bitwise operation by applying parenthesis. Java continues the C tradition
If Java's & really is a bitwise operation if its operands(integer 1 and integer variable b on example code below) are both integers, this should compile:
int b = 7;
int a = 1;
if (a == 1 & b) ...
They(& and |) were used for two purposes long time ago, logical operator and bitwise operator. If you'll check out the neonatal C (the language Java was patterned after), & and | were used as logical operator.
But since disambiguating the bitwise operations from logical operations in the same statement is very confusing, it prompted Dennis Ritchie to create a separate operator(&& and ||) for logical operator.
Check the Neonatal C section here: http://cm.bell-labs.com/who/dmr/chist.html
You can still use the bitwise operators as logical operators, its retained operator precedence is the evidence of that. Read out the history of bitwise operator's past life as logical operator on Neonatal C
Regarding the evidence, I made a blog post on comparing the logical operator and bitwise operator. It will be self-evident that the so called bitwise operators are still logical operators if you try contrasting them in an actual program: http://www.anicehumble.com/2012/05/operator-precedence-101.html
I also answered a question related to your question on What is the point of the logical operators in C?
So it's true, bitwise operators are logical operators too, albeit non-short-circuited version of short-circuited logical operators.
Regarding
There are already logical operators &&, ||, then why use &, |, ^?
The XOR can be easily answered, it's like a radio button, only one is allowed, code below returns false. Apology for the contrived code example below, the belief that drinking both beer and milk at the same time is bad was debunked already ;-)
String areYouDiabetic = "Yes";
String areYouEatingCarbohydrate = "Yes";
boolean isAllowed = areYouDiabetic == "Yes" ^ areYouEatingCarbohydrate == "Yes";
System.out.println("Allowed: " + isAllowed);
There's no short-circuit equivalent to XOR bitwise operator, as both sides of the expression are needed be evaluated.
Regarding why the need to use & and | bitwise operators as logical operators, frankly you'll be hard-pressed to find a need to use bitwise operators(a.k.a. non-short-circuit logical operators) as logical operators. A logical operation can be non-short-circuited (by using the bitwise operator, aka non-short-circuited logical operator) if you want to achieve some side effect and make your code compact(subjective), case in point:
while ( !password.isValid() & (attempts++ < MAX_ATTEMPTS) ) {
// re-prompt
}
The above can re-written as the following(removing the parenthesis), and still has exactly the same interpretation as the preceding code.
while ( !password.isValid() & attempts++ < MAX_ATTEMPTS ) {
// re-prompt
}
Removing the parenthesis and yet it still yields the same interpretation as the parenthesized one, can make the logical operator vestige of & more apparent. To run the risk of sounding superfluous, but I have to emphasize that the unparenthesized expression is not interpreted as this:
while ( ( !password.isValid() & attempts++ ) < MAX_ATTEMPTS ) {
// re-prompt
}
To sum it up, using & operator (more popularly known as bitwise operator only, but is actually both bitwise and logical(non-short-circuited)) for non-short-circuit logical operation to achieve side effect is clever(subjective), but is not encouraged, it's just one line of savings effect in exchange for readability.
Example sourced here: Reason for the exsistance of non-short-circuit logical operators
The Java type byte is signed which might be a problem for the bitwise operators. When negative bytes are extended to int or long, the sign bit is copied to all higher bits to keep the interpreted value. For example:
byte b1=(byte)0xFB; // that is -5
byte b2=2;
int i = b1 | b2<<8;
System.out.println((int)b1); // This prints -5
System.out.println(i); // This prints -5
Reason: (int)b1 is internally 0xFFFB and b2<<8 is 0x0200 so i will be 0xFFFB
Solution:
int i = (b1 & 0xFF) | (b2<<8 & 0xFF00);
System.out.println(i); // This prints 763 which is 0x2FB
Are there any popular frameworks for MBT in Java? Is anyone of you using MBT technique?
Groovy with ModelJUnit
From the site:
ModelJUnit is a Java library that extends JUnit to support model-based testing.
One way of approaching model based testing is to build a known-good implementation and compare against that. One of the best strategies for doing this is to use a higher-level language so that a short and concise implementation is easy to read and reason about.
I've had great success using Haskell and QuickCheck to do model-based testing of a query evaluation. I represented the query language as a Haskell data type, and then simply made this an instance of Arbitrary. Once that was done, QuickCheck could generate infinite numbers of queries, which I could then evaluate against both the server under test and my model implementation.
Scala and Scala-Check might allow you to do the same sort of thing on the JVM.
As a quick example of how this might be used, here's a quick and dirty example:
import Test.QuickCheck
import Control.Monad (liftM,liftM2)
-- An simple expression data type.
-- Expression is either (Add X Y, Minus X Y, Multiply X Y or just a plain value)
-- where X and Y are further expressions
data Expression = Add Expression Expression
| Minus Expression Expression
| Multiply Expression Expression
| Value Int
deriving Show
-- This is my model-based approach, high level
evaluate :: Expression -> Int
evaluate (Value x) = x
evaluate (Add lhs rhs) = evaluate lhs + evaluate rhs
evaluate (Minus lhs rhs) = evaluate lhs - evaluate rhs
evaluate (Multiply lhs rhs) = evaluate lhs * evaluate rhs
After this I have a known good implementation of a simple model. Just by inspecting the code I can be reasonable sure that this does what I think it should do. Now I can use Arbitrary to generate arbitrarily complicated expressions. The weightings below indicate the frequency (so 70 percent of the generated expressions will be Value and 10 percent will be arbitrary Add statements and so on )
instance Arbitrary Expression where
arbitrary = frequency [
(70,liftM Value arbitrary)
, (10,liftM2 Add arbitrary arbitrary)
, (10,liftM2 Minus arbitrary arbitrary)
, (10,liftM2 Multiply arbitrary arbitrary)
]
This definition gives QuickCheck the power to create me arbitrary expressions. For example:
> sample (arbitrary :: Gen Expression)
Multiply (Value 1) (Value 0)
Value 1
Add (Value (-3)) (Add (Value (-1)) (Minus (Value 3) (Value 0)))
Value (-4)
Value (-19)
Add (Value (-49)) (Value (-55))
Minus (Value 50) (Value 103)
Value 210
Value (-172)
Value (-907)
Value (-2703)
I think this is already pretty cool (but maybe I'm just weird). The real powerful bit is that now I can specify an invariant that holds over all generated values
prop_testAgainstModel :: Expression -> Bool
prop_testAgainstModel expr = evaluate expr == evaluate' expr
Where we assume that evaluate' is my super-complicated way of evaluating things that I want to verify against my model. QuickCheck will then generate lots of arbitrary expressions and try to disprove my assertion.
Here's an example run against one of my attempts to produce a more complicate evaluate function.
> quickCheck prop_testAgainstModel
*** Failed! Falsifiable (after 28 tests):
Multiply (Value (-53758)) (Value 125360)
I used this approach to test an external server written in C++. The Haskell model implementation was maybe a few 100 lines of code and I could compare this against the results of the web service with not much more code than that above.
Is there any way to interpret "normal" mathematical notation into Reverse Polish Notation(RPN)..?
eg
1) 2 + 3*4 - 1 = 234*+1-
2) 5 (4-8) = 548-
U can assume that BODMAS rule is followed and that inner brackets have to be calculated first etc.. i mean the normal maths to be applied here.. the answer should be in postfix notation..
Thanks
Yes; the shunting yard algorithm defines how to do this.
Each time you read a number, put it onto the output queue. Each time you read an operator, put it on the operator stack. These two structures form the basis of the algorithm.
So-called "normal" is rigorously called infix notation. There are also prefix and postfix notations, the latter being RPN.
The typical rearrangement of notation is done by constructing a parse tree and traversing specifically for the arrangement needed.
Here are some descriptions of how to do it: a b